全自动机器学习 AutoML 高效预测时间序列
我们非常重视原创文章,为尊重知识产权并避免潜在的版权问题,我们在此提供文章的摘要供您初步了解。如果您想要查阅更为详尽的内容,访问作者的公众号页面获取完整文章。
Summary of Time Series Data Transformation into Tabular Format for Machine Learning
Time series data is pivotal in forecasting problems such as energy consumption. Traditional time series models like Prophet are commonly used, but transforming time series data into a tabular format can utilize a broader range of machine learning algorithms, potentially improving predictions and insights.
Transforming Time Series into Table Format
In this article, we discuss how to turn a time series data set of daily energy consumption into a tabular form using open-source libraries. We experimented with various machine learning models, including gradient boosting decision trees and AutoML, comparing them with the Prophet model. Our results showed that gradient boosting decision trees improved out-of-sample prediction error by 67% (accuracy increase of 38 percentage points), and AutoML further reduced prediction error by 42% (accuracy increase of 8 percentage points) over gradient boosting. Overall, AutoML showed an 81% reduction in prediction error (46 percentage points accuracy increase) compared to Prophet.
Examining and Preparing the Data
The data set, from PJM Interconnection LLC, includes hourly energy consumption in megawatts, aiming to predict the next day's energy consumption level categorized into four quartiles: low, below average, above average, and high. The initial approach uses Prophet for time series forecasting, then the problem is reframed into a multi-class classification problem. The data is transformed into average daily energy consumption with renaming for compatibility with Prophet, and quartile classification is based on the training data to prevent data leakage.
Feature Engineering and Machine Learning
Time series data is transformed into table format and feature-engineered using libraries like sktime, tsfresh, and tsfel to extract statistical, temporal, and spectral features. After careful feature selection to remove highly correlated or irrelevant features, we obtained a feature set conducive to machine learning modeling. We then applied gradient boosting and AutoML models to predict energy consumption levels.
Advantages of AutoML
After highlighting the benefits of feature engineering and applying machine learning models like gradient boosting, we explored AutoML as a solution for model selection and hyperparameter tuning. AutoML platforms like Cleanlab Studio automate the training and optimization of various models, including gradient boosting, for prediction tasks.
Conclusion
Transforming time series data into tabular format and applying feature engineering and machine learning algorithms outperformed traditional time series models. In the case of predicting daily energy consumption for the PJM region, this method demonstrated a significant improvement in prediction accuracy, illustrating the potential for enhanced forecasting in real-world applications. The general approach of using tabular data for time series forecasting presents a promising new direction for research.
The article concludes by promoting the original public account “数据STUDIO” which focuses on Python and data science, providing content on various topics from basics to advanced levels.
想要了解更多内容?
点击领取《Python学习手册》,后台回复「福利」获取。『数据STUDIO』专注于数据科学原创文章分享,内容以 Python 为核心语言,涵盖机器学习、数据分析、可视化、MySQL等领域干货知识总结及实战项目。