扫码阅读
手机扫码阅读

全自动机器学习 AutoML 高效预测时间序列

80 2024-10-19

我们非常重视原创文章,为尊重知识产权并避免潜在的版权问题,我们在此提供文章的摘要供您初步了解。如果您想要查阅更为详尽的内容,访问作者的公众号页面获取完整文章。

查看原文:全自动机器学习 AutoML 高效预测时间序列
文章来源:
数据STUDIO
扫码关注公众号
Time Series Data Transformation and Machine Learning Summary

Summary of Time Series Data Transformation into Tabular Format for Machine Learning

Time series data is pivotal in forecasting problems such as energy consumption. Traditional time series models like Prophet are commonly used, but transforming time series data into a tabular format can utilize a broader range of machine learning algorithms, potentially improving predictions and insights.

Transforming Time Series into Table Format

In this article, we discuss how to turn a time series data set of daily energy consumption into a tabular form using open-source libraries. We experimented with various machine learning models, including gradient boosting decision trees and AutoML, comparing them with the Prophet model. Our results showed that gradient boosting decision trees improved out-of-sample prediction error by 67% (accuracy increase of 38 percentage points), and AutoML further reduced prediction error by 42% (accuracy increase of 8 percentage points) over gradient boosting. Overall, AutoML showed an 81% reduction in prediction error (46 percentage points accuracy increase) compared to Prophet.

Examining and Preparing the Data

The data set, from PJM Interconnection LLC, includes hourly energy consumption in megawatts, aiming to predict the next day's energy consumption level categorized into four quartiles: low, below average, above average, and high. The initial approach uses Prophet for time series forecasting, then the problem is reframed into a multi-class classification problem. The data is transformed into average daily energy consumption with renaming for compatibility with Prophet, and quartile classification is based on the training data to prevent data leakage.

Feature Engineering and Machine Learning

Time series data is transformed into table format and feature-engineered using libraries like sktime, tsfresh, and tsfel to extract statistical, temporal, and spectral features. After careful feature selection to remove highly correlated or irrelevant features, we obtained a feature set conducive to machine learning modeling. We then applied gradient boosting and AutoML models to predict energy consumption levels.

Advantages of AutoML

After highlighting the benefits of feature engineering and applying machine learning models like gradient boosting, we explored AutoML as a solution for model selection and hyperparameter tuning. AutoML platforms like Cleanlab Studio automate the training and optimization of various models, including gradient boosting, for prediction tasks.

Conclusion

Transforming time series data into tabular format and applying feature engineering and machine learning algorithms outperformed traditional time series models. In the case of predicting daily energy consumption for the PJM region, this method demonstrated a significant improvement in prediction accuracy, illustrating the potential for enhanced forecasting in real-world applications. The general approach of using tabular data for time series forecasting presents a promising new direction for research.

The article concludes by promoting the original public account “数据STUDIO” which focuses on Python and data science, providing content on various topics from basics to advanced levels.

想要了解更多内容?

查看原文:全自动机器学习 AutoML 高效预测时间序列
文章来源:
数据STUDIO
扫码关注公众号