原文信息:
Prediction and explanation of the formation of the Spanish day-ahead electricity price through machine learning regression原文鏈接:https://www.sciencedirect.com/science/article/pii/S0306261919302260
HighlightsWe propose a regression-tree-based method for modeling electricity price formation.The explanatory variables are extracted from publicly accessible energy related data.
The energy-related data are free and published by the TSO in a graphical interface.The model shows good accuracy in predicting the price formation. It also allows for a non-linear analysis of the dependence of price on predictors.
導(dǎo) 讀
近來,通過回歸分析估算未來現(xiàn)貨價格的
電力系統(tǒng)狀態(tài)的詳細(xì)信息基本僅限于有資質(zhì)的機(jī)構(gòu)。然而,為了確保運(yùn)營的透明度,西班牙傳輸系統(tǒng)運(yùn)營商已經(jīng)啟動了一個信息網(wǎng)站,其中可以通過圖形界面查閱大量的實(shí)時能源相關(guān)數(shù)據(jù)。毫無疑問,這為沒有資格的各方提供了開發(fā)應(yīng)用程序和算法的機(jī)會,這其中價格預(yù)測以及價格是如何確定的信息是必需的。 本文探討了從該界面提取的數(shù)據(jù)的使用,其目的有兩個:以簡單的方式預(yù)測日前價格,以及探索潛在能源驅(qū)動因素對其的影響。對于預(yù)測,作者指定了基于梯度Boosted回歸樹的分位數(shù)回歸模型。它以更復(fù)雜的代價提高了多個線性回歸模型的準(zhǔn)確度,與其他機(jī)器學(xué)習(xí)方法相比,它仍然具有更簡單的規(guī)范準(zhǔn)則。計算指標(biāo)表明,當(dāng)使用中值作為點(diǎn)預(yù)測方法時,該模型產(chǎn)生非常低的預(yù)測誤差(RMSE = 2.78€/ MWh,MAE = 1.94€/ MWh,MAPE = 0.059)。有趣的是,分位數(shù)回歸模型還允許固有的定義預(yù)測區(qū)間,具有不同的準(zhǔn)確度解釋。結(jié)果表明,平均90%的預(yù)測誤差不會超過6.8€/ MWh。
本文還對該模型實(shí)施了部分依賴性分析。這種實(shí)施 - 據(jù)我們所知,第一次用于分析電價的形成 - 已經(jīng)證明在檢測高度非線性關(guān)系方面具有重要意義。
AbstractUntil recently, detailed information on the power system state to estimate future spot prices by regression analysis was generally restricted to qualified parties. However, to ensure transparency inoperation, the Spanish Transmission System Operator has launched an informative web in which a sizable amount of real-time energy-related data can be consulted through a graphical interface. Undoubtedly, this provides the opportunity for non-qualified parties to develop applications and algorithms in which price forecast and maybe knowledge about how price is determined are required.This paper approaches the use of data extracted from that interface with two aims: the prediction of the day-ahead price in a simple way, and the exploration of the influence that the underlying energy drivers have on it. For the prediction we specified a quantile regression model based on Gradient Boosted Regression Trees. It improves the accuracy over multiple linear regression models at the cost of more complexity, and still it has simpler specification and tuning compared to other machine learning approaches. The calculated metrics show that our model produces remarkably low prediction errors when using the median as point prediction method (RMSE?=?2.78?€/MWh, MAE?=?1.94?€/MWh, and MAPE?=?0.059). Interestingly, the quantile regression model also allows to inherently define prediction intervals, with a different interpretation of accuracy. Our results show that on average 90% of times the prediction error will not exceed 6.8?€/MWh.We also implemented a partial dependence analysis on that model. This implementation—as far as we know the first time employed to analyze the formation of electricity prices—has shown to be of significant usefulness in detecting highly non-linear relationships.
KeywordsLinear regressionPrincipal componentsQuantile regressionGradient boosting regressionDay-ahead electricity price
Schematics
Fig. 1. Summary of the methodology.
Fig. 2. Pearson correlation between the variables employed in the GBRT and PCR models.
Fig. 8. Variable importance (only the most important, 20 out of 66, are represented) of the percentile 50 prediction model.
Fig. 10. Partial dependence (indeed the deviation, as in Fig. 9) between non-categorical predictors and the predicted day-ahead price. The four categories of predictors described in Section 2.2 are separately plotted. From top to bottom: forecasts, availability of international links, available dispatchable generation, and power generation at 11.00?a.m.
Fig. 11. Partial dependence between coal-based generation and the day-ahead price,depending on the values ahead of 11.00?a.m.