A neural network enhanced volatility component model

Volatility prediction, a central issue in financial econometrics, attracts increasing attention in the data science literature as advances in computational methods enable us to develop models with great forecasting precision. In this paper, we draw upon both strands of the literature and develop a novel two-component volatility model. The realized volatility is decomposed by a nonparametric filter into long- and short-run components, which are modeled by an artificial neural network and an ARMA process, respectively. We use intraday data on four major exchange rates and a Chinese stock index to construct daily realized volatility and perform out-of-sample evaluation of volatility forecasts generated by our model and well-established alternatives. Empirical results show that our model outperforms alternative models across all statistical metrics and over different forecasting horizons. Furthermore, volatility forecasts from our model offer economic gain to a mean-variance utility investor with higher portfolio returns and Sharpe ratio.


Introduction
Volatility modeling and prediction play a crucial role in asset allocation, portfolio construction and risk management, as accurate volatility forecasts are of huge importance to traders, fund managers, and regulators. Traditional models are able to capture such stylized facts as volatility persistence and clustering, including the Autoregressive Conditional Heteroskedastic (ARCH) model of Engle (1982), the Generalized ARCH model of Bollerslev (1986), and the Autoregressive Fractionally Integrated Moving Average (ARFIMA) model of Granger (1980). Meanwhile, Taylor (1994) and Shephard (1996), among others, model volatility as an unobserved component following some latent stochastic process. These stochastic volatility models are theoretically well-founded and widely implemented.
More recently, volatility component models have attracted growing attention in the literature since the seminal work of Engle and Lee (1999). This formulation breaks the volatility dynamics into two additive components, a short-run and *Corresponding author. Email: xiaoquan.liu@nottingham.edu.cn transitory component and a long-term and persistent one. This parsimonious parameterization is not only able to capture the complicated volatility dynamics but also capable of handling structural breaks in asset return volatility (Wang and Ghysels 2015). Empirically, the two-component models generate more accurate forecasts than one-factor ones (Adrian andRosenberg 2008, Engle et al. 2013). Some component models are applied to range-based volatility measures (Alizadeh et al. 2002, Harris et al. 2011) and asset return correlation (Colacito et al. 2011). Multiplicative component models have also been developed (Engle andRangel 2008, Engle andSokalska 2012).
Despite the empirical success of two-component volatility models, exactly what processes these two components follow is an open question. This leaves a lot of room for innovation. Some studies motivate the model via economic theories: Engle et al. (2013) link macroeconomic fundamentals to stock price volatility and incorporate macroeconomic variables in the long-term component specification, whereas Adrian and Rosenberg (2008) interpret the two components as asset pricing factors for financial constraint and business cycles, respectively, and model them both as a mean-reverting † Launched in April 2005, the CSI 300 index represents the most comprehensive and widely followed index in the Chinese stock market. The index is based on the largest and most liquid A-shares in the Shanghai and Shenzhen Stock Exchanges and re-balanced every 6 months. ‡ In the online appendix, we report statistical comparison between our model and the EGARCH model of Nelson (1991), the ARFIMA model of Granger (1980), and the HAR of Corsi (2009). The results are qualitatively the same. the three models. For the long-term component, the absolute percentage error (APE) is under 1.47% for our neural network enhanced model. However, the APE for the long-term component is between 16% and 25% for the component GARCH, and between 17% and 33% for the Harris et al. (2011) model, representing a massive increase for our model. In other words, our neural network enhance model is much better at capturing the long-term persistence and this is where the difference in the forecasting performance comes from. Hence, the artificial neural network method is the key to achieving prediction precision and underlines our contribution to the literature.
We perform a number of robustness tests along three directions. First, we re-run the forecasting evaluation between the models when volatility is computed as the intraday price range. The range-based volatility measure has received renewed attention recently in the literature as it is shown to be more efficient than the squared return and more robust than realized volatility in handling market microstructure noise (Alizadeh et al. 2002, Brandt andJones 2006). Second, following Patton (2011) and Bollerslev et al. (2016a), we use QLIKE as the loss function when comparing volatility predictions. The ranking of the volatility models remains unchanged and our neural network enhanced model always comes out on top. Third, we use two popular recurrent deep neural network models, namely the long short-term memory (LSTM) model and the gated recurrent unit (GRU) model, to describe the long-term component and obtain qualitatively similar results.
As statistical significance does not necessarily translate into economic gains, we conduct a portfolio exercise to explore the economic value of the volatility forecasts. We assume that a mean-variance utility investor allocates her wealth between the risk-free asset and one of the foreign exchange rates or the stock index. We take historical average returns as returns to the assets and optimize over the weights at different risk aversion levels. Thus the optimal weights as well as the overall portfolio performance are determined entirely by the volatility forecasts. We show that when we use forecasts from our model, the portfolios offer higher annualized returns, higher Sharpe ratio, and higher certainty equivalent return (CER) across a range of risk aversion levels and for all risky assets. This highlights the economic significance of the volatility forecasts generated by our model.
Finally, it is worth mentioning that our empirical results do not suggest that the proposed model would outperform any alternative models in terms of volatility prediction. Instead, our objective is to showcase the value of data science methods in addressing traditional research questions in finance. In our paper, we illustrate this by showing that a statistically motivated component volatility model, when coupled with the state-of-art neural networks for modeling the longterm component, generates improved volatility predictions in comparison to traditional models.
The rest of the paper is structured as follows. In Section 2, we outline our neural network enhanced component model for volatility in detail and introduce the evaluation metrics for the out-of-sample forecasting exercises. Section 3 introduces the data. In Section 4, we discuss empirical results, provide robustness checks, and undertake a portfolio exercise to test the economic value of volatility forecasts. Finally, Section 5 concludes.

The neural network enhanced volatility model
In our neural network enhanced volatility component model, we assume that daily realized volatility follows a twocomponent process specified as follows: where σ t is the realized volatility, L t and S t are the long-and short-run components of σ t , respectively, at time t, c is a constant, ϕ i and θ i are model parameters, and ε t is the random error term with zero mean and constant variance. We assume that L t follows a smooth and non-stationary process but leave its precise dynamics unspecified. The short-term component S t is assumed to follow a stationary ARMA process. We implement this two-component model in three steps. First, we extract the long-term component L t from the realized volatility via the wavelet method. The short-term component can subsequently be obtained as S t = σ t − L t . In Step 2, to describe the dynamics of L t and forecast n-step ahead value L t+n , we apply an autoregressive artificial neural network on L t . The future value of L t+n can thus be predicted via the estimated neural network. In the final step, the shortterm component S t = σ t − L t is modeled by an ARMA(p, q) process. The n-step ahead value of S t+n is obtained via the estimated ARMA(p, q) model. This procedure allows us to forecast the n-step ahead future value of the volatility at time t + n by σ t+n = L t+n + S t+n . These three steps are elaborated below.
Step 1. Volatility decomposition We adopt the wavelet analysis to extract the long-term component L t , which has been successfully applied to different types of raw data, such as option prices (Haven et al. 2012) and exchange rates (Barunik et al. 2016). A key feature of the wavelet transform is that it can decompose any square integrable function into a combination of some scaling function and wavelet functions, each factored by their corresponding approximation coefficients and detail coefficients. Once the original function is decomposed, its detail coefficients can be utilized for de-noising via a hard or soft thresholding (Daubechies 1992).
As discussed in Haven et al. (2012), the choice of the decomposition level is important for the de-noising effect. In figure 1, we provide a comparison of the long-/short-term components by the wavelet transform at decomposition levels 7, 4, and 1 in the top, middle, and bottom panels, respectively. The data are daily realized volatility of EUR/USD from 27 September 2009 to 7 December 2012. We clearly observe that as the decomposition level decreases, the longrun component is less smooth and behaves more like the original volatility series, whereas the short-term component looks more stationary, which is the assumption underlying volatility component models. To determine the appropriate decomposition level, we undertake the Augmented Dickey-Fuller (ADF) test (Fuller 1976) with the null hypothesis that a unit root is present in the short-run component. † We conduct the ADF test at every decomposition level from 7 to 1. Once the null is rejected, the decomposition level is chosen. In figure 1, the null is rejected at level 3 with a significant p-value of 0.01.

Step 2. Modeling the long-run component
In the second step, an artificial autoregressive neural network (ARNN) is applied to the long-term component L t . The ARNN has been applied to the time series modeling and shown to outperform traditional models such as the GARCH, EGARCH, and ARFIMA in volatility forecasting in the computer science literature (Kristjanpoller et al. 2014, Kristjanpoller andMinutolo 2016), especially after the data are deseasonalized (Kristjanpoller and Minutolo 2015). This is particularly true for the three-layer ARNN (Zhang andQi 2005, Patil et al. 2008), as the model exhibits an advantage over recurrent feed-forward neural network with less sensitivity to the problem of long-term dependence (Mustafaraj et al. 2011).
Motivated by this, we utilize an ARNN with three layers to model the long-term component L t . The three layers are, respectively, an input layer that includes lagged L t inputs to the network; a hidden layer with hyperbolic tangential activation functions; and an output layers with a linear activation function. The model assumes the following general form for one-step ahead forecasts (Siegelmann et al. 1997, Mustafaraj et al. 2011: where g[ϕ i (t), θ ARNN ] is the ARNN function; N h is the number of hidden neurons; N u is the number of input variables; W j,u is the weight vector from the hidden neurons to the output layers; w u,i represents the matrix that contains the weight from the external input N u to the hidden neurons N h ; w u,o and W j,o are the biases of hidden layers and the output layer, respectively, which can be interpreted as intercepts that must add up to one; ϕ i (t) is the vector that contains the input variables of the autoregressive part of the neural network; and θ ARNN specifies the parameter vector, which contains all the adjustable parameters of the ARNN including the weights and the biases.
We follow the widely used configuration for the ARNN, where f u is the hyperbolic tangent function and F j is a linear function. Therefore, the forecasting is based on current value of the data at time t as well as the stored data value at previous times, i.e. t − 1, t − 2, . . . We select four input neurons and one hidden layer with 10 neurons to estimate the ARNN following the widely used configurations in Mandic and Chambers (2001), Mustafaraj et al. (2011), andNorgaard et al. (2000). ‡ This is a typical application of † We choose the ADF test because it is widely used in the literature although it is not the most powerful.
‡ As a robustness check, we have used 5 neurons and 15 neurons and obtained qualitatively similar results. These are available from the authors upon request. supervised-learning neural network, where the model parameters are obtained via the modified Levenberg-Marquardt algorithm (Hagan and Menhaj 1994) to map the input variables to the output target variables.
To estimate the ARNN model, we extract the long-term component L t from the realized volatility in the training dataset by Step 1 and use the first 70% of data for training the ARNN model and the remained 30% for the validation process, in which the early-stop regularization is used to avoid overfitting. The training, including the validation, and testing of our proposed model are conducted via the rolling-forward procedure. The training, validation, and testing datasets are re-constructed each time as we move forward.
The one-step ahead forecast of L t+1 is achieved by the current long-term component L t and the lagged values at t − 1, t − 2, and t − 3. The multi-step ahead forecast is achieved by the closed-loop (Norgaard et al. 2000). For example, to forecast L t+2 , we feed the predicted valueL t+1 back to the input of the ARNN model and shift the input vectors as The value of L t−3 is removed from the input vector. Once we obtain the predicted value ofL t+2 , we can forecast the L t+3 by looping theL t+2 to the input vector.
Step 3. Modeling the short-run component In the third step, we estimate an ARMA(p, q) model for the short-run componentŜ t = σ t − L t . We implement the Bayesian Information Criterion (BIC) (Schwarz 1978) to select the lag orders. Following the empirical studies in McQuarrie and Tsai (1998), we start with the ARMA(4,4) and estimate all 4 × 4 = 16 combinations of p = 1, . . . , 4 and q = 1, . . . , 4. To illustrate, for the data used in figure 1, the BIC criteria suggests that p = 2 and q = 2. Hence, the ARMA(2,2) model is chosen and used for generating n-step ahead forecasts for the short-term component. Finally, the nstep ahead forecast of the realized volatility is the sum of the outputs from the ARNN and ARMA models according to equation (1).

Forecast evaluation
We estimate and compare the forecasting performance of three models: (1) our neural network enhanced model (Hybrid); (2) the component GARCH model of Engle and Lee (1999) (EL); and (3) the cyclical model of Harris et al. (2011) (HSY). We select these two models because they are popular members in the component volatility family and directly comparable to ours. In the online appendix, we also compare volatility forecasts generated by the EGARCH of Nelson (1991), the ARFIMA model of Granger (1980), and the HAR model of Corsi (2009), as they all show strong empirical performance in the literature.
We adopt the following metrics to evaluate the forecasting performance of the volatility models.

The root mean squared forecast error (RMSFE)
The RMSFE compares the forecasted volatility from a given model with the true volatility proxy and is computed as follows: where R is the number of observations, and σ (τ 1 , τ 2 ) and σ (τ 1 , τ 2 ) are the true volatility proxy and volatility forecasts, respectively.
The Diebold and Mariano (1995) statistic We implement the pairwise comparison of Diebold and Mariano (1995) and test whether differences in RMSFE between two models are statistically significant. The test is defined as follows: If d i,j is significantly greater than zero, then model j is preferred to model i, and vice versa.

The Superior Predictive Ability (SPA) test of Hansen (2005)
To address the multiple-testing problem in the light of data mining, we conduct the SPA test of Hansen (2005). The null hypothesis states that the benchmark model is not inferior to any of the alternative models. A rejection of the null indicates that at least one competing model produces forecasts more accurate than the benchmark, which is the model with the lowest RMSFE. We report the stationary bootstrapped p-values obtained with 1000 replications for inference.
The interval forecast evaluation of Christoffersen (1998) Christoffersen (1998 proposes metrics for evaluating the adequacy of the risk management measure Value-at-Risk (VaR). Since the VaR critically hinges upon volatility forecasts, the metrics can be readily implemented on volatility forecasts. The intuition is that the intervals around point volatility prediction should be narrow in tranquil times and wide in volatile times so that occurrences of volatility forecasts outside a pre-specified interval should be small and spread out over the sample and not come in clusters (Engle 1982).
Based upon this intuition, let B t|t−1 (m) and U t|t−1 (m) denote the lower and upper limits of the ex ante interval forecast for time t made at time t − 1 for the coverage probability m. Christoffersen (1998) defines the indicator variable I t as follows: where {y t } T t=1 is a path of the time series y t . The paper proves that a sequence of interval forecasts where L(m; ·) and L(π; ·) are the likelihood under the null and alternative hypotheses, respectively, andπ = n 1 /(n 0 + n 1 ) is the maximum likelihood estimate of π . The intuition for the independence test is that the zeros and ones should appear as a random process not in a timedependent manner. Hence, independence is tested against the alternative of a first-order Markov process as follows: whereˆ 1 = ⎡ ⎢ ⎣ n 00 n 00 + n 01 n 01 n 00 + n 01 n 10 n 10 + n 11 n 11 n 10 + n 11 ⎤ ⎥ ⎦ , 2 = n 01 + n 11 n 00 + n 10 + n 01 + n 11 , and n ij is the number of observations with value i followed by j.
The tests for unconditional coverage and independence can be combined to form a complete test for conditional coverage that considers both correct coverage and the random sequence of occurrences as follows: To summarize, the interval forecast evaluation captures the probability of unusually frequent consecutive exceedances thus offering additional insight into volatility forecasting violation and clustering. For example, if the probability of correct forecasting results over 100 days is 95%, there should be less than 5 instances of consecutive exceedances. A low probability of the likelihood ratio tests implies repeated error and suggests model misspecification (Christoffersen and Pelletier 2004).

Data
For exchange rates, our data are 5-minute data for EUR/USD, GBP/EUR, GBP/JPY, and GBP/USD from 27 September 2009 to 12 August 2015 with a total of 2.16 million observations over 2145 days. Data from 27 September 2009 to 7 December 2012 are used for the in-sample estimation and the rest for out-of-sample forecasts. For the CSI 300 index, the data are 5-minute index levels from 1 August 2005 to 29 September 2017 with 144,667 observations. The in-sample period is from 1 August 2005 to 31 July 2008 and the rest is for out-of-sample forecasting exercises. We aggregate intraday squared returns to obtain daily realized volatility as the proxy of the latent true volatility process following Andersen and Bollerslev (1998): where σ 2 t is the daily realized variance on day t, and r 2 t,n is the squared logarithmic return on day t for interval n (n =  1, 2, . . . , N). Table 1 summarizes descriptive statistics for daily returns for the full sample. Panel A reports the mean, standard deviation, skewness and excess kurtosis while Panel B tabulates the autocorrelation coefficients and the Ljung and Box (1978) statistic for autocorrelation for the first five lags. The p-values for the Ljung-Box statistic are reported in parentheses. In the Ljung-Box test, the null hypothesis of no autocorrelation is strongly rejected. In figure 2, we plot the autocorrelation of daily realized volatility for up to 30 lags for the assets.

Empirical analyses
In table 2, we summarize the RMSFE of realized volatility for the Hybrid, HSY, and EL models over six forecast horizons for all assets. We observe that in this comparison of point estimates, the Hybrid model shows a very strong performance and dominates the other models in producing the smallest RMSFE, usually by a big margin, except for two occasions whereby the HSY is the best-performing model for GBP/EUR over the shortest forecasting horizon and for the CSI index for the horizon between 1 and 100 days. Our results are consistent with evidence in the literature that the artificial neural network is able to produce significantly more accurate forecast when applied to deseasonalized data (Nelson et al. 1999, Zhang andQi 2005).
We conduct the Diebold and Mariano (1995) pairwise comparison between the forecasting difference of alternative volatility models and report the heteroscedasticity-and serial correlation-adjusted t-statistics in table 3. A positive t-statistic indicates that the model in the row is preferred to the model in the column while a negative t-statistic suggests that the model in the column is preferred. We note that between the Hybrid and HSY models, the Hybrid model is always preferred with significant t-statistic. It is often preferred when compared with  This table reports the heteroscedasticity-and serial correlation-adjusted t-statistics for the Diebold and Mariano (1995) pairwise comparison for out-of-sample forecasts between volatility models. A positive t-statistic indicates that the model in the row is preferred to the one in the column and a negative t-statistic indicates that the model in the column is preferred. The out-of-sample period is from 8 December 2012 to 15 August 2015 for the exchange rates and from 1 August 2008 to 29 September 2017 for the CSI 300 index. The forecast horizon is from day t + τ 1 to day t + τ 2 .
the EL model, especially over longer horizons. The forecasting performances of the HSY and the EL models are similar in statistical terms. We implement the superior predicative ability (SPA) test of Hansen (2005) and tabulate the stationary bootstrapped p-values, obtained via 1000 replications, in table 4. † The null hypothesis that the benchmark model is not inferior to any of † We aggregate volatility forecasts over all six horizons for each asset in order to provide a comprehensive and clear picture. the competing models is resoundingly accepted with a high p-value when the Hybrid is the benchmark model. † In table 5, we undertake the likelihood ratio tests formulated in Christoffersen (1998) and report the p-values for the null hypotheses that at the 5% level the volatility forecasts exhibit the correct unconditional coverage LR uc , are independent LR ind , and exhibit the correct conditional coverage LR cc , respectively. Not surprisingly, the Hybrid model passes the test with flying colors and exhibits the highest p-values between 0.62 and 0.87. This once again attests to the accuracy and adequacy of the volatility forecasts of our proposed model. For the HSY and EL models, the null is accepted with much lower p-values typically between 0.30 and 0.53.
To summarize, in our baseline analysis we conduct a number of statistical tests to evaluate the out-of-sample performance of our neural network enhanced model and alternative two-component models. Our proposed model substantially outperforms the others in these econometric tests.

Forecasting error analysis
What drives the superior forecasting performance of our proposed model? To obtain a better understanding of the differences in the empirical performance between the models, we break down and analyze the forecasting errors for each component across the three models.
In figure 3, we plot in Panels (a) and (b) the average absolute percentage error (APE) of the short-and long-term components, respectively, for EUR/USD for the three models. In Panel (a), the Hybrid model exhibits the smallest forecasting error for the short-term component, followed by the EL model, and the HSY on average offers the largest forecasting errors. Nevertheless, the errors for the short-term component are not too different from each other across the models † We also conduct the Mincer and Zarnowitz (1969) regression, which shows the ability of the forecasted volatility in explaining the true volatility proxy. We find that the R 2 decreases massively with increasing forecast horizons for all models as expected. However, our proposed Hybrid model still shows 30% to 70% explanatory power for the longest forecast horizon and comes out stronger than the competing models. These results are available upon request from the authors. and typically lie between 10% and 35%. However, Panel (b) shows a very different pattern. Over the two shortest horizons of up to 20 days, all models produce quite accurate forecasts and the APE is less than 1%. But as we move to longer horizons, we see massive difference between the APE for the long-term component: for the Hybrid model the APE is less than 2%, whereas for the EL and the HSY, it goes up to 23% and 28%, respectively. So it is the precision of the long-term component that drives the distinct performance of the models and this underlines the prowess of the neural network in capturing the trend component in time series of data. ‡ In table 6, we report the average APE in percent for all test assets. The same pattern emerges that our neural network enhanced model shows extraordinary ability in capturing the trend and producing average APE of 1.47% at most even for the 16-month horizon. For the other two models, the average APE is somewhere between 15% and 32% for the long-term component over the longest forecasting horizon.

Alternative volatility proxy
We perform a number of robustness tests to show that our baseline results are not due to specific modeling choices. First, instead of using realized volatility we adopt the intraday range-based measure for volatility modeling and as the proxy for true volatility. The range-based measure is proven robust to microstructure noise and receives renewed interest in the literature in recent years (Alizadeh et al. 2002, Engle ‡    and Gallo 2006). It is specified as follows: where P H t and P L t are the highest and lowest prices on day t, respectively. Table A1 in the online appendix reports the summary statistics for the daily range-based volatility for the full sample. In

Alternative loss function for volatility prediction accuracy
Our second robustness check is to adopt the loss function, QLIKE, between the true and forecasted volatility. It is defined as follows: whereσ 2 and σ 2 are variance forecasts and the true variance proxy, respectively. This particular loss function, robust when the proxy is unbiased but imperfect, is recommended in Patton (2011) andBollerslev et al. (2016b). In table 8, we report the heteroscedasticity-and serial correlation-adjusted t-statistic for the Diebold and Mariano (1995) pairwise comparison with QLIKE as the loss function. We find that the Hybrid model is always preferred to the HSY and EL models.  This table reports the heteroscedasticity-and serial correlation-adjusted t-statistics for the Diebold and Mariano (1995) pairwise comparison for out-of-sample forecasts when the long-term component is modeled by the ARNN (Hybrid), the LSTM (Hybrid-LSTM), or the GRU (Hybrid-GRU). A positive t-statistic indicates that the model in the row is preferred to that in the column and a negative t-statistic indicates that the model in the column is preferred. The out-of-sample period is from 8 December 2012 to 15 August 2015 for the exchange rates and from 1 August 2008 to 29 September 2017 for the CSI 300 index. The forecast horizon is from day t + τ 1 to day t + τ 2 .
and Mariano (1995) pairwise comparison results are reported in table 10. Consistent with our conjecture, performance of these three hybrid models is statistically comparable as the t-statistic is always insignificant.
To summarize, the empirical results in the robustness tests further corroborate the baseline findings that the neural network enhanced volatility model outperforms the two-component models of Harris et al. (2011) and Engle and Lee (1999) in producing more accurate volatility predictions over different horizons for all test assets. † This attests to the validity of proposed framework that neural network enhanced component volatility models generate more precise volatility predictions.

Economic value of volatility forecasts
A strong statistical performance does not necessarily indicate superior economic significance in the out-of-sample exercise. Therefore we analyze the economic value of volatility forecasts assuming a mean-variance utility investor who allocates her wealth between a risk-free asset and one of the four exchange rates or the stock index. We follow Wang et al. (2016) to construct the utility function as follows: where w t is the weight of the risky asset in the portfolio, r t is the return to the risky asset in excess of the risk-free rate, r t,f , and γ denotes the level of risk aversion. We maximize the † We have conducted yet another robustness check by adopting the low-pass Hodrick and Prescott (1997) filter and see whether the performance of the Hybrid model is due to our choice of the particular decomposition method, i.e. the wavelet method. We obtain qualitatively similar results that the model with the Hodrick and Prescott (1997) filtering, with the long-/short-term component still modeled via the artificial neural network and the ARMA process, respectively, outperforms the HSY and EL models. The results are available from the authors upon request. utility function U t (r t ) with respect to the weight w t and obtain the ex ante optimal weight on day t + 1: wherer t+1 andσ 2 t+1 are the mean and volatility forecasts, respectively, of the excess returns. In our study, risk-free rates come from the 3-month US Treasury bill or the 3-month Chinese national bond yield.
Following Rapach et al. (2010) and Wang et al. (2016), we use the historical average as the mean forecasts for returns, r t+1 = t j=1 r j . Hence, for each level of risk aversion γ , the optimal weightŵ t = (1/γ )(r t+1 /σ 2 t+1 ) of the portfolio is only determined by the volatility forecasts as different strategies share the same mean forecasts of returns. We use the Sharpe ratio (SR): and the certainty equivalent return (CER): to evaluate the performance of a portfolio, whereμ p and σ p are the mean and standard deviation of portfolio excess returns, respectively;μ p andσ 2 p are the mean and variance of portfolio returns in the out-of-sample period, respectively. For robustness, we adopt γ = 3, 6, and 9 to represent different level of risk aversion.
In table 11, we report the annualized average return, the Sharpe ratio, and the CER of the portfolios constructed using realized volatility from different models. When the investor is assumed to have a relatively low level of risk aversion at γ = 3, portfolios are able to achieve an annual return of 6.63% when volatility forecasts are generated by the Hybrid model for EUR/USD, and the Sharpe ratio is 0.37, while the certainty equivalent return is slightly higher than 3%. These This table reports the annualized excess return, the Sharpe ratio (SR), and the certainty equivalent return (CER) of portfolios constructed from one of the exchange rates or the stock index. Volatility forecasts are generated from different component models, and γ represents investor risk aversion level.
are evidently higher than when the forecasts are generated by the competing models. As the investor becomes more risk averse with larger γ value, she assigns a greater weight to the risk-free asset. This results in lower portfolio return, Sharpe ratio, and the CER. Nevertheless, the same pattern exists that the portfolios formed using the Hybrid volatility forecasts offer higher return and risk-adjusted return, substantiating the economic value of the volatility forecasts from our proposed model. Meanwhile, given the inextricable link between the macroeconomy and financial market volatility, especially the long-term component of the volatility dynamics (see Bloom 2009, Engle et al. 2013, Conrad et al. 2014, Chiu et al. 2018, for example), we perform a vector autoregression to explore how the macroeconomic conditions impact on the persistent component of exchange rates. In particular, we examine the impulse response of the long-term component of EURUSD and GBPUSD to changes in two important US macroeconomic variables: the GDP growth and the Federal fund rate. We find that for both exchange rates, the longterm volatility trends down given a disturbance to the GDP growth over the next 5 months but picks up in the medium run, whereas a shock to the Federate fund rate increases the long-term component of volatility over the next 5 to 10 months. †

Conclusion
The evidence that conditional volatility dynamics comprises both a long-term trend component and a strongly oscillating short-run component has crucial implication for volatility forecasting over both short and long horizons. In this † These results are not reported to conserve space. They are available upon request from the authors. paper, we build upon and extend the two-component volatility literature and develop a novel neural network enhanced volatility component model. We first decompose the daily realized volatility into long-and short-run components using the wavelet transform while implementing the ADF test to choose appropriate parameters and ensuring the stationarity assumption for the short-run component is satisfied. We then separately model the long-and short-run components with the artificial neural network and an ARMA model, respectively. The model is empirically evaluated using data on four exchange rates and a stock index over a number of forecast horizons. We compare the RMSFE and perform statistical evaluations such as the Diebold and Mariano (1995) test, the SPA test of Hansen (2005), and the interval forecast evaluation of Christoffersen (1998). In the out-of-sample comparison of volatility predictions generated by our proposed model and popular alternative models, we provide strong and robust evidence that our neural network enhanced model significantly outperforms the competing models. In economic terms, the volatility forecasts from the Hybrid model offers improved economic value to a mean-variance utility investor.