Option valuation under no-arbitrage constraints with neural networks

,


Introduction
Option valuation is a much devoted area in the financial economics literature. Since the seminal work of Black and Scholes (1973), many pricing models have been developed that extend restrictive assumptions of the Black and Scholes (BS) model and advance our understanding of options and the market in which they are traded. Well-known parametric models include the stochastic volatility model of Heston (1993), the jump diffusion models of Merton (1976), the stochastic volatility with jump model of Bates (1996) and Bakshi et al. (1997), and the double-jump model of Eraker et al. (2003). In parallel, nonparametric specifications such as the GARCH option pricing of Duan (1995), the spline based method of Bliss and Panigirtzoglou (2002), and quadratic approximation of Jackwerth and Rubinstein (1996) are formulated. From different perspectives, these models try to accommodate the stylized volatility smile with different levels of empirical success.
Option valuation is also a topical issue in the data science literature as sophisticated numerical methods make it possible to achieve great pricing and forecasting precision (see Dugas et al., 2001;Garcia and Gencay, 2000;Gradojevic et al., 2009;Liu et al., 2019, for example). As a popular data science method, the neural network is gaining popularity and increasingly being adopted in option valuation and hedging since the early effort of Hutchinson et al. (1994) (see Buehler et al., 2019;Culkin and Das, 2017;Dugas et al., 2009;Liang et al., 2009;Yao et al., 2000, for example). Ruf and Wang (2020) offer an excellent and updated literature review.
Motivated by the above strands of the literature, in this paper we propose an economically meaningful hybrid gated neural network (hGNN) based option valuation model. We start from both the neural network architecture and six necessary and sufficient no-arbitrage constraints in the option valuation theory. We select the softplus function as the activation function in hidden neurons and construct a multiplicative structure to maintain its differentiability. Meanwhile, the slope and weights of the linear function in input layers must satisfy the no-arbitrage constraints. We finalize the structure with a pre-processing module before the input layer. This pre-processing module also meets economic constraints and is added to enhance the input-output mapping capability.
Furthermore, we build a separate neural network for modeling and predicting latent option-implied volatilities, an essential input in option pricing models. Hence, our model is called a hybrid gated neural network (hGNN) based model.
The model design contributes to the financial economics literature as well as the data science literature. Our first contribution is that by taking into account the essential no-arbitrage con-straints, our model attaches economic meaning to the neural network architecture. Quite often, the data science methods start with specifically designed architectures, i.e., a number of neuron layers or infinite dimensional hyperplane, and consider option pricing as a nonlinear and complex regression problem. The mapping relationship between the input and the output is learned from a large amount of data. These models usually aim at building complex neural network algorithms to enhance the learning capability for improved mapping performance (Hutchinson et al., 1994;Gradojevic et al., 2009). Hence, they are often compared to a black box, indicating that the economic interpretability and internal functionality of these models are opaque to most users (Knight, 2017a,b;McNelis, 2005).
More recently, a burgeoning theme emerges in this area as studies implement an economic hint or select particular activation functions to satisfy economic constraints, thus offering economic interpretability (see Gilpin et al., 2018;Guidotti et al., 2018;Mudrakarta et al., 2018;Ribeiro et al., 2016;Yang et al., 2017, for example). Our paper builds upon and extends this strand of the literature as we try to introduce economic intuition into neural network models, a traditionally technical and data-driven approach. As an important aspect of this contribution, our proposed hGNN model allows us to obtain analytical expressions for European option Greeks, and we believe our paper is the first to do so. These analytical Greek letters would facilitate option traders to better design and implement trading and hedging strategies and underscore the practicality of our model.
We also contribute to the literature by offering comprehensive empirical evidence that our hGNN model outperforms popular neutral network based models as well as economically motivated models such as those featuring stochastic volatility and jumps in underlying asset returns. They include a deep neural network (dNN) model, the best-performing specification in Andreou et al. (2008Andreou et al. ( , 2010 (AnNN), a stochastic volatility model, a stochastic volatility model with stochastic interest, and a stochastic volatility model with jumps. These three stochastic volatility based models come from the seminal work of Bakshi et al. (1997). Our sample includes more than 2 million S&P 500 options over 952 trading days from 22 May, 2014, to 2 March, 2018. We implement a rolling scheme to produce 7-and 30-day ahead option price forecasts, which are evaluated by two loss functions.
Furthermore, we perform a delta hedging exercise based on 7-and 30-day re-balancing frequencies.
Our empirical results show that the hGNN model performs better in generating more precise option price forecasts and smaller errors in the hedging exercise. This is the case regardless of forecasting horizons, put or call option type, option moneyness, time to maturity, or the loss function.
What drives this superior performance? To address this question, we form two groups of options.
The first group includes the most mispriced options according to the dNN model, and the second group contains a randomly selected sample of the remaining options. We observe that options in Group 1, which the dNN model has a hard time predicting, tend to be short-dated ones with extreme moneyness. Our hGNN model is constructed based on no-arbitrage conditions, including option boundary conditions, and is trained by synthetic prices for thinly traded options, it thus substantially outperforms the other two neural network based models in predicting options in Group 1. This underscores the empirical prowess of our model. The rest of the paper is structured as follows. In Section 2, we provide the methodological and empirical considerations that motivate our specific model design. Section 3 discusses the model structure, introduces no-arbitrage constraints in the option pricing literature, proves that our model satisfies these constraints, and constructs a separate neural network for modeling option-implied volatilities. In Section 4, we derive analytical expressions for European option Greeks based on the hGNN model, and outline a delta-hedging strategy. Section 5 conducts empirical analyses and robustness tests using S&P 500 options. Finally, Section 6 concludes.

Design motivation
The architecture and algorithm of our hGNN model are motivated by the following considerations.
1. Empirically, Gradojevic et al. (2009) show that neural network based models usually perform poorly for deep OTM and very long-and/or short-dated options. The paper addresses this issue by grouping options according to their moneyness and maturity, and constructing separate models for each group. Although this leads to improved empirical performance, the grouping is static and cannot adapt to changing market conditions. Furthermore, the paper does not cover very deep out-of-money call options and the algorithm is computationally cumbersome. More recently, Yang et al. (2017) design an architecture of neural networks with selected no-arbitrage theories for European call options but fail to consider European put options or important boundary conditions. This leaves a gap in the literature.
2. As a key input for option valuation, the volatility is a latent variable that needs to be proxied.
In an unreported empirical examination, we follow the hybrid neural network model in Andreou et al. (2008) and find that using the BS implied volatilities instead of realized volatilities as the volatility proxy significantly improves option pricing accuracy. This motivates us to build a separate structure for predicting option-implied volatilities.
3. To the best of our knowledge, analytic expressions for European option Greeks, i.e., the partial derivatives of option prices with respect to underlying asset price, volatility, strike price, and time to maturity, have not been derived for neural network based models. Hence the black box criticism towards this type of models (Knight, 2017a,b).
4. Most importantly, the economic interpretation of option pricing models represents the biggest gap between theory motivated models in the finance literature and data science based models.
The latter may fit the data very well due to their data-driven and data-intensive nature but falls short of offering economic intuition. Our paper represents a step towards bridging this gap.

Model construction
In this section, we first briefly outline the non-arbitrage constraints for European options. We then develop our hGNN model and show that it fully satisfies the constraints. Finally, a separate neural network is constructed for modeling and predicting implied volatilities.

No-arbitrage constraints
In the financial economics theory, there exists a risk-neutral probability measure Q under which the discounted asset price is a martingale (Delbaen and Schachermayer, 1994;Cochrane, 2001). This allows us to write the call option price as follows: Likewise for put options we have the following: where K is the strike price, S t is the underlying asset price at time t, T is the expiry date, τ = T − t is time to maturity, σ t is the volatility at time t, r t is the risk-free interest rate, and C and P are the call and put option prices, respectively.
We follow Theorem 2.1 in Roper (2010) and the call option surface constraints in Fengler and Hin (2015) to estimate a continuous function of option prices. Thus, option prices C and P and the variables (K, S t , τ, σ t , r) observed at time t are subject to the following constraints: (c1) Convexity in K Both C and P are convex across K for τ ≥ 0. C is monotonically non-increasing with K, whereas P is monotonically non-decreasing with K. Hence, ∂C ∂K ≤ 0 and ∂P ∂K ≥ 0.

Model design under constraints
Existing neural network based option valuation models usually adopt the traditional three-layer architecture: an input layer with N input variables, a hidden layer with H neurons, and an output layer with a single neuron. Each hidden neuron includes a certain type of activation function: either a sigmoid function as in Gradojevic et al. (2009), or a hyperbolic tangent function as in Andreou et al. (2008Andreou et al. ( , 2010. These models achieve nice empirical performance by utilizing the input-output mapping capability of a neural network, or a stack of neural networks, but pay little attention to the no-arbitrage constraints central to option valuation. Therefore, they perform poorly for options with extreme moneyness and long-/short-dated maturity. Our proposed model improves upon these. It also has a three-layered structure but before the input layer, we add a division module with three inputs (K, S t , Ψ) and one output of option moneyness: where Ψ is a call/put indicator: it is 1 for call options and -1 for put options. This design accommodates both call and put options with m = K/S t and m = S t /K, respectively. The input layer has N i = 4 input variables (m, τ, σ t , r), which are called features in the data science literature.
Contrary to Andreou et al. (2010) and Das and Padhy (2017), we use these variables as the input of the Black-Scholes model for two reasons. First, our model aims to solve the canonical option valuation problem and our approach is the same as those in the traditional finance literature. Second, using these variables as the input makes it possible for our proposed model to provide analytical expressions for the Greeks.
The hidden layer of our model consists N h neurons, and each neuron contains a gated network architecture activation function following the architecture in Memisevic (2013) and Sigaud et al. (2015). This gated structure is an extension of the deep learning building block in Bengio (2013) and LeCun et al. (2015). It is particularly well-suited for multiplicative interactions between the input and output as in our case, and selected to maintain the model's first-and second-order differentiability so that it satisfies no-arbitrage constraints (c1)-(c6). Finally, the output layer contains an additive linear function with one output variable called the target.
Based on the above considerations, our proposed GNN option valuation model y(m, τ, σ t , r), illustrated in Figure 1, is expressed as follows: where σ + () is the softplus function σ + (x) = log (1 + e x ). The weights (w m j , w τ j , w r j , w σt j ) and biases (b m j , b τ j , b r j , b σt j ) are parameters to be estimated. The + and − in (b r j ± re w r j ) are for call and put options, respectively. The sign in each σ + () function is designed according to specific constraints.
In the architecture of Figure 1, the model has four input variables: moneyness m, time to maturity τ , volatility σ t , and interest rate r. Each input variable is directly connected to N h numbers of softplus activation functions, and their outputs are directly connected to N h multiplication gates.
Thus, each multiplication gate has inputs from four softplus activation functions corresponding to four input variables. The outputs of N h multiplication gates are aggregated into an addition gate for generating the final output. Hence, the output y(m, τ, σ t , r) can be expressed in Eq.(4) as the  Our proposed model exhibits two clear advantages compared with existing models in the literature. First and very importantly, from the option pricing perspective, our model integrates the no-arbitrage constraints as the prior to support the logic of the option valuation. Hence, the model goes beyond a large scaled connection of the neurons and is able to reflect option pricing theories.

Proof: Constraint (c1)
The derivative of a softplus function σ + (x) can be obtained as follows: The function 1 1+e −x is called the sigmoid, which can also be used as an activation function. We represent it as σ s = 1 1+e −x thus σ + (x) = σ s (x). In this way, constraint (c1) can be written as follows: Hence, ∂y ∂m ≤ 0. Consider the definition of moneyness, we have the following for call options: Likewise for put options: Similarly, we can express the constraint (c2) for call and put options as follows: Thus, lim K→∞ C = 0 for calls and lim K→0 P = 0 for puts.

Proof: Constraint (c6)
Based on the proof for constraint (c1), we have the following: where σ s (x) = σ s (x)(1 − σ s (x)) ≥ 0, thus ∂ 2 y ∂m 2 ≥ 0 and we have the following: For call options, ∂ 2 m ∂K 2 = 0, thus: Likewise for put options: Since dividing a positive constant on both sides of an equation does not change the sign of the equation, we divide St K 3 on both side of equation (14) and let F(y, m) = m ∂ 2 y ∂m 2 + 2 ∂y ∂m for K > 0 and S t > 0. To determine the value of F(y, m), we approximate it by the second-order Taylor expansion and obtain the following: Finally, to approximate the value of F(y, m), we solve two equations m−a = 2 and m = (m−a) 2 2 and obtain a = 0 and m = 2. Therefore, F(y, m) ≈ y(2) ≥ 0. This completes the proof of constraints (c1), (c2), (c3) and (c6).
The output layer contains one neuron for the final estimated option priceĈ andP as follows: where y(m, τ, σ t , r) is the output value of the GNN.
To summarize, our GNN is able to map four features, i.e., the moneyness m, maturity time τ , volatility σ t , and the interest rate r, to the target option price y(m, τ, σ t , r), while ensuring that important no-arbitrage constraints are satisfied.

Boundary conditions
Usually the output bound of regression-type applications of a neural network is achieved by scaling or normalizing the target variables rather than modifying the model structure, as the structure is determined by data. The downside of this data-driven approach is manifestoed in the poor pricing performance for deep OTM and extreme short-or long-dated options (Andreou et al., 2008(Andreou et al., , 2010Gradojevic et al., 2009). This underlines the importance of satisfying the option boundary constraints (c4) and (c5).
The options on the boundary are those with strike prices approaching zero, S t , or infinite, and those very close to maturity. These options are thinly traded in the market, and this lack of data undermines data science models. To cope with this, we synthesize prices for these options based on available market data and use synthesized prices as hints for our model. This useful approach is first developed in Abu- Mostafa (1993Mostafa ( , 1994 and becomes a popular approach to compensating an imbalanced dataset in the literature (Barua et al., 2014;Chawla et al., 2002;Galar et al., 2012;Khoshgoftaar et al., 2011). Abu-Mostafa (1995), Cao et al. (2015), Cao et al. (2016), and Garcia and Gencay (2000), in particular, implement the method with financial data.
For every τ , we synthesize virtual option prices with K = 0 and for every K, we synthesize virtual option prices with τ = 0. For ATM options with K = S t , we synthesize virtual calls that are slightly ITM, and obtain put prices via the put-call parity. This follows Song and Xiu (2016), which compensate low trading volume of ITM calls by obtaining call prices from OTM puts.
Options with K = ∞ do not exist. We synthesize prices for these options using the Black-Scholes model with an almost zero call option price. We take advantage of the precision of modern computers and take 2 −126 , the number closest to zero under the single-precision floating-point format (IS Committee, 2008), as the option price for iteratively calculating the strike price. For example, to synthesize a call option with τ = 15, we obtain strike price K by solving 2 −126 = BS (K, S t = 4973.07, τ = 15, r = 0.02, σ t = 0.296), where the implied volatility σ t = 0.296 corresponds to strike price K = 5700. Due to the convexity constraint, we solve this by the traditional Newton Raphson method, and obtain K = 11087.52 and C = 2.003 × 10 −39 . Likewise, for each τ , we synthesize a call option with almost zero price and infinite strike price. All alternative models in this paper are trained with the same dataset that includes market data and the prices for these synthesized options on the boundary.
To train our hGNN model, we use a powerful stochastic gradient descent optimization algorithm in Kingma and Ba (2017) with the mean absolute percentage error (MAPE) as the loss function.
This R package GradDescent is obtained from Wijaya et al. (2018). We follow the empirical study of Zhou et al. (2016) and set the optimal values of number of hidden neurons N h = 100, the number of epochs to be 100, batch size as 100, and learning rate as 10 −3 (see also Cheng et al., 2020;Yu et al., 2019).

Modeling implied volatilities
Volatility is an essential input to option pricing models. In the literature, modeling optionimplied volatilities follows two main directions. For deterministic volatility functions (DVF), implied volatilities are usually expressed as a function of option moneyness, time to maturity, and lagged implied volatilities (Andreou et al., 2010(Andreou et al., , 2014Chalamandaris and Tsekrekos, 2014). Meanwhile, Dunis et al. (2013) and Konstantinidi et al. (2008) incorporate economic variables, such as the yield curve slope, interest rate, and stock index returns, in a vector autoregression (VAR) to obtain volatility forecasts. In these studies, the determinants m, τ and IV m,τ t−l are either regressed linearly by AR or VAR models (Dunis et al., 2013), or nonlinearly as m 2 , τ 2 , or mτ enter the DVF.
The implicit assumption is that implied volatilities are well described by these determinants via a pre-defined relation.
We extend these studies in two ways. First, data-wise, we implement the principal component analysis (PCA) to extract the most relevant information from the determinants while reducing the computational complexity; second, methodologically, we adopt a deep neural network for a data-defined nonlinear model with two modules. We assume that implied volatilities IV m,τ t are determined by moneyness m, time to maturity τ , and lagged volatilities IV m,τ t−l , l ∈ 1, ..., L, where L = 10 is considered the optimal VAR lag in Dunis et al. (2013). Hence, we construct a vector of of all available m i and τ j (i = 1, . . . , I and j = 1, . . . , J) on day t to construct a matrix of input data for day t, and the final data consist N such matrices from day t to t − N + 1.
Our implied volatility model includes two modules. For the first module, we implement the PCA to extract five most important components, PC p , p = 1, . . . , 5, that are able to explain most of the variation in volatility (Kolanovic and Krishnamachari, 2017). The corresponding principal component scores PS p , p = 1, . . . , 5 are used to predict implied volatilities. The second module is a multi-layer feed-forward artificial neural network, also termed the deep neural network (Fischer and Krauss, 2018;Krauss et al., 2017). It has one input layer with five neurons corresponding to five principal components, two hidden layers with 30 and 10 neurons, respectively, and an output layer with a linear transfer function. Each neuron in the hidden layers contains a hyperbolic tangent sigmoid activation function. To control overfitting, we perform regularization by an input-dropout ratio of 0.2 and a hidden dropout ratio of 0.5 (Hinton et al., 2012;H2O, 2018). We train this model with 200 epochs, which indicate the number of passes to carry out over the training dataset.
To forecast one-day ahead implied volatilities on day t+1 from m i and τ j , we construct the T , and multiply it with the [12 × 5] principal component coefficient matrix to obtain the pricinpal component scores PS t+1 = X To predict N -day ahead implied volatilities, a daily rolling scheme is performed. We first predict implied volatilities on day t+2 with data from t−8 to t+1, whereby volatilities on day t+1 are forecasted; we then predict implied volatilities on day t+3 from data from t−7 to t+2, whereby volatilities on day t + 1 and day t + 2 are forecasted; and so forth. If N > 10, the forecasts are based entirely on forecasted implied volatilities. Although this rolling scheme potentially deteriorates the precision of long-term volatility forecasts, our empirical results exhibit strong performance.
To summarize, we first obtain option-implied volatility IV m,τ t for day t from moneyness m, option time to maturity τ , the interest rate r, and implied volatilities IV m,τ t−n , n = 1, ..., 10 from the previous 10 days. Once we know IV m,τ t , it is used together with m, τ , r to forecast option prices on day t.

Option Greeks
A main motivation for this paper is to develop the hGNN model in such a way that it offers analytical expressions for European option Greeks. This allows the model to lend itself readily in hedging strategies, and represents a key contribution of our paper. Given the structure of our model in equation (4), we first derive the option Greeks in this section. We then outline how they are used in hedging exercises.

Analytical option Greeks
Option ∆ European call option ∆ is expressed as follows: where m = K St , σ + (x) = log (1 + e x ), and σ s = 1 1+e −x . Likewise, the ∆ for European put option can be written as follows: Option ν European option νĈ is given as follows: Option Θ The option ΘĈ can be expressed as follows: Option ρ The ρĈ of European call option can be written as follows: Likewise for put options:

Hedging exercises
We are interested in the hedging performance of our hGNN model as options are an essential risk management tool for investors. We implement a conventional delta-neutral hedge following Bakshi et al. (1997): For the SV model, we hedge both the price and volatility risks with positions in the underlying asset and in a second option contract; for the SVSI model, we involve a bond for hedging the interest rate risk in addition to the hedging strategy of the SV model; for the SVJ model, due to the difficulty associated with stochastic jump sizes (Bates, 1996;Merton, 1976), we implement a partial hedge for which only the diffusion risks are neutralized but the jump risk is unhedged. For the hGNN model, we hedge the risks in the underlying price, volatility, and interest rate with positions in the underlying asset, a second option contract, and a bond.
Suppose we sell one call option with time to maturity τ and strike price K. We need to hedge the price risk with a position in V S,t shares of the underlying asset, the interest rate risk with a position in V B,t units of τ -period discount bond, and the volatility risk with a position in V C,t units of a second call option with the same maturity τ but different strike priceK. The overall portfolio value at time t can be expressed as V 0,t + V S,t S t + V B,t B t,τ + V C,t C t,τ,K , where V 0,t represents the initial cash position. The derivation of V S,t , V B,t , and V C,t is outlined in Bakshi et al. (1997) as follows: For the hGNN model, ∆ S , ∆ V , and ∆ R are equivalent to the Greek letters ∆Ĉ, νĈ, and ρĈ specified in equations (17), (19), and (21), respectively.
The hedged portfolio thus constructed is updated at each time interval ∆t when it is re-balanced.
Hence, the hedging error can be written as follows: In the empirical analysis, we choose ∆t=7 and 30 days following Bakshi and Madan (2000) and summarize the average hedging error. We train the model with the first 60% of data and use the remaining 40% to evaluate the hedging performance without re-training the model.

Data and empirical analysis
In this section, we first describe the options data used for the empirical analysis. We then compare the performance of the hGNN model with that of well-established models in the literature in terms of forecasting option prices and hedging.

Data
We use options and futures written on the S&P 500 index. We categorize all options into one of five moneyness groups, as the literature shows that the neural network based option valuation models are better at pricing some moneyness groups than others. In terms of option maturity, short-, medium-and long-term options have fewer than 90 days, between 90 to 180 days, and more than 180 days to maturity, respectively. We evaluate forecasting performance with two loss functions: the mean absolute percentage errors (MAPE) and the root mean square errors (RMSE). The statistical significance of forecasting error differences is gauged via the popular pairwise comparison developed in Diebold and Mariano (1995).
We  LeDell, 2018). We also include three popular traditional option pricing models specified in Bakshi et al. (1997): the stochastic volatility (SV) model, the stochastic volatility and stochastic interest rate (SVSI) model, and the stochastic volatility with random jumps (SVJ) model. We follow exactly the specifications in (Bakshi et al., 1997) for these three models.
The models are used to generate out-of-sample option prices from 19 March, 2015, to 2 March, 2018, in a rolling scheme. We set the training and validation window to be 300 days and the trained models are tested in the following 7 or 30 days. Afterwards the training and validation window rolls forward for 7 or 30 days and used for the second testing. This goes on until the end of the sample period.   Option price prediction

Implied volatility prediction
In Table 3, we report the baseline out-of-sample prediction results for the 7-day ahead horizon between our proposed hGNN model and alternative models. We summarize the forecasting performance for call (Panel A) and put (Panel B) options across different time to maturity and moneyness over two loss functions.
We find that the hGNN model consistently produces lower forecasting errors compared with the dNN and AnNN. For deep ITM calls that traditional neural network based models have a hard time predicting, the MAPE and RMSE for the hGNN are a mere 0.9% and 14.5, respectively, and they compare favorably with the other two models whereby the MAPE and RMSE are 4.7% and 111 for the dNN, and 7.8% and 187 for the AnNN, respectively. This superior forecasting performance is also evident for put options. For VDOTM puts, the forecasting errors measured by the MAPE and RMSE for the proposed model is 1.44% and 0.03, respectively, for long-term options, whereas they are 14% and 0.25 for the dNN, and 17% and 0.26 for the AnNN, respectively. 2 Furthermore, we show that the hGNN model exhibits significantly improved performance compared with traditional SVJ, SVSI, and SV models with much smaller MAPE and RMSE. This is the case regardless of the option type, time to maturity, or option moneyness, and highlights the computational prowess of neural network based option valuation models.
For robustness, we conduct the same exercises over 30-day ahead forecasting horizon. We further determine the statistical significance of the pricing error differences for 7-and 30-day ahead forecasting errors between all six models via the Diebold and Mariano (1995) and Giacomini and White (2006) tests. These robustness tests show that the hGNN model continues to outperform the other five models with more precise forecasts, and is always the preferred model statistically to alternative models according to statistical inferences based on Diebold and Mariano (1995) and Giacomini and White (2006) tests. 3 What drives this significantly improved forecasting performance by the hGNN model? To address this question, we form two groups of options. Group 1 includes 29,771 options, which are among the top 1% of our sample with the largest absolute percentage errors (APE) according to the dNN model, whereas Group 2 contains 294,795 options that are randomly selected from the rest of the sample. By taking a closer look at the characteristics and pricing performance of these two groups of options, we hope to better understand the driving factor behind the success of the hGNN model relative to the dNN and AnNN models.
In Table 4, we provide a simple summary of these two groups of options along the moneyness (K/F) and maturity dimensions. We note that overall options in Group 1 tend to be OTM in terms of moneyness, and much more short-dated in option maturity. In Figure 2, we illustrate the MAPE differences between the dNN and hGNN models in orange bars and between the AnNN and hGNN models in grey bars across forecasting horizons, option moneyness, and time to maturity. It is striking that for Group 1 options on the left of each panel, the MAPE differences are substantial: moving from DITM to DOTM options, the error differences increase dramatically within the same time-to-maturity group. For example, for 30-day ahead forecasts for call options, the MAPE differences are around 40% for DITM options but they go beyond 100% for DOTM options, indicating Meanwhile, the error differences for Group 2 options also increase gradually from DITM to DOTM options but with a much smaller magnitude. For DOTM options, the biggest MAPE differences tend to take place for short-term options at around 40-50%. This indicates that although the hGNN model generates more accurate out-of-sample option price predictions, the improvement tends to be milder.
In Figure 3, we further visualize the forecasting errors generated by the three models for individual options in Group 1. It is evident that the black dots, representing pricing errors generated by the hGNN model, lies below the blue dots, i.e. pricing errors from the dNN model, whereas the green dots, i.e., pricing errors generated by the AnNN model, are all over the place.

Hedging performance
Finally, we focus on the hedging performance of alternative models with 7-and 30-day rebalancing. Figure 4 shows the option Greeks δ, ν, Θ, and ρ for call options across strike prices and time-to-maturity. The average hedging errors for options with different time-to-maturity and moneyness are reported in Table 5. We note that across the board, the hGNN model consistently generates smaller average hedging error than the alternative models except one case whereby its      hedging error is slightly bigger. As the hedging performance is based on trading strategies involving options, the underlying asset, and bonds, smaller hedging errors represent the true economic value that the hGNN model generates for market participants relative to the other models.
To summarize, the empirical analyses conducted in this section show that the proposed hGNN option valuation model outperforms the other neural network based models as well as traditional models both in generating significantly smaller option price predictions and in offering smaller error on average in hedged portfolios using options. The results are robust with respect to different option moneyness, time to maturity, forecasting horizons, and the put/call type.

Robustness check
In addition to the baseline results, we have further examined the prediction performance of these six option pricing models over 30-day ahead forecasting horizon. Using the Diebold and Mariano (1995) and Giacomini and White (2006) tests, we show that the forecasting errors from our proposed model is significantly smaller than those from the other models over both 7-and 30-day ahead horizons. This is also the case for grouped options.

Conclusion
The recent literature has seen substantial advancement in a number of new option pricing models based on data science methods that generate more precise option price forecasts. However, a major issue for this kind of models is the lack of economic intuition and interpretation. Hence, they are considered a black box to the mainstream finance industry.
This study represents a novel approach in addressing this issue. We develop a hybrid gated neutral network (hGNN) model for option valuation that not only produces superior prediction accuracy as many neural network based models do, but also offers analytical expressions for option Greeks which improves its hedging performance. We start from no-arbitrage constraints in the option pricing theory that all models need to meet, construct a multiplicative structure for the hidden neurons to maintain its differentiability, and select the slope and weights in the input layer to satisfy the no-arbitrage constraints. We further train this model with synthesized theoretical values for options on the boundaries thus ensuring that all constraints are satisfied. Furthermore, we construct a separate neural network model for forecasting option-implied volatilities with information from option moneyness, time to maturity, and lagged implied volatilities.
Using daily data from May 2014 to March 2018, we show empirically that our hGNN model consistently and significantly outperforms both neural network based models such as the dNN and AnNN, and traditional models such as the stochastic volatility model with jumps in forecasting S&P 500 option prices across board. It also offers smaller hedging errors in a delta hedging strategy involving stocks and bonds compared with alternative models. These empirical findings substantiate the novelty of our model as a step towards formulating neural network based models with economic insight.