Forecasting stock market return with nonlinearity: a genetic programming approach

The issue whether return in the stock market is predictable remains ambiguous. This paper attempts to establish new return forecasting models in order to contribute on addressing this issue. In contrast to existing literatures, we first reveal that the model forecasting accuracy can be improved through better model specification without adding any new variables. Instead of having a unified return forecasting model, we argue that stock markets in different countries shall have different forecasting models. Furthermore, we adopt an evolutionary procedure called Genetic programming (GP), to develop our new models with nonlinearity. Our newly-developed forecasting models are testified to be more accurate than traditional AR-family models. More importantly, the trading strategy we propose based on our forecasting models has been verified to be highly profitable in different types of stock markets in terms of stock index futures trading.


Introduction
A crucial question for open discussions in finance is whether future stock returns are predictable (see Fama 1970), and this issue is also controversial (e.g.Ang and Bekaert 2006).A plethora of studies (such as Fama and French 1988;Campbell and Yogo 2006;Bollerslev et al. 2015;Golez and Koudijs 2018;Liu et al. 2019) have shown that the stock returns are predictable by using relevant variables, such as dividend.On the other hand, however, many others remain skeptical about the stock return predictability (Welch and Goyal 2007).

Research background and contributions
This paper aims to further examine this issue by building new return forecasting models using the genetic programming approach.Contributions of our paper stem from several aspects.Firstly, we unveil that the model forecasting accuracy can be improved through better model specification without adding any new variables.Seeking the relevant variables for forecasting future returns has been witnessed in burgeoning literatures (see Fama and French 1988;Nelson and Kim 1993;Campbell and Shiller 1988).In these aforementioned works, they only focus on demonstrating the potential of different variables in forecasting stock market returns.Nevertheless, works dedicated to calibrate model specifications are scant.As a result, in this paper, in contrast to existing literatures, we only use lagged market return as the future return predictors.We do not add any new variable to our model because we intend to show that new model specification can improve the model prediction power.It might be complementary to existing literatures that better model specification could be equivalently vital as including new variables, which reinforces predictive power of return forecasting model.
Furthermore, we adopt an evolutionary procedure, namely genetic programming (GP), to develop our new models with nonlinearity.The nonlinear dependence of the return time series has been well documented (see Scheinkman and LeBaron 1989;Ding et al. 1993), especially for those emerging markets (see Avdoulas et al. 2018).We use GP to search the potential forms of the return forecasting 1 3 model using only the lagged returns as predictors.GP is a specialized form of evolutionary algorithm (EA) inspired by Darwin's theory of evolution.The basic idea behind is to simulate the survival of the fittest principle in a biological category, such that the favoured race of the successive generations will be naturally chosen for preservation.A distinct feature of GP compared to other evolutionary methods is the tree structure which gives not only an optimised solution but also the solution method.
More importantly, discontinuous movements like jumps happen frequently in the stock market and traditional return forecasting model is difficult to capture such discontinuity (see Kim and Mei 2001;Chan and Maheu 2002;Cremers et al. 2015).The jump process in terms of natural log function is usually used to approximate price discontinuous movements in the futures and options markets, and this formulation tends to be nonlinear (Bates 1996;Kou 2002).Therefore, we also include the natural log function in our GP framework in order to capture such discontinuity and it would be advantageous to use GP as the solution model provided by GP is normally nonlinear because of its evolutionary nature.Therefore, the accuracy of GP model specification might be heightened by capturing such nonlinearity.
Besides, stock markets in different countries might exhibit distinguishing characteristics.It is arguable that the characteristics in emerging markets could be entirely contrasting to those in developed markets.Therefore, we classify markets into subgroups and we employ different model specifications for different market types.Like Gencay and Selcuk (2004) show that different countries have different moment properties at right and left tails of their return distributions, which may entail different risk-reward relationship.Lee et al. (2015) illustrate that empirical evidence from the US stock market might be quite distinctive to other countries, especially Asian countries, under the structural VAR model.It is also well documented that different stock markets have different characteristics in the literature (see Chen et al. 2006;Choudhry and Garg 2008).Consequently, we argue that different countries shall have different return forecasting models that suit their own countries' characteristics.Basically, developed countries and emerging countries shall be categorized into two different types of economics, which shall have different return forecasting models.Furthermore, during different time periods, stock markets may also exhibit different features.Therefore, dynamic model specification with GP could well be overwhelmingly favourable compared with static models even with new variables.
Therefore, the goal of the paper is to adopt GP to generate the best models that can predict future stock returns without adding any other variables but with high accuracy.We categorize different countries into different groups and we have developed an appropriate model for each group.More importantly, our proposed models are more accurate in predicting returns and can be used to develop corresponding trading strategies with high profitability.The trading strategy is used in the stock index futures market.It is noticeable that futures trading differs from stock trading.Specifically, futures contract has maturity, which indicates that futures contract has an expired date and all futures positions would be closed on that day automatically.Therefore, we use our model to forecast 1-day ahead return and implement the intra-day trading strategy.In other words, our futures position would be opened and closed on the same date.Additionally, unlike stock investment, investors can earn money even if the market return is negative because they can take short positions of stock index futures.As a result, the traditional stock trading strategy like buy and hold would be irrelevant and thus it might be inappropriate to use such strategy as a benchmark.Therefore, we adopt the same trading strategy for all tested models.Compared with other AR-family models with same trading strategy in futures market, our model exhibits 55% profitability on average while other models only have 40%.
Empirically, we are able to demonstrate that our models have superior performance in forecasting future returns compared with AutoRegressive (AR) family models in both linear and nonlinear forms.The improvement rate is around 30% for in-sample fitting and around 40% for outof-sample forecasting.Furthermore, we have exploited a trading strategy based on our models.The profitability of our trading strategy is around 20% for developed markets and around 60% for emerging markets from 2012 to 2017, which are noticeably higher than traditional AR family models.Moreover, we also adopt a traditional non-linear model for the robustness check and our models outperforms the traditional non-linear model, which verifies the robustness of our results.

Literature overview
Predicting future returns with relevant variables has been the focal point in literatures.Dividend is the most popular variable for predicting future stock returns among those works.Fama and French (1988), Nelson and Kim (1993) and Campbell and Shiller (1988) show that the capability of dividend conveying the future dividend growth and expected return information may yield the successful cases of stock return prediction.The price/earnings (P/E) ratio has also been well documented in the return predictability literatures.Like Lamont (1998) maintains that the P/E ratio holds the predictive power to predict the future stock returns in addition to dividend.Moreover, book/market (B/M) ratio also plays an important role in stock predictability research.Jiang and Lee (2007) demonstrate the prediction power of B/M ratio and log dividend yield in terms of return forecasting performance.Aydogan and Gursoy (2000), unfold the fact that P/E as well as B/M ratios carry the ability of predicting future returns, especially over long time periods.More recently, Cremers and Weinbaum (2010) use deviations from put-call parity to predict future stock returns.However, the prediction power of those models is quite limited as Ang and Bekaert (2006) argue that return forecasting model with dividend fails to exhibit any long-horizon predictive power.More importantly, forecasting models based on the dividend and earnings yield may also have instability problems (see Lettau and Ludvigson 2001;Goyal and Welch 2003;Paye and Timmermann 2006;Cai et al. 2015).Moreover, since our paper also focuses on developing trading strategies, the technical analysis paper such as Park and Irwin (2007), Batten et al. (2018) and Jiang et al. (2019) as well as the recent non-linear model works such as Zhao et al. (2019) and Facchini et al. (2020) could also be relevant.
GP also holds the elegant characteristics that one can build the relevant performance criterion directly into the search procedure.Furthermore, it has been shown that GP has been adopted in various financial areas.For example, Manahov et al. (2015) has utilized a Strongly Typed Genetic Programming (STGP) based trading algorithm to forecast 1-day-ahead stock return.The STGP-based system enables them to investigate the stock return forecasting through groups of artificial traders.They find that the STGP-based forecasting results dominate other benchmark forecasts in a short time horizon.Pimenta et al. (2017) apply genetic programming with multiobjective optimization to develop an automated investing method and this method is proven to be quite profitable in the Brazil stock exchange market (BOVESPA).More recently, the applications of GP have been also witnessed in different research fields other than finance (see Bhola et al. 2019;Chen and Gao 2019;Shoba and Rajavel 2020).Therefore, we utilize GP to build our new return forecasting models with nonlinearity and embedded nonlinearity characteristics in our models could enhance the model performance in predicting future returns.
Therefore, developing new return forecasting model without adding new variables is essential since less variable might make model more stable.Further, model specification with features like nonlinearity would be also helpful and GP method would be favorable.

Paper structure
The remainder of the paper is organized as follows.Section 2 gives the detailed information about the data and methodology we use.Section 3 describes the GP algorithm.Section 4 shows the empirical return forecasting results.Section 5 presents the empirical results of the trading strategy based on our return forecasting models.Section 6 concludes our paper.

The data
We obtain four countries' stock index from WIND database with daily frequency, from January 1, 2006 to December 31, 2017.The full sample constituted by four countries contains two subsamples, which are developed economics and emerging economics.For developed economics, we use S&P 500 index of US and Nikkei 225 index of Japan.For emerging economics, we use Sensex 30 index of India and CSI 300 index of China.As pointed out in Batten et al. (2018), the sample composition could have an impact on the model performance.In order to maintain the in-sample and outof-sample periods have the same observations, we divide the sample from 2006 to 2011 and 2012 to 2017.This can help models to show their performance in a relatively similar way in both in-sample and out-of-sample periods and assist us to enhance our model performance in the out-of-sample period.For the in-sample test, we use the full sample period, which means the input data and the forecasted returns will be both the whole sample period.For the out-of-sample period, we use January 1, 2006-December 31, 2011 as the estimation period and January 1, 2012-December 31, 2017 as the forecasting period, which means we use the period January 1, 2006-December 31, 2011 as the input data to forecast the stock return of the period January 1, 2012-December 31, 2017.Specifically, in-sample test indicates that we use available data to forecast values within the estimation period while out-of-sample test means we use available data to forecast values outside the estimation period.For the trading strategy empirical test, we also use the corresponding stock index futures data for the four stock markets with the period from January 1, 2012 to December 31, 2017.In addition, for both in-sample and out-of-sample tests, we use 1-day ahead prediction during the data period and the statistical test for error differences will be also employed.

Model specifications and variable estimation
The main variable we use in this paper is the return, which can be defined as (Andersen and Bollerslev 1998): where P t is the spot price of a stock or a stock index.
Table 1 summarizes the detailed statistics of stock index returns for four countries.
The AR-class model has been widely used in the financial literatures for return forecasting (see Ferrara et al. 2015;Avdoulas et al. 2018).We use the AR-class model to forecast stock index returns, with both linear and nonlinear specifications as benchmark models.For the linear benchmark model, we use the standard autoregressive (AR) model.For the non-linear benchmark models, we use the SETAR (self-exciting threshold autoregression) model and the STAR (Smooth transition autoregressive) model.Both of the models are time series models, which assume that data order is in time sequence.For the linear ARMA model, it assumes a linear relationship between past asset returns and future asset returns.In other words, future asset returns can be envisioned as a function of past asset returns with linear combination.On the other hand, the nonlinear models such as SETAR and STAR models, they assume a nonlinear relationship between past asset returns and future asset returns.
Those models can be viewed as structural models where they use thresholds to distinguish returns in different structures.
Then, we provide a brief description of the models implemented in our analysis (Terasvirta 1994;Hurn et al. 2016).The benchmark forecasting linear autoregressive model (AR) of order p (p is the number of lagged autorregressive term y t ), for a given horizon h: is a constant, and is a p-vector of parameters.Specific model is selected by Bayesian Information Criterion (BIC).The BIC is mathematically defined as: where n is the data size, k is the number of parameters estimated, is the set of all parameters, in particular, L( ) rep- resents the maximized value of the likelihood function for the estimated model with .
We need the maximum value of L( ) , the lowest BIC is thereby preferred.Moreover, parameters are estimated by Ordinary least squares (OLS) linear regression method.OLS linear regression method uncovers the parameters of a linear function consisting a number of independent variables by minimizing the sum of the squares of the differences between the observed dependent variable and predicted dependent variable (calculated through the linear function with independent variables).As has been documented in the literature (Marcellino et al. 2006), the (1) forecasting model in Eq. ( 1) often outperforms alternative and more sophisticated univariate and multivariate models.In this work, we focus on three classes of well-known autoregressive models that nest the AR (p) model in Eq. ( 1), namely ARMA model, TAR model and STAR model.
In additional to the traditional ARMA model, we use two nonlinear models, TAR and STAR.The threshold autoregressive (TAR) model was developed by Tong (1978), which assumes that the regime-switching that occurs at time t can be determined by an observable variable q relative to a threshold value, denoted by c.The model presumes that the time series may behave differently corresponding to different regimes where the regime-switching point depends on the past values of the time series and the specific threshold value c.A specific case of TAR model is a SETAR (self-exciting threshold autoregression) model, which assumes that the threshold variable q can be selected to be the lagged value of the time series itself (Tong 1990;Hansen 1997Hansen , 2000)).The most general case is to presume that the model has two regimes to switch, where the specific model for order p can be defined as: where I [A] is an indicator function with I (A) = 1 if the event A occurs and I (A)=0 otherwise; Besides, we also use the STAR model and the most general case of the STAR model for order p can be expressed by the following function formula: where X t = (y t , y t−1 , y t−2 , … , y t−p+1 ) ; 1 + 1 and 2 + 2 are p + 1-vectors of parameters.G(⋅) is the smooth-transition function.
Specifically, the smooth transition function can be determined by one of the following functions: one is a logistic function, which is: or an exponential function, which is: where is the smoothing parameter that controls for the shape of regime changes; z t−d is the transition variable, z t−d is the standard deviation of the transition variable and c is the threshold parameter.

Preliminaries
In this section, we will develop our return forecasting model based on the estimated variables in Sect. 2. For the specific model development, we will adopt an evolutionary search method, Genetic programming (GP).GP is an evolutionary computation (EC) technique inspired by biological process (see Banzhaf et al. 1998;Hirsh et al. 2000;Poli et al. 2008).
Since the form of return forecasting model with nonlinearity is uncertain, it would be beneficial to adopt GP method.One big advantage of adopting GP in this work is that it can allow one to be agnostic about the general form of the model.In GP, a population of computer programs is evolved based on the principles of natural selection originated from Darwin's theory of evolution.After certain number of generations, GP can transform populations of programs into new and better programs.As stated in Poli et al. (2008), GP has been very successful at evolving novel and unexpected ways of solving problems.The main idea of our GP approach is as follows: it firstly generates a random population of functions, and then it evaluates the quality of each individual function, which is the difference between the generated function and the targeted function ( r t in this work, see Sect.3.2 for details).Such quality is usually called the fitness of the individual.Next, one or two function(s) will be probabilistically selected based on its fitness in order to participate in the genetic operations.Normally there are two genetic operations, one is called crossover and another is called mutation.The crossover operation is used to create a new child function (called offspring) by randomly choosing some subitems from two selected functions (called parents, which are usually polynomials) and recombining the subitems from the two functions together.The mutation operation is used to create a new child function by choosing some random subitems from one selected function and altering them.After new individuals are created, their fitness will be calculated again, and genetic operations will also be performed again to evaluate the newlygenerated function.The genetic operations will be undertaken under the probability of crossover and mutation, which will be outlined later.This whole process is mainly based on the aforementioned principles of evolution and will be repeated until an acceptable solution is found or other termination criterion is satisfied (usually up to some certain number of generations).The best individual will be returned as the solution, which is effectively the new return forecasting model.

Genetic programming system
For our model development, we reduce the forecasting task to the computation of the following function based on our GP approach using the data sample period from January 1, 2006 to December 31, 2017: where r t−1 , r t−2 , r t−3 are the lagged terms of the stock index return.Our goal is to find the most relevant terms that have effects on predicting the future stock index return.
Our GP approach consists of the following parts: • Terminal Set: r t−1 , r t−2 , r t−3 .
• Fitness measure: the error between the value of the individual function and the corresponding desired output (i.e.r t ).• GP parameters: population = 10,000, the maximum length of the program = 1000 (i.e. up to 1000 subitems within one polynomial function), probability of crossover operation = 0.8 (i.e.80% of population functions will be mixed with other functions to generate new functions) and probability of mutation operation = 0.1 (i.e.10% of population functions will be mutated to generate new functions).• Termination criterion: when the fitness measure reaches 0 or the system runs up to 100 generations, the system will terminate (For our work, the fitness measure will never reach 0, therefore the system will terminate after 100 generations).
The general procedure of our GP approach can be found in Algorithm 1. where d is the log-diffusion drift, d is the volatility of the stock return and ln(1 + J(Q)) is the log-return jump-ampli- tude with the a simple Poisson jump dP(t) with jump rate and the process ensures that J(Q) > −1.Therefore, it is comparable that the natural log items in our model are analogous to the jump function in the SDE.More importantly, the indicator function is analogous to the Poisson jump process because the Poisson jump process occurs at a predetermined rate while our indictor function implies the strong serial correlation of returns, which may also be interpreted as the jump occurring probability.When the returns are positively related, for example, returns are all positive or negative during a couple of days, then the jump is more likely to happen.Therefore, the NRFM1 may capture the jump ingredient in the developed markets by comprising those natural log items.
On the other hand, however, there is no natural log item in NRFM2 for emerging markets.Accordingly, NRFM2 unfolds the fact that jumps are less likely to occur in emerging markets.The reason is that returns in chosen emerging markets are more bounded.For instance, there is a price limit system in the Chinese stock market, which binds the daily return within ± 10%.Similarly, in the Indian stock market, the price movement has also been constrained.In particular, Bombay stock exchange (BSE) has implemented circuit filters system and set the trigger of circuit filters at 10% (rise or fall).Those binding regulations prominently reduce the jump probabilities in both emerging markets.As a result, we acquire two models with distinguishing features that can represent different types of markets.

Empirical results of return forecasting
This section gives both empirical results for regression models and model performance of return forecasting.In particular, we compare our data fitting results as well

Model development
In order to enhance the accuracy of the developed model, we categorize our sample into two subsamples, namely developed economics (including US and Japan) and emerging economics (including China and India) and we run the two subsamples separately.With the settings stated in the previous section, we ran our GP algorithm for 50 times for each subsample.After simplification, the best function we obtained is the following model for the developed economics: where r t−q is the lagged term of return and I is the indicator function: I = 1 if the condition in the parenthesis holds and I = 0 otherwise.
We denote this model as the nonlinear return forecasting model 1 (NRFM1).
The best function we obtained is the following model for emerging economics: where r t−q is the lagged term of return.
We denote this model as the nonlinear return forecasting model 2 (NRFM2).
These two newly-developed models display distinctive components.For NRFM1, it has three natural log items whereas NRFM2 has none.The natural log items might be a nexus of jumps embedded in the stock price process.Consider the following jump-diffusion stochastic differential equation (SDE) that depicts a stock process with log-normal distribution: as prediction results with three ARMA models, namely, ARMA, SETAR and STAR as well as a high moments return forecasting model (HMRFM).For the model performance evaluation, we use mean absolute error (MAE) for the model accuracy test.The periodic averaged MAE can be defined as: where T represents the number of observations embedded in the forecasting period, Observed t presents the observed variance from the market and Predicted t presents the vari- ance predicted from the models.
For the robustness purpose, we also use mean squared error (MSE) to measure the model performance for both insample fitting and out-of-sample forecasting since our daily data could be quite noisy (Pong et al. 2004; Golosnoy et al. 2014;Bollerslev et al. 2016).The periodic averaged MSE can be defined as: where T represents the number of observations embedded in the forecasting period, Observed t presents the observed variance from the market and Predicted t presents the vari- ance predicted from the models.Lower MSE indicates higher forecasting accuracy.
For the ARMA model estimation, we use the AIC (Akaike Information Criteria) and BIC (Bayesian Information Criterion) to determine the optimal lag.Specifically, we use ARMA (1, 1) for Japan and India and ARMA (2, 2) for China and US regarding the in-sample test and we use ARMA (1, 1) for Japan and ARMA (2, 2) for China, US and India regarding the out-of-sample test.

In-sample data fitting
For the in-sample modeling, we compare three AR-class models with our models in fitting future stock market returns.In particular, we use the NRFM1 to forecast the stock returns of US and Japan, and we use the NRFM2 to forecast the stock returns of China and India.Table 2 shows the in-sample fitting MAE against ARMA, SETAR and STAR models.In general, our models outperform other three models with the improvement rate averaged around 25%.For the NRFM1 model performs better in predicting the stock return for developed countries than NRFM2 for emerging markets (see Tables 2, 3).It might because that the developed market prices reflect more information than emerging markets, whose market efficiency tend to be low.This result is in accord with existing documents that most Asian markets display weak or no market efficiency (Kim and Shamsuddin 2008).Less reflected information in the market prices dilute the accuracy of data fitting within the sample by using only market price as the predicting variable.It also explains the reason that our model performs better in predicting the stock return for US than for Japan.

Out-of-sample forecasting
On the other hand, for the out-of-sample forecasting, we compare three AR-class models with our models in forecasting future stock market returns.Table 4 shows the outof-sample fitting MAE against ARMA, SETAR and STAR models.In general, our models outperform other three models with the improvement rate averaged around 32%.
Unlike the results from the in-sample fitting, the NRFM1 model exhibits weaker performance in predicting the stock return for US and Japan compared with China and India (see Tables 4, 5).Because out-of-sample prediction only uses the information from the past, returns in developed markets  with market efficiency are unpredictable (Timmermann and Granger 2004).On the other side, however, emerging markets with no market efficiency might create predictable returns.Therefore, in the next section, we will propose a trading strategy based on our return prediction models.The trading strategy profit could demonstrate whether it can earn higher returns in emerging stock markets than in developed markets.From theoretical perspective, the return forecasting models should earn higher returns in emerging stock markets since returns in those markets are more predictable.

Robustness check
In order to demonstrate that our results are robust, we adopt a non-linear return forecasting model other than ARMAfamily models as the benchmark model, which we denote as a High Moments Return Forecasting Model (HMRFM).
High moments like skewness are vastly concerned by investors in the stock market (see Kozhan et al. 2013;Kelly and Jiang 2014).Therefore, we adopt a HMRFM that is proposed by Jondeau et al. (2019) as our benchmark model, which provides nonlinear relation investigation between return and high moments.The format of the model can be written as follows: where t is the volatility of the return at time t (also known as second central moment), calculated by � ∑ n t=1 (r t − rt ) 2 , sk t is the skewness at time t (also known as third central moment), calculated by ∑ n t=1 ( r t − rt  ) 3 , and rt is the average return during the period.
Table 6 shows the in-sample fitting MAE and MSE of our models against HMRFM.In general, our models (9)  outperform HMRFM with the improvement rate averaged around 50% for MAE and around 90% for MSE.On the other hand, our models also surpass the HMRFM regarding the out-of-sample forecasting.

Stock index futures trading strategy based on the return forecasting models
In order to show different return predictability in different markets, we propose a trading strategy by trading the corresponding stock index futures based on the two return forecasting models we have developed.We aim to reveal different trading profitability in different markets by using the same trading strategy, which can be described as follows.Suppose we are now at time t before the market open time.We use our model to forecast the stock index return for time t.If the forecasted return is positive, then we long the corresponding stock index futures at its open price and we close our contract at its close price.On the other hand, if the forecasted return is negative, then we short the corresponding stock index futures at its open price and we also close our contract at its close price.At the end of time t, it is observable whether our strategy is successful or not.If the actual return is positive and we long the futures, then we earn the corresponding stock index futures return at time t.Otherwise, we lose the return of the same amount.The situation is exactly identical for the short position and negative return.
Therefore, the cumulative return of the trading strategy for each stock market is defined as: In particular, (10) where r TS t is the return from the trading strategy at time t, F o t is the stock index futures open price at time t and F c t is the stock index futures close price at time t, r IF t is the stock index futures return at time t and r FR t is the forecasted stock index futures return at time t, I(⋅) is the indicator function, when r IF t r FR t ≥ 0 , then I = 1 , and I = −1 , otherwise.Then, we adopt the trading strategy to test our models against other four models based on the out-of-sample results (i.e. from January 1, 2012 to December 31, 2017).It might be essential to point out the irrelevance of transaction cost for the trading strategy.The transaction cost of the trading strategy based on our forecasting model to compare with other AR-family models as well as the HMRFM, is trivial because all models are under similar trading mechanics, which incur the same amount of transaction cost.In other words, all models would be deducted same amount of transaction cost from their returns, which has little impact on the result.The empirical results have been presented in Table 8.From Table 8, it is clear that our model has substantial positive returns for all four countries over the 6-year period.Our model also exhibits the additional returns compared with other three ARMA models and HMRFM.Specifically, for the NRFM1, the returns for developed countries of our model slightly outperforms other four models.On the other side, however, the returns for emerging countries of our model (NRFM2) considerably outperforms other four models.In comparison between NRFM1 and NRFM2, the returns we can earn by using the trading strategy in emerging markets are substantially higher than we can earn in developed markets.Therefore, this result aligns with theoretical prediction that returns in emerging market are more predictable, which may lead to higher returns.More importantly, this empirical results also show that the nonlinear return forecasting model has large superiority in the emerging countries, which is also consistent with the existing arguments (see Avdoulas et al. 2018) From different trading strategies, the simple moving average (MA) trading strategy remains popular in the stock market (Fong and Yong 2005).The advantages of MA strategy involve that it can easily smooth out market noise and then follow the real market trend.The fact that financial practitioners adopt MA strategy in making buy and sell decisions in the stock market endures for decades.Therefore, it is observable that the ARMA model has a higher cumulative return compared with other two AM-family models.More importantly, information inefficiency in the market could weaken the prompt adjustment of prices responding and reflecting all public available information.From the efficient market hypothesis, information inefficiency could present in the market efficiency in either weak form or semi-strong form.As a consequence, market prices could be massively impacted by psychological factors, which shall be analyzed under an irrational theoretical framework (Menkhoff 2010).The models we built incorporate jumps in the return process to reflect informational surprises or news.This part helps our models to grab the discontinuity in information inefficient markets and thus provide higher returns in those markets.For the further model comparison, we present the hit ratios of all models in Table 9, which indicate the successful trading percentage of each model during the sample period.It is observable that our model has the largest hit ratio among all models.As a result, our model can be envisioned as the best performance model in both return earned and trading success.

Conclusion
To conclude, we have built two return forecasting models based on GP method for both developed markets and emerging markets.Our developed models have superior properties in many respects.Firstly, our model only uses lagged returns as predictors rather than filling copious variables into the model.Secondly, our models are AI based propositions, which incorporate special relations in the stock market, such as nonlinearity.Thirdly, our models distinguish developed markets from emerging markets, which restore market characteristics into our models.Empirically, we show that our models present significant improvements in return forecasting compared with autoregressive (AR) family models in both linear and nonlinear forms.The improvement rate is around 30% for in-sample fitting and around 40% for out-of-sample forecasting.Based on those precise return forecasting models, we also propose a trading strategy and the trading strategy has been verified to be highly profitable in both developed markets and emerging markets.In particular, compared with other AR-family models, our model earned 55% profitability on average while other models only earned 40%.Specifically, our model earned 35% and 30% annualized return in US and Japan respectively.In contrast, other models earned 23% and 21% annualized return in US and Japan respectively.For emerging economics, our model earned 57% and 98% annualized return in China and India respectively.In contrast, other models earned 40% and 75% annualized return in China and India respectively.It is arguable that our model displays superior performance over other models with same trading strategy in the futures markets.This superior performance could be attributed to the nonlinearity capture in our model.Moreover, our models also outperform the traditional non-linear model, which illustrates the outstanding information extraction ability of the GP approach.

Table 1
Statistical summary of variables used for the 12-year returns of four countries Algorithm 1: GP for Stock Market Return Forecasting Model 1 Initialisation: Initialise the population of the first generation ; 2 while not find the "good enough" forecasted model or not reach the maximum number of generations; Evaluate each forecasted model's fitness ; Select the best forecasted model from the population of the current generation and insert it into the next new generation; 5 Evaluation: 9 Elitism: 10 Update Population: Update the population of the current generation;

Table 2
MAE of in sample fitting of stock index returnsThis table presents the in-sample fitting results of four countries' stock index return forecasting of four models using the Mean absolute error (MAE).The p values for statistical differences of the forecasting errors are also presented.Our models outperform all other three models

Table 3
MSE of in sample fitting of stock index returnsThis table presents the in-sample fitting results of four countries' stock index return forecasting of four models using the Mean Squared Error (MSE).The p values for statistical differences of the forecasting errors are also presented.Our models outperform all other three models.Where en = ×10 n , e.g. e − 06 = ×10 −6

Table 4
MAE of out-of-sample forecasting of stock index returnsThis table presents the out-of-sample prediction results of four countries' stock index return forecasting of four models using the mean absolute error (MAE).The p values for statistical differences of the forecasting errors are also presented.Our models outperform all other three models

Table 5
MSE of out-of-sample forecasting of stock index returnsThis table presents the out-of-sample prediction results of four countries' stock index return forecasting of four models using the Mean Squared Error (MSE).The p values for statistical differences of the forecasting errors are also presented.Our models outperform all other three models.Where en = ×10 n , e.g. e − 06 = ×10 −6

Table 6
MAE and MSE for in-sample fitting of stock index returnsThis table presents both mean absolute error (MAE) and Mean squared error (MSE) in-sample fitting results of four countries' stock index return regarding the two models.The p values for statistical differences of the forecasting errors are also presented.Our models outperform HMRFM and our results are robust.Where en = ×10 n , e.g. e − 06 = ×10 −6

Table 7
MAE and MSE for out-of-sample forecasting of stock index returnsThis table presents both Mean Absolute Error (MAE) and Mean Squared Error (MSE) for out-of-sample forecasting results of four countries' stock index return regarding the two models.The p values for statistical differences of the forecasting errors are also presented.Our models outperform HMRFM and our results are robust.Where en = ×10 n , e.g. e − 06 = ×10 −6 Table 7 presents the out-of-sample forecasting MAE and MSE of our model against HMRFM.It can be observed that our models outperform HMRFM by around 60% for MAE and around 90% for MSE.These results have demonstrated the robustness of our models.

Table 8
Cumulative returns of trading strategy based on the return forecasting modelsThis table presents the out-of-sample results of four countries' stock index futures returns according to the trading strategy based on the return forecasting models.Our models have significantly higher returns than all other models

Table 9
Hit ratios of trading strategy based on the return forecasting models