Mitchell, Emily
(2022)
Statistical analysis of agricultural soils climate data to aid food security under environmental change.
PhD thesis, University of Nottingham.
Abstract
Wheat is one of the most important food crops in the world for human consumption, livestock feed and biofuels. Demand for wheat has increased due to a rising population and crop growth concerns resulting from a changing climate. By exploring novel uses of data gathered on farming practices from the Farm Business Survey, this thesis aims to identify key farming practices which are most associated with high yields.
The first part of this thesis is concerned with modelling wheat yield based on a linear combination of data from the Farm Business Survey, such as annual crop protection costs, labour costs and organic status of the farms, and data from the UK Met Office, such as annual monthly rainfall. We compute coefficient estimates in the linear model using quantile regression, linear regression and principal component regression. We also take a two-step approach by fitting a linear regression model after selecting variables based on either forward stepwise regression, with and without orthogonalisation after every step, or Lasso regression. Variable selection methods consistently select organic status, crop protection and rainfall in June to be included in the model first. Comparing all models based on their mean squared prediction error for an average year, we find that a model created based on linear regression applied to a subset of variables selected with forward stepwise regression with orthogonalisation after every step achieves the smallest mean squared prediction error. This model included the majority of the variables corresponding to farming practices and a small number of weather conditions.
To account for the uncertainty at both the variable selection stage and the parameter estimation stage, focus is next shifted to Bayesian shrinkage priors as a means of simultaneous model selection and inference. If uncertainty is only accounted for after variable selection, the confidence intervals of the coefficient estimates will be unrealistically narrow and lead us to be overconfident about our estimates. The Bayesian Lasso, which is the analogue of the frequentist Lasso, and the horseshoe prior provide credible intervals for the parameters of the linear model. In order to apply these shrinkage priors, we use the Gibbs sampler when the global shrinkage parameter is allowed to vary and Hamiltonian Monte Carlo when the global shrinkage parameter is fixed. We find that these methods also consistently select organic status, crop protection and rainfall in June to be important factors when modelling wheat yields. However, the horseshoe prior finds appropriate credible intervals capturing the combined uncertainty of the model selection and parameter estimation stages for these factors which the two-step frequentist approach aims to account for, but fails to do.
The second part of this thesis is specifically concerned with modelling the highest yields under current technologies and growing conditions. We address this by performing an extreme value analysis, which in our context translates to modelling the highest-yielding farms. We find that wheat yields have an upper finite bound estimated at $17.60$ tonnes per hectare and therefore the scope to improve yields for high-yielding farms diminishes when yields per hectare approach this bound. Furthermore, we find there is no difference between the maximum attainable yields for macro regions west England and Wales, north England and east England. Lastly, we show that the difference between the maximal yields of medium and high spenders on crop protection and fertilisers is not statistically significant.
Actions (Archive Staff Only)
|
Edit View |