Remote sensing technology and machine learning algorithms for crop yield prediction in Bambara groundnut and grapevines

Jewan, Shaikh Yassir Yousouf (2024) Remote sensing technology and machine learning algorithms for crop yield prediction in Bambara groundnut and grapevines. PhD thesis, University of Nottingham.

[thumbnail of Thesis revised with post-viva corrections incorporated. Final version.] PDF (Thesis revised with post-viva corrections incorporated. Final version.) (Thesis - as examined) - Repository staff only until 31 December 2025. Subsequently available to Anyone - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence All Rights Reserved.
Download (8MB)

Abstract

Accurate and timely crop yield prediction (CYP) is important for ensuring food security, optimising agricultural practices and ensuring sustainable agriculture. Remote sensing technology (RS) utilises satellites, unmanned aerial vehicles (UAVs), unmanned ground vehicles (UGVs), and several platforms and sensors for precision agriculture (PA) applications such as yield prediction across various wavelengths and spatial scales. This thesis explores the use of RS for CYP, on two distinct crops, with primary focus on grapevines and minor focus on groundnut. The research spans across multiple years of RS and ground-based dataset acquired in Malaysia and South Australia. The aerial and ground-based data were then used in conjunction with statistical and machine learning (ML) models to predict groundnut and grapevine yield.

In chapter 2, we evaluated the efficacy of using a low-cost color infrared camera to predict groundnut yield using RS data and ground data acquired at several key phenological stages. Unmanned aerial vehicles (UAVs) are increasingly popular in recent years for agricultural remote sensing (RS) applications such as crop yield forecasting and precision agriculture (PA). The objective of this study was to evaluate the performance of a low-cost UAV-based RS technology for Bambara groundnut yield prediction. A multirotor UAV equipped with a near-infrared sensitive consumer-grade digital camera was used to collect image data during the 2018 growing season (April to August). Flight missions were carried out six times during critical phenological stages of the life-cycle of the monitored crop. Yield was recorded at harvest. Four vegetation indices (VIs) namely normalized difference vegetation index (NDVI), enhanced vegetation index 2 (EVI2), green normalized difference vegetation index (GNDVI) and simple ratio (SR) generated from the Red-Green-Near Infrared bands were calculated using the georeferenced orthomosaic UAV images. Pearson’s correlation and Bland–Altman testing showed a significant agreement between remotely and proximally sensed VIs. Significant and positive correlations were found between four VIs and yield, with the strongest relationship using SR at podfilling stage (R = 0.81**). Multi-temporal accumulative VIs improved yield prediction significantly with the best index being SR and the best interval being from podfilling to maturity (R = 0.88**). The accumulated SR from podfilling to maturity resulted in higher prediction accuracy (R2 = 0.71, RMSE = 0.20, MAPE = 14.2%) than spectral index at a single stage (R2 = 0.68, RMSE = 0.24, MAPE = 15.1%). Finally, a yield map was generated using the model developed, to better understand the within-field spatial variations of yield for future site-specific or variable-rate application operations.

Chapter 3 investigates the use of UAVs equipped with multispectral camera, ground-truth data together with ML algorithms to predict grape yield and quality. Data were collected at veraison stage during the 2019/2020 and 2020/2021 growing season in Coombe’s vineyard, Adelaide, South Australia. Multispectral, thermal, and canopy state variables including fractional intercepted photosynthetically active radiation, stomatal conductance, photosynthesis and chlorophyll content were collected simultaneously. Yield components and quality parameters were measured at harvest. Spectral and thermal indices were computed and used together with canopy state variables data as input to the ML models namely ridge regression, random forest regression, adaboost regression, principal compoenent regression, and partial least squares regression. Results indicate that Random Forest (RF) outperformed all other models in predicting grape yield components and quality parameters in validation. For total number of clusters per vine, R² was 0.80 and RMSE 0.49. For average cluster weight, R² was 0.87 and RMSE 0.00. Weight of 50 berries had R² of 0.78 and RMSE 0.00. Average berry weight had R² of 0.79 and RMSE 0.00. Total cluster yield had R² of 0.85 and RMSE 0.23. Average berries per bunch had R² of 0.82 and RMSE 1.43. Yield had R² of 0.95 and RMSE 0.77. Total soluble solids had R² of 0.80 and RMSE 0.39. Titratable acidity had R² of 0.77 and RMSE 0.39. pH had R² of 0.73 and RMSE 0.04. Maturity index had R² of 0.83 and RMSE 0.24. Results show that the key predictors identified for grape yield components and quality parameters were stem water potential, fractional intercepted photosynthetically active radiation, canopy chlorophyll content, canopy to air temperature difference at midday, stomatal conductance, stomatal conductance index, crop water stress index, conductance index, triangular greenness index, and visible atmospherically resistant index. These findings offer insights for optimising vineyard management and winegrape production.

Chapter 4 extends the investigation by integrating proximally sensed hyperspectral VIs acquired using a spectroradiometer and thermal data together with ground-truth measurements to predict grape yield components and quality parameters. The study was conducted during the 2019/2020 and 2020/2021 grapevine growing seasons in Coombe’s Vineyard, Adelaide, South Australia, and was validated in the 2021/2022 season in Coonawarra Vineyards, Coonawarra, South Australia. At veraison, hyperspectral and thermal data were collected using a handheld spectroradiometer and thermal imaging camera. Concurrently, grapevine canopy variables fractional intercepted photosynthetically active radiation, stem water potential, chlorophyll content and gas exchange were measured. Yield components (clusters per vine, cluster weight, berries per cluster, berry weight) and quality parameters (total soluble solids, titratable acidity, pH, maturity index) were measured at harvest. From the hyperspectral and thermal data, 20 VIs and 3 thermal indices were derived. These data and crop state variables were used to model grape yield components and quality parameters using linear and non-linear regression models such as ridge (RR), Bayesian ridge (BRR), random forest (RF), gradient boosting (GB), K-Nearest Neighbor (kNN), and decision trees (DT). Results indicated that GB consistently outperformed other models. GB had the best performance for total number of clusters per vine (R² = 0.77, RMSE = 0.56), average cluster weight (R² = 0.93, RMSE = 0.00), weight of 50 berries (R² = 0.93, RMSE = 0.00), average berry weight (R² = 0.95, RMSE = 0.00), total cluster yield (R² = 0.95, RMSE = 0.13) and average berries per bunch (R² = 0.93, RMSE = 0.83). For yield, RF performed best (R² = 0.97, RMSE = 0.55). GB performed best for total soluble solids (R2 = 0.83, RMSE = 0.34), pH (R2 = 0.93, RMSE = 0.02), and maturation index (R2 = 0.88, RMSE = 0.19). However, RF performed best for titratable acidity (R2 = 0.83, RMSE = 0.33). Our results also revealed the top 10 predictor variables for grapevine yield components and quality parameters which were: canopy temperature depression, leaf chlorophyll content, fractional intercepted photosynthetically active radiation, normalised difference infrared index, stem water potential, stomatal conductance, net photosynthesis, modified triangular vegetation index, modified red-edge simple ratio and ANTgitelson index. These predictors significantly influence grapevine growth, berry quality, and yield.

In Chapter 5, the complexity of modeling grapevine yield components and quality parameters were further explored. The study was conducted over four grapevine growing seasons (2018/19 to 2021/22) on Cabernet Sauvignon and Shiraz in Coonawarra, South Australia. It included four irrigation treatments: conventional, crop evapotranspiration, soil moisture, and two plant water status sensors. Data were collected at six phenological stages: budburst, flowering, fruit set, pea-sized, veraison, preharvest, and harvest. Weather data were also recorded. A network of proximal thermography sensors continuously measured canopy temperature and microclimatic conditions. Ground measurements included soil moisture, plant water status, fractional intercepted photosynthetically active radiation, crop coefficient, photosynthesis, stomatal conductance, transpiration rate, internal carbon dioxide concentration, and water use efficiency. Remote sensing data (thermal, multispectral, RGB) and proximal hyperspectral data were acquired concurrently with ground data. At harvest, yield components (number of clusters per vine, cluster weight, number of berries per cluster, berry weight) and quality parameters (total soluble solids, titratable acidity, pH, hue, color, total anthocyanin, total phenolics, and maturation index) were measured. Various structural, spectral, thermal, and composite indices were derived from the remotely sensed data. These indices, along with ground-truth canopy state variables and weather data, were first subjected to PCA then different PC components combinations were used as input to linear and non-linear regression models, including random forest regression (RFR), multiple linear regression (MLR), decision trees regression (DTR), and adaboost regression (ABR) to predict grapevine yield components and quality parameters. For yield, RFR with PC1 to PC10 performed best (R² = 0.90, RMSE = 0.31). For average berry weight, RFR with PC1 to PC9 was the most accurate (R² = 0.92, RMSE = 0.02). For average bunch weight prediction, MLR with PC1 to PC7 excelled (R² = 0.92, RMSE = 2.59). For berries per bunch, MLR with PC1 to PC7 had the best performance (R² = 0.84, RMSE = 2.73). Bunch count was best predicted by MLR with PC1 to PC6 (R² = 0.77, RMSE = 2.45). For pruning weight, MLR with PC1 to PC6 was optimal (R² = 0.67, RMSE = 0.04). Finally, for Ravaz Index, RFR with PC1 to PC10 performed best (R² = 0.85, RMSE = 0.17). For color density, ABR with all variables as input performed best (R2 = 0.81, RMSE = 0.03). Colour/berry and titratable acidity were best predicted using RFR with PC1 to PC10 (R2 = 0.90, RMSE = 0.05) and (R2 = 0.87, RMSE = 0.08) respectively. While total soluble solids were best predicted using RFR with PC1 to PC7 as input (R2 = 0.92, RMSE = 0.18). Tannin concentration was best predicted using DTR with PC1 and PC2 as input (R2 = 0.99, RMSE = 0.03). RFR with PC1 to PC8 had the highest performance for total anthocyanin content (R2 = 0.85, RMSE = 4.39). Total phenolics was most accurately predicted using MLR with PC1 to PC6 (R2 = 0.90, RMSE = 0.03). While pH was best predicted using RFR with PC1 to PC9 as input (R2 = 0.91, RMSE = 0.02). The top-10 predictors for grapevine yield and quality components were water use efficiency, photosynthetic rate, leaf area index, stem water potential, predawn water potential, TCARI/OSAVI, evapotranspiration, stomatal conductance, photochemical reflectance index and green normalised difference vegetation index. The findings of this thesis highlight the importance of employing a diverse dataset including RS data and groundtruth-data and ML techniques for accurate CYP.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Murchie, Erik
Sparkes, Debbie
Singh, Ajit
Pagay, Vinay
Tyerman, Stephen D.
Gautam, Deepak
Billa, Lawal
Keywords: yield prediction, remote sensing, machine learning, multimodal data, vegetation indices, thermal indices, weather data, modelling
Subjects: S Agriculture > SB Plant culture
Faculties/Schools: UK Campuses > Faculty of Science > School of Biosciences
Item ID: 79839
Depositing User: Jewan, Shaikh Yassir Yousouf
Date Deposited: 18 Feb 2025 14:45
Last Modified: 21 Feb 2025 17:42
URI: https://eprints.nottingham.ac.uk/id/eprint/79839

Actions (Archive Staff Only)

Edit View Edit View