Tsantila, Efterpi
(2025)
The application of machine learning to predict disease, production and reproduction outcomes from the transition period of dairy cattle.
PhD thesis, University of Nottingham.
Abstract
Data collected under a transition period monitoring service, from 133 herds over
the course of 2 years, were utilised in order to build predictive models for
disease, production and reproductive outcomes. Both cow level and pen level
variables were used as potential predictor variables, while a variety of methods
including linear regression, decision tree, random forest, multiple adaptive
regression splines (MARS) and artificial neural networks (ANNs) for continuous
outcomes; and logistic regression, decision tree, random forest, ANNs, support
vector machines (SVM) and naïve Bayes for binary outcomes. Models
generating predictions on both the individual and the herd/quarter-year group
level were produced.
Various health outcomes (occurrence or not of milk fever, LDA, RFM and
metritis, as well as a collective disease status outcome) were explored. On the
individual lactation level all models lacked predictive value; the best performing
model was that for collective disease outcome, with a kappa value (measuring
agreement between predicted and observed data) of 0.16, although accuracy
was relatively high at 0.86. When building models on the herd/quarter-year
level, the best performing model was for the milk fever outcome; predicted
group prevalence of milk fever explained around 44% of variation in observed
prevalence, suggesting relatively low predictiveness. Better prediction
performance was revealed when individual lactation level model predictions
were aggregated at herd-quarter-year level and compared with observed
aggregated disease prevalences; just over two thirds (67%) of the variation in4
observed outcome was explained by the aggregated predictions for occurrence
of metritis.
Moving to the reproductive outcomes, probability of insemination success, as
well as time from calving to successful insemination, were investigated. Kappa
values for the former ranged from 0.04 to 0.17, while the R2 value describing
the relationship between aggregated predictions and actual aggregated values
on the herd-quarter-year level was found to be 0.37. When building models on
the aggregated level instead, the maximum R2 value was found to be at 0.24
for the MARS model. Regarding the time to insemination outcome, the
maximum R2 value calculated was found just at 0.024 for the linear regression,
indicating very low predictive value. Interestingly, while no strong predictive
value was found in these models, inferential models were built for those same
outcomes and found strong associations between insemination success and
lactation number, calving month, as well as calf mortality; and between time to
insemination and metritis, corrected protein percentage in milk, calving month
and lactation number.
For the production outcomes, models for both the 305-day predicted milk yield
and the daily residual milk yield (difference between observed yield for a given
cow on a given day, and expected daily yield based on lactation curve shape
for the appropriate parity in the cow’s herd) were built. For the individual
lactation level of the 305-day milk yield models, R2 values were again relatively
low, at around 0.1, with the exception of the random forest that had a value of
0.34. Similarly, when comparing aggregated predictions using the individual
lactation models and actual aggregated values, the R2 was as low as 0.024.
Building models on a herd/quarter-year level yielded similar results with R25
ranging from 0.12 to 0.39 for the linear regression and the random forest
models respectively. For the daily residual milk yield outcome, the R2 values of
individual lactation models had a maximum value of 0.21 for the random forest
model, while regarding the aggregated models the maximum value was at
0.134. When using the individual lactation level models to compare aggregated
predictions with actual aggregated values the R2 was found to be at 0.34. Not
unlike our results on the reproductive outcomes, various strong inferential
associations were identified for these outcomes, regardless of the predictive
models’ performance.
Since transition management is key to successful dairy farming, machine
learning would be useful both in terms of predicting which individuals may get
a negative outcome and possibly require enhanced observation or other
preventive interventions, and also in providing a potential monitoring metric.
The latter would mean that even if individual predictions are not good, knowing
the predicted disease prevalence, insemination success or yield ineach group’s
cows could be used as a measure of overall transition “success”. Overall, very
few of our models were predictive enough to be useful in either context most
likely, but that could perhaps improve if we had other data available such as
sensor data or history from previous lactations. The project as a whole provides
a good example of why it is important to be cautious with choice of prediction
performance metrics and avoid accuracy as the main measure in unbalanced
data, and of how in many areas inferential models can find strongly significant
associations but still generate very poor predictions when applied to new data.
Item Type: |
Thesis (University of Nottingham only)
(PhD)
|
Supervisors: |
Hudson, Christopher Randall, Laura Green, Martin Remnant, John |
Keywords: |
Dairy cows; Predictive models; Health outcomes; Reproductive outcomes; Production outcomes; Transition management; Prediction performance metrics |
Subjects: |
S Agriculture > SF Animal culture |
Faculties/Schools: |
UK Campuses > Faculty of Medicine and Health Sciences > School of Veterinary Medicine and Science |
Item ID: |
81298 |
Depositing User: |
Tsantila, Efterpi
|
Date Deposited: |
24 Jul 2025 04:40 |
Last Modified: |
24 Jul 2025 04:40 |
URI: |
https://eprints.nottingham.ac.uk/id/eprint/81298 |
Actions (Archive Staff Only)
 |
Edit View |