The application of machine learning to predict disease, production and reproduction outcomes from the transition period of dairy cattle

Tsantila, Efterpi

The application of machine learning to predict disease, production and reproduction outcomes from the transition period of dairy cattle

Tools

Tsantila, Efterpi (2025) The application of machine learning to predict disease, production and reproduction outcomes from the transition period of dairy cattle. PhD thesis, University of Nottingham.

[thumbnail of Efterpi Tsantila-Thesis after corrections.pdf]

Preview

PDF (Thesis - as examined) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (5MB) | Preview

Abstract

Data collected under a transition period monitoring service, from 133 herds over

the course of 2 years, were utilised in order to build predictive models for

disease, production and reproductive outcomes. Both cow level and pen level

variables were used as potential predictor variables, while a variety of methods

including linear regression, decision tree, random forest, multiple adaptive

regression splines (MARS) and artificial neural networks (ANNs) for continuous

outcomes; and logistic regression, decision tree, random forest, ANNs, support

vector machines (SVM) and naïve Bayes for binary outcomes. Models

generating predictions on both the individual and the herd/quarter-year group

level were produced.

Various health outcomes (occurrence or not of milk fever, LDA, RFM and

metritis, as well as a collective disease status outcome) were explored. On the

individual lactation level all models lacked predictive value; the best performing

model was that for collective disease outcome, with a kappa value (measuring

agreement between predicted and observed data) of 0.16, although accuracy

was relatively high at 0.86. When building models on the herd/quarter-year

level, the best performing model was for the milk fever outcome; predicted

group prevalence of milk fever explained around 44% of variation in observed

prevalence, suggesting relatively low predictiveness. Better prediction

performance was revealed when individual lactation level model predictions

were aggregated at herd-quarter-year level and compared with observed

aggregated disease prevalences; just over two thirds (67%) of the variation in4

observed outcome was explained by the aggregated predictions for occurrence

of metritis.

Moving to the reproductive outcomes, probability of insemination success, as

well as time from calving to successful insemination, were investigated. Kappa

values for the former ranged from 0.04 to 0.17, while the R2 value describing

the relationship between aggregated predictions and actual aggregated values

on the herd-quarter-year level was found to be 0.37. When building models on

the aggregated level instead, the maximum R2 value was found to be at 0.24

for the MARS model. Regarding the time to insemination outcome, the

maximum R2 value calculated was found just at 0.024 for the linear regression,

indicating very low predictive value. Interestingly, while no strong predictive

value was found in these models, inferential models were built for those same

outcomes and found strong associations between insemination success and

lactation number, calving month, as well as calf mortality; and between time to

insemination and metritis, corrected protein percentage in milk, calving month

and lactation number.

For the production outcomes, models for both the 305-day predicted milk yield

and the daily residual milk yield (difference between observed yield for a given

cow on a given day, and expected daily yield based on lactation curve shape

for the appropriate parity in the cow’s herd) were built. For the individual

lactation level of the 305-day milk yield models, R2 values were again relatively

low, at around 0.1, with the exception of the random forest that had a value of

0.34. Similarly, when comparing aggregated predictions using the individual

lactation models and actual aggregated values, the R2 was as low as 0.024.

Building models on a herd/quarter-year level yielded similar results with R25

ranging from 0.12 to 0.39 for the linear regression and the random forest

models respectively. For the daily residual milk yield outcome, the R2 values of

individual lactation models had a maximum value of 0.21 for the random forest

model, while regarding the aggregated models the maximum value was at

0.134. When using the individual lactation level models to compare aggregated

predictions with actual aggregated values the R2 was found to be at 0.34. Not

unlike our results on the reproductive outcomes, various strong inferential

associations were identified for these outcomes, regardless of the predictive

models’ performance.

Since transition management is key to successful dairy farming, machine

learning would be useful both in terms of predicting which individuals may get

a negative outcome and possibly require enhanced observation or other

preventive interventions, and also in providing a potential monitoring metric.

The latter would mean that even if individual predictions are not good, knowing

the predicted disease prevalence, insemination success or yield ineach group’s

cows could be used as a measure of overall transition “success”. Overall, very

few of our models were predictive enough to be useful in either context most

likely, but that could perhaps improve if we had other data available such as

sensor data or history from previous lactations. The project as a whole provides

a good example of why it is important to be cautious with choice of prediction

performance metrics and avoid accuracy as the main measure in unbalanced

data, and of how in many areas inferential models can find strongly significant

associations but still generate very poor predictions when applied to new data.

Item Type:	Thesis (University of Nottingham only) (PhD)
Supervisors:	Hudson, Christopher Randall, Laura Green, Martin Remnant, John
Keywords:	Dairy cows; Predictive models; Health outcomes; Reproductive outcomes; Production outcomes; Transition management; Prediction performance metrics
Subjects:	S Agriculture > SF Animal culture
Faculties/Schools:	UK Campuses > Faculty of Medicine and Health Sciences > School of Veterinary Medicine and Science
Item ID:	81298
Depositing User:	Tsantila, Efterpi
Date Deposited:	24 Jul 2025 04:40
Last Modified:	24 Jul 2025 04:40
URI:	https://eprints.nottingham.ac.uk/id/eprint/81298

Actions (Archive Staff Only)

Edit View

LoginAdmin