Start Over

Comparison of machine learning methods for detecting lameness in dairy cows

Authors :: Zoche-Golob, Veit
Beise, Hans-Peter
Belik, Vitaly
Publication Year :: 2022
Publisher :: Zenodo, 2022.
Abstract: Lameness is one of the most important health disorders of dairy cows. It is very time-consuming to detect lame cows reliably and early enough. For this reason, automatic lameness detection systems are being developed. Such systems are able to capture the cows’ gait by different sensors like scales or cameras. Alternatively, existing data originating from automatic heat detection systems are used that describe the cows’ behavior. These data are usually already aggregated and are called indirect variables. An essential part of automatic lameness detection systems is the algorithm which classifies the data that were measured in „lame“ and „not lame“. The objective of my present thesis was to compare different classification algorithms to detect lameness from indirect variables describing the activity, the performance and characteristics of the cows. For my thesis, data of the project Klauenfitnet were available including the indirect variables lactation number, days in milk, daily milk yield, average steps per hour and average lying duration per lying bout, and the labels „lame“ or „not lame“ according to the results of assessments of the cows’ gait. The values of the variables were collected on a daily basis throughout the project period. The classification was based on time-dependent features like daily activity and milk yield as well as constant (for the observation period) variables like lactation number. The cows’ gait had only been assessed approximately every two weeks. Therefore, not all observations were labelled and available for supervised learning methods. I included discriminative and generative approaches for multivariate timeseries classification into the comparison. Additional to classical machine learning methods like random forests and Support Vector Machines (SVMs), deep learning methods like Multilayer Perceptrons (MLPs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) were applied. For the discriminative approaches, extra features were created by feature engineering. In the end-to-end approaches, the final features used for classification were learnt in deep learning models while the discriminative classifier was trained. In order to make the unlabeled data available for training the classifiers, unsupervised pretraining with autoencoders was applied to deep learning models in generative approaches. The comparison consisted in total of nine approaches: three approaches with feature engineering (FE-SVM, FE-RF, FE-MLP), three end-to-end approaches (E2E-MLP, E2E-CNN, E2E-GRU), and three generative approaches with unsupervised pretraining (AE-MLP, AE-CNN, AE-GRU). The metrics accuracy, precision, sensitivity and specifity were calculated to assess the classification performance of the different approaches. In addition, I recorded the training and classification run times, and the amount of main memory that was required. The nine approaches were tested in ten-fold cross-validation based on the same splits of the data. For the analysis of the differences between the expected classification performances of the nine approaches under examination, I developed a graphical model as a Bayesian multivariate linear model. This solution made it possible to analyse the differences of the classification approaches in the four metrics simultaneously. The used data set contained 727,008 observations. Of these, 17,114 observations had labels from the gait assessments. The labels were distributed to 48.0% and to 52.0% into the classes „lame“ and „not lame“, repectively. Expected preformances were similar among all nine classification approaches under consideration. Just the approaches E2E-MLP and AE-MLP had significantly worse classification results. Overall, the mean expected classification performance was only moderate with accuracy and precision of 0.71, 0.65 sensitivity, and 0.75 specifity. FE-SVM, E2E-GRU, E2E-CNN and AE-CNN classified the best. However, none of these approaches achieved an expected accuracy or precision of at least 0.75. Less than 3 GB of main memory were required for the training of all examined models, but solely the generative approaches needed more than 1.5 GB. Training duration of all discriminative classification approaches was below five minutes whereas all generative approaches with unsupervised pretraining needed considerably more time to learn the model parameters. The expected classification performance of all nine approaches were not sufficient for practical application in automatic lameness detection systems using indirect variables of activity, performance and characteristics of the cows. Consequently, evaluation and optimisation of further models for this use case are required. Approaches with CNN and RNN seem to have the biggest potential because these deep neural networks can be adapted very flexibly to the specific use cases. The ratio of the effort and the ressources needed to develop and train generative classification approaches to the surplus in classification accuracy, compared to end-to-end approaches with similar models, was not beneficial in my studies. Therefore, developing approaches with unsupervised pretraining for the automatic detection of lameness in dairy cows does not seem to be sensible as long as enough data is available for training.