Our objective was to use data mining to develop and validate a detection model for clinical mastitis (CM) using sensor data collected at nine Dutch dairy herds milking automatically. Sensor data was available for almost 3.5 million quarter milkings (QM) from 1,109 cows; 348 QM with CM were observed by the participating farmers. Data was divided into a training and a test set, stratified at the cow level. For model building, QM with CM (n = 243) from the training set were taken together with 24,987 QM with a somatic cell count less than 200,000 cells/ml on a milk production test day from cows that never exceeded this threshold during all test days within parity and that were never visually checked by the farmers for CM. The model used decision tree (DT) induction as base classifier, with and without bagging and boosting techniques. Both bagging and boosting techniques work by building models using the base classifier on various samples of the training data. For validation two test sets were created. The first included 105 QM with CM and 13,313 QM without CM, using the same selection as for the training. This test set (Test_GreyOut) excluded the large pool of QM that have a less clear mastitis status. The second test set included the same 105 QM with CM but this dataset included QM with a less clear mastitis status (Test_GreyIn): for negative examples, those QM that were not scored as having CM by the farmers were labeled as negative for CM. From this large sample (n = 1,146,544), a random sample of 50,000 QM was selected. Sensitivity levels were computed at fixed SP levels, and the transformed partial area under the curve (pAUC) was calculated for specificity values of 97% or more to evaluate performance. To visualize performance of the detection models for specificity values of 97% or more, receiver operating characteristic (ROC) curves were constructed. When using the Test_GreyOut set, the transformed pAUC increased from 0.713 when using the base classifier alone, to 0.787 when combined with boosting, to 0.800 when combined with bagging. At a specificity of 99%, sensitivity was 43.5% for the base classifier, 60.0% when combined with boosting and 61% when combined with bagging. When testing on the TestInGrey set, pAUC values were lower, but still increased when bagging and boosting techniques were used: values increased from 0.643 when using the base classifier alone, to 0.677 when combined with boosting, to 0.702 when combined with bagging. At a specificity of 99%, sensitivity was 24.7% for the base classifier, 30.5% when combined with boosting and 35.2% when combined with bagging. These results were obtained using very narrow time-windows. It is therefore concluded that models developed by DT induction are promising for future implementation