Major Depressive Disorder (MDD) is a common disabling psychiatric condition and is a major contributor to the overall global burden of disease. There are no biomarkers to aid the diagnosis of MDD, which currently relies on subjective reporting of symptoms. There is also high variability in MDD symptom profiles, which reflects the heterogeneous nature of the disorder, and likely different aetiological pathways to the disorder. To better understand underlying mechanisms, past studies have utilised different neuroimaging techniques, with the aim to identify neurobiological biomarkers associated with MDD. With growing sample sizes in imaging datasets, and advancement of techniques for data-driven analyses, researchers are beginning to see the potential of machine learning (ML) to uncover informative biomarkers from high-dimensional data for potential diagnostic applications. Data-driven stratification of MDD could help categorise the disorder into aetiologically homogeneous subgroups, and enable application of ML classification for clinical purposes. Moreover, the success of deep learning in fields such as image recognition, speech recognition and self-driving cars, has encouraged researchers to apply it to neuroimaging data from MDD patients. Despite its apparent promise, however, studies using ML to accurately predict MDD have not been consistently successful. In part, this may be due to the curse of dimensionality in predictive modelling and in similarity estimation for clustering, as well as differences in data-processing, data representation, analytical methods, data modalities, and MDD assessment tools. In the current thesis, ML methods are applied to different imaging modalities in two large datasets, UK Biobank (UKB) (N = 6, 247 − 14, 507) and subset of the Generation Scotland: Scottish Family Health Study (GS subsample) (N = 980). The aims are to investigate (i) replicability of clustering analyses based on brain morphometry data, (ii) the potential of deep learning for predicting general mental health (MHQ-factor), and (iii) the effect of MDD phenotyping on ML classification performances. In the first study, participants in the GS subsample are clustered into two groups defined by distinct patterns of brain structural properties, controlled for sex, age, intracranial volume and imaging site. The clustering is replicated in UKB sample, where inter-cluster differences are highly correlated with that of the GS subsample (r = 0.79 − 0.94). The top regions driving the cluster separation are the higher order cortical regions, commonly associated with executive function and decision making. The clusters are associated with general cognitive ability in both datasets, but not MDD case-control status. Secondly, deep learning, specifically the BrainNetCNN model, designed to capture topological locality in graphs, is applied to structural connectivity data from UKB (N ∼ 8000) for predicting, sex, age, general cognition (g-factor) and mental health (MHQ-factor). The best prediction performance is achieved with streamline count (SC) connectivity features among all network weights, achieving 86.91% accuracy, mean absolute error of 4.245 in years, as well as correlations of 0.201 and 0.143 in predicting sex, age, g-factor and MHQ-factor respectively. Connections from the putamen, right precuneus and thalamus, and left superior temporal regions are revealed to be important for predicting the four phenotypes based on SC. The superiority of SC in predicting g-factor and MHQ-factor reduces after adding sex and age as predictors. Moreover, the deep learning model does not outperform linear models, which is consistent with previous results based on data from other brain imaging modalities. Through analysing models' coefficients for each prediction task, we find moderate-to-high correlations in feature ranking across different ML methods, which indicates similarities in prediction strategies among the different models, including deep learning. Lastly, functional and structural connectivity data from UKB are used to classify different MDD phenotypes, with childhood trauma scores as additional phenotype criteria to capture more homogeneous MDD subtypes. Functional connectomes are revealed to predict MDD better than structural connectomes. Better accuracies are achieved for severe MDD phenotypes, with accuracy of 65.74% for Currently Depressed cases with childhood trauma. The robust features identified for predicting MDD with and without childhood trauma are revealed to be different, as are the features identified as important for predicting different MDD phenotypes. Across different phenotypes, edges from sensorimotor and visual networks are more frequently selected as key predictors for all the MDD phenotypes. I leverage different MDD phenotypes, wide ranges of brain imaging modalities, and diverse ML methods, to broadly examine the clinical feasibility and challenges of ML in MDD prediction. In Chapter 2, I demonstrate the importance of controlling for common covariates in clustering analysis to remove undesired confounding effects. In Chapter 3, the results suggest that the confounding effect of age and sex via head size on SC maybe larger than other network weightings, and superiority of SC in predictive modelling for various complex phenotypes reported in previous studies could be driven by common covariates. The results also suggest that model complexity may not improve prediction of complex phenotypes from structural connectomic data for sample size < 8000. I also demonstrate that using polychoric correlation, instead of Pearson's correlation, maybe better, as indicated by the higher variance explained, for categorical data to derive the MHQ-factor. Results from Chapter 4 indicates that functional networks involved in processing of sensory information are more stable biomarkers for MDD in general. Overall, these studies show that differences in data processing and MDD assessment methods could have contributed to differences in findings reported by previous studies, and the importance to check for confounding effects of common covariates. Further work may employ multimodal approach and large-scale data for identifying stable biomarkers for MDD, as well as including different diagnostic criteria for MDD phenotyping to assess the robustness of biomarkers.