Abstract An artificial neural network with multiple hidden layers (known as a deep neural network, or DNN) was employed as a predictive model (DNN p) for the first time to predict emotional responses using whole-brain functional magnetic resonance imaging (fMRI) data from individual subjects. During fMRI data acquisition, 10 healthy participants listened to 80 International Affective Digital Sound stimuli and rated their own emotions generated by each sound stimulus in terms of the arousal, dominance, and valence dimensions. The whole-brain spatial patterns from a general linear model (i.e., beta-valued maps) for each sound stimulus and the emotional response ratings were used as the input and output for the DNN P , respectively. Based on a nested five-fold cross-validation scheme, the paired input and output data were divided into training (three-fold), validation (one-fold), and test (one-fold) data. The DNN P was trained and optimized using the training and validation data and was tested using the test data. The Pearson's correlation coefficients between the rated and predicted emotional responses from our DNN P model with weight sparsity optimization (mean ± standard error 0.52 ± 0.02 for arousal, 0.51 ± 0.03 for dominance, and 0.51 ± 0.03 for valence, with an input denoising level of 0.3 and a mini-batch size of 1) were significantly greater than those of DNN models with conventional regularization schemes including elastic net regularization (0.15 ± 0.05, 0.15 ± 0.06, and 0.21 ± 0.04 for arousal, dominance, and valence, respectively), those of shallow models including logistic regression (0.11 ± 0.04, 0.10 ± 0.05, and 0.17 ± 0.04 for arousal, dominance, and valence, respectively; average of logistic regression and sparse logistic regression), and those of support vector machine-based predictive models (SVM p s; 0.12 ± 0.06, 0.06 ± 0.06, and 0.10 ± 0.06 for arousal, dominance, and valence, respectively; average of linear and non-linear SVM p s). This difference was confirmed to be significant with a Bonferroni-corrected p -value of less than 0.001 from a one-way analysis of variance (ANOVA) and subsequent paired t -test. The weights of the trained DNN P s were interpreted and input patterns that maximized or minimized the output of the DNN P s (i.e., the emotional responses) were estimated. Based on a binary classification of each emotion category (e.g., high arousal vs. low arousal), the error rates for the DNN P (31.2% ± 1.3% for arousal, 29.0% ± 1.7% for dominance, and 28.6% ± 3.0% for valence) were significantly lower than those for the linear SVM P (44.7% ± 2.0%, 50.7% ± 1.7%, and 47.4% ± 1.9% for arousal, dominance, and valence, respectively) and the non-linear SVM P (48.8% ± 2.3%, 52.2% ± 1.9%, and 46.4% ± 1.3% for arousal, dominance, and valence, respectively), as confirmed by the Bonferroni-corrected p < 0.001 from the one-way ANOVA. Our study demonstrates that the DNN p model is able to reveal neuronal circuitry associated with human emotional processing – including structures in the limbic and paralimbic areas, which include the amygdala, prefrontal areas, anterior cingulate cortex, insula, and caudate. Our DNN p model was also able to use activation patterns in these structures to predict and classify emotional responses to stimuli. Highlights • Deep neural network (DNN) was trained to predict the human emotion measured from fMRI. • The prediction performance of the DNN was superior to that of the support vector machine. • Weight representation and input pattern estimation were introduced to interpret the trained DNN. • Brain regions related to emotional processing were identified from the DNN interpretation. • Representations of neuronal activations were readily separable at the higher hidden layer of the DNN. [ABSTRACT FROM AUTHOR]