Early prediction of Alzheimer’s disease (AD) is crucial for delaying its progression. As a chronic disease, ignoring the temporal dimension of AD data affects the performance of a progression detection and medically unacceptable. Besides, AD patients are represented by heterogeneous, yet complementary, multimodalities. Multitask modeling improves progression-detection performance, robustness, and stability. However, multimodal multitask modeling has not been evaluated using time series and deep learning paradigm, especially for AD progression detection. In this paper, we propose a robust ensemble deep learning model based on a stacked convolutional neural network (CNN) and a bidirectional long short-term memory (BiLSTM) network. This multimodal multitask model jointly predicts multiple variables based on the fusion of five types of multimodal time series data plus a set of background (BG) knowledge. Predicted variables include AD multiclass progression task, and four critical cognitive scores regression tasks. The proposed model extracts local and longitudinal features of each modality using a stacked CNN and BiLSTM network. Concurrently, local features are extracted from the BG data using a feed-forward neural network. Resultant features are fused to a deep network to detect common patterns which jointly used to predict the classification and regression tasks. To validate our model, we performed six experiments on five modalities from Alzheimer’s Disease Neuroimaging Initiative (ADNI) of 1536 subjects. The results of the proposed approach achieve state-of-the-art performance for both multiclass progression and regression tasks. Moreover, our approach can be generalized in other medial domains to analyze heterogeneous temporal data for predicting patient’s future status.