Abstract Climate models are basic tools to obtain reliable estimates of future climate change and its effects on the water resources and agriculture in given basin. However, all climate models are not equally valuable for all areas. Therefore, determining the most appropriate climate model for a specific study area is essential. This study examines the performance of 10 CORDEX-AFRICA-220 Regional Climate Models (RCMs), three downscaling institutional based ensembles mean (Reg ensemble, CCLM ensemble and REMOO ensemble) and the multi-model ensemble mean. The models were evaluated based on their ability in replicating the seasonal and annual rainfall, minimum and maximum temperature and inter-annual variability for the period of 1986–2005 using statistical metrics such as BIAS, Root Mean Square Error (RMSE), Pearson correlation coefficient (r), coefficient of variation (CV), Kling Gupta Efficiency (KGE) and Taylor diagram. The findings indicated that HadREMOO, MPI-Reg4-7, HadReg4-7, Reg ensemble, and multi-model ensemble mean performed relatively better in representing the mean annual observed rainfall at the Adiramets, Debarik Ketema, Niguse Maystebri, and Zarima stations, respectively. Whereas, NorESM-CCLM, MPI-CCLM, NorESM-Reg4-7, and NorESM-REMOO exhibited a weak performance in reproducing the observed mean annual rainfall at the Adiramets, Debarik Ketema Niguse, Maystebri, and Zarima stations, respectively. Similarly, RCMs generally capture the mean annual maximum temperature of climatic stationsof Zarima subbasin well. Specifically, the MPI-Reg4-7 simulation performs well in representing the mean annual observed maximum temperature at Adiramets and Maytsebri stations, while the Debarik and Ketema Niguse stations exhibit superior performance in the HadReg4-7 simulation and the Zarima station shows better representation in the CCLM ensemble simulations. The majority of the model simulations exhibit good representation of mean annual minimum temperature at Adiramets, Debarik, and Zarima stations. Specifically, CanESM-RCM, HadReg4-7, REMOOensemble, multi-model ensemble, and Regensemble simulations perform better at Adiramets, Debarik, Ketema niguse, Maystebri and Zarima stations respectively. This suggests that these models may have biases or shortcomings in capturing the temperature values in the subbasin. Furthermore, NorESM-CCLM at Adiramets, Ketema niguse, and Zarima stations, NorESM-REMOO at Debarik station, and HadReg4-7 at Maystebri station demonstrate poor performance in representing the observed mean minimum temprature. Majority of the RCMs, all institutional based ensemble means and the multi-model ensemble mean simulations overestimate the observed mean annual rainfall of the Zarima subbasin with minimum bias of 0.02 mm at Ketema niguse HadReg4-7and maximum bias of 2.81 mm at Maytsebri MPI-CCLM simulation. Similarly, HadReg4-7 simulation of Ketama Niguse MPI-CCLM showed a minimum 0.02 mm and Maytsebri simulation kiremit season mean rainfall showed a maximum bias of and 2.99 mm. Regarding mean annual and kiremit season maximum and minimum temperature of the Zarima subbasin were overestimated by majority of the simulation and the ensemble means. The correlation (r) of observed and model simulated mean annual and kiremit season rainfall was strong (0.60–0.79) and very strong (0.80–0.99) in the majority of the simulations except Ketema niguse station mean annual and kiremit season rainfall simulations of MPI-REMOO, NorESM-Reg4-7; Debarik station kiremit season rainfall of NorESM-CCLM and NorESM-REMOO, MPI-Reg4-7 and MPI-REMOO, which showed moderate correlation. The performance of the RCMs, institutional based ensemble means and multi-model ensemble mean were different in statistical metrics (BIAS, RMSE, r, CV and KGE) and Taylor diagram. Among the simulations and ensemble means, the multi-model ensemble mean was superiors in two or more of statistical metrics at each station of the Zarima subbasin except Maytsebri station kiremit season rainfall, where the CCLM ensemble was better. Consistently, the Taylor diagram showed that the multi-model ensemble was better in the replication of the areal annual and kiremit season rainfall, maximum and minimum temperature of the subbasin. This finding evidenced that selecting the best RCMs and ensemble mean is necessary for climate projection and climate change impact assessment study.