Purpose Radiation doses accumulated during very complicated image-guided x-ray procedures have the potential to cause stochastic, but also deterministic effects, such as skin rashes or even hair loss. To monitor and reduce radiation-related risks to patients' skin, x-ray imaging devices are equipped with online air kerma monitoring components. Traditionally, such measurements have been used to estimate skin entrance dose by (a) estimating air kerma at the interventional reference point (IRP), (b) forward projecting the dose distribution, and (c) considering a backscatter factor among other correction factors. Unfortunately, the complicated interaction between incident x-ray photons, secondary electrons, and skin tissue cannot be properly accounted for by assuming a linear relationship between forward projected air kerma and a backscatter factor. Gold standard skin dose models are therefore determined using Monte Carlo (MC) techniques. However, MC simulations are computationally complex in general and possible acceleration mainly depends on the employed hardware and variance reduction techniques. To obtain reliable and fast dose estimates, we propose to combine MC-based simulations with learning-based methods. Methods The basic idea of our method is to approximate the radiation physics to calculate a first-order exposure estimate quickly. This initial estimate is then refined using prior knowledge derived from MC simulations. To this end, the primary photon propagation inside a voxelized patient model is estimated using a less accurate but fast photon ray casting (RC) simulation based on the Beer-Lambert law. The results of the RC simulation are then fed into a convolutional neural network (CNN), which maps the propagation of primary photons to the dose deposition inside the patient model. Additionally, the patient model itself including anatomy and material properties, such as mass density and mass energy-absorption coefficients, are fed into the CNN as well. The CNN is trained using smoothed results of MC simulations as output and RC simulations of identical imaging settings and patient models as input. Results In total, 163 MC and associated RC simulations are carried out for the head, thorax, abdomen, and pelvis in three different voxel phantoms. We used 10 8 or 10 9 primarily emitted photons sampled from a 125 kV peak voltage spectrum, respectively. Edge-preserving smoothing (EPS) is applied to reduce (a) general stochastic uncertainties and (b) stochastic uncertainty concerning MC simulations of less primary photons. The CNN is trained using seven imaging settings of the abdomen in a single phantom. Testing its performance on the remaining datasets, the CNN is capable of estimating skin dose with an error of below 10% for the majority of test cases. Conclusion The combination of deep neural networks and MC simulation of particle physics has the potential to decrease the computational complexity of accurate skin dose estimation. The proposed approach can provide dose distributions in under one second when running on high-end hardware. On lower cost hardware, it took up to 2 min to arrive at the same result. This makes our approach applicable in high-end environments as well as in budget solutions. Furthermore, the number of primary photons only affects the training time, while the execution time is independent of the number of primary photons.