Physics-informed neural networks (PINNs), introduced in [M. Raissi, P. Perdikaris, and G. Karniadakis, J. Comput. Phys., 378 (2019), pp. 686-707], are effective in solving integerorderpartial differential equations (PDEs) based on scattered and noisy data. PINNs employstandard feedforward neural networks (NNs) with the PDEs explicitly encoded into the NN usingautomatic differentiation, while the sum of the mean-squared PDE residuals and the meansquarederror in initial-boundary conditions is minimized with respect to the NN parameters. Herewe extend PINNs to fractional PINNs (fPINNs) to solve space-time fractional advection-diffusionequations (fractional ADEs), and we study systematically their convergence, hence explaining bothfPINNs and PINNs for the first time. Specifically, we demonstrate their accuracy and effectivenessin solving multidimensional forward and inverse problems with forcing terms whose valuesare only known at randomly scattered spatio-temporal coordinates (black-box (BB) forcing terms).A novel element of the fPINNs is the hybrid approach that we introduce for constructing theresidual in the loss function using both automatic differentiation for the integer-order operatorsand numerical discretization for the fractional operators. This approach bypasses the difficultiesstemming from the fact that automatic differentiation is not applicable to fractional operators becausethe standard chain rule in integer calculus is not valid in fractional calculus. To discretizethe fractional operators, we employ the Grünwald-Letnikov (GL) formula in one-dimensional fractionalADEs and the vector GL formula in conjunction with the directional fractional Laplacianin two- and three-dimensional fractional ADEs. We first consider the one-dimensional fractionalPoisson equation and compare the convergence of the fPINNs against the finite difference method(FDM). We present the solution convergence using both the mean L2 error as well as the standarddeviation due to sensitivity to NN parameter initializations. Using different GL formulaswe observe first-, second-, and third-order convergence rates for small size training sets but theerror saturates for larger training sets. We explain these results by analyzing the four sourcesof numerical errors due to discretization, sampling, NN approximation, and optimization. Thetotal error decays monotonically (below 10-5 for a third-order GL formula) but it saturates beyondthat point due to the optimization error. We also analyze the relative balance between discretizationand sampling errors and observe that the sampling size and the number of discretizationpoints (auxiliary points) should be comparable to achieve the highest accuracy. As we increasethe depth of the NN up to certain value, the mean error decreases and the standard deviationincreases, whereas the width has essentially no effect unless its value is either too small or toolarge. We next consider time-dependent fractional ADEs and compare white-box (WB) and BBforcing. We observe that for the WB forcing, our results are similar to the aforementioned cases;however, for the BB forcing fPINNs outperform FDM. Subsequently, we consider multidimensionaltime-, space-, and space-time-fractional ADEs using the directional fractional Laplacian and weobserve relative errors of 10-3 ~ 10-4. Finally, we solve several inverse problems in one, two, and three dimensions to identify the fractional orders, diffusion coefficients, and transport velocitiesand obtain accurate results given proper initializations even in the presence of significantnoise. [ABSTRACT FROM AUTHOR]