While Generative Adversarial Networks (GAN) are popular for their higher sample quality as opposed to other generative models like Variational Autoencoders (VAE) and Boltzmann machines, it is still difficult to evaluate their overall performance. Various aspects must be kept in mind, such as the quality of generated samples, the diversity of classes (within a class and among classes), avoiding overfitting, the use of disentangled latent spaces, agreement of said evaluation metric with human perception, etc. In this paper, we propose a new score, namely, GM Score, which takes into various factors such as sample quality, intra-class and inter-class diversity, overfitting, and other metrics such as precision, recall, and F1 score is employed for discriminability of latent space of Deep Belief Network (DBN) and Restricted Boltzmann Machine (RBM). The evaluation is done for different GANs (GAN, DCGAN, BiGAN, CGAN, CoupledGAN, LSGAN, SGAN, WGAN, and WGAN Improved) trained on the benchmark MNIST dataset. [ABSTRACT FROM AUTHOR]