Effective data-driven rotating machine fault diagnosis has recently been a research topic in the diagnosis and health management of machinery systems owing to the benefits, including safety guarantee, labor saving, and reliability improvement. However, in vast real-world applications, the classifier trained on one dataset will be extended to datasets under variant working conditions. Meanwhile, the deviation between datasets can be triggered easily by rotating speed oscillation and load variation, and it will highly degenerate the performance of machine learning-based fault diagnosis methods. Hence, a novel dataset distribution discrepancy measuring algorithm called high-order Kullback–Leibler (HKL) divergence is proposed. Based on HKL divergence and transfer learning, a new fault diagnosis network which is robust to working condition variation is constructed in this paper. In feature extraction, sparse filtering with HKL divergence is proposed to learn sharing and discriminative features of the source and target domains. In feature classification, HKL divergence is introduced into softmax regression to link the domain adaptation with health conditions. Its effectiveness is verified by experiments on a rolling bearing dataset and a gearbox dataset, which include 18 transfer learning cases. Furthermore, the asymmetrical performance phenomenon found in experiments is also analyzed.