En Tao Wang, Aregu Amsalu Aserse, Raúl Rivas González, José David Flores Félix, Anastasia P. Tampakaki, Marta Maluk, Encarna Velázquez, Arvind Gulati, Stéphane Boivin, M. Harun-or Rashid, Marc Lepetit, Euan K. James, Chang Fu Tian, Sara Moeskjær, Beatriz Jorrin, Sameh H. Youseif, Alexey M. Afonin, Michael F. Hynes, Evgeny E. Andronov, Martha-Helena Ramírez-Bahena, Gregory Kenicer, Alvaro Peix, J. Peter W. Young, Praveen Rahi, Benjamin J. Perry, Maria Izabel A. Cavassim, Ecosystems and Environment Research Programme, European Commission, Velázquez, Encarna, Rivas, Raúl, Peix, Álvaro, Ramírez Bahena, M. Helena, University of York [York, UK], Aarhus University [Aarhus], All-Russia Research Institute for Agricultural Microbiology [Saint-Pétersbourg, Russie] (ARRIAM), Ctr Natl Rech Sci, Inst Biochim & Biophys Mol & Cellulaire, Unite Mixte Rech 8619, Université Paris-Sud - Paris 11 (UP11), The James Hutton Institute, University of California [Los Angeles] (UCLA), University of California (UC), Bangladesh Institute of Nuclear Agriculture, Helsingin yliopisto = Helsingfors universitet = University of Helsinki, University of Otago [Dunedin, Nouvelle-Zélande], Inst Politecn Nacl, CICATA, Queretaro 76090, Mexico, Instituto Politecnico Nacional [Mexico] (IPN), Universidad de Salamanca, Agricultural University of Athens, University of Beira Interior [Portugal] (UBI), International Center for Agricultural Research in the Dry Areas [Egypte] (ICARDA), International Center for Agricultural Research in the Dry Areas (ICARDA), Consultative Group on International Agricultural Research [CGIAR] (CGIAR)-Consultative Group on International Agricultural Research [CGIAR] (CGIAR), Institut Sophia Agrobiotech (ISA), Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Université Côte d'Azur (UCA), Laboratoire des symbioses tropicales et méditerranéennes (UMR LSTM), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut de Recherche pour le Développement (IRD)-Université de Montpellier (UM)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut Agro - Montpellier SupAgro, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), University of Oxford, Royal Botanic Garden [Edinburgh], University of Calgary, Institute of Himalayan Bioresource Technology, China Agricultural University (CAU), ANR-16-CE20-0021,GRaSP,Caractérisation du déterminisme génétique du choix du partenaire symbiotique pour une amélioration de la symbiose fixatrice d'azote chez le pois(2016), European Project: 613551,EC:FP7:KBBE,FP7-KBBE-2013-7-single-stage,LEGATO(2014), Velázquez, Encarna [0000-0002-5946-7241], Rivas, Raúl [0000-0003-2202-1470], Peix, Álvaro [0000-0001-5084-1586], and Ramírez Bahena, M. Helena [0000-0002-0744-8313]
Bacteria currently included in Rhizobium leguminosarum are too diverse to be considered a single species, so we can refer to this as a species complex (the Rlc). We have found 429 publicly available genome sequences that fall within the Rlc and these show that the Rlc is a distinct entity, well separated from other species in the genus. Its sister taxon is R. anhuiense. We constructed a phylogeny based on concatenated sequences of 120 universal (core) genes, and calculated pairwise average nucleotide identity (ANI) between all genomes. From these analyses, we concluded that the Rlc includes 18 distinct genospecies, plus 7 unique strains that are not placed in these genospecies. Each genospecies is separated by a distinct gap in ANI values, usually at approximately 96% ANI, implying that it is a &lsquo, natural&rsquo, unit. Five of the genospecies include the type strains of named species: R. laguerreae, R. sophorae, R. ruizarguesonis, &ldquo, R. indicum&rdquo, and R. leguminosarum itself. The 16S ribosomal RNA sequence is remarkably diverse within the Rlc, but does not distinguish the genospecies. Partial sequences of housekeeping genes, which have frequently been used to characterize isolate collections, can mostly be assigned unambiguously to a genospecies, but alleles within a genospecies do not always form a clade, so single genes are not a reliable guide to the true phylogeny of the strains. We conclude that access to a large number of genome sequences is a powerful tool for characterizing the diversity of bacteria, and that taxonomic conclusions should be based on all available genome sequences, not just those of type strains.