Hirokazu Kameoka, Hsin-Te Hwang, Driss Matrouf, Markus Becker, Quan Wang, Sahidullah, Ye Jia, Yu Zhang, Lauri Juvela, Hsin-Min Wang, Wen-Chin Huang, Zhen-Hua Ling, Yuan Jiang, Yi-Chiao Wu, Héctor Delgado, Massimiliano Todisco, Yu Tsao, Li-Juan Liu, Junichi Yamagishi, Jean-François Bonastre, Tomoki Toda, Nicholas Evans, Robert A. J. Clark, Kai Onuma, Yu-Huai Peng, Sébastien Le Maguer, Avashna Govender, Takashi Kaneda, Andreas Nautsch, Kong Aik Lee, Xin Wang, Srikanth Ronanki, Ville Vestman, Koji Mushika, Ingmar Steiner, Tomi Kinnunen, Fergus Henderson, Jing-Xuan Zhang, Kou Tanaka, Paavo Alku, Hitotsubashi University, University of Edinburgh, Eurecom [Sophia Antipolis], Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), University of Eastern Finland, NEC Corporation, Aalto University, Academia Sinica, ADAPT Centre, Sigmedia Lab, EE Engineering, Trinity College Dublin, Google Inc [Mountain View], Research at Google, Hoya Corp., iFlytek Research, Nagoya City University [Nagoya, Japan], NTT Communication Science Laboratories, NTT Corporation, audEERING GmbH, Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, The Centre for Speech Technology Research [Edinburgh] (CSTR), Southern University of Science and Technology (SUSTech), The work was partially supported by JST CREST Grant No. JPMJCR18A6 (VoicePersonae project), Japan, MEXT KAKENHI Grant Nos. (16H06302, 16K16096, 17H04687, 18H04120, 18H04112, 18KT0051), Japan, the VoicePersonae and RESPECT projects funded by the French Agence Nationale de la Recherche (ANR), the Academy of Finland (NOTCH project no. 309629), and Region Grand Est, France. entitled 'NOTCH: NOn-cooperaTive speaker CHaracterization'). The authors at the University of Eastern Finland also gratefully acknowledge the use of the computational infrastructures at CSC – the IT Center for Science, and the support of the NVIDIA Corporation the donation of a Titan V GPU used in this research. The numerical calculations of some of the spoofed data were carried out on the TSUBAME3.0 supercomputer at the Tokyo Institute of Technology. The work is also partially supported by Region Grand Est, France. The ADAPT centre (13/RC/2106) is funded by the Science Foundation Ireland (SFI)., National Institute of Informatics, EURECOM, Université de Lorraine, Dept Signal Process and Acoust, Trinity College Dublin, Google, USA, HOYA Corporation, IFLYTEK Co., Ltd., Nagoya University, AudEERING GmbH, Avignon Université, University of Science and Technology of China, Aalto-yliopisto, and Southern University of Science and Technology of China (SUSTech)