The enclosed notebooks comprise the analysis for A large National Institute for Health (NIHR) Biomedical Research Centre Facilitates Cross-Disciplinary and Collaborative Research Outputs: A bibliometric analysis Vasiliki Kiparoglou, Laurence A. Brown, Helen McShane, Keith M. Channon, Syed Ghulam Sarwar Shah The final analysis for the metrics was run on the 27th Jan 2021 (when data was obtained from the respective APIs). Python (Jupyter) notebooks are available describing the entire analysis from the original curated list of publications, through to the lists of DOIs used to generate the author networks. Majority of the analysis can be run from these notebooks, except for a final manual check of the available titles and identifiers. The analyses make use of a number of packages from the PyData ecosystem, including Jupyter, IPython, Pandas, Numpy, Scipy, Holoviz libraries (Bokeh, Hvplot, Holoviews, Panel, Networkx, Requests, FuzzyWuzzy and Habanero for the Crossref API). the research environment can be created using `conda env create -f BRC_biblio.yml` Section A: Matching Digital Object Identifiers (DOIs) The publications included in the current paper were defined as those that were reported to the NIHR as the output of the Oxford Biomedical Research Centre (BRC) between 1st April 2012 and the 31st March 2017. This was the second period of funding for this research centre, hence ‘OxBRC2’. Individual papers were identified by individuals involved in research facilitation within Oxford BRC and from Bodleian Heathcare Libraries in the University of Oxford. Inclusion criteria supplied by NIHR for publications and stipulate amongst other things that work This first section deals with the aggregated data returned across the BRC, where final details of publications were not yet available or publication lag meant individual items were returned in more than one year. The notebooks deal with using available online resources to find the closest matches to the title and, where possible the Digital Object Identifiers (DOIs) available and recorded. Initially the records were cleaned to aid matching a single Digital Object Identifier (DOI) for each reference, a unique identifier that makes obtaining further information such as citation data possible. A first attempt to find a matching title and Digital object identifier (DOI) for all of the entries in the collated list of references form OxBRC2. The title field to question the Crossref API ( https://api.Crossref.org) and obtain the closest matched title on record and matching DOIs. After cleaning text the returned data from Crossref was compared to the original file for exact matches in title, DOI, or both. Titles and DOIs obtained from Crossref were used to used to compare to the orgininal record, and the score of a fuzzy-match for each reference was visualized. For references with no match or only a weak match after this process, the Crossref text query tool for matching refences (https://apps.Crossref.org/simpleTextQuery) Data from both searches were used alongside manual searches of the bibliographic databases, pubmed ( https://www.ncbi.nlm.nih.gov/pubmed/ ) and euroPMC ( https://europepmc.org/ ) and finally further internet searches where required. This process produced a single DOI for each of the publications in the original list (where one existed). Section B: Gathering Metrics The curated list of DOIs for publications generated during BRC Oxford phase2, created in section A was used to query both the Dimensions.ai metrics API (using the Requests Python package), and the Crossref API (using the Habanero package) In order to confirm results found with OxBRC2 data, A check on the assumption that the Field Citation Ratio (FCR) for all available references will on average =1. was carried out using randomly-selected DOIs from Crossref and the Dimensions metrics API Section C: Aggregation and Plotting Calculation of the overall (OxBRC2-wide) metrics for OxBRC2 and plots without separating data by research group or type Section D: Construction of Author-association Networks Using a list of all the valid DOIs, we query the CrossRef API to acquire the lists of authors for each reference and the publication date. This is followed by a look at the Journals and Publishers used in OxBRC2. The lists of authors in each reference can then be used to generate a long list of pairwise connections between authors, each with date information and weightings for the connection. 1. Split the author list for each reference into 1-to-1 author connections (edges) 2. Clean up the names of each of the authors (defaulting to Intials with dots of each previous name then last name Capitalised) 3. Extract primary affiliations for authors if present 4. Add additional details about research group and type of groups (Themes/Working Groups/Other) to each line/edge 5. Save list of edges 6. Find unique list of authors and link DOIs and group information to each 7. Look for Oxford in affiliations 8. Export author list Reindex nodes and edges for matching the data can be built into networks using the Python networkx library. these can be visualised, or exported to Gephi, as can sub-networks focused either on part of the OxBRC2 funding period, or on certain research groups or group-types. Section E(xtras): Other notebooks that might be of interest Extra_C3: and example of an Panel app to explore the data Extra_D8: Splitting the list of DOIs into sublists that can be used in VOSviewer Extra_D9: Splitting the author-association network into subnetworks based on research group type --- The optional use of VOSviewer and Gephi Author association netowrks can be created in VOSviewer (version 1.6.11, https://www.vosviewer.com/) for comparison. Where individual authors were associated with more than one research group, all associations were recorded, with the most prevalent used as the primary group (or type of group) for the author (notebook Extra_D8). For VOSviewer, each list of DOIs was imported via the Crossref DOI resource. Then networks were created with fractional counting of co-authorship, with no exclusion of papers with large numbers of authors. Additionally, a thesaurus file was constructed to aid with aggregation of records where authors have multiple names or initials that are recorded inconsistently. No restriction was made on the minimum number of publications for inclusion. The resulting network files (.gml or .GEXF) were exported for visualization analysis in the Gephi (version 0.9.2, https://gephi.org/). Each of the three networks were analysed within Gephi to obtain measure of complexity (nodes and edges) and connectivity (average path length) and to filter networks for final figures. Resaults were the same as with Networkx., Funding statement: This work was funded/supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (Research Grant Number IS-BRC-1215-20008). The views expressed are those of the author(s) and not necessarily those of the National Health Service, the NIHR, or the Department of Health.