46 results on '"Pedro J. Ballester"'
Search Results
2. Interpretable Machine Learning Models to Predict the Resistance of Breast Cancer Patients to Doxorubicin from Their microRNA Profiles
- Author
-
Adeolu Z. Ogunleye, Chayanit Piyawajanusorn, Anthony Gonçalves, Ghita Ghislat, Pedro J. Ballester, Centre de Recherche en Cancérologie de Marseille (CRCM), Aix Marseille Université (AMU)-Institut Paoli-Calmettes, Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)-Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Centre d'Immunologie de Marseille - Luminy (CIML), Aix Marseille Université (AMU)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), and This work was supported by grant funding from the Indo-French Centre for the Promotion of Advanced Research - CEFIPRA and the Petroleum Technology Development Fund (PTDF), Nigeria.
- Subjects
[SDV]Life Sciences [q-bio] ,General Chemical Engineering ,General Engineering ,General Physics and Astronomy ,Medicine (miscellaneous) ,Breast Neoplasms ,artificial intelligence ,Biochemistry, Genetics and Molecular Biology (miscellaneous) ,tumor profiling ,Machine Learning ,MicroRNAs ,Doxorubicin ,precision oncology ,Humans ,General Materials Science ,Female ,multiomics ,Algorithms - Abstract
International audience; Doxorubicin is a common treatment for breast cancer. However, not all patients respond to this drug, which sometimes causes life-threatening side effects. Accurately anticipating doxorubicin-resistant patients would therefore permit to spare them this risk while considering alternative treatments without delay. Stratifying patients based on molecular markers in their pretreatment tumors is a promising approach to advance toward this ambitious goal, but single-gene gene markers such as HER2 expression have not shown to be sufficiently predictive. The recent availability of matched doxorubicin-response and diverse molecular profiles across breast cancer patients permits now analysis at a much larger scale. 16 machine learning algorithms and 8 molecular profiles are systematically evaluated on the same cohort of patients. Only 2 of the 128 resulting models are substantially predictive, showing that they can be easily missed by a standard-scale analysis. The best model is classification and regression tree (CART) nonlinearly combining 4 selected miRNA isoforms to predict doxorubicin response (median Matthew correlation coefficient (MCC) and area under the curve (AUC) of 0.56 and 0.80, respectively). By contrast, HER2 expression is significantly less predictive (median MCC and AUC of 0.14 and 0.57, respectively). As the predictive accuracy of this CART model increases with larger training sets, its update with future data should result in even better accuracy.
- Published
- 2022
- Full Text
- View/download PDF
3. On the Best Way to Cluster NCI-60 Molecules
- Author
-
Saiveth Hernández-Hernández and Pedro J. Ballester
- Subjects
NCI-60 panel ,small molecules ,clustering ,model validation ,Molecular Biology ,Biochemistry - Abstract
Machine learning-based models have been widely used in the early drug-design pipeline. To validate these models, cross-validation strategies have been employed, including those using clustering of molecules in terms of their chemical structures. However, the poor clustering of compounds will compromise such validation, especially on test molecules dissimilar to those in the training set. This study aims at finding the best way to cluster the molecules screened by the National Cancer Institute (NCI)-60 project by comparing hierarchical, Taylor–Butina, and uniform manifold approximation and projection (UMAP) clustering methods. The best-performing algorithm can then be used to generate clusters for model validation strategies. This study also aims at measuring the impact of removing outlier molecules prior to the clustering step. Clustering results are evaluated using three well-known clustering quality metrics. In addition, we compute an average similarity matrix to assess the quality of each cluster. The results show variation in clustering quality from method to method. The clusters obtained by the hierarchical and Taylor–Butina methods are more computationally expensive to use in cross-validation strategies, and both cluster the molecules poorly. In contrast, the UMAP method provides the best quality, and therefore we recommend it to analyze this highly valuable dataset.
- Published
- 2023
- Full Text
- View/download PDF
4. Artificial intelligence for drug response prediction in disease models
- Author
-
Pedro J, Ballester, Rick, Stevens, Benjamin, Haibe-Kains, R Stephanie, Huang, and Tero, Aittokallio
- Subjects
Artificial Intelligence - Published
- 2021
5. Selecting machine-learning scoring functions for structure-based virtual screening
- Author
-
Pedro J. Ballester
- Subjects
0303 health sciences ,Virtual screening ,010405 organic chemistry ,business.industry ,Computer science ,3d model ,Machine learning ,computer.software_genre ,01 natural sciences ,0104 chemical sciences ,Machine Learning ,Structure-Activity Relationship ,03 medical and health sciences ,Software ,Pharmaceutical Preparations ,Docking (molecular) ,Drug Discovery ,Humans ,Molecular Medicine ,Structure based ,Artificial intelligence ,business ,computer ,030304 developmental biology - Abstract
Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.
- Published
- 2019
- Full Text
- View/download PDF
6. Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity?
- Author
-
Adrian Schreyer, Tom L. Blundell, Pedro J. Ballester, Blundell, Tom [0000-0002-2708-8992], and Apollo - University of Cambridge Repository
- Subjects
Models, Molecular ,Protein Conformation ,General Chemical Engineering ,Plasma protein binding ,Library and Information Sciences ,Machine learning ,computer.software_genre ,Ligands ,01 natural sciences ,Models, Biological ,Article ,03 medical and health sciences ,Protein structure ,Scoring functions for docking ,Artificial Intelligence ,0103 physical sciences ,Databases, Protein ,030304 developmental biology ,Binding affinities ,0303 health sciences ,010304 chemical physics ,Chemistry ,business.industry ,Intermolecular force ,Computational Biology ,Proteins ,General Chemistry ,Computer Science Applications ,Docking (molecular) ,Proteins metabolism ,Artificial intelligence ,Biological system ,business ,computer ,Protein ligand ,Protein Binding - Abstract
Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe an experimentally determined or modeled structure of a protein-ligand complex and its binding affinity. The inherent problem of this approach is in the difficulty of explicitly modeling the various contributions of intermolecular interactions to binding affinity. New scoring functions based on machine-learning regression models, which are able to exploit effectively much larger amounts of experimental data and circumvent the need for a predetermined functional form, have already been shown to outperform a broad range of state-of-the-art scoring functions in a widely used benchmark. Here, we investigate the impact of the chemical description of the complex on the predictive power of the resulting scoring function using a systematic battery of numerical experiments. The latter resulted in the most accurate scoring function to date on the benchmark. Strikingly, we also found that a more precise chemical description of the protein-ligand complex does not generally lead to a more accurate prediction of binding affinity. We discuss four factors that may contribute to this result: modeling assumptions, codependence of representation and regression, data restricted to the bound state, and conformational heterogeneity in data.
- Published
- 2021
- Full Text
- View/download PDF
7. OUP accepted manuscript
- Author
-
R. Stephanie Huang, Pedro J Ballester, Benjamin Haibe-Kains, Tero Aittokallio, and Rick Stevens
- Subjects
0303 health sciences ,medicine.medical_specialty ,business.industry ,0206 medical engineering ,MEDLINE ,02 engineering and technology ,Disease ,03 medical and health sciences ,medicine ,Drug response ,Intensive care medicine ,business ,Molecular Biology ,020602 bioinformatics ,030304 developmental biology ,Information Systems - Published
- 2021
- Full Text
- View/download PDF
8. An NF-κB/IRF1 axis programs cDC1s to drive anti-tumor immunity
- Author
-
Christophe Verthuy, Pedro J. Ballester, Bernard Malissen, Karine Crozat, Elodie Baudoin, Toby Lawrence, Pierre Milpied, Noudjoud Attaf, Thien-Phong Vu Manh, Nathalie Auphan-Anezin, Marc Dalod, Chuang Dong, Ammar Sabir Cheema, and Ghita Ghislat
- Subjects
Chemokine ,medicine.medical_treatment ,Cell ,Biology ,Cell biology ,medicine.anatomical_structure ,Cancer immunotherapy ,Antigen ,medicine ,biology.protein ,Cytotoxic T cell ,Signal transduction ,Reprogramming ,CD8 - Abstract
Conventional type 1 dendritic cells (cDC1s) are critical for anti-tumor immunity. They acquire antigens from dying tumor cells and cross-present them to CD8+ T cells, promoting the expansion of tumor-specific cytotoxic T cells. However, the signaling pathways that govern the anti-tumor functions of cDC1s are poorly understood. We mapped the molecular pathways regulating intra-tumoral cDC1 maturation using single cell RNA sequencing. We identified NF-κB and IFN pathways as being highly enriched in a subset of functionally mature cDC1s. The specific targeting of NF-κB or IFN pathways in cDC1s prevented the recruitment and activation of CD8+ T cells and the control of tumor growth. We identified an NF-κB-dependent IFNγ-regulated gene network in cDC1s, including cytokines and chemokines specialized in the recruitment and activation of cytotoxic T cells. We used single cell transcriptomes to map the trajectory of intra-tumoral cDC1 maturation which revealed the dynamic reprogramming of tumor-infiltrating cDC1s by NF-κB and IFN signaling pathways. This maturation process was perturbed by specific inactivation of either NF-κB or IRF1 in cDC1s, resulting in impaired expression of IFN-γ-responsive genes and consequently a failure to efficiently recruit and activate anti-tumoral CD8+ T cells. Finally, we demonstrate the relevance of these findings to cancer patients, showing that activation of the NF-κB/IRF1 axis in association with cDC1s is linked with improved clinical outcome. The NF-κB/IRF1 axis in cDC1s may therefore represent an important focal point for the development new diagnostic and therapeutic approaches to improve cancer immunotherapy.One Sentence SummaryNF-κB and IRF1 coordinate intra-tumoral cDC1 maturation and control of immunogenic tumor growth.
- Published
- 2020
- Full Text
- View/download PDF
9. Drug Repurposing for Covid-19: Discovery of Potential Small-Molecule Inhibitors of Spike Protein-ACE2 Receptor Interaction Through Virtual Screening and Consensus Scoring
- Author
-
Michael Oravic, Autumn Campbell, Juliette DiFlumeri, Elena Fattakhova, Pedro J. Ballester, Jeremy Hofer, and Sachin Patil
- Abstract
Objective There is an increased interest in drug repurposing against Covid-19 (SARS-CoV-2) as its spread has significantly outpaced development of effective therapeutics. Our aim is to identify approved drugs that can inhibit the interaction of SARS-CoV-2 spike protein with human angiotensin-converting enzyme 2 (ACE2) that is critical for coronavirus infection. Methods The published crystal structure of SARS-CoV-2 spike protein-ACE2 receptor interaction was first analyzed for druggable binding pockets. The binding interface was then probed by an integrated virtual screening protocol executed by a high-performance computer cluster, involving docking and consensus scoring using various machine-learning, empirical and knowledge-based scoring functions. The consensus-ranked lists of screened drugs were generated via ‘rank-by-rank’ and ‘rank-by-number’ schemes. Findings Although spike protein and ACE2 lacked druggable pockets in their unbound forms, they presented a well-defined pocket when bound together. Accordingly, we identified many drugs with high binding potential against this protein-protein interaction pocket. Importantly, several antivirals against two major (+)ssRNA viruses (HCV and HIV) constituted major group of our top hits, of which Atazanavir, Grazoprevir, Saquinavir, Simeprevir, Telaprevir and Tipranavir could be of most importance for immediate experimental/clinical investigations. Additional notable hits included many anti-inflammatory/antioxidant, antibiotic/antifungal, and other relevant compounds with proven activity against respiratory diseases, further emphasizing robustness of our current study. Notably, we also discovered Maraviroc, the only FDA-approved drug capable of targeting virus-host interaction and blocking HIV entry. Conclusion Our newly identified compounds warrant further experimental investigation against SARS-CoV-2 spike-ACE2 interaction, which if proven effective may present much-needed immediate clinical potential against Covid-19.
- Published
- 2020
- Full Text
- View/download PDF
10. Machine‐learning scoring functions for structure‐based virtual screening
- Author
-
Kam-Heung Sze, Hongjian Li, Pedro J. Ballester, and Gang Lu
- Subjects
Virtual screening ,Computer science ,business.industry ,Machine learning ,computer.software_genre ,Biochemistry ,Computer Science Applications ,Computational Mathematics ,Materials Chemistry ,Structure based ,Artificial intelligence ,Physical and Theoretical Chemistry ,business ,computer - Published
- 2020
- Full Text
- View/download PDF
11. Predicting Cancer Drug Response In Vivo by Learning an Optimal Feature Selection of Tumour Molecular Profiles
- Author
-
Alejandra Bruna, Stefan Naulaerts, Pedro J. Ballester, Ghita Ghislat, Linh C. Nguyen, Centre de Recherche en Cancérologie de Marseille (CRCM), Aix Marseille Université (AMU)-Institut Paoli-Calmettes, Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)-Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Institut Paoli-Calmettes, Fédération nationale des Centres de lutte contre le Cancer (FNCLCC), Aix Marseille Université (AMU), De Duve Institute, de Duve Institute, Ludwig Institute for Cancer Research, The institute of cancer research [London], Centre d'Immunologie de Marseille - Luminy (CIML), Aix Marseille Université (AMU)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), and DUMENIL, Anita
- Subjects
Oncology ,medicine.medical_specialty ,patient-derived xenograft ,QH301-705.5 ,Colorectal cancer ,[SDV]Life Sciences [q-bio] ,Medicine (miscellaneous) ,Feature selection ,Article ,General Biochemistry, Genetics and Molecular Biology ,chemistry.chemical_compound ,Breast cancer ,Internal medicine ,biomarker discovery ,Medicine ,Biology (General) ,Biomarker discovery ,Cetuximab ,business.industry ,Cancer ,Binimetinib ,medicine.disease ,[SDV] Life Sciences [q-bio] ,tumour profiling ,machine learning ,chemistry ,precision oncology ,Pharmacogenomics ,business ,medicine.drug - Abstract
International audience; (1) Background: Inter-tumour heterogeneity is one of cancer’s most fundamental features. Patient stratification based on drug response prediction is hence needed for effective anti-cancer therapy. However, single-gene markers of response are rare and/or may fail to achieve a significant impact in the clinic. Machine Learning (ML) is emerging as a particularly promising complementary approach to precision oncology. (2) Methods: Here we leverage comprehensive Patient-Derived Xenograft (PDX) pharmacogenomic data sets with dimensionality-reducing ML algorithms with this purpose. (3) Results: Combining multiple gene alterations via ML leads to better discrimination between sensitive and resistant PDXs in 19 of the 26 analysed cases. Highly predictive ML models employing concise gene lists were found for three cases: paclitaxel (breast cancer), binimetinib (breast cancer) and cetuximab (colorectal cancer). Interestingly, each of these multi-gene ML models identifies some treatment-responsive PDXs not harbouring the best actionable mutation for that case. Thus, ML multi-gene predictors generally have much fewer false negatives than the corresponding single-gene marker. (4) Conclusions: As PDXs often recapitulate clinical outcomes, these results suggest that many more patients could benefit from precision oncology if ML algorithms were also applied to existing clinical pharmacogenomics data, especially those algorithms generating classifiers combining data-selected gene alterations.
- Published
- 2021
- Full Text
- View/download PDF
12. Prediction of overall survival for patients with metastatic castration-resistant prostate cancer: development of a prognostic model through a crowdsourced challenge with open clinical trial data
- Author
-
Justin Guinney, Tao Wang, Teemu D Laajala, Kimberly Kanigel Winner, J Christopher Bare, Elias Chaibub Neto, Suleiman A Khan, Gopal Peddinti, Antti Airola, Tapio Pahikkala, Tuomas Mirtti, Thomas Yu, Brian M Bot, Liji Shen, Kald Abdallah, Thea Norman, Stephen Friend, Gustavo Stolovitzky, Howard Soule, Christopher J Sweeney, Charles J Ryan, Howard I Scher, Oliver Sartor, Yang Xie, Tero Aittokallio, Fang Liz Zhou, James C Costello, Catalina Anghe, Helia Azima, Robert Baertsch, Pedro J Ballester, Chris Bare, Vinayak Bhandari, Cuong C Dang, Maria Bekker-Nielsen Dunbar, Ann-Sophie Buchardt, Ljubomir Buturovic, Da Cao, Prabhakar Chalise, Junwoo Cho, Tzu-Ming Chu, R Yates Coley, Sailesh Conjeti, Sara Correia, Ziwei Dai, Junqiang Dai, Philip Dargatz, Sam Delavarkhan, Detian Deng, Ankur Dhanik, Yu Du, Aparna Elangovan, Shellie Ellis, Laura L Elo, Shadrielle M Espiritu, Fan Fan, Ashkan B Farshi, Ana Freitas, Brooke Fridley, Christiane Fuchs, Eyal Gofer, Gopalacharyulu Peddinti, Stefan Graw, Russ Greiner, Yuanfang Guan, Jing Guo, Pankaj Gupta, Anna I Guyer, Jiawei Han, Niels R Hansen, Billy HW Chang, Outi Hirvonen, Barbara Huang, Chao Huang, Jinseub Hwang, Joseph G Ibrahim, Vivek Jayaswa, Jouhyun Jeon, Zhicheng Ji, Deekshith Juvvadi, Sirkku Jyrkkiö, Kimberly Kanigel-Winner, Amin Katouzian, Marat D Kazanov, Shahin Khayyer, Dalho Kim, Agnieszka K Golinska, Devin Koestler, Fernanda Kokowicz, Ivan Kondofersky, Norbert Krautenbacher, Damjan Krstajic, Luke Kumar, Christoph Kurz, Matthew Kyan, Michael Laimighofer, Eunjee Lee, Wojciech Lesinski, Miaozhu Li, Ye Li, Qiuyu Lian, Xiaotao Liang, Minseong Lim, Henry Lin, Xihui Lin, Jing Lu, Mehrad Mahmoudian, Roozbeh Manshaei, Richard Meier, Dejan Miljkovic, Krzysztof Mnich, Nassir Navab, Elias C Neto, Yulia Newton, Subhabrata Pal, Byeongju Park, Jaykumar Patel, Swetabh Pathak, Alejandrina Pattin, Donna P Ankerst, Jian Peng, Anne H Petersen, Robin Philip, Stephen R Piccolo, Sebastian Pölsterl, Aneta Polewko-Klim, Karthik Rao, Xiang Ren, Miguel Rocha, Witold R. Rudnicki, Hyunnam Ryu, Hagen Scherb, Raghav Sehgal, Fatemeh Seyednasrollah, Jingbo Shang, Bin Shao, Howard Sher, Motoki Shiga, Artem Sokolov, Julia F Söllner, Lei Song, Josh Stuart, Ren Sun, Nazanin Tahmasebi, Kar-Tong Tan, Lisbeth Tomaziu, Joseph Usset, Yeeleng S Vang, Roberto Vega, Vitor Vieira, David Wang, Difei Wang, Junmei Wang, Lichao Wang, Sheng Wang, Yue Wang, Russ Wolfinger, Chris Wong, Zhenke Wu, Jinfeng Xiao, Xiaohui Xie, Doris Xin, Hojin Yang, Nancy Yu, Xiang Yu, Sulmaz Zahedi, Massimiliano Zanin, Chihao Zhang, Jingwen Zhang, Shihua Zhang, Yanchun Zhang, Hongtu Zhu, Shanfeng Zhu, Yuxin Zhu, Universidade do Minho, Institute for Molecular Medicine Finland, University of Helsinki, Department of Pathology, Medicum, Clinicum, HUSLAB, Tero Aittokallio / Principal Investigator, and Bioinformatics
- Subjects
Male ,0301 basic medicine ,Oncology ,BIOMEDICAL-RESEARCH ,law.invention ,DOUBLE-BLIND ,Prostate cancer ,0302 clinical medicine ,Randomized controlled trial ,law ,Antineoplastic Combined Chemotherapy Protocols ,health care economics and organizations ,DOCETAXEL ,PLACEBO ,Hazard ratio ,MEN ,Middle Aged ,CHEMOTHERAPY ,Prognosis ,3. Good health ,Survival Rate ,Prostatic Neoplasms, Castration-Resistant ,Docetaxel ,030220 oncology & carcinogenesis ,Crowdsourcing ,Taxoids ,medicine.drug ,Adult ,PREDNISONE ,medicine.medical_specialty ,ENZALUTAMIDE ,Adolescent ,3122 Cancers ,Young Adult ,03 medical and health sciences ,SDG 3 - Good Health and Well-being ,Internal medicine ,medicine ,Humans ,Survival rate ,Aged ,Neoplasm Staging ,Models, Statistical ,Science & Technology ,business.industry ,Proportional hazards model ,Clinical study design ,Bayes Theorem ,medicine.disease ,PHASE-III ,SIPULEUCEL-T IMMUNOTHERAPY ,Surgery ,Clinical trial ,Nomograms ,030104 developmental biology ,business ,Follow-Up Studies - Abstract
Background: Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interestnamely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trialENTHUSE M1in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0·791; Bayes factor >5) and surpassed the reference model (iAUC 0·743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3·32, 95% CI 2·394·62, p, European Union within the ERC grant LatentCauses supported the work of C.F and I.K. German Research Foundation (DFG) within the Collaborative Research Centre 1243, subproject A17 awarded to C.F. German Federal Ministry of Education and Research (BMBF) through the Research Consortium e:AtheroMED (Systems medicine of myocardial infarction and stroke) under the auspices of the e:Med Programme (grant # 01ZX1313C) supported the work of D.P.A., P.D., C.F., C.K., I.K., N.K., M.L., H.S. and J.F.S. at the Institute of Computational Biology. NIH Grants RR025747-01, MH086633 and 1UL1TR001111, and NSF Grants SES-1357666, DMS-14-07655 and BCS0826844 supported the work of C.H., J.I., E.L., Y.W., H.Y., H.Z. and J.Z. NSFC Grant Nos. 61332013, 61572139 supported the work of X.L, Y.L, Y.Z., and S.Z. National Natural Science Foundation of China grants [Nos. 61422309, 61379092] was awarded to S.Z. The Patrick C. Walsh Prostate Research Fund and the Johns Hopkins Individualized Health Initiative supported the work of R.Y.C., D.D., Y.D., Z.J., K.R., Z.W. and Y.Z. FCT Ph.D. Grant SFRH/BD/80925/2011 was awarded to S.C. Clinical Persona Inc., East Palo Alto, CA supported the work of L.B. and D.K. The Finnish Cultural Foundation and the Drug Research Doctoral Programme (DRDP) at the University of Turku supported T.D.L. The National Research Foundation Singapore and the Singapore Ministry of Education, under its Research Centres of Excellence initiative, supported the work of J.G. and K.T. A grant from the Russian Science Foundation 14-24-00155 was awarded to M.D.K. A*MIDEX grant (no. ANR-11-IDEX-0001-02) was awarded to P.J.B. NSERC supported the work of R.G. The Israeli Centers of Research Excellence (I-CORE) program (Center No. 4/11) supported the work of E.G. Academy of Finland (grants 292611, 269862, 272437, 279163, 295504), National Cancer Institute (16X064), and Cancer Society of Finland supported the work of T.A. Academy of Finland (grant 268531) supported the work of T.M
- Published
- 2017
- Full Text
- View/download PDF
13. Building Machine-Learning Scoring Functions for Structure-Based Prediction of Intermolecular Binding Affinity
- Author
-
Maciej, Wójcikowski, Pawel, Siedlecki, and Pedro J, Ballester
- Subjects
Machine Learning ,Models, Molecular ,Databases, Genetic ,Proteins ,Quantitative Structure-Activity Relationship ,Web Browser ,Ligands ,Software ,Protein Binding ,Workflow - Abstract
Molecular docking enables large-scale prediction of whether and how small molecules bind to a macromolecular target. Machine-learning scoring functions are particularly well suited to predict the strength of this interaction. Here we describe how to build RF-Score, a scoring function utilizing the machine-learning technique known as Random Forest (RF). We also point out how to use different data, features, and regression models using either R or Python programming languages.
- Published
- 2019
14. Predicting Synergism of Cancer Drug Combinations Using NCI-ALMANAC Data
- Author
-
Eddy Pasquier, Pedro J. Ballester, Stefan Naulaerts, Pavel Sidorov, Jeremy Ariey-Bonnet, Centre de Recherche en Cancérologie de Marseille (CRCM), Aix Marseille Université (AMU)-Institut Paoli-Calmettes, Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)-Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Institut Paoli-Calmettes, Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)-Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)-Aix Marseille Université (AMU), and Pasquier, Eddy
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,Drug ,[SDV.SP.MED] Life Sciences [q-bio]/Pharmaceutical sciences/Medication ,drug synergy ,Computer science ,QSAR (qualitative structure-activity relationships) ,media_common.quotation_subject ,In silico ,Cancer drugs ,[SDV.CAN]Life Sciences [q-bio]/Cancer ,Machine learning ,computer.software_genre ,chemoinformatics ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,03 medical and health sciences ,0302 clinical medicine ,[SDV.CAN] Life Sciences [q-bio]/Cancer ,[SDV.SP.MED]Life Sciences [q-bio]/Pharmaceutical sciences/Medication ,[CHIM.CHEM] Chemical Sciences/Cheminformatics ,Reliability (statistics) ,[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM] ,030304 developmental biology ,media_common ,Original Research ,0303 health sciences ,business.industry ,predictive (QSPR) models ,Random forest ,Cancer treatment ,Chemistry ,machine learning ,[SDV.SP.PHARMA] Life Sciences [q-bio]/Pharmaceutical sciences/Pharmacology ,030220 oncology & carcinogenesis ,[SDV.SP.PHARMA]Life Sciences [q-bio]/Pharmaceutical sciences/Pharmacology ,Artificial intelligence ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,business ,computer ,[CHIM.CHEM]Chemical Sciences/Cheminformatics - Abstract
BackgroundDrug combinations are of great interest for cancer treatment. Unfortunately, the discovery of synergistic combinations by purely experimental means is only feasible on small sets of drugs.In silicomodeling methods can substantially widen this search by providing tools able to predict which of all possible combinations in a large compound library are synergistic. Here we investigate to which extent drug combination synergy can be predicted by exploiting the largest available dataset to date (NCI-ALMANAC, with over 290,000 synergy determinations).MethodsEach cell line is modeled using primarily two machine learning techniques, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), on the datasets provided by NCI-ALMANAC. This large-scale predictive modeling study comprises more than 5000 pair-wise drug combinations, 60 cell lines, 4 types of models and 5 types of chemical features. The application of a powerful, yet uncommonly used, RF-specific technique for reliability prediction is also investigated.ResultsThe evaluation of these models shows that it is possible to predict the synergy of unseen drug combinations with high accuracy (Pearson correlations between 0.43 and 0.86 depending on the considered cell line, with XGBoost providing slightly better predictions than RF). We have also found that restricting to the most reliable synergy predictions results in at least two-fold error decrease with respect to employing the best learning algorithm without any reliability estimation. Alkylating agents, tyrosine kinase inhibitors and topoisomerase inhibitors are the drugs whose synergy with other partner drugs are better predicted by the models.ConclusionsDespite its leading size, NCI-ALMANAC comprises an extremely small part of all conceivable combinations. Given their accuracy and reliability estimation, the developed models should drastically reduce the number of requiredin vitrotests by predictingin silicowhich of the considered combinations are likely to be synergistic.
- Published
- 2019
- Full Text
- View/download PDF
15. USR-VS: a web server for large-scale prospective virtual screening using ultrafast shape recognition techniques
- Author
-
Pedro J. Ballester, Hongjian Li, Man Hon Wong, and Kwong-Sak Leung
- Subjects
0301 basic medicine ,Web server ,Indoles ,Similarity (geometry) ,Drug Evaluation, Preclinical ,Fluspirilene ,Computational biology ,Biology ,Ligands ,Bioinformatics ,computer.software_genre ,03 medical and health sciences ,Genetics ,Web Server issue ,Interactive visualization ,Internet ,Sulfonamides ,Activity profile ,Virtual screening ,Scale (chemistry) ,Reproducibility of Results ,030104 developmental biology ,Pharmaceutical Preparations ,Vemurafenib ,Drug Design ,computer ,Software - Abstract
Ligand-based Virtual Screening (VS) methods aim at identifying molecules with a similar activity profile across phenotypic and macromolecular targets to that of a query molecule used as search template. VS using 3D similarity methods have the advantage of biasing this search toward active molecules with innovative chemical scaffolds, which are highly sought after in drug design to provide novel leads with improved properties over the query molecule (e.g. patentable, of lower toxicity or increased potency). Ultrafast Shape Recognition (USR) has demonstrated excellent performance in the discovery of molecules with previously-unknown phenotypic or target activity, with retrospective studies suggesting that its pharmacophoric extension (USRCAT) should obtain even better hit rates once it is used prospectively. Here we present USR-VS (http://usr.marseille.inserm.fr/), the first web server using these two validated ligand-based 3D methods for large-scale prospective VS. In about 2 s, 93.9 million 3D conformers, expanded from 23.1 million purchasable molecules, are screened and the 100 most similar molecules among them in terms of 3D shape and pharmacophoric properties are shown. USR-VS functionality also provides interactive visualization of the similarity of the query molecule against the hit molecules as well as vendor information to purchase selected hits in order to be experimentally tested.
- Published
- 2016
- Full Text
- View/download PDF
16. Identification and Validation of Carbonic Anhydrase II as the First Target of the Anti-Inflammatory Drug Actarit
- Author
-
Pedro J. Ballester, Ghita Ghislat, Taufiq Rahman, Rahman, Taufiq [0000-0003-3830-5160], and Apollo - University of Cambridge Repository
- Subjects
Drug ,In silico ,media_common.quotation_subject ,Carbonic anhydrase II ,Anti-Inflammatory Agents ,lcsh:QR1-502 ,target prediction ,actarit ,Pharmacology ,Proof of Concept Study ,Biochemistry ,Article ,lcsh:Microbiology ,Arthritis, Rheumatoid ,Orphan drug ,03 medical and health sciences ,Drug Delivery Systems ,0302 clinical medicine ,malotilate ,medicine ,Humans ,Molecular Biology ,carbonic anhydrase II ,MolTarPred ,Phenylacetates ,030304 developmental biology ,media_common ,0303 health sciences ,Dose-Response Relationship, Drug ,Drug discovery ,business.industry ,Actarit ,Reproducibility of Results ,Malotilate ,Mechanism of action ,Antirheumatic Agents ,030220 oncology & carcinogenesis ,medicine.symptom ,business - Abstract
Background and purpose: Identifying the macromolecular targets of drug molecules is a fundamental aspect of drug discovery and pharmacology. Several drugs remain without known targets (orphan) despite large-scale in silico and in vitro target prediction efforts. Ligand-centric chemical-similarity-based methods for in silico target prediction have been found to be particularly powerful, but the question remains of whether they are able to discover targets for target-orphan drugs. Experimental Approach: We used one of these in silico methods to carry out a target prediction analysis for two orphan drugs: actarit and malotilate. The top target predicted for each drug was carbonic anhydrase II (CAII). Each drug was therefore quantitatively evaluated for CAII inhibition to validate these two prospective predictions. Key Results: Actarit showed in vitro concentration-dependent inhibition of CAII activity with submicromolar potency (IC50 = 422 nM) whilst no consistent inhibition was observed for malotilate. Among the other 25 targets predicted for actarit, ROR&gamma, (RAR-related orphan receptor-gamma) is promising in that it is strongly related to actarit&rsquo, s indication, rheumatoid arthritis (RA). Conclusion and Implications: This study is a proof-of-concept of the utility of MolTarPred for the fast and cost-effective identification of targets of orphan drugs. Furthermore, the mechanism of action of actarit as an anti-RA agent can now be re-examined from a CAII-inhibitor perspective, given existing relationships between this target and RA. Moreover, the confirmed CAII-actarit association supports investigating the repositioning of actarit on other CAII-linked indications (e.g., hypertension, epilepsy, migraine, anemia and bone, eye and cardiac disorders).
- Published
- 2020
- Full Text
- View/download PDF
17. Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours
- Author
-
Stefan, Naulaerts, Cuong C, Dang, and Pedro J, Ballester
- Subjects
machine learning ,biomarker discovery ,genomics ,cancer ,drug sensitivity ,Research Paper - Abstract
Cancer drug therapies are only effective in a small proportion of patients. To make things worse, our ability to identify these responsive patients before administering a treatment is generally very limited. The recent arrival of large-scale pharmacogenomic data sets, which measure the sensitivity of molecularly profiled cancer cell lines to a panel of drugs, has boosted research on the discovery of drug sensitivity markers. However, no systematic comparison of widely-used single-gene markers with multi-gene machine-learning markers exploiting genomic data has been so far conducted. We therefore assessed the performance offered by these two types of models in discriminating between sensitive and resistant cell lines to a given drug. This was carried out for each of 127 considered drugs using genomic data characterising the cell lines. We found that the proportion of cell lines predicted to be sensitive that are actually sensitive (precision) varies strongly with the drug and type of model used. Furthermore, the proportion of sensitive cell lines that are correctly predicted as sensitive (recall) of the best single-gene marker was lower than that of the multi-gene marker in 118 of the 127 tested drugs. We conclude that single-gene markers are only able to identify those drug-sensitive cell lines with the considered actionable mutation, unlike multi-gene markers that can in principle combine multiple gene mutations to identify additional sensitive cell lines. We also found that cell line sensitivities to some drugs (e.g. Temsirolimus, 17-AAG or Methotrexate) are better predicted by these machine-learning models.
- Published
- 2017
18. Biochemical evaluation of virtual screening methods reveals a cell-active inhibitor of the cancer-promoting phosphatases of regenerating liver
- Author
-
Maja Köhn, Birgit Hoeger, Maren Diether, and Pedro J. Ballester
- Subjects
USR, ultrafast shape recognition ,Thienopyridone ,Protein tyrosine phosphatase ,ROCS, rapid overlay of chemical structures ,VS, virtual screening ,2-Cyano-2-ene-esters ,Molecular Docking Simulation ,MTT, 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide ,PTP1B, protein tyrosine phosphatase 1B ,Cell Movement ,Neoplasms ,Drug Discovery ,Enzyme Inhibitors ,Cells, Cultured ,Molecular Structure ,Chemistry ,Phosphatases of regenerating liver ,TCPTP, T-cell protein tyrosine phosphatase ,General Medicine ,Small molecule ,3. Good health ,Liver ,Biochemistry ,Original Article ,DSP, dual specificity phosphatase ,PTP, protein tyrosine phosphatase ,PRL, phosphatases of regenerating liver ,Cell Survival ,UFSRAT, ultrafast shape recognition with atom types ,DIFMUP, 6,8-difluoro-4-methylumbelliferyl phosphate ,Small Molecule Libraries ,Structure-Activity Relationship ,SAR, structure–activity relationship ,Humans ,Structure–activity relationship ,Dual specificity phosphatases ,Cell Proliferation ,Pharmacology ,Virtual screening ,UPLC-MS, ultra performance liquid chromatography–mass spectrometry ,Dose-Response Relationship, Drug ,Cell growth ,Organic Chemistry ,Phosphoric Monoester Hydrolases ,VHR, vaccinia H1-related phosphatase ,High-Throughput Screening Assays ,Liver Regeneration ,Virtual screening methods ,HEK293 Cells ,Docking (molecular) - Abstract
Computationally supported development of small molecule inhibitors has successfully been applied to protein tyrosine phosphatases in the past, revealing a number of cell-active compounds. Similar approaches have also been used to screen for small molecule inhibitors for the cancer-related phosphatases of regenerating liver (PRL) family. Still, selective and cell-active compounds are of limited availability. Since especially PRL-3 remains an attractive drug target due to its clear role in cancer metastasis, such compounds are highly demanded. In this study, we investigated various virtual screening approaches for their applicability to identify novel small molecule entities for PRL-3 as target. Biochemical evaluation of purchasable compounds revealed ligand-based approaches as well suited for this target, compared to docking-based techniques that did not perform well in this context. The best hit of this study, a 2-cyano-2-ene-ester and hence a novel chemotype targeting the PRLs, was further optimized by a structure–activity-relationship (SAR) study, leading to a low micromolar PRL inhibitor with acceptable selectivity over other protein tyrosine phosphatases. The compound is active in cells, as shown by its ability to specifically revert PRL-3 induced cell migration, and exhibits similar effects on PRL-1 and PRL-2. It is furthermore suitable for fluorescence microscopy applications, and it is commercially available. These features make it the only purchasable, cell-active and acceptably selective PRL inhibitor to date that can be used in various cellular applications., Graphical abstract, Highlights • Computational ligand- and docking-based approaches were tested for PRL-3 as a target. • Ligand-based screening was proven a feasible approach for PRL-3 inhibitor discovery. • A low micromolar, non-competitive inhibitor with novel chemotype for PRLs was discovered. • The inhibitor efficiently blocks PRL induced cell migration. • The inhibitor is non-cytotoxic, commercially available and suitable for fluorescence microscopy applications.
- Published
- 2014
- Full Text
- View/download PDF
19. Prospective virtual screening for novel p53–MDM2 inhibitors using ultrafast shape recognition
- Author
-
Pedro J. Ballester, Sachin P. Patil, and Cassidy R. Kerezsi
- Subjects
Models, Molecular ,Databases, Factual ,Computational biology ,Biology ,Ligands ,Bioinformatics ,Benzoates ,Molecular Docking Simulation ,Piperazines ,Proto-Oncogene Proteins c-mdm2 ,Cell Line, Tumor ,Drug Discovery ,Image Processing, Computer-Assisted ,Humans ,Prospective Studies ,Telmisartan ,Physical and Theoretical Chemistry ,Receptor ,Cell Proliferation ,Virtual screening ,Dose-Response Relationship, Drug ,Imidazoles ,Small molecule ,Computer Science Applications ,Docking (molecular) ,biology.protein ,Mdm2 ,Benzimidazoles ,Drug Screening Assays, Antitumor ,Tumor Suppressor Protein p53 ,DrugBank - Abstract
The p53 protein, known as the guardian of genome, is mutated or deleted in approximately 50 % of human tumors. In the rest of the cancers, p53 is expressed in its wild-type form, but its function is inhibited by direct binding with the murine double minute 2 (MDM2) protein. Therefore, inhibition of the p53-MDM2 interaction, leading to the activation of tumor suppressor p53 protein presents a fundamentally novel therapeutic strategy against several types of cancers. The present study utilized ultrafast shape recognition (USR), a virtual screening technique based on ligand-receptor 3D shape complementarity, to screen DrugBank database for novel p53-MDM2 inhibitors. Specifically, using 3D shape of one of the most potent crystal ligands of MDM2, MI-63, as the query molecule, six compounds were identified as potential p53-MDM2 inhibitors. These six USR hits were then subjected to molecular modeling investigations through flexible receptor docking followed by comparative binding energy analysis. These studies suggested a potential role of the USR-selected molecules as p53-MDM2 inhibitors. This was further supported by experimental tests showing that the treatment of human colon tumor cells with the top USR hit, telmisartan, led to a dose-dependent cell growth inhibition in a p53-dependent manner. It is noteworthy that telmisartan has a long history of safe human use as an approved anti-hypertension drug and thus may present an immediate clinical potential as a cancer therapeutic. Furthermore, it could also serve as a structurally-novel lead molecule for the development of more potent, small-molecule p53-MDM2 inhibitors against variety of cancers. Importantly, the present study demonstrates that the adopted USR-based virtual screening protocol is a useful tool for hit identification in the domain of small molecule p53-MDM2 inhibitors.
- Published
- 2014
- Full Text
- View/download PDF
20. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking
- Author
-
Pedro J. Ballester and John B. O. Mitchell
- Subjects
Statistics and Probability ,Protein Conformation ,Computer science ,Chemical biology ,Overfitting ,Ligands ,Machine learning ,computer.software_genre ,Biochemistry ,Article ,Protein structure ,Artificial Intelligence ,Cluster Analysis ,Databases, Protein ,Molecular Biology ,Models, Statistical ,Drug discovery ,business.industry ,Ligand ,Binding protein ,Computational Biology ,Proteins ,Reproducibility of Results ,Ligand (biochemistry) ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Structural biology ,Docking (molecular) ,Data Interpretation, Statistical ,Drug Design ,Artificial intelligence ,business ,computer ,Algorithms ,Protein Binding ,Protein ligand - Abstract
Motivation: Accurately predicting the binding affinities of large sets of diverse protein–ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions. Results: We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score's performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score. Contact: pedro.ballester@ebi.ac.uk; jbom@st-andrews.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2010
- Full Text
- View/download PDF
21. Prospective virtual screening with Ultrafast Shape Recognition: the identification of novel inhibitors of arylamine N -acetyltransferases
- Author
-
Isaac M. Westwood, W. Graham Richards, Pedro J. Ballester, Nicola Laurieri, and Edith Sim
- Subjects
Virtual screening ,Arylamine N-acetyltransferase ,Arylamine N-Acetyltransferase ,Protein Conformation ,Drug discovery ,Biomedical Engineering ,Biophysics ,Computational Biology ,Bioengineering ,Economic shortage ,Acetyltransferases ,Computational biology ,Biology ,Biochemistry ,Combinatorial chemistry ,Small molecule ,Biomaterials ,Identification (information) ,Research articles ,Drug Discovery ,Hit rate ,Enzyme Inhibitors ,Databases, Protein ,Algorithms ,Biotechnology - Abstract
There is currently a shortage of chemical molecules that can be used as bioactive probes to study molecular targets and potentially as starting points for drug discovery. One inexpensive way to address this problem is to use computational methods to screen a comprehensive database of small molecules to discover novel structures that could lead to alternative and better bioactive probes. Despite that pleasing logic the results have been somewhat mixed. Here we describe a virtual screening technique based on ligand–receptor shape complementarity, Ultrafast Shape Recognition (USR). USR is specifically applied to identify novel inhibitors of arylamine N-acetyltransferases by computationally screening almost 700 million molecular conformers in a time- and resource-efficient manner. A small number of the predicted active compounds were purchased and tested obtaining a confirmed hit rate of 40 per cent which is an outstanding result for a prospective virtual screening.
- Published
- 2009
- Full Text
- View/download PDF
22. A parallel real-coded genetic algorithm for history matching and its application to a real petroleum reservoir
- Author
-
Pedro J. Ballester and Jonathan Carter
- Subjects
Structure (mathematical logic) ,Engineering ,business.industry ,Reliability (computer networking) ,Computation ,media_common.quotation_subject ,Inverse problem ,Geotechnical Engineering and Engineering Geology ,Fuel Technology ,Genetic algorithm ,Scalability ,Quality (business) ,business ,Cluster analysis ,Algorithm ,media_common - Abstract
A new methodology to tackle History Matching problems is presented. It is based upon the repeated application of a Real-coded Genetic Algorithm (GA). In order to shorten the computation time, the possible solutions generated by the GA are evaluated in parallel on a group of computers. This required the GA to be adapted to a multi-processor structure, so that the scalability of the computation is maximised. The best solutions of each run enter the ensemble of history matched models, which is finally analysed using a clustering algorithm. The aim is to identify the optimal regions contained in the ensemble and thus to reveal the distinct types of reservoir models consistent with the historic production data, as a way to assess the uncertainty in the Reservoir Characterisation due to the limited reliability of optimisation algorithms. The developed methodology is applied to the characterisation of a real petroleum reservoir. Results show a large improvement with respect to previous studies on that reservoir in terms of the quality and diversity of the obtained history matched models. Our main conclusion is that, even with regularisation, many distinct history matched models are possible, which highlights the importance of applying optimisation methods capable of identifying all such solutions.
- Published
- 2007
- Full Text
- View/download PDF
23. Ultrafast shape recognition for similarity search in molecular databases
- Author
-
W. Graham Richards and Pedro J. Ballester
- Subjects
Virtual screening ,Theoretical computer science ,Similarity (geometry) ,Orientation (computer vision) ,business.industry ,General Mathematics ,Nearest neighbor search ,Perspective (graphical) ,General Engineering ,General Physics and Astronomy ,Pattern recognition ,Identification (information) ,Orders of magnitude (time) ,Pattern recognition (psychology) ,Artificial intelligence ,business ,Mathematics - Abstract
Molecular databases are routinely screened for compounds that most closely resemble a molecule of known biological activity to provide novel drug leads. It is widely believed that three-dimensional molecular shape is the most discriminating pattern for biological activity as it is directly related to the steep repulsive part of the interaction potential between the drug-like molecule and its macromolecular target. However, efficient comparison of molecular shape is currently a challenge. Here, we show that a new approach based on moments of distance distributions is able to recognize molecular shape at least three orders of magnitude faster than current methodologies. Such an ultrafast method permits the identification of similarly shaped compounds within the largest molecular databases. In addition, the problematic requirement of aligning molecules for comparison is circumvented, as the proposed distributions are independent of molecular orientation. Our methodology could be also adapted to tackle similar hard problems in other fields, such as designing content-based Internet search engines for three-dimensional geometrical objects or performing fast similarity comparisons between proteins. From a broader perspective, we anticipate that ultrafast pattern recognition will soon become not only useful, but also essential to address the data explosion currently experienced in most scientific disciplines.
- Published
- 2007
- Full Text
- View/download PDF
24. The Impact of Docking Pose Generation Error on the Prediction of Binding Affinity
- Author
-
Pedro J. Ballester, Man Hon Wong, Hongjian Li, and Kwong-Sak Leung
- Subjects
Structural bioinformatics ,Computational Technique ,Chemistry ,Docking (molecular) ,Molecule ,A protein ,Computational biology ,Biochemical function ,Ligand molecule - Abstract
Docking is a computational technique that predicts the preferred conformation and binding affinity of a ligand molecule as bound to a protein pocket. It is often employed to identify a molecule that binds tightly to the target, so that a small concentration of the molecule is sufficient to modulate its biochemical function. The use of non-parametric machine learning, a data-driven approach that circumvents the need of modeling assumptions, has recently been shown to introduce a large improvement in the accuracy of docking scoring. However, the impact of pose generation error on binding affinity prediction is still to be investigated.
- Published
- 2015
- Full Text
- View/download PDF
25. The Importance of the Regression Model in the Structure-Based Prediction of Protein-Ligand Binding
- Author
-
Pedro J. Ballester, Man Hon Wong, Hongjian Li, and Kwong-Sak Leung
- Subjects
Structural bioinformatics ,Drug discovery ,Docking (molecular) ,Computer science ,Structure based ,Regression analysis ,Algorithm ,Random forest ,Protein ligand - Abstract
Docking is a key computational method for structure-based design of starting points in the drug discovery process. Recently, the use of non-parametric machine learning to circumvent modelling assumptions has been shown to result in a large improvement in the accuracy of docking. As a result, these machine-learning scoring functions are able to widely outperform classical scoring functions. The latter are characterized by their reliance on a predetermined theory-inspired functional form for the relationship between the variables that characterise the complex and its predicted binding affinity.
- Published
- 2015
- Full Text
- View/download PDF
26. The Use of Random Forest to Predict Binding Affinity in Docking
- Author
-
Hongjian Li, Pedro J. Ballester, Man Hon Wong, and Kwong-Sak Leung
- Subjects
Structural bioinformatics ,Scoring functions for docking ,Docking (molecular) ,Chemistry ,Computational biology ,Data mining ,computer.software_genre ,Biochemical function ,computer ,Macromolecule ,Ligand molecule ,Random forest - Abstract
Docking is a structure-based computational tool that can be used to predict the strength with which a small ligand molecule binds to a macromolecular target. Such binding affinity prediction is crucial to design molecules that bind more tightly to a target and thus are more likely to provide the most efficacious modulation of the target’s biochemical function. Despite intense research over the years, improving this type of predictive accuracy has proven to be a very challenging task for any class of method.
- Published
- 2015
- Full Text
- View/download PDF
27. Our calibrated model has poor predictive value: An example from the petroleum industry
- Author
-
Peter King, Pedro J. Ballester, Zohreh Tavassoli, and Jonathan Carter
- Subjects
Engineering ,Operations research ,business.industry ,Calibration (statistics) ,Fossil fuel ,Predictive capability ,Predictive value ,Industrial and Manufacturing Engineering ,chemistry.chemical_compound ,chemistry ,Petroleum industry ,Petroleum ,Biochemical engineering ,Safety, Risk, Reliability and Quality ,business ,Energy source - Abstract
It is often assumed that once a model has been calibrated to measurements then it will have some level of predictive capability, although this may be limited. If the model does not have predictive capability then the assumption is that the model needs to be improved in some way. Using an example from the petroleum industry, we show that cases can exit where calibrated models have limited predictive capability. This occurs even when there is no modelling error present. It is also shown that the introduction of a small modelling error can make it impossible to obtain any models with useful predictive capability. We have been unable to find ways of identifying which calibrated models will have some predictive capacity and those which will not.
- Published
- 2006
- Full Text
- View/download PDF
28. Characterising the parameter space of a highly nonlinear inverse problem
- Author
-
Pedro J. Ballester and Jonathan Carter
- Subjects
Mathematical optimization ,Nonlinear system ,Estimation theory ,Applied Mathematics ,Genetic algorithm ,General Engineering ,Uniqueness ,Inverse problem ,Parameter space ,Finite set ,Computer Science Applications ,Mathematics ,Physical quantity - Abstract
In inverse problems, often there is no available analytical expression relating the physical quantities of interest and the available data. In these cases, one resorts to using a numerical model with a finite number of parameters, resulting in a discrete problem. Also, many discrete inverse problems involve a highly nonlinear mapping between the model parameters and the simulation of the data by the model. Algorithms exist for estimating the model parameters in nonlinear discrete inverse problems. However, one needs to investigate how these estimated models relate to the true structure of the studied system (i.e., the truth model). This is known as model appraisal and it is greatly affected by three sources of uncertainty: misleading search, non-uniqueness and errors. In this work, we aim at analysing the impact of the first two sources of uncertainty in model appraisal (misleading search and non-uniqueness) by characterising the parameter space of a highly nonlinear geophysical inverse problem. Our appro...
- Published
- 2006
- Full Text
- View/download PDF
29. Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data
- Author
-
Pedro J. Ballester, Cuong Cao Dang, Linh C. Nguyen, Centre de Recherche en Cancérologie de Marseille (CRCM), Aix Marseille Université (AMU)-Institut Paoli-Calmettes, Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)-Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Institut Paoli-Calmettes, Fédération nationale des Centres de lutte contre le Cancer (FNCLCC), 911 Programme PhD scholarship from Vietnam National International Development, and ANR-11-IDEX-0001,Amidex,INITIATIVE D'EXCELLENCE AIX MARSEILLE UNIVERSITE(2011)
- Subjects
0301 basic medicine ,Drug ,pharmacotranscriptomics ,Computer science ,media_common.quotation_subject ,drug response ,[SDV.CAN]Life Sciences [q-bio]/Cancer ,Genomics ,Computational biology ,Biology ,Gene mutation ,Bioinformatics ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,0302 clinical medicine ,Text mining ,[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN] ,benchmarking ,Sensitivity (control systems) ,General Pharmacology, Toxicology and Pharmaceutics ,Selection (genetic algorithm) ,030304 developmental biology ,media_common ,pharmacogenomics ,0303 health sciences ,General Immunology and Microbiology ,business.industry ,biomarkers ,Articles ,bioinformatics ,General Medicine ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Regression ,3. Good health ,Random forest ,machine learning ,030104 developmental biology ,030220 oncology & carcinogenesis ,precision oncology ,Pharmacogenomics ,[SDV.SP.PHARMA]Life Sciences [q-bio]/Pharmaceutical sciences/Pharmacology ,business ,Research Article - Abstract
Background:Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets, such as those by Genomics of Drug Sensitivity in Cancer (GDSC) consortium, were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data.Methods:Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC50measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than standard k-fold cross-validation.Results and Discussion:Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by the multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG.Conclusions:Thanks to this unbiased validation, we now know that this type of models can predictin vitrotumour response to some of these drugs. These models can thus be further investigated onin vivotumour models. R code to facilitate the construction of alternative machine learning models and their validation in the presented benchmark is available athttp://ballester.marseille.inserm.fr/gdsc.transcriptomicDatav2.tar.gz.
- Published
- 2017
- Full Text
- View/download PDF
30. Comments on 'Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets': Significance for the Validation of Scoring Functions
- Author
-
John B. O. Mitchell and Pedro J. Ballester
- Subjects
Computer science ,General Chemical Engineering ,Drug Evaluation, Preclinical ,MEDLINE ,General Chemistry ,Data mining ,Library and Information Sciences ,Databases, Protein ,computer.software_genre ,Disease cluster ,computer ,Cross-validation ,Computer Science Applications - Published
- 2011
- Full Text
- View/download PDF
31. Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets
- Author
-
Hongjian Li, Pedro J. Ballester, Man Hon Wong, and Kwong-Sak Leung
- Subjects
Generality ,Training set ,Computer science ,business.industry ,Organic Chemistry ,Regression analysis ,Models, Theoretical ,computer.software_genre ,Machine learning ,Regression ,Computer Science Applications ,Autodock vina ,Random forest ,Machine Learning ,Molecular Docking Simulation ,Software ,Structural Biology ,Docking (molecular) ,Drug Discovery ,Molecular Medicine ,Data mining ,Artificial intelligence ,business ,computer - Abstract
There is a growing body of evidence showing that machine learning regression results in more accurate structure-based prediction of protein-ligand binding affinity. Docking methods that aim at optimizing the affinity of ligands for a target rely on how accurate their predicted ranking is. However, despite their proven advantages, machine-learning scoring functions are still not widely applied. This seems to be due to insufficient understanding of their properties and the lack of user-friendly software implementing them. Here we present a study where the accuracy of AutoDock Vina, arguably the most commonly-used docking software, is strongly improved by following a machine learning approach. We also analyse the factors that are responsible for this improvement and their generality. Most importantly, with the help of a proposed benchmark, we demonstrate that this improvement will be larger as more data becomes available for training Random Forest models, as regression models implying additive functional forms do not improve with more training data. We discuss how the latter opens the door to new opportunities in scoring function development. In order to facilitate the translation of this advance to enhance structure-based molecular design, we provide software to directly re-score Vina-generated poses and thus strongly improve their predicted binding affinity. The software is available at http://istar.cse.cuhk.edu.hk/rf-score-3.tgz and http://crcm. marseille.inserm.fr/fileadmin/rf-score-3.tgz.
- Published
- 2014
32. Ligand-Based Virtual Screening for the Discovery of Inhibitors for Protein Arginine Deiminase Type 4 (PAD4)
- Author
-
Chian Ying Teo, Pedro J Ballester, Mohd Basyaruddin Abdul Rahman, Abu Bakar Salleh, Bimo Ario Tejo, and Adam Leow Thean Chor
- Subjects
chemistry.chemical_compound ,Virtual screening ,Drug repositioning ,Streptonigrin ,chemistry ,Biochemistry ,High-throughput screening ,Protein-Arginine Deiminase Type-4 ,Citrullination ,Biology ,Ligand (biochemistry) ,IC50 - Abstract
Protein Arginine Deiminase type 4 (PAD4) is a new therapeutic target for the treatment of rheumatoid arthritis. In this study, ligand-based virtual screening with the integration with drug repurposing strategy was applied to the discovery of PAD4 inhibitors. Ultrafast Shape Recognition (USR) was used to search for compounds with similar shape to a previously reported inhibitor with harmful side-effects, i.e., streptonigrin. Thirty five lead-like compounds and two existing drugs were obtained from virtual screening and their inhibitory activity was tested at fixed concentration of 100 μM. Five lead-like compounds showed significant inhibition on the enzymatic activity of PAD4. The potency of the best compound was investigated by carrying out IC50 study. Importantly, the structure of the best of these new active molecules was strikingly different from that of streptonigrin. Furthermore, this new PAD4 inhibitor is the most potent to date found by a computational approach and its structure can be optimized in the future for the design of an even better inhibitor of PAD4.
- Published
- 2013
- Full Text
- View/download PDF
33. Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification
- Author
-
Nigel I. Howard, Pedro J. Ballester, John B. O. Mitchell, Martina Mangold, Richard L. Marchese Robinson, Chris Abell, Jochen Blumberger, University of St Andrews. School of Chemistry, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. EaSTCHEM
- Subjects
Computer science ,01 natural sciences ,Biochemistry ,antibacterial hit identification ,Drug Discovery ,QD ,Research Articles ,0303 health sciences ,Drug discovery ,High-throughput screening ,Hit to lead ,bioinformatics ,3. Good health ,Anti-Bacterial Agents ,machine learning ,Cheminformatics ,Antibacterial hit identification ,Biotechnology ,Virtual screening ,Bioinformatics ,In silico ,Biomedical Engineering ,Biophysics ,Bioengineering ,Streptomyces coelicolor ,Computational biology ,chemoinformatics ,high-throughput screening ,Biomaterials ,Small Molecule Libraries ,03 medical and health sciences ,SDG 3 - Good Health and Well-being ,Bacterial Proteins ,Machine learning ,High-Throughput Screening Assays ,Computer Simulation ,Hydro-Lyases ,030304 developmental biology ,Chemoinformatics ,Mycobacterium tuberculosis ,QD Chemistry ,virtual screening ,Combinatorial chemistry ,0104 chemical sciences ,010404 medicinal & biomolecular chemistry ,Databases, Chemical - Abstract
One of the initial steps of modern drug discovery is the identification of small organic molecules able to inhibit a target macromolecule of therapeutic interest. A small proportion of these hits are further developed into lead compounds, which in turn may ultimately lead to a marketed drug. A commonly used screening protocol used for this task is high-throughput screening (HTS). However, the performance of HTS against antibacterial targets has generally been unsatisfactory, with high costs and low rates of hit identification. Here, we present a novel computational methodology that is able to identify a high proportion of structurally diverse inhibitors by searching unusually large molecular databases in a time-, cost- and resource-efficient manner. This virtual screening methodology was tested prospectively on two versions of an antibacterial target (type II dehydroquinase from Mycobacterium tuberculosis and Streptomyces coelicolor ), for which HTS has not provided satisfactory results and consequently practically all known inhibitors are derivatives of the same core scaffold. Overall, our protocols identified 100 new inhibitors, with calculated K i ranging from 4 to 250 μM (confirmed hit rates are 60% and 62% against each version of the target). Most importantly, over 50 new active molecular scaffolds were discovered that underscore the benefits that a wide application of prospectively validated in silico screening tools is likely to bring to antibacterial hit identification.
- Published
- 2012
34. Molecular Shape
- Author
-
Pedro J. Ballester and Nathan Brown
- Published
- 2012
- Full Text
- View/download PDF
35. Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression
- Author
-
Pedro J. Ballester
- Subjects
Structure (mathematical logic) ,Computer science ,business.industry ,Regression analysis ,Machine learning ,computer.software_genre ,Random forest ,Task (project management) ,Support vector machine ,Range (mathematics) ,Structural bioinformatics ,Structural biology ,Artificial intelligence ,Data mining ,business ,computer - Abstract
Accurately predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction exploiting structural data are essential for analysing the outputs of Molecular Docking, which is in turn an important technique for drug discovery, chemical biology and structural biology. Conventional scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that characterise the complex and its predicted binding affinity. The inherent problem of this approach is in the difficulty of explicitly modelling the various contributions of intermolecular interactions to binding affinity. Recently, a new family of 3D structure-based regression models for binding affinity prediction has been introduced which circumvent the need for modelling assumptions. These machine learning scoring functions have been shown to widely outperform conventional scoring functions. However, to date no direct comparison among machine learning scoring functions has been made. Here the performance of the two most popular machine learning scoring functions for this task is analysed under exactly the same experimental conditions.
- Published
- 2012
- Full Text
- View/download PDF
36. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties
- Author
-
Pedro J. Ballester, Francesco Iorio, Ultan McDermott, Julio Saez-Rodriguez, Mathew J. Garnett, Cyril H. Benes, and Michael P. Menden
- Subjects
FOS: Computer and information sciences ,lcsh:Medicine ,computer.software_genre ,Biochemistry ,Workflow ,Machine Learning (cs.LG) ,Computational Engineering, Finance, and Science (cs.CE) ,0302 clinical medicine ,Neoplasms ,Drug Discovery ,Basic Cancer Research ,Cell Behavior (q-bio.CB) ,lcsh:Science ,Computer Science - Computational Engineering, Finance, and Science ,media_common ,0303 health sciences ,Multidisciplinary ,Drug discovery ,Systems Biology ,Genomics ,3. Good health ,Drug repositioning ,Oncology ,030220 oncology & carcinogenesis ,Medicine ,Research Article ,Biotechnology ,Computer Modeling ,Drug ,media_common.quotation_subject ,Systems biology ,In silico ,Antineoplastic Agents ,Biology ,Machine learning ,Inhibitory Concentration 50 ,03 medical and health sciences ,Artificial Intelligence ,Genetics ,medicine ,Humans ,Computer Simulation ,Quantitative Biology - Genomics ,030304 developmental biology ,Clinical Genetics ,Genomics (q-bio.GN) ,Analysis of Variance ,business.industry ,lcsh:R ,Personalized Medicine ,Computational Biology ,Cancer ,Human Genetics ,medicine.disease ,Computer Science - Learning ,Small Molecules ,Drug Resistance, Neoplasm ,Pharmacogenetics ,FOS: Biological sciences ,Computer Science ,Quantitative Biology - Cell Behavior ,lcsh:Q ,Personalized medicine ,Artificial intelligence ,business ,computer - Abstract
Predicting the response of a specific cancer to a therapy is a major goal in modern oncology that should ultimately lead to a personalised treatment. High-throughput screenings of potentially active compounds against a panel of genomically heterogeneous cancer cell lines have unveiled multiple relationships between genomic alterations and drug responses. Various computational approaches have been proposed to predict sensitivity based on genomic features, while others have used the chemical properties of the drugs to ascertain their effect. In an effort to integrate these complementary approaches, we developed machine learning models to predict the response of cancer cell lines to drug treatment, quantified through IC50 values, based on both the genomic features of the cell lines and the chemical properties of the considered drugs. Models predicted IC50 values in a 8-fold cross-validation and an independent blind test with coefficient of determination R2 of 0.72 and 0.64 respectively. Furthermore, models were able to predict with comparable accuracy (R2 of 0.61) IC50s of cell lines from a tissue not used in the training stage. Our in silico models can be used to optimise the experimental design of drug-cell screenings by estimating a large proportion of missing IC50 values rather than experimentally measure them. The implications of our results go beyond virtual drug screening design: potentially thousands of drugs could be probed in silico to systematically test their potential efficacy as anti-tumour agents based on their structure, thus providing a computational framework to identify new drug repositioning opportunities as well as ultimately be useful for personalized medicine by linking the genomic traits of patients to drug sensitivity., Comment: 26 pages, 7 figures, including supplemental information, presented by Michael Menden at the 5th annual RECOMB Conference on Regulatory and Systems Genomics with DREAM Challenges; accepted in PLOS ONE
- Published
- 2012
- Full Text
- View/download PDF
37. Ultrafast shape recognition: method and applications
- Author
-
Pedro J. Ballester
- Subjects
Pharmacology ,Models, Molecular ,Virtual screening ,Similarity (geometry) ,Time Factors ,Process (engineering) ,business.industry ,Computation ,Bioactive molecules ,Biology ,computer.software_genre ,Complementarity (molecular biology) ,Drug Design ,Drug Discovery ,Key (cryptography) ,Molecular Medicine ,Animals ,Humans ,Computer vision ,Computer Simulation ,Artificial intelligence ,Data mining ,business ,Ultrashort pulse ,computer - Abstract
Molecular shape complementarity is widely recognized as a key indicator of biological activity. Unfortunately, efficient computation of shape similarity is challenging, which severely limits the potential of shape-based virtual screening. Ultrafast shape recognition (USR) is a recent shape similarity technique that is characterized by its extremely high speed of operation. Here we review important methodological aspects for the optimal application of USR as well as its first applications to medicinal chemistry problems. These applications already include several particularly successful prospective virtual screens, which shows the important role that USR can play in identifying bioactive molecules to be used as chemical probes and potentially as starting points for the drug-discovery process.
- Published
- 2011
38. Ultrafast shape recognition: evaluating a new ligand-based virtual screening technology
- Author
-
Pedro J. Ballester, W. Graham Richards, and Paul W. Finn
- Subjects
Models, Molecular ,Similarity (geometry) ,Time Factors ,Nearest neighbor search ,Molecular Conformation ,Biology ,computer.software_genre ,Ligands ,Pattern Recognition, Automated ,Structure-Activity Relationship ,User-Computer Interface ,Imaging, Three-Dimensional ,Encoding (memory) ,Drug Discovery ,Materials Chemistry ,Cluster Analysis ,Computer Simulation ,Physical and Theoretical Chemistry ,Spectroscopy ,Virtual screening ,Drug discovery ,Reproducibility of Results ,Computer Graphics and Computer-Aided Design ,Chemical space ,Databases as Topic ,Cheminformatics ,Pattern recognition (psychology) ,Data mining ,computer - Abstract
Large scale database searching to identify molecules that share a common biological activity for a target of interest is widely used in drug discovery. Such an endeavour requires the availability of a method encoding molecular properties that are indicative of biological activity and at least one active molecule to be used as a template. Molecular shape has been shown to be an important indicator of biological activity; however, currently used methods are relatively slow, so faster and more reliable methods are highly desirable. Recently, a new non-superposition based method for molecular shape comparison, called Ultrafast Shape Recognition (USR), has been devised with computational performance at least three orders of magnitude faster than previously existing methods. In this study, we investigate the performance of USR in retrieving biologically active compounds through retrospective Virtual Screening experiments. Results show that USR performs better on average than a commercially available shape similarity method, while screening conformers at a rate that is more than 2500 times faster. This outstanding computational performance is particularly useful for searching much larger portions of chemical space than previously possible, which makes USR a very valuable new tool in the search for new lead molecules for drug discovery programs.
- Published
- 2008
39. Model calibration of a real petroleum reservoir using a parallel real-coded genetic algorithm
- Author
-
Pedro J. Ballester and Jonathan Carter
- Subjects
Structure (mathematical logic) ,chemistry.chemical_compound ,Mathematical optimization ,chemistry ,Calibration (statistics) ,Computer science ,Computation ,Reliability (computer networking) ,Genetic algorithm ,Scalability ,Petroleum ,Cluster analysis ,Petroleum reservoir - Abstract
An application of a Real-coded Genetic Algorithm (GA) to the model calibration of a real petroleum reservoir is presented. In order to shorten the computation time, the possible solutions generated by the GA are evaluated in parallel on a group of computers. This required the GA to be adapted to a multi-processor structure, so that the scalability of the computation is maximised. The best solutions of each run enter the ensemble of calibrated models, which is finally analysed using a clustering algorithm. The aim is to identify the optimal regions contained in the ensemble and thus to reveal the distinct types of reservoir models consistent with the historic production data, as a way to assess the uncertainty in the Reservoir Characterisation due to the limited reliability of optimisation algorithms. The developed methodology is applied to the characterisation of a real petroleum reservoir. Results show a large improvement with respect to previous studies on that reservoir in terms of the quality and diversity of the obtained calibrated models. Our main conclusion is that, even with regularisation, many distinct calibrated models are possible, which highlights the importance of applying optimisation methods capable of identifying all such solutions.
- Published
- 2007
- Full Text
- View/download PDF
40. Ultrafast shape recognition to search compound databases for similar molecular shapes
- Author
-
W. Graham Richards and Pedro J. Ballester
- Subjects
Validation study ,Similarity (geometry) ,Time Factors ,Database ,Molecular Conformation ,Computational Biology ,General Chemistry ,computer.software_genre ,Organic molecules ,Set (abstract data type) ,Computational Mathematics ,Molecular geometry ,Databases as Topic ,Drug Design ,Molecule ,computer ,Ultrashort pulse ,Algorithms ,Mathematics ,Macromolecule - Abstract
Finding a set of molecules, which closely resemble a given lead molecule, from a database containing potentially billions of chemical structures is an important but daunting problem. Similar molecular shapes are particularly important, given that in biology small organic molecules frequently act by binding into a defined and complex site on a macromolecule. Here, we present a new method for molecular shape comparison, named ultrafast shape recognition (USR), capable of screening billions of compounds for similar shapes using a single computer and without the need of aligning the molecules before testing for similarity. Despite its extremely fast comparison rate, USR will be shown to be highly accurate at describing, and hence comparing, molecular shapes.
- Published
- 2007
41. Real-Parameter Optimization Performance Study on the CEC-2005 benchmark with SPC-PNX
- Author
-
Kerry Gallagher, Pedro J. Ballester, John Stephenson, and Jonathan Carter
- Subjects
Mathematical optimization ,Meta-optimization ,Estimation theory ,Computer science ,business.industry ,Quality control and genetic algorithms ,Genetic algorithm ,Benchmark (computing) ,Artificial intelligence ,Machine learning ,computer.software_genre ,business ,computer - Abstract
This paper presents a performance study of a real-parameter genetic algorithm (SPC-PNX) on a new benchmark of real-parameter optimisation problems. This benchmark provides a systematic way to compare different optimisation methods on exactly the same test problems. These problems were designed to be hard as they incorporate features that have been shown to pose great difficulty to many optimisation methods
- Published
- 2005
- Full Text
- View/download PDF
42. Tackling an Inverse Problem from the Petroleum Industry with a Genetic Algorithm for Sampling
- Author
-
Pedro J. Ballester and Jonathan Carter
- Subjects
Mathematical optimization ,Petroleum industry ,business.industry ,Calibration (statistics) ,Truth model ,Genetic algorithm ,Sampling (statistics) ,Model parameters ,Parameter space ,Inverse problem ,business ,Mathematics - Abstract
When direct measurement of model parameters is not possible, these need to be inferred indirectly from calibration data. To solve this inverse problem, an algorithm that preferentially samples all regions of the parameter space that fit data well is needed.
- Published
- 2004
- Full Text
- View/download PDF
43. An Effective Real-Parameter Genetic Algorithm with Parent Centric Normal Crossover for Multimodal Optimisation
- Author
-
Pedro J. Ballester and Jonathan Carter
- Subjects
Cultural algorithm ,business.industry ,Computer science ,Population-based incremental learning ,ComputingMethodologies_MISCELLANEOUS ,Computer Science::Neural and Evolutionary Computation ,Crossover ,Evolutionary algorithm ,Machine learning ,computer.software_genre ,Rosenbrock function ,Unimodality ,Genetic algorithm ,Artificial intelligence ,business ,computer ,Rastrigin function - Abstract
Evolutionary Algorithms (EAs) are a useful tool to tackle real-world optimisation problems. Two important features that make these problems hard are multimodality and high dimensionality of the search landscape.
- Published
- 2004
- Full Text
- View/download PDF
44. A Real Parameter Genetic Algorithm for Cluster Identification in History Matching
- Author
-
Jonathan Carter and Pedro J. Ballester
- Subjects
Identification (information) ,Mathematical optimization ,symbols.namesake ,Local optimum ,Genetic algorithm ,symbols ,Markov chain Monte Carlo ,Parameter space ,Inverse problem ,Cluster analysis ,Free parameter ,Mathematics - Abstract
Non-linear inverse problems, by their very nature, can be expected to yield multiple solutions. This will occur even when the problem is well defined, in the sense that the number of measurements is significantly greater than the number of free parameters. These solutions will manifest themselves as local optima for some objective function, and will be separated by regions of poor objective function value. In history matching the challenge is to identify all of the high quality local optima, and sample the parameter space around them. Within a Bayesian framework this allows us to estimate the likelihood and quantify the uncertainty associated with a solution. Algorithms, such as Monte Carlo Markov Chain (MCMC), allow us to do this. However in practice they are not very efficient and not suitable for practical problems. In this paper we present a real parameter Genetic Algorithm that has been designed to search for multiple local optima and to sample the parameter space around the optima. The methodology has been implemented within a non-generational steady-state scheme. Possible solutions generated by the Genetic Algorithm are evaluated in parallel on a cluster of computers. All of the solutions generated are finally clustered using a new clustering algorithm. This algorithm does not need the user to specify the number of clusters to be identified, unlike most other clustering algorithms. The application of the algorithms is illustrated on two inverse problems. The first is a simple three parameter cross-sectional model, which was already known to have multiple solutions. The second is a real world case study, with 82 free parameters. In each case it is shown that the Genetic Algorithm can find multiple optima and that the results can be clustered with the clustering algorithm.
- Published
- 2004
- Full Text
- View/download PDF
45. An Algorithm to Identify Clusters of Solutions in Multimodal Optimisation
- Author
-
Jonathan Carter and Pedro J. Ballester
- Subjects
Clustering high-dimensional data ,Fuzzy clustering ,Computer science ,Correlation clustering ,Single-linkage clustering ,Population ,computer.software_genre ,Machine learning ,CURE data clustering algorithm ,Consensus clustering ,Cluster analysis ,education ,k-medians clustering ,education.field_of_study ,Brown clustering ,business.industry ,Constrained clustering ,Hierarchical clustering ,ComputingMethodologies_PATTERNRECOGNITION ,Data stream clustering ,Canopy clustering algorithm ,Affinity propagation ,FLAME clustering ,Anomaly detection ,Data mining ,Artificial intelligence ,business ,computer - Abstract
Clustering can be used to identify groups of similar solutions in Multimodal Optimisation. However, a poor clustering quality reduces the benefit of this application. The vast majority of clustering methods in literature operate by resorting to a priori assumptions about the data, such as the number of cluster or cluster radius. Clusters are forced to conform to these assumptions, which may not be valid for the considered population. The latter can have a huge negative impact on the clustering quality. In this paper, we apply a clustering method that does not require a priori knowledge. We demonstrate the effectiveness and efficiency of the method on real and synthetic data sets emulating solutions in Multimodal Optimisation problems.
- Published
- 2004
- Full Text
- View/download PDF
46. Real-Parameter Genetic Algorithms for Finding Multiple Optimal Solutions in Multi-modal Optimization
- Author
-
Pedro J. Ballester and Jonathan Carter
- Subjects
Mathematical optimization ,Truncation selection ,Fitness proportionate selection ,Quality control and genetic algorithms ,Crossover ,Genetic algorithm ,Genetic operator ,Tournament selection ,Selection (genetic algorithm) ,Mathematics - Abstract
The aim of this paper is to identify Genetic Algorithms (GAs) which perform well over a range of continuous and smooth multimodal real-variable functions. In our study, we focus on testing GAs combining three classes of genetic operators: selection, crossover and replacement. The approach followed is time-constrained and thus our stopping criterion is a fixed number of generations. Results show that GAs with random selection of parents and crowding replacement are robust optimizers. By contrast, GAs with tournament selection of parents and random replacement perform poorly in comparison.
- Published
- 2003
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.