1. Parallelization of large-scale drug–protein binding experiments
- Author
-
Antonios Makris, Dimitrios Michail, Mark Sawyer, and Iraklis Varlamis
- Subjects
Computer Networks and Communications ,business.industry ,Computer science ,Drug discovery ,Pipeline (computing) ,Process (computing) ,020206 networking & telecommunications ,02 engineering and technology ,Parallel computing ,Task (project management) ,Software ,Memory management ,Protein similarity ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,020201 artificial intelligence & image processing ,business ,Pharmaceutical industry - Abstract
The pharmaceutical industry invests billions of dollars on a yearly basis for new drug research. Part of this research is focused on the repositioning of established drugs to new disease indications and is based on “drug promiscuity”, or in plain words, on the ability of certain drugs to bind multiple proteins. The increased cost of wet-lab experiments makes the in-silico alternatives a promising solution. In order to find similar protein targets for an existing drug, it is necessary to analyse the protein and drug structures and find potential similarities. The latter is a highly demanding in computational resources task. However, algorithmic advances in conjunction with increased computational resources can leverage this task and increase the success rate of drug discovery with significantly smaller cost. The current work proposes several algorithms that implement the protein similarity task in a parallel high-performance computing environment, solve several load imbalance and memory management issues and take maximum advantage of the available resources. The proposed optimizations achieve better memory and CPU balancing and faster execution times. Several parts of the previously linear processing pipeline, which used different software packages, have been re-engineered in order to improve process parallelization. Experimental results, on a high-performance computing environment with up to 1024 cores and 2048GB of memory, demonstrate the effectiveness of our approach, which scales well to large amounts of protein pairs.
- Published
- 2019
- Full Text
- View/download PDF