Back to Search Start Over

Strategies for effectively modelling promoter-driven gene expression using transfer learning.

Authors :
Reddy AJ
Herschl MH
Geng X
Kolli S
Lu AX
Kumar A
Hsu PD
Levine S
Ioannidis NM
Source :
BioRxiv : the preprint server for biology [bioRxiv] 2024 May 19. Date of Electronic Publication: 2024 May 19.
Publication Year :
2024

Abstract

The ability to deliver genetic cargo to human cells is enabling rapid progress in molecular medicine, but designing this cargo for precise expression in specific cell types is a major challenge. Expression is driven by regulatory DNA sequences within short synthetic promoters, but relatively few of these promoters are cell-type-specific. The ability to design cell-type-specific promoters using model-based optimization would be impactful for research and therapeutic applications. However, models of expression from short synthetic promoters (promoter-driven expression) are lacking for most cell types due to insufficient training data in those cell types. Although there are many large datasets of both endogenous expression and promoter-driven expression in other cell types, which provide information that could be used for transfer learning, transfer strategies remain largely unexplored for predicting promoter-driven expression. Here, we propose a variety of pretraining tasks, transfer strategies, and model architectures for modelling promoter-driven expression. To thoroughly evaluate various methods, we propose two benchmarks that reflect data-constrained and large dataset settings. In the data-constrained setting, we find that pretraining followed by transfer learning is highly effective, improving performance by 24-27%. In the large dataset setting, transfer learning leads to more modest gains, improving performance by up to 2%. We also propose the best architecture to model promoter-driven expression when training from scratch. The methods we identify are broadly applicable for modelling promoter-driven expression in understudied cell types, and our findings will guide the choice of models that are best suited to designing promoters for gene delivery applications using model-based optimization. Our code and data are available at https://github.com/anikethjr/promoter_models.

Details

Language :
English
ISSN :
2692-8205
Database :
MEDLINE
Journal :
BioRxiv : the preprint server for biology
Accession number :
36909524
Full Text :
https://doi.org/10.1101/2023.02.24.529941