Back to Search Start Over

High-quality genome assembly enables prediction of allele-specific gene expression in hybrid poplar

Authors :
Shi, Tian-Le
Jia, Kai-Hua
Bao, Yu-Tao
Nie, Shuai
Tian, Xue-Chan
Yan, Xue-Mei
Chen, Zhao-Yang
Li, Zhi-Chao
Zhao, Shi-Wei
Ma, Hai-Yao
Zhao, Ye
Li, Xiang
Zhang, Ren-Gang
Guo, Jing
Zhao, Wei
El-Kassaby, Yousry Aly
Müller, Niels
Van de Peer, Yves
Wang, Xiao-Ru
Street, Nathaniel Robert
Porth, Ilga
An, Xinmin
Mao, Jian-Feng
Shi, Tian-Le
Jia, Kai-Hua
Bao, Yu-Tao
Nie, Shuai
Tian, Xue-Chan
Yan, Xue-Mei
Chen, Zhao-Yang
Li, Zhi-Chao
Zhao, Shi-Wei
Ma, Hai-Yao
Zhao, Ye
Li, Xiang
Zhang, Ren-Gang
Guo, Jing
Zhao, Wei
El-Kassaby, Yousry Aly
Müller, Niels
Van de Peer, Yves
Wang, Xiao-Ru
Street, Nathaniel Robert
Porth, Ilga
An, Xinmin
Mao, Jian-Feng
Publication Year :
2024

Abstract

Poplar (Populus) is a well-established model system for tree genomics and molecular breeding, and hybrid poplar is widely used in forest plantations. However, distinguishing its diploid homologous chromosomes is difficult, complicating advanced functional studies on specific alleles. In this study, we applied a trio-binning design and PacBio high-fidelity long-read sequencing to obtain haplotype-phased telomere-to-telomere genome assemblies for the 2 parents of the well-studied F1 hybrid “84K” (Populus alba × Populus tremula var. glandulosa). Almost all chromosomes, including the telomeres and centromeres, were completely assembled for each haplotype subgenome apart from 2 small gaps on one chromosome. By incorporating information from these haplotype assemblies and extensive RNA-seq data, we analyzed gene expression patterns between the 2 subgenomes and alleles. Transcription bias at the subgenome level was not uncovered, but extensive-expression differences were detected between alleles. We developed machine-learning (ML) models to predict allele-specific expression (ASE) with high accuracy and identified underlying genome features most highly influencing ASE. One of our models with 15 predictor variables achieved 77% accuracy on the training set and 74% accuracy on the testing set. ML models identified gene body CHG methylation, sequence divergence, and transposon occupancy both upstream and downstream of alleles as important factors for ASE. Our haplotype-phased genome assemblies and ML strategy highlight an avenue for functional studies in Populus and provide additional tools for studying ASE and heterosis in hybrids.

Details

Database :
OAIster
Notes :
application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1457588124
Document Type :
Electronic Resource
Full Text :
https://doi.org/10.1093.plphys.kiae078