1
|
Zhou L, Yu H, Wang K, Chen T, Ma Y, Huang Y, Li J, Liu L, Li Y, Kong Z, Zheng Q, Wang Y, Gu Y, Xia N, Li S. Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions. BMC Genomics 2020; 21:407. [PMID: 32546194 PMCID: PMC7296898 DOI: 10.1186/s12864-020-06818-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 06/10/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The Escherichia coli ER2566 strain (NC_CP014268.2) was developed as a BL21 (DE3) derivative strain and had been widely used in recombinant protein expression. However, like many other current RefSeq annotations, the annotation of the ER2566 strain was incomplete, with missing gene names and miscellaneous RNAs, as well as uncorrected annotations of some pseudogenes. Here, we performed a systematic reannotation of the ER2566 genome by combining multiple annotation tools with manual revision to provide a comprehensive understanding of the E. coli ER2566 strain, and used high-throughput sequencing to explore how the strain adapted under external pressure. RESULTS The reannotation included noteworthy corrections to all protein-coding genes, led to the exclusion of 190 hypothetical genes or pseudogenes, and resulted in the addition of 237 coding sequences and 230 miscellaneous noncoding RNAs and 2 tRNAs. In addition, we further manually examined all 194 pseudogenes in the Ref-seq annotation and directly identified 123 (63%) as coding genes. We then used whole-genome sequencing and high-throughput RNA sequencing to assess mutational adaptations under consecutive subculture or overexpression burden. Whereas no mutations were detected in response to consecutive subculture, overexpression of the human papillomavirus 16 type capsid led to the identification of a mutation (position 1,094,824 within the 3' non-coding region) positioned 19-bp away from the lacI gene in the transcribed RNA, which was not detected at the genomic level by Sanger sequencing. CONCLUSION The ER2566 strain was used by both the general scientific community and the biotechnology industry. Reannotation of the E. coli ER2566 strain not only improved the RefSeq data but uncovered a key site that might be involved in the transcription and translation of genes encoding the lactose operon repressor. We proposed that our pipeline might offer a universal method for the reannotation of other bacterial genomes with high speed and accuracy. This study might facilitate a better understanding of gene function for the ER2566 strain under external burden and provided more clues to engineer bacteria for biotechnological applications.
Collapse
Affiliation(s)
- Lizhi Zhou
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen, 361102, Fujian, China
| | - Hai Yu
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen, 361102, Fujian, China
| | - Kaihang Wang
- National Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Tingting Chen
- National Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Yue Ma
- National Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Yang Huang
- National Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Jiajia Li
- National Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Liqin Liu
- National Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Yuqian Li
- National Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Zhibo Kong
- National Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Qingbing Zheng
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen, 361102, Fujian, China
| | - Yingbin Wang
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen, 361102, Fujian, China
| | - Ying Gu
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen, 361102, Fujian, China
- National Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Ningshao Xia
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen, 361102, Fujian, China
- National Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Shaowei Li
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen, 361102, Fujian, China.
- National Institute of Diagnostics and Vaccine Development in Infectious Disease, School of Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China.
| |
Collapse
|
2
|
Yu X, Wang Y, Kohnen MV, Piao M, Tu M, Gao Y, Lin C, Zuo Z, Gu L. Large Scale Profiling of Protein Isoforms Using Label-Free Quantitative Proteomics Revealed the Regulation of Nonsense-Mediated Decay in Moso Bamboo ( Phyllostachys edulis). Cells 2019; 8:E744. [PMID: 31330982 PMCID: PMC6678154 DOI: 10.3390/cells8070744] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 07/12/2019] [Accepted: 07/16/2019] [Indexed: 12/13/2022] Open
Abstract
Moso bamboo is an important forest species with a variety of ecological, economic, and cultural values. However, the gene annotation information of moso bamboo is only based on the transcriptome sequencing, lacking the evidence of proteome. The lignification and fiber in moso bamboo leads to a difficulty in the extraction of protein using conventional methods, which seriously hinders research on the proteomics of moso bamboo. The purpose of this study is to establish efficient methods for extracting the total proteins from moso bamboo for following mass spectrometry-based quantitative proteome identification. Here, we have successfully established a set of efficient methods for extracting total proteins of moso bamboo followed by mass spectrometry-based label-free quantitative proteome identification, which further improved the protein annotation of moso bamboo genes. In this study, 10,376 predicted coding genes were confirmed by quantitative proteomics, accounting for 35.8% of all annotated protein-coding genes. Proteome analysis also revealed the protein-coding potential of 1015 predicted long noncoding RNA (lncRNA), accounting for 51.03% of annotated lncRNAs. Thus, mass spectrometry-based proteomics provides a reliable method for gene annotation. Especially, quantitative proteomics revealed the translation patterns of proteins in moso bamboo. In addition, the 3284 transcript isoforms from 2663 genes identified by Pacific BioSciences (PacBio) single-molecule real-time long-read isoform sequencing (Iso-Seq) was confirmed on the protein level by mass spectrometry. Furthermore, domain analysis of mass spectrometry-identified proteins encoded in the same genomic locus revealed variations in domain composition pointing towards a functional diversification of protein isoform. Finally, we found that part transcripts targeted by nonsense-mediated mRNA decay (NMD) could also be translated into proteins. In summary, proteomic analysis in this study improves the proteomics-assisted genome annotation of moso bamboo and is valuable to the large-scale research of functional genomics in moso bamboo. In summary, this study provided a theoretical basis and technical support for directional gene function analysis at the proteomics level in moso bamboo.
Collapse
Affiliation(s)
- Xiaolan Yu
- Basic Forestry and Proteomics Research Center, College of Life Science, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yongsheng Wang
- Basic Forestry and Proteomics Research Center, College of Life Science, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Markus V Kohnen
- Basic Forestry and Proteomics Research Center, College of Life Science, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Mingxin Piao
- Basic Forestry and Proteomics Research Center, College of Life Science, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Jilin Province Engineering Laboratory of Plant Genetic Improvement, College of Plant Science, Jilin University, 5333 Xi'an Road, Changchun 130062, China
| | - Min Tu
- Basic Forestry and Proteomics Research Center, College of Life Science, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yubang Gao
- Basic Forestry and Proteomics Research Center, College of Life Science, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Chentao Lin
- Basic Forestry and Proteomics Research Center, College of Life Science, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Department of Molecular, Cell & Developmental Biology, University of California, Los Angeles, CA 90095, USA
| | - Zecheng Zuo
- Basic Forestry and Proteomics Research Center, College of Life Science, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
- Jilin Province Engineering Laboratory of Plant Genetic Improvement, College of Plant Science, Jilin University, 5333 Xi'an Road, Changchun 130062, China.
| | - Lianfeng Gu
- Basic Forestry and Proteomics Research Center, College of forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| |
Collapse
|
3
|
Guillot L, Delage L, Viari A, Vandenbrouck Y, Com E, Ritter A, Lavigne R, Marie D, Peterlongo P, Potin P, Pineau C. Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes. BMC Genomics 2019; 20:56. [PMID: 30654742 PMCID: PMC6337836 DOI: 10.1186/s12864-019-5431-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 01/03/2019] [Indexed: 01/02/2023] Open
Abstract
Background Accurate structural annotation of genomes is still a challenge, despite the progress made over the past decade. The prediction of gene structure remains difficult, especially for eukaryotic species, and is often erroneous and incomplete. We used a proteogenomics strategy, taking advantage of the combination of proteomics datasets and bioinformatics tools, to identify novel protein coding-genes and splice isoforms, assign correct start sites, and validate predicted exons and genes. Results Our proteogenomics workflow, Peptimapper, was applied to the genome annotation of Ectocarpus sp., a key reference genome for both the brown algal lineage and stramenopiles. We generated proteomics data from various life cycle stages of Ectocarpus sp. strains and sub-cellular fractions using a shotgun approach. First, we directly generated peptide sequence tags (PSTs) from the proteomics data. Second, we mapped PSTs onto the translated genomic sequence. Closely located hits (i.e., PSTs locations on the genome) were then clustered to detect potential coding regions based on parameters optimized for the organism. Third, we evaluated each cluster and compared it to gene predictions from existing conventional genome annotation approaches. Finally, we integrated cluster locations into GFF files to use a genome viewer. We identified two potential novel genes, a ribosomal protein L22 and an aryl sulfotransferase and corrected the gene structure of a dihydrolipoamide acetyltransferase. We experimentally validated the results by RT-PCR and using transcriptomics data. Conclusions Peptimapper is a complementary tool for the expert annotation of genomes. It is suitable for any organism and is distributed through a Docker image available on two public bioinformatics docker repositories: Docker Hub and BioShaDock. This workflow is also accessible through the Galaxy framework and for use by non-computer scientists at https://galaxy.protim.eu. Data are available via ProteomeXchange under identifier PXD010618. Electronic supplementary material The online version of this article (10.1186/s12864-019-5431-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laetitia Guillot
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Ludovic Delage
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | - Alain Viari
- INRIA Grenoble-Rhône-Alpes, F-38330, Montbonnot-Saint-Martin, France
| | - Yves Vandenbrouck
- University Grenoble Alpes, CEA, Inserm, BIG-BGE, 38000, Grenoble, France
| | - Emmanuelle Com
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Andrés Ritter
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France.,Present address: Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| | - Régis Lavigne
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Dominique Marie
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | | | - Philippe Potin
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | - Charles Pineau
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France. .,Protim, Univ Rennes, F-35042, Rennes cedex, France.
| |
Collapse
|
4
|
N-terminome and proteogenomic analysis of the Methylobacterium extorquens DM4 reference strain for dichloromethane utilization. J Proteomics 2018; 179:131-139. [DOI: 10.1016/j.jprot.2018.03.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 02/28/2018] [Accepted: 03/16/2018] [Indexed: 12/29/2022]
|