1
|
Szakállas N, Barták BK, Valcz G, Nagy ZB, Takács I, Molnár B. Can long-read sequencing tackle the barriers, which the next-generation could not? A review. Pathol Oncol Res 2024; 30:1611676. [PMID: 38818014 PMCID: PMC11137202 DOI: 10.3389/pore.2024.1611676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 04/30/2024] [Indexed: 06/01/2024]
Abstract
The large-scale heterogeneity of genetic diseases necessitated the deeper examination of nucleotide sequence alterations enhancing the discovery of new targeted drug attack points. The appearance of new sequencing techniques was essential to get more interpretable genomic data. In contrast to the previous short-reads, longer lengths can provide a better insight into the potential health threatening genetic abnormalities. Long-reads offer more accurate variant identification and genome assembly methods, indicating advances in nucleotide deflect-related studies. In this review, we introduce the historical background of sequencing technologies and show their benefits and limits, as well. Furthermore, we highlight the differences between short- and long-read approaches, including their unique advances and difficulties in methodologies and evaluation. Additionally, we provide a detailed description of the corresponding bioinformatics and the current applications.
Collapse
Affiliation(s)
- Nikolett Szakállas
- Department of Biological Physics, Faculty of Science, Eötvös Loránd University, Budapest, Hungary
| | - Barbara K. Barták
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Gábor Valcz
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
- HUN-REN-SU Translational Extracellular Vesicle Research Group, Budapest, Hungary
| | - Zsófia B. Nagy
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - István Takács
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Béla Molnár
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| |
Collapse
|
2
|
Petraccioli A, Maio N, Carotenuto R, Odierna G, Guarino FM. The Satellite DNA PcH-Sat, Isolated and Characterized in the Limpet Patella caerulea (Mollusca, Gastropoda), Suggests the Origin from a Nin-SINE Transposable Element. Genes (Basel) 2024; 15:541. [PMID: 38790169 PMCID: PMC11121367 DOI: 10.3390/genes15050541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 04/16/2024] [Accepted: 04/23/2024] [Indexed: 05/26/2024] Open
Abstract
Satellite DNA (sat-DNA) was previously described as junk and selfish DNA in the cellular economy, without a clear functional role. However, during the last two decades, evidence has been accumulated about the roles of sat-DNA in different cellular functions and its probable involvement in tumorigenesis and adaptation to environmental changes. In molluscs, studies on sat-DNAs have been performed mainly on bivalve species, especially those of economic interest. Conversely, in Gastropoda (which includes about 80% of the currently described molluscs species), studies on sat-DNA have been largely neglected. In this study, we isolated and characterized a sat-DNA, here named PcH-sat, in the limpet Patella caerulea using the restriction enzyme method, particularly HaeIII. Monomeric units of PcH-sat are 179 bp long, AT-rich (58.7%), and with an identity among monomers ranging from 91.6 to 99.8%. Southern blot showed that PcH-sat is conserved in P. depressa and P. ulyssiponensis, while a smeared signal of hybridization was present in the other three investigated limpets (P. ferruginea, P. rustica and P. vulgata). Dot blot showed that PcH-sat represents about 10% of the genome of P. caerulea, 5% of that of P. depressa, and 0.3% of that of P. ulyssiponensis. FISH showed that PcH-sat was mainly localized on pericentromeric regions of chromosome pairs 2 and 4-7 of P. caerulea (2n = 18). A database search showed that PcH-sat contains a large segment (of 118 bp) showing high identity with a homologous trait of the Nin-SINE transposable element (TE) of the patellogastropod Lottia gigantea, supporting the hypothesis that TEs are involved in the rising and tandemization processes of sat-DNAs.
Collapse
Affiliation(s)
| | | | | | - Gaetano Odierna
- Department of Biology, University of Naples Federico II, Via Cinthia, I-80126 Naples, Italy; (A.P.); (N.M.); (R.C.); (F.M.G.)
| | | |
Collapse
|
3
|
Ferreira JS, Bruschi DP. Tracking the Diversity and Chromosomal Distribution of the Olfactory Receptor Gene Repertoires of Three Anurans Species. J Mol Evol 2023; 91:793-805. [PMID: 37906255 DOI: 10.1007/s00239-023-10135-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 10/02/2023] [Indexed: 11/02/2023]
Abstract
Olfaction is a crucial capability for most vertebrates and is realized through olfactory receptors in the nasal cavity. The enormous diversity of olfactory receptors has been created by gene duplication, following a birth-and-death model of evolution. The olfactory receptor genes of the amphibians have received relatively little attention up to now, although recent studies have increased the number of species for which data are available. This study analyzed the diversity and chromosomal distribution of the OR genes of three anuran species (Engystomops pustulosus, Bufo bufo and Hymenochirus boettgeri). The OR genes were identified through searches for homologies, and sequence filtering and alignment using bioinformatic tools and scripts. A high diversity of OR genes was found in all three species, ranging from 917 in B. bufo to 1194 in H. boettgeri, and a total of 2076 OR genes in E. pustulosus. Six OR groups were recognized using an evolutionary gene tree analysis. While E. pustulosus has one of the highest numbers of genes of the gamma group (which detect airborne odorants) yet recorded in an anuran, B. bufo presented the smallest number of pseudogene sequences ever identified, with no pseudogenes in either the beta or epsilon groups. Although H. boettgeri shares many morphological adaptations for an aquatic lifestyle with Xenopus, and presented a similar number of genes related to the detection of water-soluble odorants, it had comparatively far fewer genes related to the detection of airborne odorants. This study is the first to describe the complete OR repertoire of the three study species and represents an important contribution to the understanding of the evolution and function of the sense of smell in vertebrates.
Collapse
Affiliation(s)
- Johnny Sousa Ferreira
- Laboratório de Citogenética Evolutiva e Conservação Animal (LabCECA), Departamento de Genética, Universidade Federal do Paraná (UFPR), Paraná, Brazil
| | - Daniel Pacheco Bruschi
- Laboratório de Citogenética Evolutiva e Conservação Animal (LabCECA), Departamento de Genética, Universidade Federal do Paraná (UFPR), Paraná, Brazil.
| |
Collapse
|
4
|
Liu Y, Shen X, Gong Y, Liu Y, Song B, Zeng X. Sequence Alignment/Map format: a comprehensive review of approaches and applications. Brief Bioinform 2023; 24:bbad320. [PMID: 37668049 DOI: 10.1093/bib/bbad320] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 08/16/2023] [Accepted: 08/18/2023] [Indexed: 09/06/2023] Open
Abstract
The Sequence Alignment/Map (SAM) format file is the text file used to record alignment information. Alignment is the core of sequencing analysis, and downstream tasks accept mapping results for further processing. Given the rapid development of the sequencing industry today, a comprehensive understanding of the SAM format and related tools is necessary to meet the challenges of data processing and analysis. This paper is devoted to retrieving knowledge in the broad field of SAM. First, the format of SAM is introduced to understand the overall process of the sequencing analysis. Then, existing work is systematically classified in accordance with generation, compression and application, and the involved SAM tools are specifically mined. Lastly, a summary and some thoughts on future directions are provided.
Collapse
Affiliation(s)
- Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, 410086, Changsha, China
| | - Xiangzhen Shen
- College of Computer Science and Electronic Engineering, Hunan University, 410086, Changsha, China
| | - Yongshun Gong
- School of Software, Shandong University, 250100, Jinan, China
| | - Yiping Liu
- College of Computer Science and Electronic Engineering, Hunan University, 410086, Changsha, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, 410086, Changsha, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, 410086, Changsha, China
| |
Collapse
|
5
|
Shukla H, Suryamohan K, Khan A, Mohan K, Perumal RC, Mathew OK, Menon R, Dixon MD, Muraleedharan M, Kuriakose B, Michael S, Krishnankutty SP, Zachariah A, Seshagiri S, Ramakrishnan U. Near-chromosomal de novo assembly of Bengal tiger genome reveals genetic hallmarks of apex predation. Gigascience 2022; 12:giac112. [PMID: 36576130 PMCID: PMC9795480 DOI: 10.1093/gigascience/giac112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 07/17/2022] [Accepted: 10/20/2022] [Indexed: 12/29/2022] Open
Abstract
The tiger, a poster child for conservation, remains an endangered apex predator. Continued survival and recovery will require a comprehensive understanding of genetic diversity and the use of such information for population management. A high-quality tiger genome assembly will be an important tool for conservation genetics, especially for the Indian tiger, the most abundant subspecies in the wild. Here, we present high-quality near-chromosomal genome assemblies of a female and a male wild Indian tiger (Panthera tigris tigris). Our assemblies had a scaffold N50 of >140 Mb, with 19 scaffolds corresponding to the 19 numbered chromosomes, containing 95% of the genome. Our assemblies also enabled detection of longer stretches of runs of homozygosity compared to previous assemblies, which will help improve estimates of genomic inbreeding. Comprehensive genome annotation identified 26,068 protein-coding genes, including several gene families involved in key morphological features such as the teeth, claws, vision, olfaction, taste, and body stripes. We also identified 301 microRNAs, 365 small nucleolar RNAs, 632 transfer RNAs, and other noncoding RNA elements, several of which are predicted to regulate key biological pathways that likely contribute to the tiger's apex predatory traits. We identify signatures of positive selection in the tiger genome that are consistent with the Panthera lineage. Our high-quality genome will enable use of noninvasive samples for comprehensive assessment of genetic diversity, thus supporting effective conservation and management of wild tiger populations.
Collapse
Affiliation(s)
- Harsh Shukla
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore 560065, India
| | - Kushal Suryamohan
- MedGenome Inc., Department of Research and Development, Foster City, CA 94404, USA
- SciGenom Research Foundation, Narayana Health City, Bangalore, Karnataka 560099, India
| | - Anubhab Khan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore 560065, India
| | - Krishna Mohan
- Department of Research and Development, AgriGenome Labs Private Ltd, Kochi, Kerala 682030, India
| | - Rajadurai C Perumal
- Department of Research and Development, AgriGenome Labs Private Ltd, Kochi, Kerala 682030, India
| | - Oommen K Mathew
- Department of Research and Development, AgriGenome Labs Private Ltd, Kochi, Kerala 682030, India
| | - Ramesh Menon
- MedGenome Labs Ltd., Narayana Health City, Bangalore, Karnataka 560099, India
| | - Mandumpala Davis Dixon
- Department of Research and Development, AgriGenome Labs Private Ltd, Kochi, Kerala 682030, India
| | - Megha Muraleedharan
- Department of Research and Development, AgriGenome Labs Private Ltd, Kochi, Kerala 682030, India
| | - Boney Kuriakose
- Department of Research and Development, AgriGenome Labs Private Ltd, Kochi, Kerala 682030, India
| | - Saju Michael
- Department of Research and Development, AgriGenome Labs Private Ltd, Kochi, Kerala 682030, India
| | - Sajesh P Krishnankutty
- Department of Research and Development, AgriGenome Labs Private Ltd, Kochi, Kerala 682030, India
| | - Arun Zachariah
- SciGenom Research Foundation, Narayana Health City, Bangalore, Karnataka 560099, India
- Wayanad Wildlife Sanctuary, Sultan Bathery, Kerala 673592, India
| | - Somasekar Seshagiri
- SciGenom Research Foundation, Narayana Health City, Bangalore, Karnataka 560099, India
- MedGenome Labs Ltd., Narayana Health City, Bangalore, Karnataka 560099, India
| | - Uma Ramakrishnan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore 560065, India
| |
Collapse
|
6
|
Sharma P, Masouleh AK, Topp B, Furtado A, Henry RJ. De novo chromosome level assembly of a plant genome from long read sequence data. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 109:727-736. [PMID: 34784084 PMCID: PMC9300133 DOI: 10.1111/tpj.15583] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 11/08/2021] [Accepted: 11/10/2021] [Indexed: 05/16/2023]
Abstract
Recent advances in the sequencing and assembly of plant genomes have allowed the generation of genomes with increasing contiguity and sequence accuracy. Chromosome level genome assemblies using sequence contigs generated from long read sequencing have involved the use of proximity analysis (Hi-C) or traditional genetic maps to guide the placement of sequence contigs within chromosomes. The development of highly accurate long reads by repeated sequencing of circularized DNA (HiFi; PacBio) has greatly increased the size of contigs. We now report the use of HiFiasm to assemble the genome of Macadamia jansenii, a genome that has been used as a model to test sequencing and assembly. This achieved almost complete chromosome level assembly from the sequence data alone without the need for higher level chromosome map information. Eight of the 14 chromosomes were represented by a single large contig (six with telomere repeats at both ends) and the other six assembled from two to four main contigs. The small number of chromosome breaks appears to be the result of highly repetitive regions including ribosomal genes that cannot be assembled by these approaches. De novo assembly of near complete chromosome level plant genomes now appears possible using these sequencing and assembly tools. Further targeted strategies might allow these remaining gaps to be closed.
Collapse
Affiliation(s)
- Priyanka Sharma
- Queensland Alliance for Agriculture and Food InnovationUniversity of QueenslandBrisbaneQLD4072Australia
| | | | - Bruce Topp
- Queensland Alliance for Agriculture and Food InnovationUniversity of QueenslandBrisbaneQLD4072Australia
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food InnovationUniversity of QueenslandBrisbaneQLD4072Australia
| | - Robert J. Henry
- Queensland Alliance for Agriculture and Food InnovationUniversity of QueenslandBrisbaneQLD4072Australia
- ARC Centre of Excellence for Plant Success in Nature and AgricultureUniversity of QueenslandBrisbaneQLD4072Australia
| |
Collapse
|
7
|
Li B, Zhang X, Liu Z, Wang L, Song L, Liang X, Dou S, Tu J, Shen J, Yi B, Wen J, Fu T, Dai C, Gao C, Wang A, Ma C. Genetic and Molecular Characterization of a Self-Compatible Brassica rapa Line Possessing a New Class II S Haplotype. PLANTS (BASEL, SWITZERLAND) 2021; 10:plants10122815. [PMID: 34961286 PMCID: PMC8709392 DOI: 10.3390/plants10122815] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 12/01/2021] [Accepted: 12/03/2021] [Indexed: 05/20/2023]
Abstract
Most flowering plants have evolved a self-incompatibility (SI) system to maintain genetic diversity by preventing self-pollination. The Brassica species possesses sporophytic self-incompatibility (SSI), which is controlled by the pollen- and stigma-determinant factors SP11/SCR and SRK. However, the mysterious molecular mechanism of SI remains largely unknown. Here, a new class II S haplotype, named BrS-325, was identified in a pak choi line '325', which was responsible for the completely self-compatible phenotype. To obtain the entire S locus sequences, a complete pak choi genome was gained through Nanopore sequencing and de novo assembly, which provided a good reference genome for breeding and molecular research in B. rapa. S locus comparative analysis showed that the closest relatives to BrS-325 was BrS-60, and high sequence polymorphism existed in the S locus. Meanwhile, two duplicated SRKs (BrSRK-325a and BrSRK-325b) were distributed in the BrS-325 locus with opposite transcription directions. BrSRK-325b and BrSCR-325 were expressed normally at the transcriptional level. The multiple sequence alignment of SCRs and SRKs in class II S haplotypes showed that a number of amino acid variations were present in the contact regions (CR II and CR III) of BrSCR-325 and the hypervariable regions (HV I and HV II) of BrSRK-325s, which may influence the binding and interaction between the ligand and the receptor. Thus, these results suggested that amino acid variations in contact sites may lead to the SI destruction of a new class II S haplotype BrS-325 in B. rapa. The complete SC phenotype of '325' showed the potential for practical breeding application value in B. rapa.
Collapse
Affiliation(s)
- Bing Li
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Xueli Zhang
- Wuhan Vegetable Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan 430345, China; (X.Z.); (L.S.)
| | - Zhiquan Liu
- Hunan Vegetable Research Institute, Hunan Academy of Agricultural Science, Changsha 410125, China;
| | - Lulin Wang
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Liping Song
- Wuhan Vegetable Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan 430345, China; (X.Z.); (L.S.)
| | - Xiaomei Liang
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Shengwei Dou
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Jinxing Tu
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Jinxiong Shen
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Bin Yi
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Jing Wen
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Tingdong Fu
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Cheng Dai
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
| | - Changbin Gao
- Wuhan Vegetable Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan 430345, China; (X.Z.); (L.S.)
- Correspondence: (C.G.); (A.W.); (C.M.); Tel.: +86-27-8728-18-07 (C.M.)
| | - Aihua Wang
- Wuhan Vegetable Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan 430345, China; (X.Z.); (L.S.)
- Correspondence: (C.G.); (A.W.); (C.M.); Tel.: +86-27-8728-18-07 (C.M.)
| | - Chaozhi Ma
- National Sub-Center of Rapeseed Improvement in Wuhan, National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; (B.L.); (L.W.); (X.L.); (S.D.); (J.T.); (J.S.); (B.Y.); (J.W.); (T.F.); (C.D.)
- Correspondence: (C.G.); (A.W.); (C.M.); Tel.: +86-27-8728-18-07 (C.M.)
| |
Collapse
|
8
|
Brázda V, Bohálová N, Bowater RP. New telomere to telomere assembly of human chromosome 8 reveals a previous underestimation of G-quadruplex forming sequences and inverted repeats. Gene 2021; 810:146058. [PMID: 34737002 DOI: 10.1016/j.gene.2021.146058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 10/14/2021] [Accepted: 10/29/2021] [Indexed: 11/04/2022]
Abstract
Taking advantage of evolving and improving sequencing methods, human chromosome 8 is now available as a gapless, end-to-end assembly. Thanks to advances in long-read sequencing technologies, its centromere, telomeres, duplicated gene families and repeat-rich regions are now fully sequenced. We were interested to assess if the new assembly altered our understanding of the potential impact of non-B DNA structures within this completed chromosome sequence. It has been shown that non-B secondary structures, such as G-quadruplexes, hairpins and cruciforms, have important regulatory functions and potential as targeted therapeutics. Therefore, we analysed the presence of putative G-quadruplex forming sequences and inverted repeats in the current human reference genome (GRCh38) and in the new end-to-end assembly of chromosome 8. The comparison revealed that the new assembly contains significantly more inverted repeats and G-quadruplex forming sequences compared to the current reference sequence. This observation can be explained by improved accuracy of the new sequencing methods, particularly in regions that contain extensive repeats of bases, as is preferred by many non-B DNA structures. These results show a significant underestimation of the prevalence of non-B DNA secondary structure in previous assembly versions of the human genome and point to their importance being not fully appreciated. We anticipate that similar observations will occur as the improved sequencing technologies fill in gaps across the genomes of humans and other organisms.
Collapse
Affiliation(s)
- Václav Brázda
- Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, Brno 612 65, Czech Republic.
| | - Natália Bohálová
- Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, Brno 612 65, Czech Republic; Department of Experimental Biology, Faculty of Science, Masaryk University, Kamenice 5, Brno 62500, Czech Republic
| | - Richard P Bowater
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich NR4 7TJ, United Kingdom.
| |
Collapse
|
9
|
Fouks B, Brand P, Nguyen HN, Herman J, Camara F, Ence D, Hagen DE, Hoff KJ, Nachweide S, Romoth L, Walden KKO, Guigo R, Stanke M, Narzisi G, Yandell M, Robertson HM, Koeniger N, Chantawannakul P, Schatz MC, Worley KC, Robinson GE, Elsik CG, Rueppell O. The genomic basis of evolutionary differentiation among honey bees. Genome Res 2021; 31:1203-1215. [PMID: 33947700 PMCID: PMC8256857 DOI: 10.1101/gr.272310.120] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 04/22/2021] [Indexed: 02/06/2023]
Abstract
In contrast to the western honey bee, Apis mellifera, other honey bee species have been largely neglected despite their importance and diversity. The genetic basis of the evolutionary diversification of honey bees remains largely unknown. Here, we provide a genome-wide comparison of three honey bee species, each representing one of the three subgenera of honey bees, namely the dwarf (Apis florea), giant (A. dorsata), and cavity-nesting (A. mellifera) honey bees with bumblebees as an outgroup. Our analyses resolve the phylogeny of honey bees with the dwarf honey bees diverging first. We find that evolution of increased eusocial complexity in Apis proceeds via increases in the complexity of gene regulation, which is in agreement with previous studies. However, this process seems to be related to pathways other than transcriptional control. Positive selection patterns across Apis reveal a trade-off between maintaining genome stability and generating genetic diversity, with a rapidly evolving piRNA pathway leading to genomes depleted of transposable elements, and a rapidly evolving DNA repair pathway associated with high recombination rates in all Apis species. Diversification within Apis is accompanied by positive selection in several genes whose putative functions present candidate mechanisms for lineage-specific adaptations, such as migration, immunity, and nesting behavior.
Collapse
Affiliation(s)
- Bertrand Fouks
- Department of Biology, University of North Carolina at Greensboro, Greensboro, North Carolina 27403, USA
- Institute for Evolution and Biodiversity, Molecular Evolution and Bioinformatics, Westfälische Wilhelms-Universität, 48149 Münster, Germany
| | - Philipp Brand
- Department of Evolution and Ecology, Center for Population Biology, University of California, Davis, Davis, California 95161, USA
- Laboratory of Neurophysiology and Behavior, The Rockefeller University, New York, New York 10065, USA
| | - Hung N Nguyen
- MU Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri 65211, USA
| | - Jacob Herman
- Department of Biology, University of North Carolina at Greensboro, Greensboro, North Carolina 27403, USA
| | - Francisco Camara
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08036 Barcelona, Spain
| | - Daniel Ence
- School of Forest Resources and Conservation, University of Florida, Gainesville, Florida 32611, USA
- Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
| | - Darren E Hagen
- Department of Animal and Food Sciences, Oklahoma State University, Stillwater, Oklahoma 74078, USA
| | - Katharina J Hoff
- University of Greifswald, Institute for Mathematics and Computer Science, Bioinformatics Group, 17489 Greifswald, Germany
- University of Greifswald, Center for Functional Genomics of Microbes, 17489 Greifswald, Germany
| | - Stefanie Nachweide
- University of Greifswald, Institute for Mathematics and Computer Science, Bioinformatics Group, 17489 Greifswald, Germany
| | - Lars Romoth
- University of Greifswald, Institute for Mathematics and Computer Science, Bioinformatics Group, 17489 Greifswald, Germany
| | - Kimberly K O Walden
- Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08036 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Mario Stanke
- University of Greifswald, Institute for Mathematics and Computer Science, Bioinformatics Group, 17489 Greifswald, Germany
- University of Greifswald, Center for Functional Genomics of Microbes, 17489 Greifswald, Germany
| | | | - Mark Yandell
- Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, Utah 84112, USA
| | - Hugh M Robertson
- Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Nikolaus Koeniger
- Department of Behavioral Physiology and Sociobiology (Zoology II), University of Würzburg, 97074 Würzburg, Germany
| | - Panuwan Chantawannakul
- Environmental Science Research Center (ESRC) and Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Michael C Schatz
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Kim C Worley
- Department of Molecular and Human Genetics, Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Gene E Robinson
- Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Neuroscience Program, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Christine G Elsik
- MU Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri 65211, USA
- Division of Animal Sciences, University of Missouri, Columbia, Missouri 65211, USA
- Division of Plant Sciences, University of Missouri, Columbia, Missouri 65211, USA
| | - Olav Rueppell
- Department of Biology, University of North Carolina at Greensboro, Greensboro, North Carolina 27403, USA
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| |
Collapse
|
10
|
Chateau A, Davot T, Lafond M. Efficient assembly consensus algorithms for divergent contig sets. Comput Biol Chem 2021; 93:107516. [PMID: 34082320 DOI: 10.1016/j.compbiolchem.2021.107516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 05/12/2021] [Indexed: 11/18/2022]
Abstract
Assembly is a fundamental task in genome sequencing, and many assemblers have been made available in the last decade. Because of the wide range of possible choices, it can be hard to determine which tool or parameter to use for a specific genome sequencing project. In this paper, we propose a consensus approach that takes the best parts of several contigs datasets produced by different methods, and combines them into a better assembly. This amounts to orienting and ordering sets of contigs, which can be viewed as an optimization problem where the aim is to find an alignment of two fragmented strings that maximizes an arbitrary scoring function between matched characters. In this work, we investigate the computational complexity of this problem. We first show that it is NP-hard, even in an alphabet with only two symbols and with all scores being either 0 or 1. On the positive side, we propose an efficient, quadratic time algorithm that achieves approximation factor 3.
Collapse
Affiliation(s)
| | - Tom Davot
- LIRMM - CNRS UMR 5506 Montpellier, France.
| | | |
Collapse
|
11
|
Guan D, McCarthy SA, Wood J, Howe K, Wang Y, Durbin R. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 2020; 36:2896-2898. [PMID: 31971576 PMCID: PMC7203741 DOI: 10.1093/bioinformatics/btaa025] [Citation(s) in RCA: 1109] [Impact Index Per Article: 277.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 12/17/2019] [Accepted: 01/19/2020] [Indexed: 01/23/2023] Open
Abstract
Motivation Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence in regions of high heterozygosity often results in assemblers creating two copies rather than one copy of a region, leading to breaks in contiguity and compromising downstream steps such as gene annotation. Several tools have been developed to resolve this problem. However, they either focus only on removing contained duplicate regions, also known as haplotigs, or fail to use all the relevant information and hence make errors. Results Here we present a novel tool, purge_dups, that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps. In comparison with current tools, we demonstrate that purge_dups can reduce heterozygous duplication and increase assembly continuity while maintaining completeness of the primary assembly. Moreover, purge_dups is fully automatic and can easily be integrated into assembly pipelines. Availability and implementation The source code is written in C and is available at https://github.com/dfguan/purge_dups. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dengfeng Guan
- Department of Computer Science and Technology, Center for Bioinformatics, Harbin Institute of Technology, Harbin 150001, China.,Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Shane A McCarthy
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | - Jonathan Wood
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Kerstin Howe
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| | - Yadong Wang
- Department of Computer Science and Technology, Center for Bioinformatics, Harbin Institute of Technology, Harbin 150001, China
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK.,Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK
| |
Collapse
|
12
|
Burgin J, Molitor C, Mohareb F. MapOptics: a light-weight, cross-platform visualization tool for optical mapping alignment. Bioinformatics 2020; 35:2671-2673. [PMID: 30535283 DOI: 10.1093/bioinformatics/bty1013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2018] [Revised: 11/27/2018] [Accepted: 12/06/2018] [Indexed: 12/24/2022] Open
Abstract
SUMMARY Bionano optical mapping is a technology that can assist in the final stages of genome assembly by lengthening and ordering scaffolds in a draft assembly by aligning the assembly to a genomic map. However, currently, tools for visualization are limited to use on a Windows operating system or are developed initially for visualizing large-scale structural variation. MapOptics is a lightweight cross-platform tool that enables the user to visualize and interact with the alignment of Bionano optical mapping data and can be used for in depth exploration of hybrid scaffolding alignments. It provides a fast, simple alternative to the large optical mapping analysis programs currently available for this area of research. AVAILABILITY AND IMPLEMENTATION MapOptics is implemented in Java 1.8 and released under an MIT licence. MapOptics can be downloaded from https://github.com/FadyMohareb/mapoptics and run on any standard desktop computer equipped with a Java Virtual Machine (JVM). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Josephine Burgin
- The Bioinformatics Group, Cranfield Soil and Agrifood Institute, School of Water, Energy and Environment, Cranfield University, Bedford MK43 0AL, UK
| | - Corentin Molitor
- The Bioinformatics Group, Cranfield Soil and Agrifood Institute, School of Water, Energy and Environment, Cranfield University, Bedford MK43 0AL, UK
| | - Fady Mohareb
- The Bioinformatics Group, Cranfield Soil and Agrifood Institute, School of Water, Energy and Environment, Cranfield University, Bedford MK43 0AL, UK
| |
Collapse
|
13
|
Li C, Li X, Liu H, Wang X, Li W, Chen MS, Niu LJ. Chromatin Architectures Are Associated with Response to Dark Treatment in the Oil Crop Sesamum indicum, Based on a High-Quality Genome Assembly. PLANT & CELL PHYSIOLOGY 2020; 61:978-987. [PMID: 32154879 DOI: 10.1093/pcp/pcaa026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 03/01/2020] [Indexed: 05/21/2023]
Abstract
Eukaryotic chromatin is tightly packed into hierarchical structures, allowing appropriate gene transcription in response to environmental and developmental cues. Here, we provide a chromosome-scale de novo genome assembly of sesame with a total length of 292.3 Mb and a scaffold N50 of 20.5 Mb, containing estimated 28,406 coding genes using Pacific Biosciences long reads combined with a genome-wide chromosome conformation capture (Hi-C) approach. Based on this high-quality reference genome, we detected changes in chromatin architectures between normal growth and dark-treated sesame seedlings. Gene expression level was significantly higher in 'A' compartment and topologically associated domain (TAD) boundary regions than in 'B' compartment and TAD interior regions, which is coincident with the enrichment of H4K3me3 modification in these regions. Moreover, differentially expressed genes (DEGs) induced by dark treated were enriched in the changed TAD-related regions and genomic differential contact regions. Gene Ontology (GO) enrichment analysis of DEGs showed that genes related to 'response to stress' and 'photosynthesis' functional categories were enriched, which corresponds to dark treatment. These results suggested that chromatin organization is associated with gene transcription in response to dark treatment in sesame. Our results will facilitate the understanding of regulatory mechanisms in response to environmental cues in plants.
Collapse
Affiliation(s)
- Chaoqiong Li
- College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, Henan 466001, China
| | - Xiaoli Li
- College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, Henan 466001, China
- Key Laboratory of Plant Genetics and Molecular Breeding, Zhoukou Normal University, Zhoukou, Henan 466001, China
| | - Hongzhan Liu
- College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, Henan 466001, China
| | - Xueqin Wang
- College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, Henan 466001, China
| | - Weifeng Li
- Sesame Experiment Station, Zhoukou Academy of Agricultural Sciences, Zhoukou, Henan 466001, China
| | - Mao-Sheng Chen
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Menglun, Yunnan 666303, China
| | - Long-Jian Niu
- Department of Biology, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
- Department of Biology, Nankai University, Tianjin 660885, China
| |
Collapse
|
14
|
Exposito-Alonso M, Drost HG, Burbano HA, Weigel D. The Earth BioGenome project: opportunities and challenges for plant genomics and conservation. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 102:222-229. [PMID: 31788877 DOI: 10.1111/tpj.14631] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 11/03/2019] [Accepted: 11/18/2019] [Indexed: 05/28/2023]
Abstract
Sequencing them all. That is the ambitious goal of the recently launched Earth BioGenome project (Proceedings of the National Academy of Sciences of the United States of America, 115, 4325-4333), which aims to produce reference genomes for all eukaryotic species within the next decade. In this perspective, we discuss the opportunities of this project with a plant focus, but highlight also potential limitations. This includes the question of how to best capture all plant diversity, as the green taxon is one of the most complex clades in the tree of life, with over 300 000 species. For this, we highlight four key points: (i) the unique biological insights that could be gained from studying plants, (ii) their apparent underrepresentation in sequencing efforts given the number of threatened species, (iii) the necessity of phylogenomic methods that are aware of differences in genome complexity and quality, and (iv) the accounting for within-species genetic diversity and the historical aspect of conservation genetics.
Collapse
Affiliation(s)
| | - Hajk-Georg Drost
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076, Tübingen, Germany
- The Sainsbury Laboratory, University of Cambridge, 47 Bateman Street, CB2 1LR, Cambridge, UK
| | - Hernán A Burbano
- Centre for Life's Origins and Evolution, Department of Genetics Evolution and Environment, University College London, London, WC1H 0AG, UK
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076, Tübingen, Germany
| |
Collapse
|
15
|
Bayega A, Djambazian H, Tsoumani KT, Gregoriou ME, Sagri E, Drosopoulou E, Mavragani-Tsipidou P, Giorda K, Tsiamis G, Bourtzis K, Oikonomopoulos S, Dewar K, Church DM, Papanicolaou A, Mathiopoulos KD, Ragoussis J. De novo assembly of the olive fruit fly (Bactrocera oleae) genome with linked-reads and long-read technologies minimizes gaps and provides exceptional Y chromosome assembly. BMC Genomics 2020; 21:259. [PMID: 32228451 PMCID: PMC7106766 DOI: 10.1186/s12864-020-6672-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 03/13/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The olive fruit fly, Bactrocera oleae, is the most important pest in the olive fruit agribusiness industry. This is because female flies lay their eggs in the unripe fruits and upon hatching the larvae feed on the fruits thus destroying them. The lack of a high-quality genome and other genomic and transcriptomic data has hindered progress in understanding the fly's biology and proposing alternative control methods to pesticide use. RESULTS Genomic DNA was sequenced from male and female Demokritos strain flies, maintained in the laboratory for over 45 years. We used short-, mate-pair-, and long-read sequencing technologies to generate a combined male-female genome assembly (GenBank accession GCA_001188975.2). Genomic DNA sequencing from male insects using 10x Genomics linked-reads technology followed by mate-pair and long-read scaffolding and gap-closing generated a highly contiguous 489 Mb genome with a scaffold N50 of 4.69 Mb and L50 of 30 scaffolds (GenBank accession GCA_001188975.4). RNA-seq data generated from 12 tissues and/or developmental stages allowed for genome annotation. Short reads from both males and females and the chromosome quotient method enabled identification of Y-chromosome scaffolds which were extensively validated by PCR. CONCLUSIONS The high-quality genome generated represents a critical tool in olive fruit fly research. We provide an extensive RNA-seq data set, and genome annotation, critical towards gaining an insight into the biology of the olive fruit fly. In addition, elucidation of Y-chromosome sequences will advance our understanding of the Y-chromosome's organization, function and evolution and is poised to provide avenues for sterile insect technique approaches.
Collapse
Affiliation(s)
- Anthony Bayega
- McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montreal, Canada
| | - Haig Djambazian
- McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montreal, Canada
| | - Konstantina T. Tsoumani
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece
| | - Maria-Eleni Gregoriou
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece
| | - Efthimia Sagri
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece
| | - Eleni Drosopoulou
- Department of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | | | - Kristina Giorda
- Integrated DNA Technologies, Inc., 1710 Commercial Park, Coralville, Iowa, 52241 USA
| | - George Tsiamis
- Department of Environmental Engineering, University of Patras, Agrinio, Greece
| | - Kostas Bourtzis
- Insect Pest Control Laboratory, Joint FAO/IAEA Division of Nuclear Techniques in Food and Agriculture, Vienna, Austria
| | - Spyridon Oikonomopoulos
- McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montreal, Canada
| | - Ken Dewar
- McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montreal, Canada
| | - Deanna M. Church
- Inscripta, Inc., 5500 Central Avenue #220, Boulder, CO 80301 USA
| | - Alexie Papanicolaou
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, NSW 2753 Australia
| | - Kostas D. Mathiopoulos
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece
| | - Jiannis Ragoussis
- McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montreal, Canada
| |
Collapse
|
16
|
Ben Ali A, Luque G, Alba E. An efficient discrete PSO coupled with a fast local search heuristic for the DNA fragment assembly problem. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.10.026] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
17
|
Hao Z, Lv D, Ge Y, Shi J, Weijers D, Yu G, Chen J. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput Sci 2020; 6:e251. [PMID: 33816903 PMCID: PMC7924719 DOI: 10.7717/peerj-cs.251] [Citation(s) in RCA: 227] [Impact Index Per Article: 56.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 12/19/2019] [Indexed: 05/03/2023]
Abstract
BACKGROUND Owing to the rapid advances in DNA sequencing technologies, whole genome from more and more species are becoming available at increasing pace. For whole-genome analysis, idiograms provide a very popular, intuitive and effective way to map and visualize the genome-wide information, such as GC content, gene and repeat density, DNA methylation distribution, genomic synteny, etc. However, most available software programs and web servers are available only for a few model species, such as human, mouse and fly, or have limited application scenarios. As more and more non-model species are sequenced with chromosome-level assembly being available, tools that can generate idiograms for a broad range of species and be capable of visualizing more data types are needed to help better understanding fundamental genome characteristics. RESULTS The R package RIdeogram allows users to build high-quality idiograms of any species of interest. It can map continuous and discrete genome-wide data on the idiograms and visualize them in a heat map and track labels, respectively. CONCLUSION The visualization of genome-wide data mapping and comparison allow users to quickly establish a clear impression of the chromosomal distribution pattern, thus making RIdeogram a useful tool for any researchers working with omics.
Collapse
Affiliation(s)
- Zhaodong Hao
- Key Laboratory of Forest Genetics & Biotechnology of Ministry of Education, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, Jiangsu, China
- Laboratory of Biochemistry, Wageningen University, Wageningen, Haarlem, Netherlands
| | - Dekang Lv
- Institute of Cancer Stem Cell, Dalian Medical University, Dalian, Liaoning, China
| | - Ying Ge
- Institute of Cancer Stem Cell, Dalian Medical University, Dalian, Liaoning, China
| | - Jisen Shi
- Key Laboratory of Forest Genetics & Biotechnology of Ministry of Education, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, Jiangsu, China
| | - Dolf Weijers
- Laboratory of Biochemistry, Wageningen University, Wageningen, Haarlem, Netherlands
| | - Guangchuang Yu
- Institute of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China
| | - Jinhui Chen
- Key Laboratory of Forest Genetics & Biotechnology of Ministry of Education, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, Jiangsu, China
| |
Collapse
|
18
|
Improvement of the Pacific bluefin tuna (Thunnus orientalis) reference genome and development of male-specific DNA markers. Sci Rep 2019; 9:14450. [PMID: 31595011 PMCID: PMC6783451 DOI: 10.1038/s41598-019-50978-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 09/17/2019] [Indexed: 12/30/2022] Open
Abstract
The Pacific bluefin tuna, Thunnus orientalis, is a highly migratory species that is widely distributed in the North Pacific Ocean. Like other marine species, T. orientalis has no external sexual dimorphism; thus, identifying sex-specific variants from whole genome sequence data is a useful approach to develop an effective sex identification method. Here, we report an improved draft genome of T. orientalis and male-specific DNA markers. Combining PacBio long reads and Illumina short reads sufficiently improved genome assembly, with a 38-fold increase in scaffold contiguity (to 444 scaffolds) compared to the first published draft genome. Through analysing re-sequence data of 15 males and 16 females, 250 male-specific SNPs were identified from more than 30 million polymorphisms. All male-specific variants were male-heterozygous, suggesting that T. orientalis has a male heterogametic sex-determination system. The largest linkage disequilibrium block (3,174 bp on scaffold_064) contained 51 male-specific variants. PCR primers and a PCR-based sex identification assay were developed using these male-specific variants. The sex of 115 individuals (56 males and 59 females; sex was diagnosed by visual examination of the gonads) was identified with high accuracy using the assay. This easy, accurate, and practical technique facilitates the control of sex ratios in tuna farms. Furthermore, this method could be used to estimate the sex ratio and/or the sex-specific growth rate of natural populations.
Collapse
|
19
|
Costessi A, van den Bogert B, May A, Ver Loren van Themaat E, Roubos JA, Kolkman MAB, Butler D, Pirovano W. Novel sequencing technologies to support industrial biotechnology. FEMS Microbiol Lett 2019; 365:4982775. [PMID: 30010862 DOI: 10.1093/femsle/fny103] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 04/19/2018] [Indexed: 12/11/2022] Open
Abstract
Industrial biotechnology develops and applies microorganisms for the production of bioproducts and enzymes with applications ranging from food and feed ingredients and processing to bio-based chemicals, biofuels and pharmaceutical products. Next generation DNA sequencing technologies play an increasingly important role in improving and accelerating microbial strain development for existing and novel bio-products via screening, gene and pathway discovery, metabolic engineering and additional optimization and understanding of large-scale manufacturing. In this mini-review, we describe novel DNA sequencing and analysis technologies with a focus on applications to industrial strain development, enzyme discovery and microbial community analysis.
Collapse
Affiliation(s)
- Adalberto Costessi
- Next Generation Sequencing Department, BaseClear B.V., Sylviusweg 74, 2333 BE, Leiden, The Netherlands
| | | | - Ali May
- Bioinformatics Department, BaseClear B.V., Sylviusweg 74, 2333 BE, Leiden, The Netherlands
| | | | - Johannes A Roubos
- DSM Biotechnology Center, DSM, Alexander Fleminglaan 1, 2600 MA, Delft, The Netherlands
| | - Marc A B Kolkman
- Division of Industrial Biosciences, DuPont, Archimedesweg 30, 2300 AE, Leiden, The Netherlands
| | - Derek Butler
- Bianomics Business Unit, BaseClear B.V., Sylviusweg 74, 2333 BE, Leiden, The Netherlands
| | - Walter Pirovano
- Bioinformatics Department, BaseClear B.V., Sylviusweg 74, 2333 BE, Leiden, The Netherlands
| |
Collapse
|
20
|
Smits THM. The importance of genome sequence quality to microbial comparative genomics. BMC Genomics 2019; 20:662. [PMID: 31429698 PMCID: PMC6701015 DOI: 10.1186/s12864-019-6014-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Accepted: 08/05/2019] [Indexed: 12/16/2022] Open
Abstract
The quality of microbial genome sequences has been a concern ever since the emergence of genome sequencing. The quality of the genome assemblies is dependent on the sequencing technology used and the aims for which the sequence was generated. Novel sequencing and bioinformatics technologies are not intrinsically better than the older technologies, although they are generally more efficient. In this correspondence, the importance for comparative genomics of additional manual assembly efforts over autoassembly and careful annotation is emphasized.
Collapse
Affiliation(s)
- Theo H M Smits
- Environmental Genomics and Systems Biology Research Group, Institute of Natural Resource Sciences (IUNR), Zurich University of Applied Sciences ZHAW, Wädenswil, Switzerland.
| |
Collapse
|
21
|
Sedlazeck FJ, Lee H, Darby CA, Schatz MC. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet 2019; 19:329-346. [PMID: 29599501 DOI: 10.1038/s41576-018-0003-4] [Citation(s) in RCA: 291] [Impact Index Per Article: 58.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.
Collapse
Affiliation(s)
- Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Hayan Lee
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Charlotte A Darby
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. .,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
22
|
Wang A, Wang Z, Li Z, Li LM. BAUM: improving genome assembly by adaptive unique mapping and local overlap-layout-consensus approach. Bioinformatics 2019; 34:2019-2028. [PMID: 29346504 DOI: 10.1093/bioinformatics/bty020] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 01/12/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the second generation sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the single-molecule real-time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Results Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can (i) perform reference-assisted assembly based on the genome of a close species (ii) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. Availability and implementation http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anqi Wang
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhanyu Wang
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zheng Li
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Lei M Li
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
23
|
Larsen PA, Matocq MD. Emerging genomic applications in mammalian ecology, evolution, and conservation. J Mammal 2019. [DOI: 10.1093/jmammal/gyy184] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Affiliation(s)
- Peter A Larsen
- Department of Veterinary and Biomedical Sciences, University of Minnesota, Saint Paul, MN, USA
| | - Marjorie D Matocq
- Department of Natural Resources and Environmental Science; Program in Ecology, Evolution, and Conservation Biology, University of Nevada, Reno, NV, USA
| |
Collapse
|
24
|
Wallberg A, Bunikis I, Pettersson OV, Mosbech MB, Childers AK, Evans JD, Mikheyev AS, Robertson HM, Robinson GE, Webster MT. A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds. BMC Genomics 2019; 20:275. [PMID: 30961563 PMCID: PMC6454739 DOI: 10.1186/s12864-019-5642-0] [Citation(s) in RCA: 129] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 03/24/2019] [Indexed: 01/27/2023] Open
Abstract
Background The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map. Results Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel_HAv3) is significantly more contiguous and complete than the previous one (Amel_4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor > 98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features. Conclusions The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics. Electronic supplementary material The online version of this article (10.1186/s12864-019-5642-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Andreas Wallberg
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ignas Bunikis
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Olga Vinnere Pettersson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Mai-Britt Mosbech
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Anna K Childers
- USDA-ARS Insect Genetics and Biochemistry Research Unit, Fargo, ND, USA.,USDA-ARS Bee Research Lab, Beltsville, MD, USA
| | - Jay D Evans
- USDA-ARS Bee Research Lab, Beltsville, MD, USA
| | | | - Hugh M Robertson
- Department of Entomology and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Gene E Robinson
- Department of Entomology and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Matthew T Webster
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
25
|
Talsania K, Mehta M, Raley C, Kriga Y, Gowda S, Grose C, Drew M, Roberts V, Cheng KT, Burkett S, Oeser S, Stephens R, Soppet D, Chen X, Kumar P, German O, Smirnova T, Hautman C, Shetty J, Tran B, Zhao Y, Esposito D. Genome Assembly and Annotation of the Trichoplusia ni Tni-FNL Insect Cell Line Enabled by Long-Read Technologies. Genes (Basel) 2019; 10:genes10020079. [PMID: 30678108 PMCID: PMC6409714 DOI: 10.3390/genes10020079] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 01/09/2019] [Accepted: 01/14/2019] [Indexed: 12/22/2022] Open
Abstract
Background: Trichoplusia ni derived cell lines are commonly used to enable recombinant protein expression via baculovirus infection to generate materials approved for clinical use and in clinical trials. In order to develop systems biology and genome engineering tools to improve protein expression in this host, we performed de novo genome assembly of the Trichoplusia ni-derived cell line Tni-FNL. Methods: By integration of PacBio single-molecule sequencing, Bionano optical mapping, and 10X Genomics linked-reads data, we have produced a draft genome assembly of Tni-FNL. Results: Our assembly contains 280 scaffolds, with a N50 scaffold size of 2.3 Mb and a total length of 359 Mb. Annotation of the Tni-FNL genome resulted in 14,101 predicted genes and 93.2% of the predicted proteome contained recognizable protein domains. Ortholog searches within the superorder Holometabola provided further evidence of high accuracy and completeness of the Tni-FNL genome assembly. Conclusions: This first draft Tni-FNL genome assembly was enabled by complementary long-read technologies and represents a high-quality, well-annotated genome that provides novel insight into the complexity of this insect cell line and can serve as a reference for future large-scale genome engineering work in this and other similar recombinant protein production hosts.
Collapse
Affiliation(s)
- Keyur Talsania
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Monika Mehta
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Castle Raley
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Yuliya Kriga
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Sujatha Gowda
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Carissa Grose
- NCI RAS Initiative, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Matthew Drew
- NCI RAS Initiative, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Veronica Roberts
- NCI RAS Initiative, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Kwong Tai Cheng
- NCI RAS Initiative, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Sandra Burkett
- Comparative Molecular Cytogenetics Core Facility, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | | | - Robert Stephens
- NCI RAS Initiative, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Daniel Soppet
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Xiongfeng Chen
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Parimal Kumar
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Oksana German
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Tatyana Smirnova
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Christopher Hautman
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Jyoti Shetty
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Bao Tran
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Yongmei Zhao
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| | - Dominic Esposito
- NCI RAS Initiative, Frederick National Laboratory for Cancer Research Sponsored by the National Cancer Institute, Frederick, MD 21701, USA.
| |
Collapse
|
26
|
Cook DE, Valle-Inclan JE, Pajoro A, Rovenich H, Thomma BP, Faino L. Long-Read Annotation: Automated Eukaryotic Genome Annotation Based on Long-Read cDNA Sequencing. PLANT PHYSIOLOGY 2019; 179:38-54. [PMID: 30401722 PMCID: PMC6324239 DOI: 10.1104/pp.18.00848] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 10/19/2018] [Indexed: 05/16/2023]
Abstract
Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short- and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes (Verticillium dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.
Collapse
Affiliation(s)
- David E. Cook
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| | - Jose Espejo Valle-Inclan
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| | - Alice Pajoro
- Laboratory of Molecular Biology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| | - Hanna Rovenich
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| | - Bart P.H.J. Thomma
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
- Author for contact:
| | - Luigi Faino
- Laboratory of Phytopathology, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| |
Collapse
|
27
|
Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol 2018; 36:nbt.4277. [PMID: 30346939 PMCID: PMC6476705 DOI: 10.1038/nbt.4277] [Citation(s) in RCA: 270] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 09/10/2018] [Indexed: 12/20/2022]
Abstract
Complex allelic variation hampers the assembly of haplotype-resolved sequences from diploid genomes. We developed trio binning, an approach that simplifies haplotype assembly by resolving allelic variation before assembly. In contrast with prior approaches, the effectiveness of our method improved with increasing heterozygosity. Trio binning uses short reads from two parental genomes to first partition long reads from an offspring into haplotype-specific sets. Each haplotype is then assembled independently, resulting in a complete diploid reconstruction. We used trio binning to recover both haplotypes of a diploid human genome and identified complex structural variants missed by alternative approaches. We sequenced an F1 cross between the cattle subspecies Bos taurus taurus and Bos taurus indicus and completely assembled both parental haplotypes with NG50 haplotig sizes of >20 Mb and 99.998% accuracy, surpassing the quality of current cattle reference genomes. We suggest that trio binning improves diploid genome assembly and will facilitate new studies of haplotype variation and inheritance.
Collapse
Affiliation(s)
- Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Alexander T. Dilthey
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
- Institute of Medical Microbiology, Heinrich-Heine-University Düsseldorf, Düsseldorf, North Rhine-Westphalia, Germany
| | - Derek M. Bickhart
- Cell Wall Biology and Utilization Laboratory, ARS USDA, Madison, Wisconsin, USA
| | | | - Stefan Hiendleder
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy SA, Australia
- Robinson Research Institute, The University of Adelaide, Adelaide SA, Australia
| | - John L. Williams
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy SA, Australia
| | | | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| |
Collapse
|
28
|
Guppy JL, Jones DB, Jerry DR, Wade NM, Raadsma HW, Huerlimann R, Zenger KR. The State of " Omics" Research for Farmed Penaeids: Advances in Research and Impediments to Industry Utilization. Front Genet 2018; 9:282. [PMID: 30123237 PMCID: PMC6085479 DOI: 10.3389/fgene.2018.00282] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 07/09/2018] [Indexed: 12/19/2022] Open
Abstract
Elucidating the underlying genetic drivers of production traits in agricultural and aquaculture species is critical to efforts to maximize farming efficiency. "Omics" based methods (i.e., transcriptomics, genomics, proteomics, and metabolomics) are increasingly being applied to gain unprecedented insight into the biology of many aquaculture species. While the culture of penaeid shrimp has increased markedly, the industry continues to be impeded in many regards by disease, reproductive dysfunction, and a poor understanding of production traits. Extensive effort has been, and continues to be, applied to develop critical genomic resources for many commercially important penaeids. However, the industry application of these genomic resources, and the translation of the knowledge derived from "omics" studies has not yet been completely realized. Integration between the multiple "omics" resources now available (i.e., genome assemblies, transcriptomes, linkage maps, optical maps, and proteomes) will prove critical to unlocking the full utility of these otherwise independently developed and isolated resources. Furthermore, emerging "omics" based techniques are now available to address longstanding issues with completing keystone genome assemblies (e.g., through long-read sequencing), and can provide cost-effective industrial scale genotyping tools (e.g., through low density SNP chips and genotype-by-sequencing) to undertake advanced selective breeding programs (i.e., genomic selection) and powerful genome-wide association studies. In particular, this review highlights the status, utility and suggested path forward for continued development, and improved use of "omics" resources in penaeid aquaculture.
Collapse
Affiliation(s)
- Jarrod L. Guppy
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- College of Science and Engineering and Centre for Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, QLD, Australia
| | - David B. Jones
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- College of Science and Engineering and Centre for Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, QLD, Australia
| | - Dean R. Jerry
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- College of Science and Engineering and Centre for Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, QLD, Australia
| | - Nicholas M. Wade
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- Aquaculture Program, CSIRO Agriculture & Food, Queensland Bioscience Precinct, St Lucia, QLD, Australia
| | - Herman W. Raadsma
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- Faculty of Science, Sydney School of Veterinary Science, The University of Sydney, Camden, NSW, Australia
| | - Roger Huerlimann
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- College of Science and Engineering and Centre for Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, QLD, Australia
| | - Kyall R. Zenger
- Australian Research Council Industrial Transformation Research Hub for Advanced Prawn Breeding, James Cook University, Townsville, QLD, Australia
- College of Science and Engineering and Centre for Sustainable Tropical Fisheries and Aquaculture, James Cook University, Townsville, QLD, Australia
| |
Collapse
|
29
|
An D, Li C, Zhou Y, Wu Y, Wang W. Genomes and Transcriptomes of Duckweeds. Front Chem 2018; 6:230. [PMID: 29974050 PMCID: PMC6019479 DOI: 10.3389/fchem.2018.00230] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 05/31/2018] [Indexed: 11/23/2022] Open
Abstract
Duckweeds (Lemnaceae family) are the smallest flowering plants that adapt to the aquatic environment. They are regarded as the promising sustainable feedstock with the characteristics of high starch storage, fast propagation, and global distribution. The duckweed genome size varies 13-fold ranging from 150 Mb in Spirodela polyrhiza to 1,881 Mb in Wolffia arrhiza. With the development of sequencing technology and bioinformatics, five duckweed genomes from Spirodela and Lemna genera are sequenced and assembled. The genome annotations discover that they share similar protein orthologs, whereas the repeat contents could mainly explain the genome size difference. The gene families responsible for cell growth and expansion, lignin biosynthesis, and flowering are greatly contracted. However, the gene family of glutamate synthase has experienced expansion, indicating their significance in ammonia assimilation and nitrogen transport. The transcriptome is comprehensively sequenced for the genera of Spirodela, Landoltia, and Lemna, including various treatments such as abscisic acid, radiation, heavy metal, and starvation. The analysis of the underlying molecular mechanism and the regulatory network would accelerate their applications in the fields of bioenergy and phytoremediation. The comparative genomics has shown that duckweed genomes contain relatively low gene numbers and more contracted gene families, which may be in parallel with their highly reduced morphology with a simple leaf and primary roots. Still, we are waiting for the advancement of the long read sequencing technology to resolve the complex genomes and transcriptomes for unsequenced Wolffiella and Wolffia due to the large genome sizes and the similarity in their polyploidy.
Collapse
Affiliation(s)
- Dong An
- Department of Plant Sciences, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Changsheng Li
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Yong Zhou
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Yongrui Wu
- National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Shanghai Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Wenqin Wang
- Department of Plant Sciences, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
30
|
Campbell CR, Poelstra JW, Yoder AD. What is Speciation Genomics? The roles of ecology, gene flow, and genomic architecture in the formation of species. Biol J Linn Soc Lond 2018. [DOI: 10.1093/biolinnean/bly063] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
| | - J W Poelstra
- Department of Biology, Duke University, Durham, NC, USA
| | - Anne D Yoder
- Department of Biology, Duke University, Durham, NC, USA
| |
Collapse
|
31
|
Berthelier J, Casse N, Daccord N, Jamilloux V, Saint-Jean B, Carrier G. A transposable element annotation pipeline and expression analysis reveal potentially active elements in the microalga Tisochrysis lutea. BMC Genomics 2018. [PMID: 29783941 DOI: 10.17882/52231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023] Open
Abstract
BACKGROUND Transposable elements (TEs) are mobile DNA sequences known as drivers of genome evolution. Their impacts have been widely studied in animals, plants and insects, but little is known about them in microalgae. In a previous study, we compared the genetic polymorphisms between strains of the haptophyte microalga Tisochrysis lutea and suggested the involvement of active autonomous TEs in their genome evolution. RESULTS To identify potentially autonomous TEs, we designed a pipeline named PiRATE (Pipeline to Retrieve and Annotate Transposable Elements, download: https://doi.org/10.17882/51795 ), and conducted an accurate TE annotation on a new genome assembly of T. lutea. PiRATE is composed of detection, classification and annotation steps. Its detection step combines multiple, existing analysis packages representing all major approaches for TE detection and its classification step was optimized for microalgal genomes. The efficiency of the detection and classification steps was evaluated with data on the model species Arabidopsis thaliana. PiRATE detected 81% of the TE families of A. thaliana and correctly classified 75% of them. We applied PiRATE to T. lutea genomic data and established that its genome contains 15.89% Class I and 4.95% Class II TEs. In these, 3.79 and 17.05% correspond to potentially autonomous and non-autonomous TEs, respectively. Annotation data was combined with transcriptomic and proteomic data to identify potentially active autonomous TEs. We identified 17 expressed TE families and, among these, a TIR/Mariner and a TIR/hAT family were able to synthesize their transposase. Both these TE families were among the three highest expressed genes in a previous transcriptomic study and are composed of highly similar copies throughout the genome of T. lutea. This sum of evidence reveals that both these TE families could be capable of transposing or triggering the transposition of potential related MITE elements. CONCLUSION This manuscript provides an example of a de novo transposable element annotation of a non-model organism characterized by a fragmented genome assembly and belonging to a poorly studied phylum at genomic level. Integration of multi-omics data enabled the discovery of potential mobile TEs and opens the way for new discoveries on the role of these repeated elements in genomic evolution of microalgae.
Collapse
Affiliation(s)
- Jérémy Berthelier
- IFREMER, Physiology and Biotechnology of Algae Laboratory, rue de l'Ile d'Yeu, 44311, Nantes, France.
- Mer Molécules Santé, EA 2160 IUML - FR 3473 CNRS, Le Mans University, Le Mans, France.
| | - Nathalie Casse
- Mer Molécules Santé, EA 2160 IUML - FR 3473 CNRS, Le Mans University, Le Mans, France
| | - Nicolas Daccord
- Institut de Recherche en Horticulture et Semences, INRA of Angers, AGROCAMPUS-Ouest, SFR4207 QUASAV, Université d'Angers, Angers, France
- Université Bretagne Loire, Angers, France
| | | | - Bruno Saint-Jean
- IFREMER, Physiology and Biotechnology of Algae Laboratory, rue de l'Ile d'Yeu, 44311, Nantes, France
| | - Grégory Carrier
- IFREMER, Physiology and Biotechnology of Algae Laboratory, rue de l'Ile d'Yeu, 44311, Nantes, France
| |
Collapse
|
32
|
Berthelier J, Casse N, Daccord N, Jamilloux V, Saint-Jean B, Carrier G. A transposable element annotation pipeline and expression analysis reveal potentially active elements in the microalga Tisochrysis lutea. BMC Genomics 2018; 19:378. [PMID: 29783941 PMCID: PMC5963040 DOI: 10.1186/s12864-018-4763-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 05/07/2018] [Indexed: 01/01/2023] Open
Abstract
Background Transposable elements (TEs) are mobile DNA sequences known as drivers of genome evolution. Their impacts have been widely studied in animals, plants and insects, but little is known about them in microalgae. In a previous study, we compared the genetic polymorphisms between strains of the haptophyte microalga Tisochrysis lutea and suggested the involvement of active autonomous TEs in their genome evolution. Results To identify potentially autonomous TEs, we designed a pipeline named PiRATE (Pipeline to Retrieve and Annotate Transposable Elements, download: 10.17882/51795), and conducted an accurate TE annotation on a new genome assembly of T. lutea. PiRATE is composed of detection, classification and annotation steps. Its detection step combines multiple, existing analysis packages representing all major approaches for TE detection and its classification step was optimized for microalgal genomes. The efficiency of the detection and classification steps was evaluated with data on the model species Arabidopsis thaliana. PiRATE detected 81% of the TE families of A. thaliana and correctly classified 75% of them. We applied PiRATE to T. lutea genomic data and established that its genome contains 15.89% Class I and 4.95% Class II TEs. In these, 3.79 and 17.05% correspond to potentially autonomous and non-autonomous TEs, respectively. Annotation data was combined with transcriptomic and proteomic data to identify potentially active autonomous TEs. We identified 17 expressed TE families and, among these, a TIR/Mariner and a TIR/hAT family were able to synthesize their transposase. Both these TE families were among the three highest expressed genes in a previous transcriptomic study and are composed of highly similar copies throughout the genome of T. lutea. This sum of evidence reveals that both these TE families could be capable of transposing or triggering the transposition of potential related MITE elements. Conclusion This manuscript provides an example of a de novo transposable element annotation of a non-model organism characterized by a fragmented genome assembly and belonging to a poorly studied phylum at genomic level. Integration of multi-omics data enabled the discovery of potential mobile TEs and opens the way for new discoveries on the role of these repeated elements in genomic evolution of microalgae. Electronic supplementary material The online version of this article (10.1186/s12864-018-4763-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jérémy Berthelier
- IFREMER, Physiology and Biotechnology of Algae Laboratory, rue de l'Ile d'Yeu, 44311, Nantes, France. .,Mer Molécules Santé, EA 2160 IUML - FR 3473 CNRS, Le Mans University, Le Mans, France.
| | - Nathalie Casse
- Mer Molécules Santé, EA 2160 IUML - FR 3473 CNRS, Le Mans University, Le Mans, France
| | - Nicolas Daccord
- Institut de Recherche en Horticulture et Semences, INRA of Angers, AGROCAMPUS-Ouest, SFR4207 QUASAV, Université d'Angers, Angers, France.,Université Bretagne Loire, Angers, France
| | | | - Bruno Saint-Jean
- IFREMER, Physiology and Biotechnology of Algae Laboratory, rue de l'Ile d'Yeu, 44311, Nantes, France
| | - Grégory Carrier
- IFREMER, Physiology and Biotechnology of Algae Laboratory, rue de l'Ile d'Yeu, 44311, Nantes, France
| |
Collapse
|
33
|
Same-Sex Twin Pair Phenotypic Correlations are Consistent with Human Y Chromosome Promoting Phenotypic Heterogeneity. Evol Biol 2018. [DOI: 10.1007/s11692-018-9454-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
34
|
Integrity, standards, and QC-related issues with big data in pre-clinical drug discovery. Biochem Pharmacol 2018; 152:84-93. [PMID: 29551586 DOI: 10.1016/j.bcp.2018.03.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Accepted: 03/13/2018] [Indexed: 11/21/2022]
Abstract
The tremendous expansion of data analytics and public and private big datasets presents an important opportunity for pre-clinical drug discovery and development. In the field of life sciences, the growth of genetic, genomic, transcriptomic and proteomic data is partly driven by a rapid decline in experimental costs as biotechnology improves throughput, scalability, and speed. Yet far too many researchers tend to underestimate the challenges and consequences involving data integrity and quality standards. Given the effect of data integrity on scientific interpretation, these issues have significant implications during preclinical drug development. We describe standardized approaches for maximizing the utility of publicly available or privately generated biological data and address some of the common pitfalls. We also discuss the increasing interest to integrate and interpret cross-platform data. Principles outlined here should serve as a useful broad guide for existing analytical practices and pipelines and as a tool for developing additional insights into therapeutics using big data.
Collapse
|
35
|
Li C, Lin F, An D, Wang W, Huang R. Genome Sequencing and Assembly by Long Reads in Plants. Genes (Basel) 2017; 9:E6. [PMID: 29283420 PMCID: PMC5793159 DOI: 10.3390/genes9010006] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 12/18/2017] [Accepted: 12/18/2017] [Indexed: 11/17/2022] Open
Abstract
Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists' projects.
Collapse
Affiliation(s)
- Changsheng Li
- College of Agronomy, Shenyang Agricultural University, 120 Dongling Road, Shenyang 110866, China.
| | - Feng Lin
- College of Bioscience and Biotechnology, Shenyang Agricultural University, 120 Dongling Road, Shenyang 110866, China.
| | - Dong An
- School of Agriculture and Biology, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai 200240, China.
| | - Wenqin Wang
- School of Agriculture and Biology, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai 200240, China.
| | - Ruidong Huang
- College of Agronomy, Shenyang Agricultural University, 120 Dongling Road, Shenyang 110866, China.
| |
Collapse
|
36
|
Gan HM, Lee YP, Austin CM. Nanopore Long-Read Guided Complete Genome Assembly of Hydrogenophaga intermedia, and Genomic Insights into 4-Aminobenzenesulfonate, p-Aminobenzoic Acid and Hydrogen Metabolism in the Genus Hydrogenophaga. Front Microbiol 2017; 8:1880. [PMID: 29046667 PMCID: PMC5632844 DOI: 10.3389/fmicb.2017.01880] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 09/14/2017] [Indexed: 11/13/2022] Open
Abstract
We improved upon the previously reported draft genome of Hydrogenophaga intermedia strain PBC, a 4-aminobenzenesulfonate-degrading bacterium, by supplementing the assembly with Nanopore long reads which enabled the reconstruction of the genome as a single contig. From the complete genome, major genes responsible for the catabolism of 4-aminobenzenesulfonate in strain PBC are clustered in two distinct genomic regions. Although the catabolic genes for 4-sulfocatechol, the deaminated product of 4-aminobenzenesulfonate, are only found in H. intermedia, the sad operon responsible for the first deamination step of 4-aminobenzenesulfonate is conserved in various Hydrogenophaga strains. The absence of pabB gene in the complete genome of H. intermedia PBC is consistent with its p-aminobenzoic acid (pABA) auxotrophy but surprisingly comparative genomics analysis of 14 Hydrogenophaga genomes indicate that pABA auxotrophy is not an uncommon feature among members of this genus. Of even more interest, several Hydrogenophaga strains do not possess the genomic potential for hydrogen oxidation, calling for a revision to the taxonomic description of Hydrogenophaga as "hydrogen eating bacteria."
Collapse
Affiliation(s)
- Han M Gan
- Centre for Integrative Ecology, School of Life and Environmental Sciences, Deakin University, Geelong, VIC, Australia.,Genomics Facility, Tropical Medicine and Biology Platform, Monash University Malaysia, Bandar Sunway, Malaysia.,School of Science, Monash University Malaysia, Bandar Sunway, Malaysia
| | - Yin P Lee
- Genomics Facility, Tropical Medicine and Biology Platform, Monash University Malaysia, Bandar Sunway, Malaysia.,School of Science, Monash University Malaysia, Bandar Sunway, Malaysia
| | - Christopher M Austin
- Centre for Integrative Ecology, School of Life and Environmental Sciences, Deakin University, Geelong, VIC, Australia.,Genomics Facility, Tropical Medicine and Biology Platform, Monash University Malaysia, Bandar Sunway, Malaysia.,School of Science, Monash University Malaysia, Bandar Sunway, Malaysia
| |
Collapse
|
37
|
Fuentes-Pardo AP, Ruzzante DE. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations. Mol Ecol 2017; 26:5369-5406. [PMID: 28746784 DOI: 10.1111/mec.14264] [Citation(s) in RCA: 158] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Revised: 06/23/2017] [Accepted: 06/28/2017] [Indexed: 12/14/2022]
Abstract
Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology.
Collapse
|
38
|
Schmutzer T, Bolger ME, Rudd S, Chen J, Gundlach H, Arend D, Oppermann M, Weise S, Lange M, Spannagl M, Usadel B, Mayer KFX, Scholz U. Bioinformatics in the plant genomic and phenomic domain: The German contribution to resources, services and perspectives. J Biotechnol 2017; 261:37-45. [PMID: 28698099 DOI: 10.1016/j.jbiotec.2017.07.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Revised: 06/30/2017] [Accepted: 07/04/2017] [Indexed: 10/19/2022]
Abstract
Plant genetic resources are a substantial opportunity for plant breeding, preservation and maintenance of biological diversity. As part of the German Network for Bioinformatics Infrastructure (de.NBI) the German Crop BioGreenformatics Network (GCBN) focuses mainly on crop plants and provides both data and software infrastructure which are tailored to the needs of the plant research community. Our mission and key objectives include: (1) provision of transparent access to germplasm seeds, (2) the delivery of improved workflows for plant gene annotation, and (3) implementation of bioinformatics services that link genotypes and phenotypes. This review introduces the GCBN's spectrum of web-services and integrated data resources that address common research problems in the plant genomics community.
Collapse
Affiliation(s)
- Thomas Schmutzer
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Marie E Bolger
- Forschungszentrum Jülich (FZJ), Institute of Bio- and Geosciences (IBG-2) Plant Sciences, Wilhelm-Johnen-Straße, 52425 Jülich, Germany
| | - Stephen Rudd
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Jinbo Chen
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Heidrun Gundlach
- Helmholtz Zentrum München (HMGU), Plant Genome and Systems Biology (PGSB), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Markus Oppermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Stephan Weise
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Manuel Spannagl
- Helmholtz Zentrum München (HMGU), Plant Genome and Systems Biology (PGSB), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Björn Usadel
- Forschungszentrum Jülich (FZJ), Institute of Bio- and Geosciences (IBG-2) Plant Sciences, Wilhelm-Johnen-Straße, 52425 Jülich, Germany
| | - Klaus F X Mayer
- Helmholtz Zentrum München (HMGU), Plant Genome and Systems Biology (PGSB), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; School of Life Sciences Weihenstephan, Technical University of Munich, Alte Akademie 8, 85354 Freising, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany.
| |
Collapse
|
39
|
Abstract
Humans and other mammals are limited in their natural abilities to regenerate lost body parts. By contrast, many salamanders are highly regenerative and can spontaneously replace lost limbs even as adults. Because salamander limbs are anatomically similar to human limbs, knowing how they regenerate should provide important clues for regenerative medicine. Although interest in understanding the mechanics of this process has never wavered, until recently researchers have been vexed by seemingly impenetrable logistics of working with these creatures at a molecular level. Chief among the problems has been the very large size of salamander genomes, and not a single salamander genome has been fully sequenced to date. Recently the enormous gap in sequence information has been bridged by approaches that leverage mRNA as the starting point. Together with functional experimentation, these data are rapidly enabling researchers to finally uncover the molecular mechanisms underpinning the astonishing biological process of limb regeneration.
Collapse
Affiliation(s)
- Brian J Haas
- Broad Institute of Massachusetts Institute of Technology(MIT) and Harvard, Klarman Cell Observatory, 415 Main Street, Cambridge, MA 02142, USA.
| | - Jessica L Whited
- Harvard Medical School, Harvard Stem Cell Institute, and Brigham and Women's Hospital Department of Orthopedic Surgery, 60 Fenwood Road, Boston, MA 02115, USA.
| |
Collapse
|