1
|
Raymond BB, Guenzi-Tiberi P, Maréchal E, Quarmby LM. Snow alga Sanguina aurantia as revealed through de novo genome assembly and annotation. G3 (BETHESDA, MD.) 2024; 14:jkae181. [PMID: 39093299 PMCID: PMC11457085 DOI: 10.1093/g3journal/jkae181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 03/18/2024] [Accepted: 07/06/2024] [Indexed: 08/04/2024]
Abstract
To thrive on melting alpine and polar snow, some Chlorophytes produce an abundance of astaxanthin, causing red blooms, often dominated by genus Sanguina. The red cells have not been cultured, but we recently grew a green biciliate conspecific with Sanguina aurantia from a sample of watermelon snow. This culture provided source material for Oxford Nanopore Technology and Illumina sequencing. Our assembly pipeline exemplifies the value of a hybrid long- and short-read approach for the complexities of working with a culture grown from a field sample. Using bioinformatic tools, we separated assembled contigs into 2 genomic pools based on a difference in GC content (57.5 and 55.1%). We present the data as 2 assemblies of S. aurantia variants but explore other possibilities. High-throughput chromatin conformation capture analysis (Hi-C sequencing) was used to scaffold the assemblies into a 96-Mb genome designated as "A" and a 102-Mb genome designated as "B." Both assemblies are highly contiguous: genome A consists of 38 scaffolds with an N50 of 5.4 Mb, while genome B has 50 scaffolds with an N50 of 6.4 Mb. RNA sequencing was used to improve gene annotation.
Collapse
Affiliation(s)
- Breanna B Raymond
- Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, BCBC V5A 1S6, Canada
| | - Pierre Guenzi-Tiberi
- Laboratoire de Physiologie Cellulaire et Végétale, CNRS, CEA, INRAE, Université Grenoble Alpes, IRIG, CEA Grenoble, 17 Avenue des Martyrs, 38000 Grenoble, France
| | - Eric Maréchal
- Laboratoire de Physiologie Cellulaire et Végétale, CNRS, CEA, INRAE, Université Grenoble Alpes, IRIG, CEA Grenoble, 17 Avenue des Martyrs, 38000 Grenoble, France
| | - Lynne M Quarmby
- Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, BCBC V5A 1S6, Canada
| |
Collapse
|
2
|
de Ronne M, Légaré G, Belzile F, Boyle B, Torkamaneh D. 3D-GBS: a universal genotyping-by-sequencing approach for genomic selection and other high-throughput low-cost applications in species with small to medium-sized genomes. PLANT METHODS 2023; 19:13. [PMID: 36740716 PMCID: PMC9899395 DOI: 10.1186/s13007-023-00990-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
Despite the increased efficiency of sequencing technologies and the development of reduced-representation sequencing (RRS) approaches allowing high-throughput sequencing (HTS) of multiplexed samples, the per-sample genotyping cost remains the most limiting factor in the context of large-scale studies. For example, in the context of genomic selection (GS), breeders need genome-wide markers to predict the breeding value of large cohorts of progenies, requiring the genotyping of thousands candidates. Here, we introduce 3D-GBS, an optimized GBS procedure, to provide an ultra-high-throughput and ultra-low-cost genotyping solution for species with small to medium-sized genome and illustrate its use in soybean. Using a combination of three restriction enzymes (PstI/NsiI/MspI), the portion of the genome that is captured was reduced fourfold (compared to a "standard" ApeKI-based protocol) while reducing the number of markers by only 40%. By better focusing the sequencing effort on limited set of restriction fragments, fourfold more samples can be genotyped at the same minimal depth of coverage. This GBS protocol also resulted in a lower proportion of missing data and provided a more uniform distribution of SNPs across the genome. Moreover, we investigated the optimal number of reads per sample needed to obtain an adequate number of markers for GS and QTL mapping (500-1000 markers per biparental cross). This optimization allows sequencing costs to be decreased by ~ 92% and ~ 86% for GS and QTL mapping studies, respectively, compared to previously published work. Overall, 3D-GBS represents a unique and affordable solution for applications requiring extremely high-throughput genotyping where cost remains the most limiting factor.
Collapse
Affiliation(s)
- Maxime de Ronne
- Département de Phytologie, Université Laval, Quebec, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec, Canada
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Quebec, Canada
| | - Gaétan Légaré
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec, Canada
| | - François Belzile
- Département de Phytologie, Université Laval, Quebec, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec, Canada
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Quebec, Canada
| | - Brian Boyle
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec, Canada
| | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Quebec, Canada.
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec, Canada.
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Quebec, Canada.
- Institut intelligence et données (IID), Université Laval, Quebec, Canada.
| |
Collapse
|
3
|
Vu HT, Tran N, Nguyen TD, Vu QL, Bui MH, Le MT, Le L. Complete Chloroplast Genome of Paphiopedilum delenatii and Phylogenetic Relationships among Orchidaceae. PLANTS (BASEL, SWITZERLAND) 2020; 9:E61. [PMID: 31906501 PMCID: PMC7020410 DOI: 10.3390/plants9010061] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Revised: 12/23/2019] [Accepted: 12/25/2019] [Indexed: 02/05/2023]
Abstract
Paphiopedilum delenatii is a native orchid of Vietnam with highly attractive floral traits. Unfortunately, it is now listed as a critically endangered species with a few hundred individuals remaining in nature. In this study, we performed next-generation sequencing of P. delenatii and assembled its complete chloroplast genome. The whole chloroplast genome of P. delenatii was 160,955 bp in size, 35.6% of which was GC content, and exhibited typical quadripartite structure of plastid genomes with four distinct regions, including the large and small single-copy regions and a pair of inverted repeat regions. There were, in total, 130 genes annotated in the genome: 77 coding genes, 39 tRNA genes, 8 rRNA genes, and 6 pseudogenes. The loss of ndh genes and variation in inverted repeat (IR) boundaries as well as data of simple sequence repeats (SSRs) and divergent hotspots provided useful information for identification applications and phylogenetic studies of Paphiopedilum species. Whole chloroplast genomes could be used as an effective super barcode for species identification or for developing other identification markers, which subsequently serves the conservation of Paphiopedilum species.
Collapse
Affiliation(s)
- Huyen-Trang Vu
- Faculty of Biotechnology, Nguyen Tat Thanh University, District 4, Hochiminh City 72820, Vietnam; (H.-T.V.); (T.-D.N.); (M.-H.B.)
- Faculty of Biotechnology, International University-Vietnam National University, Linh Trung Ward, Thu Duc District, Hochiminh City 7000000, Vietnam; (N.T.); (M.-T.L.)
| | - Ngan Tran
- Faculty of Biotechnology, International University-Vietnam National University, Linh Trung Ward, Thu Duc District, Hochiminh City 7000000, Vietnam; (N.T.); (M.-T.L.)
| | - Thanh-Diem Nguyen
- Faculty of Biotechnology, Nguyen Tat Thanh University, District 4, Hochiminh City 72820, Vietnam; (H.-T.V.); (T.-D.N.); (M.-H.B.)
| | - Quoc-Luan Vu
- Tay Nguyen Institute for Scientific Research, Vietnam Academy of Science and Technology, Dalat 670000, Vietnam;
| | - My-Huyen Bui
- Faculty of Biotechnology, Nguyen Tat Thanh University, District 4, Hochiminh City 72820, Vietnam; (H.-T.V.); (T.-D.N.); (M.-H.B.)
| | - Minh-Tri Le
- Faculty of Biotechnology, International University-Vietnam National University, Linh Trung Ward, Thu Duc District, Hochiminh City 7000000, Vietnam; (N.T.); (M.-T.L.)
| | - Ly Le
- Faculty of Biotechnology, International University-Vietnam National University, Linh Trung Ward, Thu Duc District, Hochiminh City 7000000, Vietnam; (N.T.); (M.-T.L.)
- Vingroup Big Data Institute, Hai Ba Trung District, Hanoi 100000, Vietnam
| |
Collapse
|
4
|
Cheng CY, Tseng WL, Chang CF, Chang CH, Gau SSF. A Deep Learning Approach for Missing Data Imputation of Rating Scales Assessing Attention-Deficit Hyperactivity Disorder. Front Psychiatry 2020; 11:673. [PMID: 32765316 PMCID: PMC7379397 DOI: 10.3389/fpsyt.2020.00673] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 06/29/2020] [Indexed: 02/03/2023] Open
Abstract
A variety of tools and methods have been used to measure behavioral symptoms of attention-deficit/hyperactivity disorder (ADHD). Missing data is a major concern in ADHD behavioral studies. This study used a deep learning method to impute missing data in ADHD rating scales and evaluated the ability of the imputed dataset (i.e., the imputed data replacing the original missing values) to distinguish youths with ADHD from youths without ADHD. The data were collected from 1220 youths, 799 of whom had an ADHD diagnosis, and 421 were typically developing (TD) youths without ADHD, recruited in Northern Taiwan. Participants were assessed using the Conners' Continuous Performance Test, the Chinese versions of the Conners' rating scale-revised: short form for parent and teacher reports, and the Swanson, Nolan, and Pelham, version IV scale for parent and teacher reports. We used deep learning, with information from the original complete dataset (referred to as the reference dataset), to perform missing data imputation and generate an imputation order according to the imputed accuracy of each question. We evaluated the effectiveness of imputation using support vector machine to classify the ADHD and TD groups in the imputed dataset. The imputed dataset can classify ADHD vs. TD up to 89% accuracy, which did not differ from the classification accuracy (89%) using the reference dataset. Most of the behaviors related to oppositional behaviors rated by teachers and hyperactivity/impulsivity rated by both parents and teachers showed high discriminatory accuracy to distinguish ADHD from non-ADHD. Our findings support a deep learning solution for missing data imputation without introducing bias to the data.
Collapse
Affiliation(s)
- Chung-Yuan Cheng
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.,Department of Psychiatry, National Taiwan University Hospital and College of Medicine, Taipei, Taiwan
| | - Wan-Ling Tseng
- Child Study Center, Yale University School of Medicine, New Haven, CT, United States
| | - Ching-Fen Chang
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Chuan-Hsiung Chang
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Susan Shur-Fen Gau
- Department of Psychiatry, National Taiwan University Hospital and College of Medicine, Taipei, Taiwan.,Graduate Institute of Brain and Mind Sciences, and Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|