1
|
Li L, Sun M, Wang J, Wan S. Multi-omics based artificial intelligence for cancer research. Adv Cancer Res 2024; 163:303-356. [PMID: 39271266 DOI: 10.1016/bs.acr.2024.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Abstract
With significant advancements of next generation sequencing technologies, large amounts of multi-omics data, including genomics, epigenomics, transcriptomics, proteomics, and metabolomics, have been accumulated, offering an unprecedented opportunity to explore the heterogeneity and complexity of cancer across various molecular levels and scales. One of the promising aspects of multi-omics lies in its capacity to offer a holistic view of the biological networks and pathways underpinning cancer, facilitating a deeper understanding of its development, progression, and response to treatment. However, the exponential growth of data generated by multi-omics studies present significant analytical challenges. Processing, analyzing, integrating, and interpreting these multi-omics datasets to extract meaningful insights is an ambitious task that stands at the forefront of current cancer research. The application of artificial intelligence (AI) has emerged as a powerful solution to these challenges, demonstrating exceptional capabilities in deciphering complex patterns and extracting valuable information from large-scale, intricate omics datasets. This review delves into the synergy of AI and multi-omics, highlighting its revolutionary impact on oncology. We dissect how this confluence is reshaping the landscape of cancer research and clinical practice, particularly in the realms of early detection, diagnosis, prognosis, treatment and pathology. Additionally, we elaborate the latest AI methods for multi-omics integration to provide a comprehensive insight of the complex biological mechanisms and inherent heterogeneity of cancer. Finally, we discuss the current challenges of data harmonization, algorithm interpretability, and ethical considerations. Addressing these challenges necessitates a multidisciplinary collaboration, paving the promising way for more precise, personalized, and effective treatments for cancer patients.
Collapse
Affiliation(s)
- Lusheng Li
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States
| | - Mengtao Sun
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States
| | - Jieqiong Wang
- Department of Neurological Sciences, University of Nebraska Medical Center, Omaha, NE, United States
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States.
| |
Collapse
|
2
|
Wu H, Yang Z, Wang J, Bu Y, Wang Y, Xu K, Li J, Yan C, Liu D, Han Y. Exploring shared therapeutic targets in diabetic cardiomyopathy and diabetic foot ulcers through bioinformatics analysis. Sci Rep 2024; 14:230. [PMID: 38168477 PMCID: PMC10761883 DOI: 10.1038/s41598-023-50954-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 12/28/2023] [Indexed: 01/05/2024] Open
Abstract
Advanced diabetic cardiomyopathy (DCM) patients are often accompanied by severe peripheral artery disease. For patients with DCM combined with diabetic foot ulcer (DFU), there are currently no good therapeutic targets and drugs. Here, we investigated the underlying network of molecular actions associated with the occurrence of these two complications. The datasets were downloaded from the Gene Expression Omnibus (GEO) database. We performed enrichment and protein-protein interaction analyses, and screened for hub genes. Construct transcription factors (TFs) and microRNAs regulatory networks for validated hub genes. Finally, drug prediction and molecular docking verification were performed. We identified 299 common differentially expressed genes (DEGs), many of which were involved in inflammation and lipid metabolism. 6 DEGs were identified as hub genes (PPARG, JUN, SLC2A1, CD4, SCARB1 and SERPINE1). These 6 hub genes were associated with inflammation and immune response. We identified 31 common TFs and 2 key miRNAs closely related to hub genes. Interestingly, our study suggested that fenofibrate, a lipid-lowering medication, holds promise as a potential treatment for DCM combined with DFU due to its stable binding to the identified hub genes. Here, we revealed a network involves a common target for DCM and DFU. Understanding these networks and hub genes is pivotal for advancing our comprehension of the multifaceted complications of diabetes and facilitating the development of future therapeutic interventions.
Collapse
Affiliation(s)
- Hanlin Wu
- Dalian Medical University, Dalian, 116044, Liaoning Province, China
- State Key Laboratory of Frigid Zone Cardiovascular Diseases, Department of Cardiology and Cardiovascular Research Institute, General Hospital of Northern Theater Command, Wenhua Road 83, Shenyang, 110016, Liaoning Province, China
| | - Zheming Yang
- State Key Laboratory of Frigid Zone Cardiovascular Diseases, Department of Cardiology and Cardiovascular Research Institute, General Hospital of Northern Theater Command, Wenhua Road 83, Shenyang, 110016, Liaoning Province, China
| | - Jing Wang
- State Key Laboratory of Frigid Zone Cardiovascular Diseases, Department of Cardiology and Cardiovascular Research Institute, General Hospital of Northern Theater Command, Wenhua Road 83, Shenyang, 110016, Liaoning Province, China
| | - Yuxin Bu
- State Key Laboratory of Frigid Zone Cardiovascular Diseases, Department of Cardiology and Cardiovascular Research Institute, General Hospital of Northern Theater Command, Wenhua Road 83, Shenyang, 110016, Liaoning Province, China
| | - Yani Wang
- State Key Laboratory of Frigid Zone Cardiovascular Diseases, Department of Cardiology and Cardiovascular Research Institute, General Hospital of Northern Theater Command, Wenhua Road 83, Shenyang, 110016, Liaoning Province, China
| | - Kai Xu
- State Key Laboratory of Frigid Zone Cardiovascular Diseases, Department of Cardiology and Cardiovascular Research Institute, General Hospital of Northern Theater Command, Wenhua Road 83, Shenyang, 110016, Liaoning Province, China
| | - Jing Li
- State Key Laboratory of Frigid Zone Cardiovascular Diseases, Department of Cardiology and Cardiovascular Research Institute, General Hospital of Northern Theater Command, Wenhua Road 83, Shenyang, 110016, Liaoning Province, China
| | - Chenghui Yan
- State Key Laboratory of Frigid Zone Cardiovascular Diseases, Department of Cardiology and Cardiovascular Research Institute, General Hospital of Northern Theater Command, Wenhua Road 83, Shenyang, 110016, Liaoning Province, China
| | - Dan Liu
- State Key Laboratory of Frigid Zone Cardiovascular Diseases, Department of Cardiology and Cardiovascular Research Institute, General Hospital of Northern Theater Command, Wenhua Road 83, Shenyang, 110016, Liaoning Province, China.
| | - Yaling Han
- State Key Laboratory of Frigid Zone Cardiovascular Diseases, Department of Cardiology and Cardiovascular Research Institute, General Hospital of Northern Theater Command, Wenhua Road 83, Shenyang, 110016, Liaoning Province, China.
| |
Collapse
|
3
|
Chen C, Wang J, Pan D, Wang X, Xu Y, Yan J, Wang L, Yang X, Yang M, Liu G. Applications of multi-omics analysis in human diseases. MedComm (Beijing) 2023; 4:e315. [PMID: 37533767 PMCID: PMC10390758 DOI: 10.1002/mco2.315] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/25/2023] [Accepted: 05/31/2023] [Indexed: 08/04/2023] Open
Abstract
Multi-omics usually refers to the crossover application of multiple high-throughput screening technologies represented by genomics, transcriptomics, single-cell transcriptomics, proteomics and metabolomics, spatial transcriptomics, and so on, which play a great role in promoting the study of human diseases. Most of the current reviews focus on describing the development of multi-omics technologies, data integration, and application to a particular disease; however, few of them provide a comprehensive and systematic introduction of multi-omics. This review outlines the existing technical categories of multi-omics, cautions for experimental design, focuses on the integrated analysis methods of multi-omics, especially the approach of machine learning and deep learning in multi-omics data integration and the corresponding tools, and the application of multi-omics in medical researches (e.g., cancer, neurodegenerative diseases, aging, and drug target discovery) as well as the corresponding open-source analysis tools and databases, and finally, discusses the challenges and future directions of multi-omics integration and application in precision medicine. With the development of high-throughput technologies and data integration algorithms, as important directions of multi-omics for future disease research, single-cell multi-omics and spatial multi-omics also provided a detailed introduction. This review will provide important guidance for researchers, especially who are just entering into multi-omics medical research.
Collapse
Affiliation(s)
- Chongyang Chen
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
| | - Jing Wang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Donghui Pan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xinyu Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Yuping Xu
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Junjie Yan
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Lizhen Wang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Xifei Yang
- Shenzhen Key Laboratory of Modern ToxicologyShenzhen Medical Key Discipline of Health Toxicology (2020–2024)Shenzhen Center for Disease Control and PreventionShenzhenChina
| | - Min Yang
- Key Laboratory of Nuclear MedicineMinistry of HealthJiangsu Key Laboratory of Molecular Nuclear MedicineJiangsu Institute of Nuclear MedicineWuxiChina
| | - Gong‐Ping Liu
- Co‐innovation Center of NeurodegenerationNantong UniversityNantongChina
- Department of PathophysiologySchool of Basic MedicineKey Laboratory of Ministry of Education of China and Hubei Province for Neurological DisordersTongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
4
|
Dotolo S, Esposito Abate R, Roma C, Guido D, Preziosi A, Tropea B, Palluzzi F, Giacò L, Normanno N. Bioinformatics: From NGS Data to Biological Complexity in Variant Detection and Oncological Clinical Practice. Biomedicines 2022; 10:biomedicines10092074. [PMID: 36140175 PMCID: PMC9495893 DOI: 10.3390/biomedicines10092074] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/12/2022] [Accepted: 08/22/2022] [Indexed: 11/22/2022] Open
Abstract
The use of next-generation sequencing (NGS) techniques for variant detection has become increasingly important in clinical research and in clinical practice in oncology. Many cancer patients are currently being treated in clinical practice or in clinical trials with drugs directed against specific genomic alterations. In this scenario, the development of reliable and reproducible bioinformatics tools is essential to derive information on the molecular characteristics of each patient’s tumor from the NGS data. The development of bioinformatics pipelines based on the use of machine learning and statistical methods is even more relevant for the determination of complex biomarkers. In this review, we describe some important technologies, computational algorithms and models that can be applied to NGS data from Whole Genome to Targeted Sequencing, to address the problem of finding complex cancer-associated biomarkers. In addition, we explore the future perspectives and challenges faced by bioinformatics for precision medicine both at a molecular and clinical level, with a focus on an emerging complex biomarker such as homologous recombination deficiency (HRD).
Collapse
Affiliation(s)
- Serena Dotolo
- Cell Biology and Biotherapy Unit, Istituto Nazionale Tumori—IRCCS—Fondazione G. Pascale, 80131 Naples, Italy
| | - Riziero Esposito Abate
- Cell Biology and Biotherapy Unit, Istituto Nazionale Tumori—IRCCS—Fondazione G. Pascale, 80131 Naples, Italy
| | - Cristin Roma
- Cell Biology and Biotherapy Unit, Istituto Nazionale Tumori—IRCCS—Fondazione G. Pascale, 80131 Naples, Italy
| | - Davide Guido
- Bioinformatics Research Core Facility, Gemelli Science and Technology Park (GSTeP), Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Largo A. Gemelli, 8, 00168 Rome, Italy
| | - Alessia Preziosi
- Bioinformatics Research Core Facility, Gemelli Science and Technology Park (GSTeP), Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Largo A. Gemelli, 8, 00168 Rome, Italy
| | - Beatrice Tropea
- Bioinformatics Research Core Facility, Gemelli Science and Technology Park (GSTeP), Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Largo A. Gemelli, 8, 00168 Rome, Italy
| | - Fernando Palluzzi
- Bioinformatics Research Core Facility, Gemelli Science and Technology Park (GSTeP), Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Largo A. Gemelli, 8, 00168 Rome, Italy
| | - Luciano Giacò
- Bioinformatics Research Core Facility, Gemelli Science and Technology Park (GSTeP), Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Largo A. Gemelli, 8, 00168 Rome, Italy
| | - Nicola Normanno
- Cell Biology and Biotherapy Unit, Istituto Nazionale Tumori—IRCCS—Fondazione G. Pascale, 80131 Naples, Italy
- Correspondence:
| |
Collapse
|
5
|
Alharbi WS, Rashid M. A review of deep learning applications in human genomics using next-generation sequencing data. Hum Genomics 2022; 16:26. [PMID: 35879805 PMCID: PMC9317091 DOI: 10.1186/s40246-022-00396-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 07/12/2022] [Indexed: 12/02/2022] Open
Abstract
Genomics is advancing towards data-driven science. Through the advent of high-throughput data generating technologies in human genomics, we are overwhelmed with the heap of genomic data. To extract knowledge and pattern out of this genomic data, artificial intelligence especially deep learning methods has been instrumental. In the current review, we address development and application of deep learning methods/models in different subarea of human genomics. We assessed over- and under-charted area of genomics by deep learning techniques. Deep learning algorithms underlying the genomic tools have been discussed briefly in later part of this review. Finally, we discussed briefly about the late application of deep learning tools in genomic. Conclusively, this review is timely for biotechnology or genomic scientists in order to guide them why, when and how to use deep learning methods to analyse human genomic data.
Collapse
Affiliation(s)
- Wardah S Alharbi
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia
| | - Mamoon Rashid
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia.
| |
Collapse
|
6
|
Bao S, Wang X, Li M, Gao Z, Zheng D, Shen D, Liu L. Potential of Mitochondrial Ribosomal Genes as Cancer Biomarkers Demonstrated by Bioinformatics Results. Front Oncol 2022; 12:835549. [PMID: 35719986 PMCID: PMC9204274 DOI: 10.3389/fonc.2022.835549] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 04/27/2022] [Indexed: 12/15/2022] Open
Abstract
Next-generation sequencing and bioinformatics analyses have clearly revealed the roles of mitochondrial ribosomal genes in cancer development. Mitochondrial ribosomes are composed of three RNA components encoded by mitochondrial DNA and 82 specific protein components encoded by nuclear DNA. They synthesize mitochondrial inner membrane oxidative phosphorylation (OXPHOS)-related proteins and participate in various biological activities via the regulation of energy metabolism and apoptosis. Mitochondrial ribosomal genes are strongly associated with clinical features such as prognosis and foci metastasis in patients with cancer. Accordingly, mitochondrial ribosomes have become an important focus of cancer research. We review recent advances in bioinformatics research that have explored the link between mitochondrial ribosomes and cancer, with a focus on the potential of mitochondrial ribosomal genes as biomarkers in cancer.
Collapse
Affiliation(s)
- Shunchao Bao
- Department of Radiotherapy, Second Hospital of Jilin University, Changchun, China
| | - Xinyu Wang
- Department of Breast Surgery, Second Hospital of Jilin University, Changchun, China
| | - Mo Li
- Department of Radiotherapy, Second Hospital of Jilin University, Changchun, China
| | - Zhao Gao
- Nuclear Medicine Department, Second Hospital of Jilin University, Changchun, China
| | - Dongdong Zheng
- Department of Cardiovascular Surgery, Second Hospital of Jilin University, Changchun, China
| | - Dihan Shen
- Medical Research Center, Second Hospital of Jilin University, Changchun, China
| | - Linlin Liu
- Department of Radiotherapy, Second Hospital of Jilin University, Changchun, China
| |
Collapse
|
7
|
Grama SB, Liu Z, Li J. Emerging Trends in Genetic Engineering of Microalgae for Commercial Applications. Mar Drugs 2022; 20:285. [PMID: 35621936 PMCID: PMC9143385 DOI: 10.3390/md20050285] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 04/15/2022] [Accepted: 04/19/2022] [Indexed: 02/04/2023] Open
Abstract
Recently, microalgal biotechnology has received increasing interests in producing valuable, sustainable and environmentally friendly bioproducts. The development of economically viable production processes entails resolving certain limitations of microalgal biotechnology, and fast evolving genetic engineering technologies have emerged as new tools to overcome these limitations. This review provides a synopsis of recent progress, current trends and emerging approaches of genetic engineering of microalgae for commercial applications, including production of pharmaceutical protein, lipid, carotenoids and biohydrogen, etc. Photochemistry improvement in microalgae and CO2 sequestration by microalgae via genetic engineering were also discussed since these subjects are closely entangled with commercial production of the above mentioned products. Although genetic engineering of microalgae is proved to be very effective in boosting performance of production in laboratory conditions, only limited success was achieved to be applicable to industry so far. With genetic engineering technologies advancing rapidly and intensive investigations going on, more bioproducts are expected to be produced by genetically modified microalgae and even much more to be prospected.
Collapse
Affiliation(s)
- Samir B. Grama
- Laboratory of Natural Substances, Biomolecules and Biotechnological Applications, University of Oum El Bouaghi, Oum El Bouaghi 04000, Algeria;
| | - Zhiyuan Liu
- College of Marine Sciences, Hainan University, Haikou 570228, China;
| | - Jian Li
- College of Agricultural Sciences, Panzhihua University, Panzhihua 617000, China
| |
Collapse
|
8
|
Monaco A, Pantaleo E, Amoroso N, Lacalamita A, Lo Giudice C, Fonzino A, Fosso B, Picardi E, Tangaro S, Pesole G, Bellotti R. A primer on machine learning techniques for genomic applications. Comput Struct Biotechnol J 2021; 19:4345-4359. [PMID: 34429852 PMCID: PMC8365460 DOI: 10.1016/j.csbj.2021.07.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 07/23/2021] [Accepted: 07/23/2021] [Indexed: 11/28/2022] Open
Abstract
High throughput sequencing technologies have enabled the study of complex biological aspects at single nucleotide resolution, opening the big data era. The analysis of large volumes of heterogeneous "omic" data, however, requires novel and efficient computational algorithms based on the paradigm of Artificial Intelligence. In the present review, we introduce and describe the most common machine learning methodologies, and lately deep learning, applied to a variety of genomics tasks, trying to emphasize capabilities, strengths and limitations through a simple and intuitive language. We highlight the power of the machine learning approach in handling big data by means of a real life example, and underline how described methods could be relevant in all cases in which large amounts of multimodal genomic data are available.
Collapse
Affiliation(s)
- Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy
| | - Ester Pantaleo
- Dipartimento Interateneo di Fisica "M. Merlin", Università degli Studi di Bari "Aldo Moro", Via G. Amendola 173, 70125 Bari, Italy
| | - Nicola Amoroso
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento di Farmacia - Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Antonio Lacalamita
- National Institute of Gastroenterology "S. de Bellis", Research Hospital, 70013 Castellana Grotte (Bari), Italy
| | - Claudio Lo Giudice
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Adriano Fonzino
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Bruno Fosso
- Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Ernesto Picardi
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy.,Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Sabina Tangaro
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari "Aldo Moro", Bari, Via G. Amendola 165, 70125 Bari, Italy
| | - Graziano Pesole
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy.,Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento Interateneo di Fisica "M. Merlin", Università degli Studi di Bari "Aldo Moro", Via G. Amendola 173, 70125 Bari, Italy
| |
Collapse
|
9
|
de Vries JJC, Brown JR, Couto N, Beer M, Le Mercier P, Sidorov I, Papa A, Fischer N, Oude Munnink BB, Rodriquez C, Zaheri M, Sayiner A, Hönemann M, Cataluna AP, Carbo EC, Bachofen C, Kubacki J, Schmitz D, Tsioka K, Matamoros S, Höper D, Hernandez M, Puchhammer-Stöckl E, Lebrand A, Huber M, Simmonds P, Claas ECJ, López-Labrador FX. Recommendations for the introduction of metagenomic next-generation sequencing in clinical virology, part II: bioinformatic analysis and reporting. J Clin Virol 2021; 138:104812. [PMID: 33819811 DOI: 10.1016/j.jcv.2021.104812] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 03/20/2021] [Indexed: 12/11/2022]
Abstract
Metagenomic next-generation sequencing (mNGS) is an untargeted technique for determination of microbial DNA/RNA sequences in a variety of sample types from patients with infectious syndromes. mNGS is still in its early stages of broader translation into clinical applications. To further support the development, implementation, optimization and standardization of mNGS procedures for virus diagnostics, the European Society for Clinical Virology (ESCV) Network on Next-Generation Sequencing (ENNGS) has been established. The aim of ENNGS is to bring together professionals involved in mNGS for viral diagnostics to share methodologies and experiences, and to develop application guidelines. Following the ENNGS publication Recommendations for the introduction of mNGS in clinical virology, part I: wet lab procedure in this journal, the current manuscript aims to provide practical recommendations for the bioinformatic analysis of mNGS data and reporting of results to clinicians.
Collapse
Affiliation(s)
- Jutte J C de Vries
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | - Julianne R Brown
- Microbiology, Virology and Infection Prevention & Control, Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom.
| | - Natacha Couto
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom.
| | - Martin Beer
- Friedrich-Loeffler-Institute, Institute of Diagnostic Virology, Greifswald, Germany.
| | | | - Igor Sidorov
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | - Anna Papa
- Department of Microbiology, Medical School, Aristotle University of Thessaloniki, Greece.
| | - Nicole Fischer
- University Medical Center Hamburg-Eppendorf, UKE Institute for Medical Microbiology, Virology and Hygiene, Germany.
| | | | - Christophe Rodriquez
- Department of Virology, University hospital Henri Mondor, Assistance Public des Hopitaux de Paris, Créteil, France.
| | - Maryam Zaheri
- Institute of Medical Virology, University of Zurich, Switzerland.
| | - Arzu Sayiner
- Dokuz Eylul University, Medical Faculty, Department of Medical Microbiology, Izmir, Turkey.
| | - Mario Hönemann
- Institute of Virology, Leipzig University, Leipzig, Germany.
| | - Alba Perez Cataluna
- Department of Preservation and Food Safety Technologies, IATA-CSIC, Paterna, Valencia, Spain.
| | - Ellen C Carbo
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | | | - Jakub Kubacki
- Institute of Virology, University of Zurich, Switzerland.
| | - Dennis Schmitz
- RIVM National Institute for Public Health and Environment, Bilthoven, the Netherlands.
| | - Katerina Tsioka
- Department of Microbiology, Medical School, Aristotle University of Thessaloniki, Greece.
| | - Sébastien Matamoros
- Medical Microbiology and Infection Control, Amsterdam UMC, Amsterdam, the Netherlands.
| | - Dirk Höper
- Friedrich-Loeffler-Institute, Institute of Diagnostic Virology, Greifswald, Germany.
| | - Marta Hernandez
- Laboratory of Molecular Biology and Microbiology, Instituto Tecnologico Agrario de Castilla y Leon, Valladolid, Spain.
| | | | | | - Michael Huber
- Institute of Medical Virology, University of Zurich, Switzerland.
| | - Peter Simmonds
- Nuffield Department of Medicine, University of Oxford, Oxford, UK.
| | - Eric C J Claas
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | - F Xavier López-Labrador
- Virology Laboratory, Genomics and Health Area, Centre for Public Health Research (FISABIO-Public Health), Valencia, Spain; Department of Microbiology, Medical School, University of Valencia, Spain; CIBERESP, Instituto de Salud Carlos III, Madrid, Spain.
| | | |
Collapse
|
10
|
Sysoev M, Grötzinger SW, Renn D, Eppinger J, Rueping M, Karan R. Bioprospecting of Novel Extremozymes From Prokaryotes-The Advent of Culture-Independent Methods. Front Microbiol 2021; 12:630013. [PMID: 33643258 PMCID: PMC7902512 DOI: 10.3389/fmicb.2021.630013] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 01/21/2021] [Indexed: 12/20/2022] Open
Abstract
Extremophiles are remarkable organisms that thrive in the harshest environments on Earth, such as hydrothermal vents, hypersaline lakes and pools, alkaline soda lakes, deserts, cold oceans, and volcanic areas. These organisms have developed several strategies to overcome environmental stress and nutrient limitations. Thus, they are among the best model organisms to study adaptive mechanisms that lead to stress tolerance. Genetic and structural information derived from extremophiles and extremozymes can be used for bioengineering other nontolerant enzymes. Furthermore, extremophiles can be a valuable resource for novel biotechnological and biomedical products due to their biosynthetic properties. However, understanding life under extreme conditions is challenging due to the difficulties of in vitro cultivation and observation since > 99% of organisms cannot be cultivated. Consequently, only a minor percentage of the potential extremophiles on Earth have been discovered and characterized. Herein, we present a review of culture-independent methods, sequence-based metagenomics (SBM), and single amplified genomes (SAGs) for studying enzymes from extremophiles, with a focus on prokaryotic (archaea and bacteria) microorganisms. Additionally, we provide a comprehensive list of extremozymes discovered via metagenomics and SAGs.
Collapse
Affiliation(s)
- Maksim Sysoev
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Stefan W. Grötzinger
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Dominik Renn
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Jörg Eppinger
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Institute for Experimental Molecular Imaging, University Clinic, RWTH Aachen University, Aachen, Germany
| | - Magnus Rueping
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Institute for Experimental Molecular Imaging, University Clinic, RWTH Aachen University, Aachen, Germany
| | - Ram Karan
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
11
|
Kotlarz K, Mielczarek M, Suchocki T, Czech B, Guldbrandtsen B, Szyda J. The application of deep learning for the classification of correct and incorrect SNP genotypes from whole-genome DNA sequencing pipelines. J Appl Genet 2020; 61:607-616. [PMID: 32996082 PMCID: PMC7652806 DOI: 10.1007/s13353-020-00586-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 09/11/2020] [Accepted: 09/18/2020] [Indexed: 11/18/2022]
Abstract
A downside of next-generation sequencing technology is the high technical error rate. We built a tool, which uses array-based genotype information to classify next-generation sequencing–based SNPs into the correct and the incorrect calls. The deep learning algorithms were implemented via Keras. Several algorithms were tested: (i) the basic, naïve algorithm, (ii) the naïve algorithm modified by pre-imposing different weights on incorrect and correct SNP class in calculating the loss metric and (iii)–(v) the naïve algorithm modified by random re-sampling (with replacement) of the incorrect SNPs to match 30%/60%/100% of the number of correct SNPs. The training data set was composed of data from three bulls and consisted of 2,227,995 correct (97.94%) and 46,920 incorrect SNPs, while the validation data set consisted of data from one bull with 749,506 correct (98.05%) and 14,908 incorrect SNPs. The results showed that for a rare event classification problem, like incorrect SNP detection in NGS data, the most parsimonious naïve model and a model with the weighting of SNP classes provided the best results for the classification of the validation data set. Both classified 19% of truly incorrect SNPs as incorrect and 99% of truly correct SNPs as correct and resulted in the F1 score of 0.21 — the highest among the compared algorithms. We conclude the basic models were less adapted to the specificity of a training data set and thus resulted in better classification of the independent, validation data set, than the other tested models.
Collapse
Affiliation(s)
- Krzysztof Kotlarz
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland
| | - Magda Mielczarek
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland.,Institute of Animal Breeding, Balice, Poland
| | - Tomasz Suchocki
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland.,Institute of Animal Breeding, Balice, Poland
| | - Bartosz Czech
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland
| | - Bernt Guldbrandtsen
- Animal Breeding Group, Department of Animal Sciences, University of Bonn, Bonn, Germany
| | - Joanna Szyda
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland. .,Institute of Animal Breeding, Balice, Poland.
| |
Collapse
|
12
|
Potgieter L, Feurtey A, Dutheil JY, Stukenbrock EH. On Variant Discovery in Genomes of Fungal Plant Pathogens. Front Microbiol 2020; 11:626. [PMID: 32373089 PMCID: PMC7176817 DOI: 10.3389/fmicb.2020.00626] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 03/19/2020] [Indexed: 11/13/2022] Open
Abstract
Comparative genome analyses of eukaryotic pathogens including fungi and oomycetes have revealed extensive variability in genome composition and structure. The genomes of individuals from the same population can exhibit different numbers of chromosomes and different organization of chromosomal segments, defining so-called accessory compartments that have been shown to be crucial to pathogenicity in plant-infecting fungi. This high level of structural variation confers a methodological challenge for population genomic analyses. Variant discovery from population sequencing data is typically achieved using established pipelines based on the mapping of short reads to a reference genome. These pipelines have been developed, and extensively used, for eukaryote genomes of both plants and animals, to retrieve single nucleotide polymorphisms and short insertions and deletions. However, they do not permit the inference of large-scale genomic structural variation, as this task typically requires the alignment of complete genome sequences. Here, we compare traditional variant discovery approaches to a pipeline based on de novo genome assembly of short read data followed by whole genome alignment, using simulated data sets with properties mimicking that of fungal pathogen genomes. We show that the latter approach exhibits levels of performance comparable to that of read-mapping based methodologies, when used on sequence data with sufficient coverage. We argue that this approach further allows additional types of genomic diversity to be explored, in particular as long-read third-generation sequencing technologies are becoming increasingly available to generate population genomic data.
Collapse
Affiliation(s)
- Lizel Potgieter
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany
| | - Alice Feurtey
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Julien Y. Dutheil
- Molecular Systems Evolution, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Eva H. Stukenbrock
- Environmental Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany
| |
Collapse
|
13
|
HRCM: An Efficient Hybrid Referential Compression Method for Genomic Big Data. BIOMED RESEARCH INTERNATIONAL 2020; 2019:3108950. [PMID: 31915686 PMCID: PMC6930768 DOI: 10.1155/2019/3108950] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Revised: 09/14/2019] [Accepted: 10/22/2019] [Indexed: 12/22/2022]
Abstract
With the maturity of genome sequencing technology, huge amounts of sequence reads as well as assembled genomes are generating. With the explosive growth of genomic data, the storage and transmission of genomic data are facing enormous challenges. FASTA, as one of the main storage formats for genome sequences, is widely used in the Gene Bank because it eases sequence analysis and gene research and is easy to be read. Many compression methods for FASTA genome sequences have been proposed, but they still have room for improvement. For example, the compression ratio and speed are not so high and robust enough, and memory consumption is not ideal, etc. Therefore, it is of great significance to improve the efficiency, robustness, and practicability of genomic data compression to reduce the storage and transmission cost of genomic data further and promote the research and development of genomic technology. In this manuscript, a hybrid referential compression method (HRCM) for FASTA genome sequences is proposed. HRCM is a lossless compression method able to compress single sequence as well as large collections of sequences. It is implemented through three stages: sequence information extraction, sequence information matching, and sequence information encoding. A large number of experiments fully evaluated the performance of HRCM. Experimental verification shows that HRCM is superior to the best-known methods in genome batch compression. Moreover, HRCM memory consumption is relatively low and can be deployed on standard PCs.
Collapse
|
14
|
Khitmoh N, Smanchat S, Tongsima S. Stretch Profile: A pruning technique to accelerate DNA sequence search. INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2020.100323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
15
|
Tang Y, Li M, Sun J, Zhang T, Zhang J, Zheng P. TRCMGene: A two-step referential compression method for the efficient storage of genetic data. PLoS One 2018; 13:e0206521. [PMID: 30395579 PMCID: PMC6218042 DOI: 10.1371/journal.pone.0206521] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 10/08/2018] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The massive quantities of genetic data generated by high-throughput sequencing pose challenges to data storage, transmission and analyses. These problems are effectively solved through data compression, in which the size of data storage is reduced and the speed of data transmission is improved. Several options are available for compressing and storing genetic data. However, most of these options either do not provide sufficient compression rates or require a considerable length of time for decompression and loading. RESULTS Here, we propose TRCMGene, a lossless genetic data compression method that uses a referential compression scheme. The novel concept of two-step compression method, which builds an index structure using K-means and k-nearest neighbours, is introduced to TRCMGene. Evaluation with several real datasets revealed that the compression factor of TRCMGene ranges from 9 to 21. TRCMGene presents a good balance between compression factor and reading time. On average, the reading time of compressed data is 60% of that of uncompressed data. Thus, TRCMGene not only saves disc space but also saves file access time and speeds up data loading. These effects collectively improve genetic data storage and transmission in the current hardware environment and render system upgrades unnecessary. TRCMGene, user manual and demos could be accessed freely from https://github.com/tangyou79/TRCM. The data mentioned in this manuscript could be downloaded from: https://github.com/tangyou79/TRCM/wiki.
Collapse
Affiliation(s)
- You Tang
- Electrical and Information Engineering College, JiLin Agricultural Science and Technology University, Jilin, China
| | - Min Li
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
| | - Jing Sun
- College of Life Science and Agriculture, Qiqihar University, Qiqihar, China
| | - Tao Zhang
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
| | - Jicheng Zhang
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
- * E-mail: (JCZ); (PZ)
| | - Ping Zheng
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
- * E-mail: (JCZ); (PZ)
| |
Collapse
|
16
|
Fleming A, Abdalla EA, Maltecca C, Baes CF. Invited review: Reproductive and genomic technologies to optimize breeding strategies for genetic progress in dairy cattle. Arch Anim Breed 2018. [DOI: 10.5194/aab-61-43-2018] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Abstract. Dairy cattle breeders have exploited technological advances that have emerged in the past in regards to reproduction and genomics. The implementation of such technologies in routine breeding programs has permitted genetic gains in traditional milk production traits as well as, more recently, in low-heritability traits like health and fertility. As demand for dairy products increases, it is important for dairy breeders to optimize the use of available technologies and to consider the many emerging technologies that are currently being investigated in various fields. Here we review a number of technologies that have helped shape dairy breeding programs in the past and present, along with those potentially forthcoming. These tools have materialized in the areas of reproduction, genotyping and sequencing, genetic modification, and epigenetics. Although many of these technologies bring encouraging opportunities for genetic improvement of dairy cattle populations, their applications and benefits need to be weighed with their impacts on economics, genetic diversity, and society.
Collapse
|
17
|
Koo H, Hakim JA, Morrow CD, Eipers PG, Davila A, Andersen DT, Bej AK. Comparison of two bioinformatics tools used to characterize the microbial diversity and predictive functional attributes of microbial mats from Lake Obersee, Antarctica. J Microbiol Methods 2017; 140:15-22. [PMID: 28655556 PMCID: PMC6108183 DOI: 10.1016/j.mimet.2017.06.017] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Revised: 06/22/2017] [Accepted: 06/23/2017] [Indexed: 01/01/2023]
Abstract
In this study, using NextGen sequencing of the collective 16S rRNA genes obtained from two sets of samples collected from Lake Obersee, Antarctica, we compared and contrasted two bioinformatics tools, PICRUSt and Tax4Fun. We then developed an R script to assess the taxonomic and predictive functional profiles of the microbial communities within the samples. Taxa such as Pseudoxanthomonas, Planctomycetaceae, Cyanobacteria Subsection III, Nitrosomonadaceae, Leptothrix, and Rhodobacter were exclusively identified by Tax4Fun that uses SILVA database; whereas PICRUSt that uses Greengenes database uniquely identified Pirellulaceae, Gemmatimonadetes A1-B1, Pseudanabaena, Salinibacterium and Sinobacteraceae. Predictive functional profiling of the microbial communities using Tax4Fun and PICRUSt separately revealed common metabolic capabilities, while also showing specific functional IDs not shared between the two approaches. Combining these functional predictions using a customized R script revealed a more inclusive metabolic profile, such as hydrolases, oxidoreductases, transferases; enzymes involved in carbohydrate and amino acid metabolisms; and membrane transport proteins known for nutrient uptake from the surrounding environment. Our results present the first molecular-phylogenetic characterization and predictive functional profiles of the microbial mat communities in Lake Obersee, while demonstrating the efficacy of combining both the taxonomic assignment information and functional IDs using the R script created in this study for a more streamlined evaluation of predictive functional profiles of microbial communities.
Collapse
Affiliation(s)
- Hyunmin Koo
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA.
| | - Joseph A Hakim
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Casey D Morrow
- Cell, Developmental, and Integrative Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Peter G Eipers
- Cell, Developmental, and Integrative Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Alfonso Davila
- NASA Ames Research Center, MS 245-3, Moffett Field, CA, USA
| | | | - Asim K Bej
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA.
| |
Collapse
|
18
|
Arenas M, Pereira F, Oliveira M, Pinto N, Lopes AM, Gomes V, Carracedo A, Amorim A. Forensic genetics and genomics: Much more than just a human affair. PLoS Genet 2017; 13:e1006960. [PMID: 28934201 PMCID: PMC5608170 DOI: 10.1371/journal.pgen.1006960] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
While traditional forensic genetics has been oriented towards using human DNA in criminal investigation and civil court cases, it currently presents a much wider application range, including not only legal situations sensu stricto but also and, increasingly often, to preemptively avoid judicial processes. Despite some difficulties, current forensic genetics is progressively incorporating the analysis of nonhuman genetic material to a greater extent. The analysis of this material-including other animal species, plants, or microorganisms-is now broadly used, providing ancillary evidence in criminalistics in cases such as animal attacks, trafficking of species, bioterrorism and biocrimes, and identification of fraudulent food composition, among many others. Here, we explore how nonhuman forensic genetics is being revolutionized by the increasing variety of genetic markers, the establishment of faster, less error-burdened and cheaper sequencing technologies, and the emergence and improvement of models, methods, and bioinformatics facilities.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
| | - Filipe Pereira
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal
- Interdisciplinary Centre of Marine and Environmental Research (CIIMAR), University of Porto, Porto, Portugal
| | - Manuela Oliveira
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
- Faculty of Sciences, University of Porto, Porto, Portugal
| | - Nadia Pinto
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
- Centre of Mathematics of the University of Porto, Porto, Portugal
| | - Alexandra M. Lopes
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
| | - Veronica Gomes
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
| | - Angel Carracedo
- Institute of Forensic Sciences Luis Concheiro, University of Santiago de Compostela, Santiago de Compostela, Spain
- Genomics Medicine Group, CIBERER, University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Antonio Amorim
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
- Faculty of Sciences, University of Porto, Porto, Portugal
| |
Collapse
|
19
|
You Q, Yi X, Zhang K, Wang C, Ma X, Zhang X, Xu W, Li F, Su Z. Genome-wide comparative analysis of H3K4me3 profiles between diploid and allotetraploid cotton to refine genome annotation. Sci Rep 2017; 7:9098. [PMID: 28831143 PMCID: PMC5567255 DOI: 10.1038/s41598-017-09680-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Accepted: 07/28/2017] [Indexed: 12/28/2022] Open
Abstract
Polyploidy is a common evolutionary occurrence in plants. Recently, published genomes of allotetraploid G. hirsutum and its donors G. arboreum and G. raimondii make cotton an accessible polyploid model. This study used chromatin immunoprecipitation with high-throughput sequencing (ChIP-Seq) to investigate the genome-wide distribution of H3K4me3 in G. arboreum and G. hirsutum, and explore the conservation and variation of genome structures between diploid and allotetraploid cotton. Our results showed that H3K4me3 modifications were associated with active transcription in both cottons. The H3K4me3 histone markers appeared mainly in genic regions and were enriched around the transcription start sites (TSSs) of genes. We integrated the ChIP-seq data of H3K4me3 with RNA-seq and ESTs data to refine the genic structure annotation. There were 6,773 and 12,773 new transcripts discovered in G. arboreum and G. hirsutum, respectively. Furthermore, co-expression networks were linked with histone modification and modularized in an attempt to explain differential H3K4me3 enrichment correlated with changes in gene transcription during cotton development and evolution. Taken together, we have combined epigenomic and transcriptomic datasets to systematically discover functional genes and compare them between G. arboreum and G. hirsutum, which may be beneficial for studying diploid and allotetraploid plants with large genomes and complicated evolution.
Collapse
Affiliation(s)
- Qi You
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Xin Yi
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Kang Zhang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Chunchao Wang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Xuelian Ma
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Xueyan Zhang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agriculture Sciences (ICR, CAAS), Anyang, Henan, 455000, China
| | - Wenying Xu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Fuguang Li
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agriculture Sciences (ICR, CAAS), Anyang, Henan, 455000, China.
| | - Zhen Su
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
20
|
Anstead CA, Perry T, Richards S, Korhonen PK, Young ND, Bowles VM, Batterham P, Gasser RB. The Battle Against Flystrike - Past Research and New Prospects Through Genomics. ADVANCES IN PARASITOLOGY 2017; 98:227-281. [PMID: 28942770 DOI: 10.1016/bs.apar.2017.03.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Flystrike, or cutaneous myiasis, is caused by blow fly larvae of the genus Lucilia. This disease is a major problem in countries with large sheep populations. In Australia, Lucilia cuprina (Wiedemann, 1830) is the principal fly involved in flystrike. While much research has been conducted on L. cuprina, including physical, chemical, immunological, genetic and biological investigations, the molecular biology of this fly is still poorly understood. The recent sequencing, assembly and annotation of the draft genome and analyses of selected transcriptomes of L. cuprina have given a first global glimpse of its molecular biology and insights into host-fly interactions, insecticide resistance genes and intervention targets. The present article introduces L. cuprina, flystrike and associated issues, details past control efforts and research foci, reviews salient aspects of the L. cuprina genome project and discusses how the new genomic and transcriptomic resources for this fly might accelerate fundamental molecular research of L. cuprina towards developing new methods for the treatment and control of flystrike.
Collapse
Affiliation(s)
| | - Trent Perry
- The University of Melbourne, Parkville, VIC, Australia
| | | | | | - Neil D Young
- The University of Melbourne, Parkville, VIC, Australia
| | | | | | | |
Collapse
|
21
|
Anstead CA, Batterham P, Korhonen PK, Young ND, Hall RS, Bowles VM, Richards S, Scott MJ, Gasser RB. A blow to the fly — Lucilia cuprina draft genome and transcriptome to support advances in biology and biotechnology. Biotechnol Adv 2016; 34:605-620. [DOI: 10.1016/j.biotechadv.2016.02.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 02/08/2016] [Accepted: 02/20/2016] [Indexed: 02/07/2023]
|
22
|
Recent advances in molecular marker techniques: Insight into QTL mapping, GWAS and genomic selection in plants. ACTA ACUST UNITED AC 2016. [DOI: 10.1007/s12892-015-0037-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
23
|
Pavesi G. ChIP-Seq Data Analysis to Define Transcriptional Regulatory Networks. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2016; 160:1-14. [PMID: 28070596 DOI: 10.1007/10_2016_43] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The first step in the definition of transcriptional regulatory networks is to establish correct relationships between transcription factors (TFs) and their target genes, together with the effect of their regulatory activity (activator or repressor). Fundamental advances in this direction have been made possible by the introduction of experimental techniques such as Chromatin Immunoprecipitation, which, coupled with next-generation sequencing technologies (ChIP-Seq), permit the genome-wide identification of TF binding sites. This chapter provides a survey on how data of this kind are to be processed and integrated with expression and other types of data to infer transcriptional regulatory rules and codes.
Collapse
Affiliation(s)
- Giulio Pavesi
- Department of Biosciences, University of Milan, Via Celoria 26, 20133, Milan, Italy.
| |
Collapse
|
24
|
Xie X, Zhou S, Guan J. CoGI: Towards Compressing Genomes as an Image. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1275-1285. [PMID: 26671800 DOI: 10.1109/tcbb.2015.2430331] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Genomic science is now facing an explosive increase of data thanks to the fast development of sequencing technology. This situation poses serious challenges to genomic data storage and transferring. It is desirable to compress data to reduce storage and transferring cost, and thus to boost data distribution and utilization efficiency. Up to now, a number of algorithms / tools have been developed for compressing genomic sequences. Unlike the existing algorithms, most of which treat genomes as one-dimensional text strings and compress them based on dictionaries or probability models, this paper proposes a novel approach called CoGI (the abbreviation of Compressing Genomes as an Image) for genome compression, which transforms the genomic sequences to a two-dimensional binary image (or bitmap), then applies a rectangular partition coding algorithm to compress the binary image. CoGI can be used as either a reference-based compressor or a reference-free compressor. For the former, we develop two entropy-based algorithms to select a proper reference genome. Performance evaluation is conducted on various genomes. Experimental results show that the reference-based CoGI significantly outperforms two state-of-the-art reference-based genome compressors GReEn and RLZ-opt in both compression ratio and compression efficiency. It also achieves comparable compression ratio but two orders of magnitude higher compression efficiency in comparison with XM--one state-of-the-art reference-free genome compressor. Furthermore, our approach performs much better than Gzip--a general-purpose and widely-used compressor, in both compression speed and compression ratio. So, CoGI can serve as an effective and practical genome compressor. The source code and other related documents of CoGI are available at: http://admis.fudan.edu.cn/projects/cogi.htm.
Collapse
|
25
|
Lin MH, Jones DF, Fleming R. Transcriptomic analysis of degraded forensic body fluids. Forensic Sci Int Genet 2015; 17:35-42. [DOI: 10.1016/j.fsigen.2015.03.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Revised: 03/06/2015] [Accepted: 03/10/2015] [Indexed: 10/23/2022]
|
26
|
Zhang S, Bartkowiak L, Nabiswa B, Mishra P, Fann J, Ouellette D, Correia I, Regier D, Liu J. Identifying low-level sequence variants via next generation sequencing to aid stable CHO cell line screening. Biotechnol Prog 2015; 31:1077-85. [DOI: 10.1002/btpr.2119] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Revised: 05/04/2015] [Indexed: 01/07/2023]
Affiliation(s)
- Sheng Zhang
- Process Sciences Cell Culture, Abbvie Bioresearch Center; 100 Research Drive Worcester MA 01605
| | - Lisa Bartkowiak
- Process Sciences Cell Culture, Abbvie Bioresearch Center; 100 Research Drive Worcester MA 01605
| | - Bernard Nabiswa
- Process Sciences Cell Culture, Abbvie Bioresearch Center; 100 Research Drive Worcester MA 01605
| | - Pratibha Mishra
- Process Sciences Cell Culture, Abbvie Bioresearch Center; 100 Research Drive Worcester MA 01605
| | - John Fann
- Process Sciences Cell Culture, Abbvie Bioresearch Center; 100 Research Drive Worcester MA 01605
| | - David Ouellette
- Process Sciences Analytics, Abbvie Bioresearch Center; 100 Research Drive Worcester MA 01605
| | - Ivan Correia
- Process Sciences Analytics, Abbvie Bioresearch Center; 100 Research Drive Worcester MA 01605
| | - Dean Regier
- Protein Science, Abbvie Bioresearch Center; 100 Research Drive Worcester MA 01605
| | - Junjian Liu
- Protein Science, Abbvie Bioresearch Center; 100 Research Drive Worcester MA 01605
| |
Collapse
|
27
|
Mielczarek M, Szyda J. Review of alignment and SNP calling algorithms for next-generation sequencing data. J Appl Genet 2015; 57:71-9. [PMID: 26055432 DOI: 10.1007/s13353-015-0292-7] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Revised: 02/27/2015] [Accepted: 05/15/2015] [Indexed: 01/21/2023]
Abstract
Application of the massive parallel sequencing technology has become one of the most important issues in life sciences. Therefore, it was crucial to develop bioinformatics tools for next-generation sequencing (NGS) data processing. Currently, two of the most significant tasks include alignment to a reference genome and detection of single nucleotide polymorphisms (SNPs). In many types of genomic analyses, great numbers of reads need to be mapped to the reference genome; therefore, selection of the aligner is an essential step in NGS pipelines. Two main algorithms-suffix tries and hash tables-have been introduced for this purpose. Suffix array-based aligners are memory-efficient and work faster than hash-based aligners, but they are less accurate. In contrast, hash table algorithms tend to be slower, but more sensitive. SNP and genotype callers may also be divided into two main different approaches: heuristic and probabilistic methods. A variety of software has been subsequently developed over the past several years. In this paper, we briefly review the current development of NGS data processing algorithms and present the available software.
Collapse
Affiliation(s)
- M Mielczarek
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kożuchowska 7, 51-631, Wroclaw, Poland.
| | - J Szyda
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kożuchowska 7, 51-631, Wroclaw, Poland
| |
Collapse
|
28
|
Gene dynamics of toll-like receptor 4 through a population bottleneck in an insular population of water voles (Arvicola amphibius). CONSERV GENET 2015. [DOI: 10.1007/s10592-015-0731-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
29
|
|
30
|
Sampson J, Jacobs K, Yeager M, Chanock S, Chatterjee N. Efficient study design for next generation sequencing. Genet Epidemiol 2015; 35:269-77. [PMID: 21370254 DOI: 10.1002/gepi.20575] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2010] [Revised: 12/24/2010] [Accepted: 01/12/2011] [Indexed: 01/23/2023]
Abstract
Next Generation Sequencing represents a powerful tool for detecting genetic variation associated with human disease. Because of the high cost of this technology, it is critical that we develop efficient study designs that consider the trade-off between the number of subjects (n) and the coverage depth (µ). How we divide our resources between the two can greatly impact study success, particularly in pilot studies. We propose a strategy for selecting the optimal combination of n and µ for studies aimed at detecting rare variants and for studies aimed at detecting associations between rare or uncommon variants and disease. For detecting rare variants, we find the optimal coverage depth to be between 2 and 8 reads when using the likelihood ratio test. For association studies, we find the strategy of sequencing all available subjects to be preferable. In deriving these combinations, we provide a detailed analysis describing the distribution of depth across a genome and the depth needed to identify a minor allele in an individual. The optimal coverage depth depends on the aims of the study, and the chosen depth can have a large impact on study success.
Collapse
Affiliation(s)
- Joshua Sampson
- Biostatistics Branch, DCEG, National Cancer Institute, Rockville, MD 20852, USA.
| | | | | | | | | |
Collapse
|
31
|
Xie X, Guan J, Zhou S. Similarity evaluation of DNA sequences based on frequent patterns and entropy. BMC Genomics 2015; 16 Suppl 3:S5. [PMID: 25707937 PMCID: PMC4331808 DOI: 10.1186/1471-2164-16-s3-s5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND DNA sequence analysis is an important research topic in bioinformatics. Evaluating the similarity between sequences, which is crucial for sequence analysis, has attracted much research effort in the last two decades, and a dozen of algorithms and tools have been developed. These methods are based on alignment, word frequency and geometric representation respectively, each of which has its advantage and disadvantage. RESULTS In this paper, for effectively computing the similarity between DNA sequences, we introduce a novel method based on frequency patterns and entropy to construct representative vectors of DNA sequences. Experiments are conducted to evaluate the proposed method, which is compared with two recently-developed alignment-free methods and the BLASTN tool. When testing on the β-globin genes of 11 species and using the results from MEGA as the baseline, our method achieves higher correlation coefficients than the two alignment-free methods and the BLASTN tool. CONCLUSIONS Our method is not only able to capture fine-granularity information (location and ordering) of DNA sequences via sequence blocking, but also insensitive to noise and sequence rearrangement due to considering only the maximal frequent patterns. It outperforms major existing methods or tools.
Collapse
|
32
|
Eastman AW, Yuan ZC. Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing. Front Microbiol 2015; 5:769. [PMID: 25653642 PMCID: PMC4301005 DOI: 10.3389/fmicb.2014.00769] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 12/16/2014] [Indexed: 01/10/2023] Open
Abstract
Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing projects.
Collapse
Affiliation(s)
- Alexander W Eastman
- Southern Crop Protection and Food Research Centre, Agriculture and Agri-Food Canada, Government of Canada London, ON, Canada ; Department of Microbiology and Immunology, Schulich School of Medicine and Dentistry, University of Western Ontario London, ON, Canada
| | - Ze-Chun Yuan
- Southern Crop Protection and Food Research Centre, Agriculture and Agri-Food Canada, Government of Canada London, ON, Canada ; Department of Microbiology and Immunology, Schulich School of Medicine and Dentistry, University of Western Ontario London, ON, Canada
| |
Collapse
|
33
|
Oleksiewicz U, Tomczak K, Woropaj J, Markowska M, Stępniak P, Shah PK. Computational characterisation of cancer molecular profiles derived using next generation sequencing. Contemp Oncol (Pozn) 2015; 19:A78-91. [PMID: 25691827 PMCID: PMC4322529 DOI: 10.5114/wo.2014.47137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Our current understanding of cancer genetics is grounded on the principle that cancer arises from a clone that has accumulated the requisite somatically acquired genetic aberrations, leading to the malignant transformation. It also results in aberrent of gene and protein expression. Next generation sequencing (NGS) or deep sequencing platforms are being used to create large catalogues of changes in copy numbers, mutations, structural variations, gene fusions, gene expression, and other types of information for cancer patients. However, inferring different types of biological changes from raw reads generated using the sequencing experiments is algorithmically and computationally challenging. In this article, we outline common steps for the quality control and processing of NGS data. We highlight the importance of accurate and application-specific alignment of these reads and the methodological steps and challenges in obtaining different types of information. We comment on the importance of integrating these data and building infrastructure to analyse it. We also provide exhaustive lists of available software to obtain information and point the readers to articles comparing software for deeper insight in specialised areas. We hope that the article will guide readers in choosing the right tools for analysing oncogenomic datasets.
Collapse
Affiliation(s)
- Urszula Oleksiewicz
- Laboratory of Gene Therapy, Department of Cancer Immunology, The Greater Poland Cancer Centre, Poznan, Poland ; Department of Cancer Immunology and Diagnostics, Chair of Medical Biotechnology, Poznan University of Medical Sciences, Poznan, Poland ; These authors contributed equally to this paper
| | - Katarzyna Tomczak
- Laboratory of Gene Therapy, Department of Cancer Immunology, The Greater Poland Cancer Centre, Poznan, Poland ; Department of Cancer Immunology and Diagnostics, Chair of Medical Biotechnology, Poznan University of Medical Sciences, Poznan, Poland ; Postgraduate School of Molecular Medicine, Medical University of Warsaw, Warsaw ; These authors contributed equally to this paper
| | - Jakub Woropaj
- Poznan University of Economics, Poznań, Poland ; These authors contributed equally to this paper
| | | | | | - Parantu K Shah
- Institute for Applied Cancer Science, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
34
|
Jex AR, Littlewood DT, Gasser RB. Sequencing and annotation of mitochondrial genomes from individual parasitic helminths. Methods Mol Biol 2015; 1201:51-63. [PMID: 25388107 DOI: 10.1007/978-1-4939-1438-8_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Mitochondrial (mt) genomics has significant implications in a range of fundamental areas of parasitology, including evolution, systematics, and population genetics as well as explorations of mt biochemistry, physiology, and function. Mt genomes also provide a rich source of markers to aid molecular epidemiological and ecological studies of key parasites. However, there is still a paucity of information on mt genomes for many metazoan organisms, particularly parasitic helminths, which has often related to challenges linked to sequencing from tiny amounts of material. The advent of next-generation sequencing (NGS) technologies has paved the way for low cost, high-throughput mt genomic research, but there have been obstacles, particularly in relation to post-sequencing assembly and analyses of large datasets. In this chapter, we describe protocols for the efficient amplification and sequencing of mt genomes from small portions of individual helminths, and highlight the utility of NGS platforms to expedite mt genomics. In addition, we recommend approaches for manual or semi-automated bioinformatic annotation and analyses to overcome the bioinformatic "bottleneck" to research in this area. Taken together, these approaches have demonstrated applicability to a range of parasites and provide prospects for using complete mt genomic sequence datasets for large-scale molecular systematic and epidemiological studies. In addition, these methods have broader utility and might be readily adapted to a range of other medium-sized molecular regions (i.e., 10-100 kb), including large genomic operons, and other organellar (e.g., plastid) and viral genomes.
Collapse
Affiliation(s)
- Aaron R Jex
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Corner Flemington Road & Park Drive, Parkville, VIC, 3010, Australia,
| | | | | |
Collapse
|
35
|
Abstract
BACKGROUND Next generation sequencing (NGS)-based assays continue to redefine the field of genetic testing. Owing to the complexity of the data, bioinformatics has become a necessary component in any laboratory implementing a clinical NGS test. CONTENT The computational components of an NGS-based work flow can be conceptualized as primary, secondary, and tertiary analytics. Each of these components addresses a necessary step in the transformation of raw data into clinically actionable knowledge. Understanding the basic concepts of these analysis steps is important in assessing and addressing the informatics needs of a molecular diagnostics laboratory. Equally critical is a familiarity with the regulatory requirements addressing the bioinformatics analyses. These and other topics are covered in this review article. SUMMARY Bioinformatics has become an important component in clinical laboratories generating, analyzing, maintaining, and interpreting data from molecular genetics testing. Given the rapid adoption of NGS-based clinical testing, service providers must develop informatics work flows that adhere to the rigor of clinical laboratory standards, yet are flexible to changes as the chemistry and software for analyzing sequencing data mature.
Collapse
Affiliation(s)
- Gavin R Oliver
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Steven N Hart
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Eric W Klee
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN.
| |
Collapse
|
36
|
Baes CF, Dolezal MA, Koltes JE, Bapst B, Fritz-Waters E, Jansen S, Flury C, Signer-Hasler H, Stricker C, Fernando R, Fries R, Moll J, Garrick DJ, Reecy JM, Gredler B. Evaluation of variant identification methods for whole genome sequencing data in dairy cattle. BMC Genomics 2014; 15:948. [PMID: 25361890 PMCID: PMC4289218 DOI: 10.1186/1471-2164-15-948] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 10/14/2014] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Advances in human genomics have allowed unprecedented productivity in terms of algorithms, software, and literature available for translating raw next-generation sequence data into high-quality information. The challenges of variant identification in organisms with lower quality reference genomes are less well documented. We explored the consequences of commonly recommended preparatory steps and the effects of single and multi sample variant identification methods using four publicly available software applications (Platypus, HaplotypeCaller, Samtools and UnifiedGenotyper) on whole genome sequence data of 65 key ancestors of Swiss dairy cattle populations. Accuracy of calling next-generation sequence variants was assessed by comparison to the same loci from medium and high-density single nucleotide variant (SNV) arrays. RESULTS The total number of SNVs identified varied by software and method, with single (multi) sample results ranging from 17.7 to 22.0 (16.9 to 22.0) million variants. Computing time varied considerably between software. Preparatory realignment of insertions and deletions and subsequent base quality score recalibration had only minor effects on the number and quality of SNVs identified by different software, but increased computing time considerably. Average concordance for single (multi) sample results with high-density chip data was 58.3% (87.0%) and average genotype concordance in correctly identified SNVs was 99.2% (99.2%) across software. The average quality of SNVs identified, measured as the ratio of transitions to transversions, was higher using single sample methods than multi sample methods. A consensus approach using results of different software generally provided the highest variant quality in terms of transition/transversion ratio. CONCLUSIONS Our findings serve as a reference for variant identification pipeline development in non-human organisms and help assess the implication of preparatory steps in next-generation sequencing pipelines for organisms with incomplete reference genomes (pipeline code is included). Benchmarking this information should prove particularly useful in processing next-generation sequencing data for use in genome-wide association studies and genomic selection.
Collapse
Affiliation(s)
- Christine F Baes
- Bern University of Applied Sciences, School of Agricultural, Forest and Food Sciences HAFL, Länggasse 85, CH-3052 Zollikofen, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Cunha MV, Inácio J, Freimanis G, Fusaro A, Granberg F, Höper D, King DP, Monne I, Orton R, Rosseel T. Next-generation sequencing in veterinary medicine: how can the massive amount of information arising from high-throughput technologies improve diagnosis, control, and management of infectious diseases? Methods Mol Biol 2014; 1247:415-36. [PMID: 25399113 PMCID: PMC7123048 DOI: 10.1007/978-1-4939-2004-4_30] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The development of high-throughput molecular technologies and associated bioinformatics has dramatically changed the capacities of scientists to produce, handle, and analyze large amounts of genomic, transcriptomic, and proteomic data. A clear example of this step-change is represented by the amount of DNA sequence data that can be now produced using next-generation sequencing (NGS) platforms. Similarly, recent improvements in protein and peptide separation efficiencies and highly accurate mass spectrometry have promoted the identification and quantification of proteins in a given sample. These advancements in biotechnology have increasingly been applied to the study of animal infectious diseases and are beginning to revolutionize the way that biological and evolutionary processes can be studied at the molecular level. Studies have demonstrated the value of NGS technologies for molecular characterization, ranging from metagenomic characterization of unknown pathogens or microbial communities to molecular epidemiology and evolution of viral quasispecies. Moreover, high-throughput technologies now allow detailed studies of host-pathogen interactions at the level of their genomes (genomics), transcriptomes (transcriptomics), or proteomes (proteomics). Ultimately, the interaction between pathogen and host biological networks can be questioned by analytically integrating these levels (integrative OMICS and systems biology). The application of high-throughput biotechnology platforms in these fields and their typical low-cost per information content has revolutionized the resolution with which these processes can now be studied. The aim of this chapter is to provide a current and prospective view on the opportunities and challenges associated with the application of massive parallel sequencing technologies to veterinary medicine, with particular focus on applications that have a potential impact on disease control and management.
Collapse
Affiliation(s)
- Mónica V. Cunha
- Instituto Nacional de Investigação Agrária e Veterinária, IP and Centro de Biologia Ambiental, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
| | - João Inácio
- Instituto Nacional de Investigação Agrária e Veterinária, IP, Lisboa, Portugal and School of Pharmacy and Biomolecular Sciences, University of Brighton, Brighton, United Kingdom
| | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Yan B, Wang ZH, Zhu CD, Guo JT, Zhao JL. MicroRNA repertoire for functional genome research in tilapia identified by deep sequencing. Mol Biol Rep 2014; 41:4953-63. [PMID: 24752404 DOI: 10.1007/s11033-014-3361-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Accepted: 03/31/2014] [Indexed: 12/17/2022]
Abstract
The Nile tilapia (Oreochromis niloticus; Cichlidae) is an economically important species in aquaculture and occupies a prominent position in the aquaculture industry. MicroRNAs (miRNAs) are a class of noncoding RNAs that post-transcriptionally regulate gene expression involved in diverse biological and metabolic processes. To increase the repertoire of miRNAs characterized in tilapia, we used the Illumina/Solexa sequencing technology to sequence a small RNA library using pooled RNA sample isolated from the different developmental stages of tilapia. Bioinformatic analyses suggest that 197 conserved and 27 novel miRNAs are expressed in tilapia. Sequence alignments indicate that all tested miRNAs and miRNAs* are highly conserved across many species. In addition, we characterized the tissue expression patterns of five miRNAs using real-time quantitative PCR. We found that miR-1/206, miR-7/9, and miR-122 is abundantly expressed in muscle, brain, and liver, respectively, implying a potential role in the regulation of tissue differentiation or the maintenance of tissue identity. Overall, our results expand the number of tilapia miRNAs, and the discovery of miRNAs in tilapia genome contributes to a better understanding the role of miRNAs in regulating diverse biological processes.
Collapse
Affiliation(s)
- Biao Yan
- Key Laboratory of Freshwater Fisheries Germplasm Resource, Ministry of Agriculture, SHOU, Shanghai, 201306, China,
| | | | | | | | | |
Collapse
|
39
|
Grötzinger SW, Alam I, Ba Alawi W, Bajic VB, Stingl U, Eppinger J. Mining a database of single amplified genomes from Red Sea brine pool extremophiles-improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA). Front Microbiol 2014; 5:134. [PMID: 24778629 PMCID: PMC3985023 DOI: 10.3389/fmicb.2014.00134] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2014] [Accepted: 03/16/2014] [Indexed: 11/13/2022] Open
Abstract
Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.
Collapse
Affiliation(s)
- Stefan W Grötzinger
- Division of Physical Sciences and Engineering, KAUST Catalysis Center, King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia
| | - Intikhab Alam
- Division of Biological Sciences and Engineering, Computational Bioscience Research Center, King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia
| | - Wail Ba Alawi
- Division of Biological Sciences and Engineering, Computational Bioscience Research Center, King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia
| | - Vladimir B Bajic
- Division of Biological Sciences and Engineering, Computational Bioscience Research Center, King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia
| | - Ulrich Stingl
- Division of Biological Sciences and Engineering, Red Sea Research Center, King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia
| | - Jörg Eppinger
- Division of Physical Sciences and Engineering, KAUST Catalysis Center, King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia
| |
Collapse
|
40
|
Barba M, Czosnek H, Hadidi A. Historical perspective, development and applications of next-generation sequencing in plant virology. Viruses 2014; 6:106-36. [PMID: 24399207 PMCID: PMC3917434 DOI: 10.3390/v6010106] [Citation(s) in RCA: 119] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 12/17/2013] [Accepted: 12/24/2013] [Indexed: 12/27/2022] Open
Abstract
Next-generation high throughput sequencing technologies became available at the onset of the 21st century. They provide a highly efficient, rapid, and low cost DNA sequencing platform beyond the reach of the standard and traditional DNA sequencing technologies developed in the late 1970s. They are continually improved to become faster, more efficient and cheaper. They have been used in many fields of biology since 2004. In 2009, next-generation sequencing (NGS) technologies began to be applied to several areas of plant virology including virus/viroid genome sequencing, discovery and detection, ecology and epidemiology, replication and transcription. Identification and characterization of known and unknown viruses and/or viroids in infected plants are currently among the most successful applications of these technologies. It is expected that NGS will play very significant roles in many research and non-research areas of plant virology.
Collapse
Affiliation(s)
- Marina Barba
- Consiglio per la ricerca e la Sperimentazione in Agricoltura, Centro di Ricerca per la Patologia Vegetale, Via C. G. Bertero 22, Rome 00156, Italy.
| | - Henryk Czosnek
- Consiglio per la ricerca e la Sperimentazione in Agricoltura, Centro di Ricerca per la Patologia Vegetale, Via C. G. Bertero 22, Rome 00156, Italy.
| | - Ahmed Hadidi
- Consiglio per la ricerca e la Sperimentazione in Agricoltura, Centro di Ricerca per la Patologia Vegetale, Via C. G. Bertero 22, Rome 00156, Italy.
| |
Collapse
|
41
|
Muthuswamy A, Eapen SJ. Research on Plant Pathogenic Fungi in the Genomics Era: From Sequence Analysis to Systems Biology. Fungal Biol 2014. [DOI: 10.1007/978-1-4939-1188-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
42
|
Calabrese C, Mangiulli M, Manzari C, Paluscio AM, Caratozzolo MF, Marzano F, Kurelac I, D'Erchia AM, D'Elia D, Licciulli F, Liuni S, Picardi E, Attimonelli M, Gasparre G, Porcelli AM, Pesole G, Sbisà E, Tullo A. A platform independent RNA-Seq protocol for the detection of transcriptome complexity. BMC Genomics 2013; 14:855. [PMID: 24308330 PMCID: PMC4046740 DOI: 10.1186/1471-2164-14-855] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Accepted: 11/26/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recent studies have demonstrated an unexpected complexity of transcription in eukaryotes. The majority of the genome is transcribed and only a little fraction of these transcripts is annotated as protein coding genes and their splice variants. Indeed, most transcripts are the result of antisense, overlapping and non-coding RNA expression. In this frame, one of the key aims of high throughput transcriptome sequencing is the detection of all RNA species present in the cell and the first crucial step for RNA-seq users is represented by the choice of the strategy for cDNA library construction. The protocols developed so far provide the utilization of the entire library for a single sequencing run with a specific platform. RESULTS We set up a unique protocol to generate and amplify a strand-specific cDNA library representative of all RNA species that may be implemented with all major platforms currently available on the market (Roche 454, Illumina, ABI/SOLiD). Our method is reproducible, fast, easy-to-perform and even allows to start from low input total RNA. Furthermore, we provide a suitable bioinformatics tool for the analysis of the sequences produced following this protocol. CONCLUSION We tested the efficiency of our strategy, showing that our method is platform-independent, thus allowing the simultaneous analysis of the same sample with different NGS technologies, and providing an accurate quantitative and qualitative portrait of complex whole transcriptomes.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Apollonia Tullo
- Istituto di Tecnologie Biomediche (ITB), Consiglio Nazionale delle Ricerche (CNR), Bari, Italy.
| |
Collapse
|
43
|
Malaeb L, Le-Clech P, Vrouwenvelder JS, Ayoub GM, Saikaly PE. Do biological-based strategies hold promise to biofouling control in MBRs? WATER RESEARCH 2013; 47:5447-63. [PMID: 23863390 DOI: 10.1016/j.watres.2013.06.033] [Citation(s) in RCA: 106] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2013] [Revised: 05/21/2013] [Accepted: 06/15/2013] [Indexed: 05/26/2023]
Abstract
Biofouling in membrane bioreactors (MBRs) remains a primary challenge for their wider application, despite the growing acceptance of MBRs worldwide. Research studies on membrane fouling are extensive in the literature, with more than 200 publications on MBR fouling in the last 3 years; yet, improvements in practice on biofouling control and management have been remarkably slow. Commonly applied cleaning methods are only partially effective and membrane replacement often becomes frequent. The reason for the slow advancement in successful control of biofouling is largely attributed to the complex interactions of involved biological compounds and the lack of representative-for-practice experimental approaches to evaluate potential effective control strategies. Biofouling is driven by microorganisms and their associated extra-cellular polymeric substances (EPS) and microbial products. Microorganisms and their products convene together to form matrices that are commonly treated as a black box in conventional control approaches. Biological-based antifouling strategies seem to be a promising constituent of an effective integrated control approach since they target the essence of biofouling problems. However, biological-based strategies are in their developmental phase and several questions should be addressed to set a roadmap for translating existing and new information into sustainable and effective control techniques. This paper investigates membrane biofouling in MBRs from the microbiological perspective to evaluate the potential of biological-based strategies in offering viable control alternatives. Limitations of available control methods highlight the importance of an integrated anti-fouling approach including biological strategies. Successful development of these strategies requires detailed characterization of microorganisms and EPS through the proper selection of analytical tools and assembly of results. Existing microbiological/EPS studies reveal a number of implications as well as knowledge gaps, warranting future targeted research. Systematic and representative microbiological studies, complementary utilization of molecular and biofilm characterization tools, standardized experimental methods and validation of successful biological-based antifouling strategies for MBR applications are needed. Specifically, in addition, linking these studies to relevant operational conditions in MBRs is an essential step to ultimately develop a better understanding and more effective and directed control strategy for biofouling.
Collapse
Affiliation(s)
- Lilian Malaeb
- Water Desalination and Reuse Research Center and Division of Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | | | | | | | | |
Collapse
|
44
|
Ulahannan D, Kovac MB, Mulholland PJ, Cazier JB, Tomlinson I. Technical and implementation issues in using next-generation sequencing of cancers in clinical practice. Br J Cancer 2013; 109:827-35. [PMID: 23887607 PMCID: PMC3749581 DOI: 10.1038/bjc.2013.416] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Revised: 04/23/2013] [Accepted: 06/27/2013] [Indexed: 12/13/2022] Open
Abstract
Next-generation sequencing (NGS) of cancer genomes promises to revolutionise oncology, with the ability to design and use targeted drugs, to predict outcome and response, and to classify tumours. It is continually becoming cheaper, faster and more reliable, with the capability to identify rare yet clinically important somatic mutations. Technical challenges include sequencing samples of low quality and/or quantity, reliable identification of structural and copy number variation, and assessment of intratumour heterogeneity. Once these problems are overcome, the use of the data to guide clinical decision making is not straightforward, and there is a risk of premature use of molecular changes to guide patient management in the absence of supporting evidence. Paradoxically, NGS may simply move the bottleneck of personalised medicine from data acquisition to the identification of reliable biomarkers. Standardised cancer NGS data collection on an international scale would be a significant step towards optimising patient care.
Collapse
Affiliation(s)
- D Ulahannan
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK.
| | | | | | | | | |
Collapse
|
45
|
Cheng L, Quek CYJ, Sun X, Bellingham SA, Hill AF. The detection of microRNA associated with Alzheimer's disease in biological fluids using next-generation sequencing technologies. Front Genet 2013; 4:150. [PMID: 23964286 PMCID: PMC3737441 DOI: 10.3389/fgene.2013.00150] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Accepted: 07/21/2013] [Indexed: 02/06/2023] Open
Abstract
Diagnostic tools for neurodegenerative diseases such as Alzheimer's disease (AD) currently involve subjective neuropsychological testing and specialized brain imaging techniques. While definitive diagnosis requires a pathological brain evaluation at autopsy, neurodegenerative changes are believed to begin years before the clinical presentation of cognitive decline. Therefore, there is an essential need for reliable biomarkers to aid in the early detection of disease in order to implement preventative strategies. microRNAs (miRNA) are small non-coding RNA species that are involved in post-transcriptional gene regulation. Expression levels of miRNAs have potential as diagnostic biomarkers as they are known to circulate and tissue specific profiles can be identified in a number of bodily fluids such as plasma, CSF and urine. Recent developments in deep sequencing technology present a viable approach to develop biomarker discovery pipelines in order to profile miRNA signatures in bodily fluids specific to neurodegenerative diseases. Here we review the potential use of miRNA deep sequencing in biomarker identification from biological fluids and its translation into clinical practice.
Collapse
Affiliation(s)
- Lesley Cheng
- Department of Biochemistry and Molecular Biology, The University of Melbourne Melbourne, VIC, Australia ; Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne Melbourne, VIC, Australia
| | | | | | | | | |
Collapse
|
46
|
D'Antonio M, D'Onorio De Meo P, Paoletti D, Elmi B, Pallocca M, Sanna N, Picardi E, Pesole G, Castrignanò T. WEP: a high-performance analysis pipeline for whole-exome data. BMC Bioinformatics 2013; 14 Suppl 7:S11. [PMID: 23815231 PMCID: PMC3633005 DOI: 10.1186/1471-2105-14-s7-s11] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background The advent of massively parallel sequencing technologies (Next Generation Sequencing, NGS) profoundly modified the landscape of human genetics. In particular, Whole Exome Sequencing (WES) is the NGS branch that focuses on the exonic regions of the eukaryotic genomes; exomes are ideal to help us understanding high-penetrance allelic variation and its relationship to phenotype. A complete WES analysis involves several steps which need to be suitably designed and arranged into an efficient pipeline. Managing a NGS analysis pipeline and its huge amount of produced data requires non trivial IT skills and computational power. Results Our web resource WEP (Whole-Exome sequencing Pipeline web tool) performs a complete WES pipeline and provides easy access through interface to intermediate and final results. The WEP pipeline is composed of several steps: 1) verification of input integrity and quality checks, read trimming and filtering; 2) gapped alignment; 3) BAM conversion, sorting and indexing; 4) duplicates removal; 5) alignment optimization around insertion/deletion (indel) positions; 6) recalibration of quality scores; 7) single nucleotide and deletion/insertion polymorphism (SNP and DIP) variant calling; 8) variant annotation; 9) result storage into custom databases to allow cross-linking and intersections, statistics and much more. In order to overcome the challenge of managing large amount of data and maximize the biological information extracted from them, our tool restricts the number of final results filtering data by customizable thresholds, facilitating the identification of functionally significant variants. Default threshold values are also provided at the analysis computation completion, tuned with the most common literature work published in recent years. Conclusions Through our tool a user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and a user-friendly web interface for annotated variant visualization. Non-IT mastered users can access through WEP to the most updated and tested WES algorithms, tuned to maximize the quality of called variants while minimizing artifacts and false positives. The web tool is available at the following web address: http://www.caspur.it/wep
Collapse
Affiliation(s)
- Mattia D'Antonio
- Dipartimento di Bioscienze, Biotecnologie e Scienze Farmacologiche, Università degli Studi di Bari, Bari, Italy
| | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Pérez-de-Castro AM, Vilanova S, Cañizares J, Pascual L, Blanca JM, Díez MJ, Prohens J, Picó B. Application of genomic tools in plant breeding. Curr Genomics 2012; 13:179-95. [PMID: 23115520 PMCID: PMC3382273 DOI: 10.2174/138920212800543084] [Citation(s) in RCA: 90] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2011] [Revised: 09/16/2011] [Accepted: 10/11/2011] [Indexed: 02/08/2023] Open
Abstract
Plant breeding has been very successful in developing improved varieties using conventional tools and methodologies. Nowadays, the availability of genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. Next Generation Sequencing (NGS) technologies are allowing the mass sequencing of genomes and transcriptomes, which is producing a vast array of genomic information. The analysis of NGS data by means of bioinformatics developments allows discovering new genes and regulatory sequences and their positions, and makes available large collections of molecular markers. Genome-wide expression studies provide breeders with an understanding of the molecular basis of complex traits. Genomic approaches include TILLING and EcoTILLING, which make possible to screen mutant and germplasm collections for allelic variants in target genes. Re-sequencing of genomes is very useful for the genome-wide discovery of markers amenable for high-throughput genotyping platforms, like SSRs and SNPs, or the construction of high density genetic maps. All these tools and resources facilitate studying the genetic diversity, which is important for germplasm management, enhancement and use. Also, they allow the identification of markers linked to genes and QTLs, using a diversity of techniques like bulked segregant analysis (BSA), fine genetic mapping, or association mapping. These new markers are used for marker assisted selection, including marker assisted backcross selection, ‘breeding by design’, or new strategies, like genomic selection. In conclusion, advances in genomics are providing breeders with new tools and methodologies that allow a great leap forward in plant breeding, including the ‘superdomestication’ of crops and the genetic dissection and breeding for complex traits.
Collapse
Affiliation(s)
- A M Pérez-de-Castro
- Instituto de Conservación y Mejora de la Agrodiversidad Valenciana, Universitat Politècnica de València, Camino de Vera 14, 46022 Valencia, Spain
| | | | | | | | | | | | | | | |
Collapse
|
48
|
Rubio M, de Horna A, Belles X. MicroRNAs in metamorphic and non-metamorphic transitions in hemimetabolan insect metamorphosis. BMC Genomics 2012; 13:386. [PMID: 22882747 PMCID: PMC3462697 DOI: 10.1186/1471-2164-13-386] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Accepted: 07/26/2012] [Indexed: 11/28/2022] Open
Abstract
Background Previous work showed that miRNAs play key roles in the regulation of metamorphosis in the hemimetabolan species Blattella germanica. To gain insight about which miRNAs might be important, we have constructed two miRNA libraries, one of the penultimate, pre-metamorphic nymphal instar (N5) and the other of the last, metamorphic nymphal instar (N6). Results High throughput sequencing gave 61 canonical miRNAs present in the N5 and N6 libraries, although at different proportions in each. Comparison of both libraries led to the identification of three and 37 miRNAs significantly more expressed in N5 and N6 respectively. Twelve of these 40 miRNAs were then investigated further by qRT-PCR and results indicated that miR-252-3p was well expressed in N5 but not in N6, whereas let-7-5p, miR-100-5p and miR-125-5p showed the reverse pattern. 20-Hydroxyecdysone (20E) tended to stimulate miRNA expression, whereas juvenile hormone (JH) inhibited the 20E stimulatory effect. Expression of let-7, miR-100 and miR-125 was increased by 20E, which has also been observed in D. melanogaster. The only miRNA that was inhibited by 20E was miR-252-3p. The involvement of let-7, miR-100 and miR-125 in metamorphosis has been demonstrated in other insects. Depletion of miR-252-3p caused growth and developmental delays, which suggests that this miRNA is involved in regulating these processes prior to metamorphosis. Conclusions The comparative analysis of miRNA libraries from pre-metamorphic (N5) and metamorphic stages (N6) of B. germanica proved to be a useful tool to identify miRNAs with roles in hemimetabolan metamorphosis. Three miRNAs emerged as important factors in the metamorphic stage (N6): let-7-5p, miR-100-5p and miR-125-5p, whereas miR-252-3p appears to be important in the pre-metamorphic stage (N5).
Collapse
Affiliation(s)
- Mercedes Rubio
- Institute of Evolutionary Biology, CSIC-UPF, Passeig Marítim 39, 08003 Barcelona, Spain
| | | | | |
Collapse
|
49
|
Wang Q, Xia J, Jia P, Pao W, Zhao Z. Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives. Brief Bioinform 2012; 14:506-19. [PMID: 22877769 DOI: 10.1093/bib/bbs044] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Gene fusions are important genomic events in human cancer because their fusion gene products can drive the development of cancer and thus are potential prognostic tools or therapeutic targets in anti-cancer treatment. Major advancements have been made in computational approaches for fusion gene discovery over the past 3 years due to improvements and widespread applications of high-throughput next generation sequencing (NGS) technologies. To identify fusions from NGS data, existing methods typically leverage the strengths of both sequencing technologies and computational strategies. In this article, we review the NGS and computational features of existing methods for fusion gene detection and suggest directions for future development.
Collapse
|
50
|
Naidoo N, Pawitan Y, Soong R, Cooper DN, Ku CS. Human genetics and genomics a decade after the release of the draft sequence of the human genome. Hum Genomics 2012; 5:577-622. [PMID: 22155605 PMCID: PMC3525251 DOI: 10.1186/1479-7364-5-6-577] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.
Collapse
Affiliation(s)
- Nasheen Naidoo
- Centre for Molecular Epidemiology, Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | | | | | | | | |
Collapse
|