1
|
Biswas M, So K, Bertolini TB, Krishnan P, Rana J, Muñoz-Melero M, Syed F, Kumar SRP, Gao H, Xuei X, Terhorst C, Daniell H, Cao S, Herzog RW. Distinct functions and transcriptional signatures in orally induced regulatory T cell populations. Front Immunol 2023; 14:1278184. [PMID: 37954612 PMCID: PMC10637621 DOI: 10.3389/fimmu.2023.1278184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 10/16/2023] [Indexed: 11/14/2023] Open
Abstract
Oral administration of antigen induces regulatory T cells (Treg) that can not only control local immune responses in the small intestine, but also traffic to the central immune system to deliver systemic suppression. Employing murine models of the inherited bleeding disorder hemophilia, we find that oral antigen administration induces three CD4+ Treg subsets, namely FoxP3+LAP-, FoxP3+LAP+, and FoxP3-LAP+. These T cells act in concert to suppress systemic antibody production induced by therapeutic protein administration. Whilst both FoxP3+LAP+ and FoxP3-LAP+ CD4+ T cells express membrane-bound TGF-β (latency associated peptide, LAP), phenotypic, functional, and single cell transcriptomic analyses reveal distinct characteristics in the two subsets. As judged by an increase in IL-2Rα and TCR signaling, elevated expression of co-inhibitory receptor molecules and upregulation of the TGFβ and IL-10 signaling pathways, FoxP3+LAP+ cells are an activated form of FoxP3+LAP- Treg. Whereas FoxP3-LAP+ cells express low levels of genes involved in TCR signaling or co-stimulation, engagement of the AP-1 complex members Jun/Fos and Atf3 is most prominent, consistent with potent IL-10 production. Single cell transcriptomic analysis further reveals that engagement of the Jun/Fos transcription factors is requisite for mediating TGFβ expression. This can occur via an Il2ra dependent or independent process in FoxP3+LAP+ or FoxP3-LAP+ cells respectively. Surprisingly, both FoxP3+LAP+ and FoxP3-LAP+ cells potently suppress and induce FoxP3 expression in CD4+ conventional T cells. In this process, FoxP3-LAP+ cells may themselves convert to FoxP3+ Treg. We conclude that orally induced suppression is dependent on multiple regulatory cell types with complementary and interconnected roles.
Collapse
Affiliation(s)
- Moanaro Biswas
- Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Kaman So
- Department of Biostatistics and Health Data Science and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Thais B. Bertolini
- Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Preethi Krishnan
- Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, BC, Canada
| | - Jyoti Rana
- Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Maite Muñoz-Melero
- Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Farooq Syed
- Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Sandeep R. P. Kumar
- Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Hongyu Gao
- Center for Medical Genomics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Xiaoling Xuei
- Center for Medical Genomics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Cox Terhorst
- Division of Immunology, Beth Israel Deaconess Medical Center (BIDMC), Harvard Medical School, Boston, MA, United States
| | - Henry Daniell
- Department of Basic and Translational Sciences, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Sha Cao
- Department of Biostatistics and Health Data Science and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Roland W. Herzog
- Herman B Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN, United States
| |
Collapse
|
2
|
Rather MA, Agarwal D, Bhat TA, Khan IA, Zafar I, Kumar S, Amin A, Sundaray JK, Qadri T. Bioinformatics approaches and big data analytics opportunities in improving fisheries and aquaculture. Int J Biol Macromol 2023; 233:123549. [PMID: 36740117 DOI: 10.1016/j.ijbiomac.2023.123549] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 02/05/2023]
Abstract
Aquaculture has witnessed an excellent growth rate during the last two decades and offers huge potential to provide nutritional as well as livelihood security. Genomic research has contributed significantly toward the development of beneficial technologies for aquaculture. The existing high throughput technologies like next-generation technologies generate oceanic data which requires extensive analysis using appropriate tools. Bioinformatics is a rapidly evolving science that involves integrating gene based information and computational technology to produce new knowledge for the benefit of aquaculture. Bioinformatics provides new opportunities as well as challenges for information and data processing in new generation aquaculture. Rapid technical advancements have opened up a world of possibilities for using current genomics to improve aquaculture performance. Understanding the genes that govern economically relevant characteristics, necessitates a significant amount of additional research. The various dimensions of data sources includes next-generation DNA sequencing, protein sequencing, RNA sequencing gene expression profiles, metabolic pathways, molecular markers, and so on. Appropriate bioinformatics tools are developed to mine the biologically relevant and commercially useful results. The purpose of this scoping review is to present various arms of diverse bioinformatics tools with special emphasis on practical translation to the aquaculture industry.
Collapse
Affiliation(s)
- Mohd Ashraf Rather
- Division of Fish Genetics and Biotechnology, Faculty of Fisheries Ganderbal, Sher-e- Kashmir University of Agricultural Science and Technology, Kashmir, India.
| | - Deepak Agarwal
- Institute of Fisheries Post Graduation Studies OMR Campus, Vaniyanchavadi, Chennai, India
| | | | - Irfan Ahamd Khan
- Division of Fish Genetics and Biotechnology, Faculty of Fisheries Ganderbal, Sher-e- Kashmir University of Agricultural Science and Technology, Kashmir, India
| | - Imran Zafar
- Department of Bioinformatics and Computational Biology, Virtual University Punjab, Pakistan
| | - Sujit Kumar
- Department of Bioinformatics and Computational Biology, Virtual University Punjab, Pakistan
| | - Adnan Amin
- Postgraduate Institute of Fisheries Education and Research Kamdhenu University, Gandhinagar-India University of Kurasthra, India; Department of Aquatic Environmental Management, Faculty of Fisheries Rangil- Ganderbel -SKUAST-K, India
| | - Jitendra Kumar Sundaray
- ICAR-Central Institute of Freshwater Aquaculture, Kausalyaganga, Bhubaneswar, Odisha 751002, India
| | - Tahiya Qadri
- Division of Food Science and Technology, SKUAST-K, Shalimar, India
| |
Collapse
|
3
|
Ajaykumar A, Yang JJ. Integrative Comparison of Burrows-Wheeler Transform-Based Mapping Algorithm with de Bruijn Graph for Identification of Lung/Liver Cancer-Specific Gene. J Microbiol Biotechnol 2022; 32:149-159. [PMID: 34949753 PMCID: PMC9628837 DOI: 10.4014/jmb.2110.10017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 12/22/2021] [Accepted: 12/23/2021] [Indexed: 12/15/2022]
Abstract
Cancers of the lung and liver are the top 10 leading causes of cancer death worldwide. Thus, it is essential to identify the genes specifically expressed in these two cancer types to develop new therapeutics. Although many messenger RNA (mRNA) sequencing data related to these cancer cells are available due to the advancement of next-generation sequencing (NGS) technologies, optimized data processing methods need to be developed to identify the novel cancer-specific genes. Here, we conducted an analytical comparison between Bowtie2, a Burrows-Wheeler transform-based alignment tool, and Kallisto, which adopts pseudo alignment based on a transcriptome de Bruijn graph using mRNA sequencing data on normal cells and lung/liver cancer tissues. Before using cancer data, simulated mRNA sequencing reads were generated, and the high Transcripts Per Million (TPM) values were compared. mRNA sequencing reads data on lung/liver cancer cells were also extracted and quantified. While Kallisto could directly give the output in TPM values, Bowtie2 provided the counts. Thus, TPM values were calculated by processing the Sequence Alignment Map (SAM) file in R using package Rsubread and subsequently in python. The analysis of the simulated sequencing data revealed that Kallisto could detect more transcripts and had a higher overlap over Bowtie2. The evaluation of these two data processing methods using the known lung cancer biomarkers concludes that in standard settings without any dedicated quality control, Kallisto is more effective at producing faster and more accurate results than Bowtie2. Such conclusions were also drawn and confirmed with the known biomarkers specific to liver cancer.
Collapse
Affiliation(s)
- Atul Ajaykumar
- Department of Information, Communication and Electronics Engineering, The Catholic University of Korea, Bucheon 14662, Republic of Korea
| | - Jung Jin Yang
- Department of Computer Science Engineering, The Catholic University of Korea, Bucheon 14662, Republic of Korea
| |
Collapse
|
4
|
Mi Z, Zhongqiang C, Caiyun J, Yanan L, Jianhua W, Liang L. Circular RNA detection methods: A minireview. Talanta 2022; 238:123066. [PMID: 34808570 DOI: 10.1016/j.talanta.2021.123066] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 11/11/2021] [Accepted: 11/12/2021] [Indexed: 12/21/2022]
Abstract
Circular RNA (circRNA), a novel type of covalently closed RNA, is implicated in several developmental and metabolic disease processes. CircRNAs exhibit tissue-specific expression, and are stable, abundant, and highly conserved, making them ideal biomarkers for diagnosis and prognosis. Accurate profiling of circRNA, however, is a prerequisite for their clinical application. Traditional methods such as northern blotting, RT-qPCR, and microarray analysis provide useful but limited information. To address these issues, a number of novel assays have recently emerged, such as droplet digital PCR (ddPCR), isothermal exponential amplification, and rolling cycle amplification, which increase the sensitivity and specificity of circRNA detection. Herein, we summarize the advantages and limitations of the new detection methods and discuss the challenges as well as future directions.
Collapse
Affiliation(s)
- Zhang Mi
- Department of Pharmacy, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Chen Zhongqiang
- School of Medicine, Jianghan University, Wuhan, 430056, China
| | - Jiang Caiyun
- Department of Pharmacy, The Third Affiliate Hospital of Sun Yat-Sen University, Guangzhou, 510630, China
| | - Liu Yanan
- Department of Pharmacy, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Wu Jianhua
- Department of Pharmacy, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Liu Liang
- Department of Pharmacy, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China.
| |
Collapse
|
5
|
Pan B, Ren L, Onuchic V, Guan M, Kusko R, Bruinsma S, Trigg L, Scherer A, Ning B, Zhang C, Glidewell-Kenney C, Xiao C, Donaldson E, Sedlazeck FJ, Schroth G, Yavas G, Grunenwald H, Chen H, Meinholz H, Meehan J, Wang J, Yang J, Foox J, Shang J, Miclaus K, Dong L, Shi L, Mohiyuddin M, Pirooznia M, Gong P, Golshani R, Wolfinger R, Lababidi S, Sahraeian SME, Sherry S, Han T, Chen T, Shi T, Hou W, Ge W, Zou W, Guo W, Bao W, Xiao W, Fan X, Gondo Y, Yu Y, Zhao Y, Su Z, Liu Z, Tong W, Xiao W, Zook JM, Zheng Y, Hong H. Assessing reproducibility of inherited variants detected with short-read whole genome sequencing. Genome Biol 2022; 23:2. [PMID: 34980216 PMCID: PMC8722114 DOI: 10.1186/s13059-021-02569-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 12/06/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. RESULTS To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. CONCLUSIONS Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.
Collapse
Affiliation(s)
- Bohu Pan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | | | | | | | | | - Len Trigg
- Real Time Genomics, Hamilton, New Zealand
| | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Baitang Ning
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Eric Donaldson
- Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, 20993, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | | - Gokhan Yavas
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | | | | | | | - Joe Meehan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Jing Wang
- Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100013, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | | | - Lianhua Dong
- Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100013, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | | | - Mehdi Pirooznia
- Bioinformatics and Computational Biology Laboratory, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA
| | | | | | - Samir Lababidi
- Office of Health Informatics, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, 20993, USA
| | | | - Steve Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Tao Han
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Tao Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Tieliu Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Weigong Ge
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Wen Zou
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Wenjing Guo
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Wenjun Bao
- SAS Institute Inc., Cary, NC, 27513, USA
| | - Wenzhong Xiao
- Stanford Genome Technology Center, Stanford University School of Medicine, Palo Alto, CA, 94305, USA
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Yoichi Gondo
- Department of Molecular Life Sciences, Tokai University School of Medicine, 143 Shimokasuya, Isehara, 259-1193, Japan
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Yongmei Zhao
- CCR-SF Bioinformatics Group, Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21701, USA
| | - Zhenqiang Su
- Takeda Pharmaceuticals, Cambridge, MA, 02139, USA
| | - Zhichao Liu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Wenming Xiao
- Division of Molecular Genetics and Pathology, Center for Device and Radiological Health, US Food and Drug Administration, Silver Spring, MD, 20993, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China.
- Human Phenome Institute, Fudan University, Shanghai, 200438, China.
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
6
|
Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, Mutlu O, Mangul S. Technology dictates algorithms: recent developments in read alignment. Genome Biol 2021; 22:249. [PMID: 34446078 PMCID: PMC8390189 DOI: 10.1186/s13059-021-02443-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 07/28/2021] [Indexed: 01/08/2023] Open
Abstract
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.
Collapse
Affiliation(s)
- Mohammed Alser
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Jeremy Rotman
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Dhrithi Deshpande
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA
| | - Kodi Taraszka
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Pelin Icer Baykal
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Harry Taegyun Yang
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Ph.D. Program, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Victor Xue
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Benjamin D Singer
- Division of Pulmonary and Critical Care Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
- Department of Biochemistry & Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, USA
- Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Brunilda Balliu
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - David Koslicki
- Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16801, USA
- Biology Department, Pennsylvania State University, University Park, PA, 16801, USA
- The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16801, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 119991, Russia
| | - Can Alkan
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Bilkent-Hacettepe Health Sciences and Technologies Program, Ankara, Turkey
| | - Onur Mutlu
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
7
|
Yen S, Johnson JS. Metagenomics: a path to understanding the gut microbiome. Mamm Genome 2021; 32:282-296. [PMID: 34259891 PMCID: PMC8295064 DOI: 10.1007/s00335-021-09889-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 06/28/2021] [Indexed: 12/16/2022]
Abstract
The gut microbiome is a major determinant of host health, yet it is only in the last 2 decades that the advent of next-generation sequencing has enabled it to be studied at a genomic level. Shotgun sequencing is beginning to provide insight into the prokaryotic as well as eukaryotic and viral components of the gut community, revealing not just their taxonomy, but also the functions encoded by their collective metagenome. This revolution in understanding is being driven by continued development of sequencing technologies and in consequence necessitates reciprocal development of computational approaches that can adapt to the evolving nature of sequence datasets. In this review, we provide an overview of current bioinformatic strategies for handling metagenomic sequence data and discuss their strengths and limitations. We then go on to discuss key technological developments that have the potential to once again revolutionise the way we are able to view and hence understand the microbiome.
Collapse
Affiliation(s)
- Sandi Yen
- Oxford Centre for Microbiome Studies, Kennedy Institute of Rheumatology, University of Oxford, Roosevelt Drive, Headington, Oxford, OX3 7FY, UK
| | - Jethro S Johnson
- Oxford Centre for Microbiome Studies, Kennedy Institute of Rheumatology, University of Oxford, Roosevelt Drive, Headington, Oxford, OX3 7FY, UK.
| |
Collapse
|
8
|
Robinson T, Harkin J, Shukla P. Hardware Acceleration of Genomics Data Analysis: Challenges and Opportunities. Bioinformatics 2021; 37:1785-1795. [PMID: 34037688 PMCID: PMC8317111 DOI: 10.1093/bioinformatics/btab017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 11/03/2020] [Accepted: 05/24/2021] [Indexed: 12/11/2022] Open
Abstract
The significant decline in the cost of genome sequencing has dramatically changed the typical bioinformatics pipeline for analysing sequencing data. Where traditionally, the computational challenge of sequencing is now secondary to genomic data analysis. Short read alignment (SRA) is a ubiquitous process within every modern bioinformatics pipeline in the field of genomics and is often regarded as the principal computational bottleneck. Many hardware and software approaches have been provided to solve the challenge of acceleration. However, previous attempts to increase throughput using many-core processing strategies have enjoyed limited success, mainly due to a dependence on global memory for each computational block. The limited scalability and high energy costs of many-core SRA implementations pose a significant constraint in maintaining acceleration. The Networks-On-Chip (NoC) hardware interconnect mechanism has advanced the scalability of many-core computing systems and, more recently, has demonstrated potential in SRA implementations by integrating multiple computational blocks such as pre-alignment filtering and sequence alignment efficiently, while minimising memory latency and global memory access. This paper provides a state of the art review on current hardware acceleration strategies for genomic data analysis, and it establishes the challenges and opportunities of utilising NoCs as a critical building block in next-generation sequencing (NGS) technologies for advancing the speed of analysis.
Collapse
Affiliation(s)
- Tony Robinson
- School of Computing, Engineering and Intelligent Systems, Ulster University, Magee Campus, Derry/Londonderry, BT48 7JL, UK
| | - Jim Harkin
- School of Computing, Engineering and Intelligent Systems, Ulster University, Magee Campus, Derry/Londonderry, BT48 7JL, UK
| | - Priyank Shukla
- Northern Ireland Centre for Stratified Medicine, Biomedical Sciences Research Institute, Ulster University, C-TRIC Building, Altnagelvin Area Hospital, Derry/Londonderry, BT47 6SB, UK
| |
Collapse
|
9
|
Paskov K, Jung JY, Chrisman B, Stockham NT, Washington P, Varma M, Sun MW, Wall DP. Estimating sequencing error rates using families. BioData Min 2021; 14:27. [PMID: 33892748 PMCID: PMC8063364 DOI: 10.1186/s13040-021-00259-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Accepted: 03/29/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. RESULTS We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method's versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. CONCLUSION Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.
Collapse
Affiliation(s)
- Kelley Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| | - Jae-Yoon Jung
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.,Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA, USA
| | - Brianna Chrisman
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Nate T Stockham
- Department of Neuroscience, Stanford University, Stanford, CA, USA
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Maya Varma
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Min Woo Sun
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA. .,Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA, USA.
| |
Collapse
|
10
|
Adil A, Kumar V, Jan AT, Asger M. Single-Cell Transcriptomics: Current Methods and Challenges in Data Acquisition and Analysis. Front Neurosci 2021; 15:591122. [PMID: 33967674 PMCID: PMC8100238 DOI: 10.3389/fnins.2021.591122] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 03/19/2021] [Indexed: 11/17/2022] Open
Abstract
Rapid cost drops and advancements in next-generation sequencing have made profiling of cells at individual level a conventional practice in scientific laboratories worldwide. Single-cell transcriptomics [single-cell RNA sequencing (SC-RNA-seq)] has an immense potential of uncovering the novel basis of human life. The well-known heterogeneity of cells at the individual level can be better studied by single-cell transcriptomics. Proper downstream analysis of this data will provide new insights into the scientific communities. However, due to low starting materials, the SC-RNA-seq data face various computational challenges: normalization, differential gene expression analysis, dimensionality reduction, etc. Additionally, new methods like 10× Chromium can profile millions of cells in parallel, which creates a considerable amount of data. Thus, single-cell data handling is another big challenge. This paper reviews the single-cell sequencing methods, library preparation, and data generation. We highlight some of the main computational challenges that require to be addressed by introducing new bioinformatics algorithms and tools for analysis. We also show single-cell transcriptomics data as a big data problem.
Collapse
Affiliation(s)
- Asif Adil
- Department of Computer Sciences, Baba Ghulam Shah Badshah University, Rajouri, India
| | - Vijay Kumar
- Department of Biotechnology, Yeungnam University, Gyeongsan, South Korea
| | - Arif Tasleem Jan
- School of Biosciences and Biotechnology, Baba Ghulam Shah Badshah University, Rajouri, India
| | - Mohammed Asger
- Department of Computer Sciences, Baba Ghulam Shah Badshah University, Rajouri, India
| |
Collapse
|
11
|
Abstract
RNA silencing plays a critical role in diverse biological processes in plants including growth, development, and responses to abiotic and biotic stresses. RNA silencing is guided by small non-coding RNAs (sRNAs) with the length of 21-24 nucleotides (nt) that are loaded into Argonaute (AGO) to repress expression of target loci and transcripts through transcriptional or posttranscriptional gene silencing mechanisms. Identification and quantitative characterization of sRNAs are crucial steps toward appreciation of their functions in biology. Here, we developed a step-by-step protocol to precisely illustrate the process of cloning of sRNA libraries and correspondingly computational analysis of the recovered sRNAs. This protocol can be used in all kinds of organisms, including Arabidopsis, and is compatible with various high-throughput sequence technologies such as Illumina Hiseq. Thus, we wish that this protocol represents an accurate way to identify and quantify sRNAs in vivo.
Collapse
Affiliation(s)
- Di Sun
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, USA
- Institute for Plant Genomics and Biotechnology, Texas A&M University, College Station, TX, USA
- Graduate Program for Molecular and Environmental Plant Science, Texas A&M University, College Station, TX, USA
| | - Zeyang Ma
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, USA
- Institute for Plant Genomics and Biotechnology, Texas A&M University, College Station, TX, USA
| | - Jiaying Zhu
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, USA
- Institute for Plant Genomics and Biotechnology, Texas A&M University, College Station, TX, USA
| | - Xiuren Zhang
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, USA.
- Institute for Plant Genomics and Biotechnology, Texas A&M University, College Station, TX, USA.
| |
Collapse
|
12
|
Zhang G, Zhang Y, Jin J. The Ultrafast and Accurate Mapping Algorithm FANSe3: Mapping a Human Whole-Genome Sequencing Dataset Within 30 Minutes. PHENOMICS (CHAM, SWITZERLAND) 2021; 1:22-30. [PMID: 36939746 PMCID: PMC9584123 DOI: 10.1007/s43657-020-00008-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 10/28/2020] [Accepted: 11/10/2020] [Indexed: 11/26/2022]
Abstract
Aligning billions of reads generated by the next-generation sequencing (NGS) to reference sequences, termed "mapping", is the time-consuming and computationally-intensive process in most NGS applications. A Fast, accurate and robust mapping algorithm is highly needed. Therefore, we developed the FANSe3 mapping algorithm, which can map a 30 × human whole-genome sequencing (WGS) dataset within 30 min, a 50 × human whole exome sequencing (WES) dataset within 30 s, and a typical mRNA-seq dataset within seconds in a single-server node without the need for any hardware acceleration feature. Like its predecessor FANSe2, the error rate of FANSe3 can be kept as low as 10-9 in most cases, this is more robust than the Burrows-Wheeler transform-based algorithms. Error allowance hardly affected the identification of a driver somatic mutation in clinically relevant WGS data and provided robust gene expression profiles regardless of the parameter settings and sequencer used. The novel algorithm, designed for high-performance cloud-computing after infrastructures, will break the bottleneck of speed and accuracy in NGS data analysis and promote NGS applications in various fields. The FANSe3 algorithm can be downloaded from the website: http://www.chi-biotech.com/fanse3/.
Collapse
Affiliation(s)
- Gong Zhang
- MOE Key Laboratory of Tumor Molecular Biology and Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, 510632 China
- Chi-Biotech Co. Ltd., Shenzhen, 518000 China
| | | | - Jingjie Jin
- MOE Key Laboratory of Tumor Molecular Biology and Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, 510632 China
| |
Collapse
|
13
|
Song Y, Tang W, Li H. Identification of KIF4A and its effect on the progression of lung adenocarcinoma based on the bioinformatics analysis. Biosci Rep 2021; 41:BSR20203973. [PMID: 33398330 PMCID: PMC7823194 DOI: 10.1042/bsr20203973] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 12/25/2020] [Accepted: 01/04/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Lung adenocarcinoma (LUAD) is the most frequent histological type of lung cancer, and its incidence has displayed an upward trend in recent years. Nevertheless, little is known regarding effective biomarkers for LUAD. METHODS The robust rank aggregation method was used to mine differentially expressed genes (DEGs) from the gene expression omnibus (GEO) datasets. The Search Tool for the Retrieval of Interacting Genes (STRING) database was used to extract hub genes from the protein-protein interaction (PPI) network. The expression of the hub genes was validated using expression profiles from TCGA and Oncomine databases and was verified by real-time quantitative PCR (qRT-PCR). The module and survival analyses of the hub genes were determined using Cytoscape and Kaplan-Meier curves. The function of KIF4A as a hub gene was investigated in LUAD cell lines. RESULTS The PPI analysis identified seven DEGs including BIRC5, DLGAP5, CENPF, KIF4A, TOP2A, AURKA, and CCNA2, which were significantly upregulated in Oncomine and TCGA LUAD datasets, and were verified by qRT-PCR in our clinical samples. We determined the overall and disease-free survival analysis of the seven hub genes using GEPIA. We further found that CENPF, DLGAP5, and KIF4A expressions were positively correlated with clinical stage. In LUAD cell lines, proliferation and migration were inhibited and apoptosis was promoted by knocking down KIF4A expression. CONCLUSION We have identified new DEGs and functional pathways involved in LUAD. KIF4A, as a hub gene, promoted the progression of LUAD and might represent a potential therapeutic target for molecular cancer therapy.
Collapse
Affiliation(s)
- Yexun Song
- Department of Otolaryngology-Head Neck Surgery, Xiangya Hospital, Central South University, Changsha 410008, Hunan Province, China
| | - Wenfang Tang
- Department of Respiratory Medicine, The First Hospital of Changsha, Changsha 410000, Hunan Province, China
| | - Hui Li
- Department of Respiratory Medicine, The First Hospital of Changsha, Changsha 410000, Hunan Province, China
| |
Collapse
|
14
|
Galise TR, Esposito S, D'Agostino N. Guidelines for Setting Up a mRNA Sequencing Experiment and Best Practices for Bioinformatic Data Analysis. Methods Mol Biol 2021; 2264:137-162. [PMID: 33263908 DOI: 10.1007/978-1-0716-1201-9_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
RNA-sequencing, commonly referred to as RNA-seq, is the most recently developed method for the analysis of transcriptomes. It uses high-throughput next-generation sequencing technologies and has revolutionized our understanding of the complexity and dynamics of whole transcriptomes.In this chapter, we recall the key developments in transcriptome analysis and dissect the different steps of the general workflow that can be run by users to design and perform a mRNA-seq experiment as well as to process mRNA-seq data obtained by the Illumina technology. The chapter proposes guidelines for completing a mRNA-seq study properly and makes available recommendations for best practices based on recent literature and on the latest developments in technology and algorithms. We also remark the large number of choices available (especially for bioinformatic data analysis) in front of which the scientist may be in trouble.In the last part of the chapter we discuss the new frontiers of single-cell RNA-seq and isoform sequencing by long read technology.
Collapse
Affiliation(s)
- Teresa Rosa Galise
- Department of Agricultural Sciences, University of Naples Federico II, Portici, Italy
| | - Salvatore Esposito
- CREA Research Centre for Vegetable and Ornamental Crops, Pontecagnano Faiano, Italy
| | - Nunzio D'Agostino
- Department of Agricultural Sciences, University of Naples Federico II, Portici, Italy.
| |
Collapse
|
15
|
Computational Genomics. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
16
|
Pockrandt C, Alzamel M, Iliopoulos CS, Reinert K. GenMap: ultra-fast computation of genome mappability. Bioinformatics 2020; 36:3687-3692. [PMID: 32246826 PMCID: PMC7320602 DOI: 10.1093/bioinformatics/btaa222] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Revised: 03/23/2020] [Accepted: 03/31/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Computing the uniqueness of k-mers for each position of a genome while allowing for up to e mismatches is computationally challenging. However, it is crucial for many biological applications such as the design of guide RNA for CRISPR experiments. More formally, the uniqueness or (k, e)-mappability can be described for every position as the reciprocal value of how often this k-mer occurs approximately in the genome, i.e. with up to e mismatches. RESULTS We present a fast method GenMap to compute the (k, e)-mappability. We extend the mappability algorithm, such that it can also be computed across multiple genomes where a k-mer occurrence is only counted once per genome. This allows for the computation of marker sequences or finding candidates for probe design by identifying approximate k-mers that are unique to a genome or that are present in all genomes. GenMap supports different formats such as binary output, wig and bed files as well as csv files to export the location of all approximate k-mers for each genomic position. AVAILABILITY AND IMPLEMENTATION GenMap can be installed via bioconda. Binaries and C++ source code are available on https://github.com/cpockrandt/genmap.
Collapse
Affiliation(s)
- Christopher Pockrandt
- Center for Computational Biology, School of Medicine.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.,Department of Computer Science and Mathematics, Freie Universität Berlin.,Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Mai Alzamel
- Department of Informatics, King's College London, London, UK.,Department of Computer Science, King Saud University, Riyadh, Saudi Arabia
| | | | - Knut Reinert
- Department of Computer Science and Mathematics, Freie Universität Berlin.,Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
17
|
Subkhankulova T, Naumenko F, Tolmachov OE, Orlov YL. Novel ChIP-seq simulating program with superior versatility: isChIP. Brief Bioinform 2020; 22:6035271. [PMID: 33320934 DOI: 10.1093/bib/bbaa352] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 10/18/2020] [Accepted: 11/03/2020] [Indexed: 12/13/2022] Open
Abstract
Chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is recognized as an extremely powerful tool to study the interaction of numerous transcription factors and other chromatin-associated proteins with DNA. The core problem in the optimization of ChIP-seq protocol and the following computational data analysis is that a 'true' pattern of binding events for a given protein factor is unknown. Computer simulation of the ChIP-seq process based on 'a-priory known binding template' can contribute to a drastically reduce the number of wet lab experiments and finally help achieve radical optimization of the entire processing pipeline. We present a newly developed ChIP-sequencing simulation algorithm implemented in the novel software, in silico ChIP-seq (isChIP). We demonstrate that isChIP closely approximates real ChIP-seq protocols and is able to model data similar to those obtained from experimental sequencing. We validated isChIP using publicly available datasets generated for well-characterized transcription factors Oct4 and Sox2. Although the novel software is compatible with the Illumina protocols by default, it can also successfully perform simulations with a number of alternative sequencing platforms such as Roche454, Ion Torrent and SOLiD as well as model ChIP -Exo. The versatility of isChIP was demonstrated through modelling a wide range of binding events, including those of transcription factors and chromatin modifiers. We also performed a comparative analysis against a few existing ChIP-seq simulators and showed the fundamental superiority of our model. Due to its ability to utilize known binding templates, isChIP can potentially be employed to help investigators choose the most appropriate analytical software through benchmarking of available ChIP-seq programs and optimize the experimental parameters of ChIP-seq protocol. isChIP software is freely available at https://github.com/fnaumenko/isChIP.
Collapse
Affiliation(s)
| | | | | | - Yuriy L Orlov
- Digital Health Institute, I.M. Sechenov First Moscow State Medical University (Sechenov University), and Senior Scientist at Agrarian and Technological Institute, Peoples' Friendship University of Russia (RUDN University), Russia
| |
Collapse
|
18
|
Lee N, Park MJ, Song W, Jeon K, Jeong S. Currently Applied Molecular Assays for Identifying ESR1 Mutations in Patients with Advanced Breast Cancer. Int J Mol Sci 2020; 21:ijms21228807. [PMID: 33233830 PMCID: PMC7699999 DOI: 10.3390/ijms21228807] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 11/17/2020] [Accepted: 11/19/2020] [Indexed: 12/11/2022] Open
Abstract
Approximately 70% of breast cancers, the leading cause of cancer-related mortality worldwide, are positive for the estrogen receptor (ER). Treatment of patients with luminal subtypes is mainly based on endocrine therapy. However, ER positivity is reduced and ESR1 mutations play an important role in resistance to endocrine therapy, leading to advanced breast cancer. Various methodologies for the detection of ESR1 mutations have been developed, and the most commonly used method is next-generation sequencing (NGS)-based assays (50.0%) followed by droplet digital PCR (ddPCR) (45.5%). Regarding the sample type, tissue (50.0%) was more frequently used than plasma (27.3%). However, plasma (46.2%) became the most used method in 2016-2019, in contrast to 2012-2015 (22.2%). In 2016-2019, ddPCR (61.5%), rather than NGS (30.8%), became a more popular method than it was in 2012-2015. The easy accessibility, non-invasiveness, and demonstrated usefulness with high sensitivity of ddPCR using plasma have changed the trends. When using these assays, there should be a comprehensive understanding of the principles, advantages, vulnerability, and precautions for interpretation. In the future, advanced NGS platforms and modified ddPCR will benefit patients by facilitating treatment decisions efficiently based on information regarding ESR1 mutations.
Collapse
Affiliation(s)
- Nuri Lee
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
| | - Min-Jeong Park
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
| | - Wonkeun Song
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
| | - Kibum Jeon
- Department of Laboratory Medicine, Hangang Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea;
| | - Seri Jeong
- Department of Laboratory Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07440, Korea; (N.L.); (M.-J.P.); (W.S.)
- Correspondence: ; Tel.: +82-845-5305
| |
Collapse
|
19
|
Kanzi AM, San JE, Chimukangara B, Wilkinson E, Fish M, Ramsuran V, de Oliveira T. Next Generation Sequencing and Bioinformatics Analysis of Family Genetic Inheritance. Front Genet 2020; 11:544162. [PMID: 33193618 PMCID: PMC7649788 DOI: 10.3389/fgene.2020.544162] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 09/21/2020] [Indexed: 12/29/2022] Open
Abstract
Mendelian and complex genetic trait diseases continue to burden and affect society both socially and economically. The lack of effective tests has hampered diagnosis thus, the affected lack proper prognosis. Mendelian diseases are caused by genetic mutations in a singular gene while complex trait diseases are caused by the accumulation of mutations in either linked or unlinked genomic regions. Significant advances have been made in identifying novel diseases associated mutations especially with the introduction of next generation and third generation sequencing. Regardless, some diseases are still without diagnosis as most tests rely on SNP genotyping panels developed from population based genetic analyses. Analysis of family genetic inheritance using whole genomes, whole exomes or a panel of genes has been shown to be effective in identifying disease-causing mutations. In this review, we discuss next generation and third generation sequencing platforms, bioinformatic tools and genetic resources commonly used to analyze family based genomic data with a focus on identifying inherited or novel disease-causing mutations. Additionally, we also highlight the analytical, ethical and regulatory challenges associated with analyzing personal genomes which constitute the data used for family genetic inheritance.
Collapse
Affiliation(s)
- Aquillah M. Kanzi
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | | | | | | | | | | | | |
Collapse
|
20
|
Cacciabue M, Currá A, Carrillo E, König G, Gismondi MI. A beginner's guide for FMDV quasispecies analysis: sub-consensus variant detection and haplotype reconstruction using next-generation sequencing. Brief Bioinform 2020; 21:1766-1775. [PMID: 31697321 PMCID: PMC7110011 DOI: 10.1093/bib/bbz086] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 06/18/2019] [Accepted: 06/19/2019] [Indexed: 12/18/2022] Open
Abstract
Deep sequencing of viral genomes is a powerful tool to study RNA virus complexity. However, the analysis of next-generation sequencing data might be challenging for researchers who have never approached the study of viral quasispecies by this methodology. In this work we present a suitable and affordable guide to explore the sub-consensus variability and to reconstruct viral quasispecies from Illumina sequencing data. The guide includes a complete analysis pipeline along with user-friendly descriptions of software and file formats. In addition, we assessed the feasibility of the workflow proposed by analyzing a set of foot-and-mouth disease viruses (FMDV) with different degrees of variability. This guide introduces the analysis of quasispecies of FMDV and other viruses through this kind of approach.
Collapse
Affiliation(s)
- Marco Cacciabue
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
- Departamento de Ciencias Básicas, Universidad Nacional de Luján, Luján, Argentina
| | - Anabella Currá
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
- Departamento de Ciencias Básicas, Universidad Nacional de Luján, Luján, Argentina
| | - Elisa Carrillo
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
| | - Guido König
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
| | - María Inés Gismondi
- Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET), Hurlingham, Argentina
- Departamento de Ciencias Básicas, Universidad Nacional de Luján, Luján, Argentina
| |
Collapse
|
21
|
Yao Z, You FM, N'Diaye A, Knox RE, McCartney C, Hiebert CW, Pozniak C, Xu W. Evaluation of variant calling tools for large plant genome re-sequencing. BMC Bioinformatics 2020; 21:360. [PMID: 32807073 PMCID: PMC7430858 DOI: 10.1186/s12859-020-03704-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 07/28/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Discovering single nucleotide polymorphisms (SNPs) from agriculture crop genome sequences has been a widely used strategy for developing genetic markers for several applications including marker-assisted breeding, population diversity studies for eco-geographical adaption, genotyping crop germplasm collections, and others. Accurately detecting SNPs from large polyploid crop genomes such as wheat is crucial and challenging. A few variant calling methods have been previously developed but they show a low concordance between their variant calls. A gold standard of variant sets generated from one human individual sample was established for variant calling tool evaluations, however hitherto no gold standard of crop variant set is available for wheat use. The intent of this study was to evaluate seven SNP variant calling tools (FreeBayes, GATK, Platypus, Samtools/mpileup, SNVer, VarScan, VarDict) with the two most popular mapping tools (BWA-mem and Bowtie2) on wheat whole exome capture (WEC) re-sequencing data from allohexaploid wheat. RESULTS We found the BWA-mem mapping tool had both a higher mapping rate and a higher accuracy rate than Bowtie2. With the same mapping quality (MQ) cutoff, BWA-mem detected more variant bases in mapping reads than Bowtie2. The reads preprocessed with quality trimming or duplicate removal did not significantly affect the final mapping performance in terms of mapped reads. Based on the concordance and receiver operating characteristic (ROC), the Samtools/mpileup variant calling tool with BWA-mem mapping of raw sequence reads outperformed other tests followed by FreeBayes and GATK in terms of specificity and sensitivity. VarDict and VarScan were the poorest performing variant calling tools with the wheat WEC sequence data. CONCLUSION The BWA-mem and Samtools/mpileup pipeline, with no need to preprocess the raw read data before mapping onto the reference genome, was ascertained the optimum for SNP calling for the complex wheat genome re-sequencing. These results also provide useful guidelines for reliable variant identification from deep sequencing of other large polyploid crop genomes.
Collapse
Affiliation(s)
- Zhen Yao
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, 101 Route 100, Morden, Manitoba, R6M 1Y5, Canada
| | - Frank M You
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, Ontario, K1A 0C6, Canada
| | - Amidou N'Diaye
- Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, S7N 5A8, Canada
| | - Ron E Knox
- Swift Current Research and Development Centre, Agriculture and Agri-Food Canada, Box 1030, Swift Current, Saskatchewan, S9H 3X2, Canada
| | - Curt McCartney
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, 101 Route 100, Morden, Manitoba, R6M 1Y5, Canada
| | - Colin W Hiebert
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, 101 Route 100, Morden, Manitoba, R6M 1Y5, Canada
| | - Curtis Pozniak
- Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, S7N 5A8, Canada
| | - Wayne Xu
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, 101 Route 100, Morden, Manitoba, R6M 1Y5, Canada.
| |
Collapse
|
22
|
He X, Chen S, Li R, Han X, He Z, Yuan D, Zhang S, Duan X, Niu B. Comprehensive fundamental somatic variant calling and quality management strategies for human cancer genomes. Brief Bioinform 2020; 22:5854402. [PMID: 32510555 DOI: 10.1093/bib/bbaa083] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 04/19/2020] [Accepted: 04/21/2020] [Indexed: 12/21/2022] Open
Abstract
Next-generation sequencing (NGS) technology has revolutionised human cancer research, particularly via detection of genomic variants with its ultra-high-throughput sequencing and increasing affordability. However, the inundation of rich cancer genomics data has resulted in significant challenges in its exploration and translation into biological insights. One of the difficulties in cancer genome sequencing is software selection. Currently, multiple tools are widely used to process NGS data in four stages: raw sequence data pre-processing and quality control (QC), sequence alignment, variant calling and annotation and visualisation. However, the differences between these NGS tools, including their installation, merits, drawbacks and application, have not been fully appreciated. Therefore, a systematic review of the functionality and performance of NGS tools is required to provide cancer researchers with guidance on software and strategy selection. Another challenge is the multidimensional QC of sequencing data because QC can not only report varied sequence data characteristics but also reveal deviations in diverse features and is essential for a meaningful and successful study. However, monitoring of QC metrics in specific steps including alignment and variant calling is neglected in certain pipelines such as the 'Best Practices Workflows' in GATK. In this review, we investigated the most widely used software for the fundamental analysis and QC of cancer genome sequencing data and provided instructions for selecting the most appropriate software and pipelines to ensure precise and efficient conclusions. We further discussed the prospects and new research directions for cancer genomics.
Collapse
|
23
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. Differential Expression Analysis of RNA-seq Reads: Overview, Taxonomy, and Tools. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:566-586. [PMID: 30281477 DOI: 10.1109/tcbb.2018.2873010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Analysis of RNA-sequence (RNA-seq) data is widely used in transcriptomic studies and it has many applications. We review RNA-seq data analysis from RNA-seq reads to the results of differential expression analysis. In addition, we perform a descriptive comparison of tools used in each step of RNA-seq data analysis along with a discussion of important characteristics of these tools. A taxonomy of tools is also provided. A discussion of issues in quality control and visualization of RNA-seq data is also included along with useful tools. Finally, we provide some guidelines for the RNA-seq data analyst, along with research issues and challenges which should be addressed.
Collapse
|
24
|
Kiselev D, Matsvay A, Abramov I, Dedkov V, Shipulin G, Khafizov K. Current Trends in Diagnostics of Viral Infections of Unknown Etiology. Viruses 2020; 12:E211. [PMID: 32074965 PMCID: PMC7077230 DOI: 10.3390/v12020211] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 02/10/2020] [Accepted: 02/12/2020] [Indexed: 12/27/2022] Open
Abstract
Viruses are evolving at an alarming rate, spreading and inconspicuously adapting to cutting-edge therapies. Therefore, the search for rapid, informative and reliable diagnostic methods is becoming urgent as ever. Conventional clinical tests (PCR, serology, etc.) are being continually optimized, yet provide very limited data. Could high throughput sequencing (HTS) become the future gold standard in molecular diagnostics of viral infections? Compared to conventional clinical tests, HTS is universal and more precise at profiling pathogens. Nevertheless, it has not yet been widely accepted as a diagnostic tool, owing primarily to its high cost and the complexity of sample preparation and data analysis. Those obstacles must be tackled to integrate HTS into daily clinical practice. For this, three objectives are to be achieved: (1) designing and assessing universal protocols for library preparation, (2) assembling purpose-specific pipelines, and (3) building computational infrastructure to suit the needs and financial abilities of modern healthcare centers. Data harvested with HTS could not only augment diagnostics and help to choose the correct therapy, but also facilitate research in epidemiology, genetics and virology. This information, in turn, could significantly aid clinicians in battling viral infections.
Collapse
Affiliation(s)
- Daniel Kiselev
- FSBI “Center of Strategic Planning” of the Ministry of Health, 119435 Moscow, Russia; (D.K.); (A.M.); (I.A.); (G.S.)
- I.M. Sechenov First Moscow State Medical University, 119146 Moscow, Russia
| | - Alina Matsvay
- FSBI “Center of Strategic Planning” of the Ministry of Health, 119435 Moscow, Russia; (D.K.); (A.M.); (I.A.); (G.S.)
- Moscow Institute of Physics and Technology, National Research University, 117303 Moscow, Russia
| | - Ivan Abramov
- FSBI “Center of Strategic Planning” of the Ministry of Health, 119435 Moscow, Russia; (D.K.); (A.M.); (I.A.); (G.S.)
| | - Vladimir Dedkov
- Pasteur Institute, Federal Service on Consumers’ Rights Protection and Human Well-Being Surveillance, 197101 Saint-Petersburg, Russia;
- Martsinovsky Institute of Medical Parasitology, Tropical and Vector Borne Diseases, Sechenov First Moscow State Medical University, 119146 Moscow, Russia
| | - German Shipulin
- FSBI “Center of Strategic Planning” of the Ministry of Health, 119435 Moscow, Russia; (D.K.); (A.M.); (I.A.); (G.S.)
| | - Kamil Khafizov
- FSBI “Center of Strategic Planning” of the Ministry of Health, 119435 Moscow, Russia; (D.K.); (A.M.); (I.A.); (G.S.)
- Moscow Institute of Physics and Technology, National Research University, 117303 Moscow, Russia
| |
Collapse
|
25
|
Hernandez-Lopez AA, Alberti C, Mattavelli M. Toward a Dynamic Threshold for Quality Score Distortion in Reference-Based Alignment. J Comput Biol 2020; 27:288-300. [PMID: 31891532 DOI: 10.1089/cmb.2019.0333] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The intrinsic high-entropy sequence metadata, known as quality scores, is largely the cause of the substantial size of sequence data files. Yet, there is no consensus on a viable reduction of the resolution of the quality score scale, arguably because of collateral side effects. In this article, we leverage on the penalty functions of HISAT2 aligner to rebin the quality score scale in such a way as to avoid any impact on sequence alignment, identifying alongside a distortion threshold for "safe" quality score representation. We tested our findings on whole-genome and RNA-seq data, and contrasted the results with three methods for lossy compression of the quality scores.
Collapse
Affiliation(s)
| | | | - Marco Mattavelli
- École Polytechnique Fédérale de Lausanne, EPFL, Lausanne, Switzerland
| |
Collapse
|
26
|
Pereira R, Oliveira J, Sousa M. Bioinformatics and Computational Tools for Next-Generation Sequencing Analysis in Clinical Genetics. J Clin Med 2020; 9:E132. [PMID: 31947757 PMCID: PMC7019349 DOI: 10.3390/jcm9010132] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 12/15/2019] [Accepted: 12/30/2019] [Indexed: 12/13/2022] Open
Abstract
Clinical genetics has an important role in the healthcare system to provide a definitive diagnosis for many rare syndromes. It also can have an influence over genetics prevention, disease prognosis and assisting the selection of the best options of care/treatment for patients. Next-generation sequencing (NGS) has transformed clinical genetics making possible to analyze hundreds of genes at an unprecedented speed and at a lower price when comparing to conventional Sanger sequencing. Despite the growing literature concerning NGS in a clinical setting, this review aims to fill the gap that exists among (bio)informaticians, molecular geneticists and clinicians, by presenting a general overview of the NGS technology and workflow. First, we will review the current NGS platforms, focusing on the two main platforms Illumina and Ion Torrent, and discussing the major strong points and weaknesses intrinsic to each platform. Next, the NGS analytical bioinformatic pipelines are dissected, giving some emphasis to the algorithms commonly used to generate process data and to analyze sequence variants. Finally, the main challenges around NGS bioinformatics are placed in perspective for future developments. Even with the huge achievements made in NGS technology and bioinformatics, further improvements in bioinformatic algorithms are still required to deal with complex and genetically heterogeneous disorders.
Collapse
Affiliation(s)
- Rute Pereira
- Laboratory of Cell Biology, Department of Microscopy, Institute of Biomedical Sciences Abel Salazar (ICBAS), University of Porto (UP), 4050-313 Porto, Portugal;
- Biology and Genetics of Reproduction Unit, Multidisciplinary Unit for Biomedical Research (UMIB), ICBAS-UP, 4050-313 Porto, Portugal;
| | - Jorge Oliveira
- Biology and Genetics of Reproduction Unit, Multidisciplinary Unit for Biomedical Research (UMIB), ICBAS-UP, 4050-313 Porto, Portugal;
- UnIGENe and CGPP–Centre for Predictive and Preventive Genetics-Institute for Molecular and Cell Biology (IBMC), i3S-Institute for Research and Innovation in Health-UP, 4200-135 Porto, Portugal
| | - Mário Sousa
- Laboratory of Cell Biology, Department of Microscopy, Institute of Biomedical Sciences Abel Salazar (ICBAS), University of Porto (UP), 4050-313 Porto, Portugal;
- Biology and Genetics of Reproduction Unit, Multidisciplinary Unit for Biomedical Research (UMIB), ICBAS-UP, 4050-313 Porto, Portugal;
| |
Collapse
|
27
|
Teissandier A, Servant N, Barillot E, Bourc'his D. Tools and best practices for retrotransposon analysis using high-throughput sequencing data. Mob DNA 2019; 10:52. [PMID: 31890048 PMCID: PMC6935493 DOI: 10.1186/s13100-019-0192-1] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 12/04/2019] [Indexed: 12/26/2022] Open
Abstract
Background Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy half of the mammalian genome mass. Sequenced reads coming from these regions introduce ambiguities in the mapping step. Therefore, applying dedicated parameters and algorithms has to be taken into consideration when transposable elements regulation is investigated with sequencing datasets. Results Here, we used simulated reads on the mouse and human genomes to define the best parameters for aligning transposable element-derived reads on a reference genome. The efficiency of the most commonly used aligners was compared and we further evaluated how transposable element representation should be estimated using available methods. The mappability of the different transposon families in the mouse and the human genomes was calculated giving an overview into their evolution. Conclusions Based on simulated data, we provided recommendations on the alignment and the quantification steps to be performed when transposon expression or regulation is studied, and identified the limits in detecting specific young transposon families of the mouse and human genomes. These principles may help the community to adopt standard procedures and raise awareness of the difficulties encountered in the study of transposable elements.
Collapse
Affiliation(s)
- Aurélie Teissandier
- 1Institut Curie, PSL Research University, 75005 Paris, France.,2INSERM U900, 75005 Paris, France.,3MINES ParisTech, PSL Research University, 75005 Paris, France.,4INSERM U934, CNRS UMR 3215, 75005 Paris, France
| | - Nicolas Servant
- 1Institut Curie, PSL Research University, 75005 Paris, France.,2INSERM U900, 75005 Paris, France.,3MINES ParisTech, PSL Research University, 75005 Paris, France
| | - Emmanuel Barillot
- 1Institut Curie, PSL Research University, 75005 Paris, France.,2INSERM U900, 75005 Paris, France.,3MINES ParisTech, PSL Research University, 75005 Paris, France
| | - Deborah Bourc'his
- 1Institut Curie, PSL Research University, 75005 Paris, France.,4INSERM U934, CNRS UMR 3215, 75005 Paris, France
| |
Collapse
|
28
|
Quinn TP, Erb I, Gloor G, Notredame C, Richardson MF, Crowley TM. A field guide for the compositional analysis of any-omics data. Gigascience 2019; 8:giz107. [PMID: 31544212 PMCID: PMC6755255 DOI: 10.1093/gigascience/giz107] [Citation(s) in RCA: 132] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 07/10/2019] [Accepted: 08/12/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Next-generation sequencing (NGS) has made it possible to determine the sequence and relative abundance of all nucleotides in a biological or environmental sample. A cornerstone of NGS is the quantification of RNA or DNA presence as counts. However, these counts are not counts per se: their magnitude is determined arbitrarily by the sequencing depth, not by the input material. Consequently, counts must undergo normalization prior to use. Conventional normalization methods require a set of assumptions: they assume that the majority of features are unchanged and that all environments under study have the same carrying capacity for nucleotide synthesis. These assumptions are often untestable and may not hold when heterogeneous samples are compared. RESULTS Methods developed within the field of compositional data analysis offer a general solution that is assumption-free and valid for all data. Herein, we synthesize the extant literature to provide a concise guide on how to apply compositional data analysis to NGS count data. CONCLUSIONS In highlighting the limitations of total library size, effective library size, and spike-in normalizations, we propose the log-ratio transformation as a general solution to answer the question, "Relative to some important activity of the cell, what is changing?"
Collapse
Affiliation(s)
- Thomas P Quinn
- Bioinformatics Core Research Group, Deakin University, 1 Gheringhap Street, Geelong Victoria 3220, Australia
- Centre for Molecular and Medical Research, Deakin University, 1 Gheringhap Street, Geelong Victoria 3220, Australia
| | - Ionas Erb
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Greg Gloor
- Department of Biochemistry, University of Western Ontario, 1151 Richmond Street, London ON N6A 3K7, Canada
| | - Cedric Notredame
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Mark F Richardson
- Bioinformatics Core Research Group, Deakin University, 1 Gheringhap Street, Geelong Victoria 3220, Australia
- Genomics Centre, School of Life and Environmental Sciences, Deakin University, 1 Gheringhap Street, Geelong Victoria 3220, Australia
- Centre for Integrative Ecology, School of Life and Environmental Sciences, Deakin University, 1 Gheringhap Street, Geelong Victoria 3220, Australia
| | - Tamsyn M Crowley
- Poultry Hub Australia, University of New England, Elm Avenue, Armidale New South Wales 2351, Australia
| |
Collapse
|
29
|
Suner A. Clustering methods for single-cell RNA-sequencing expression data: performance evaluation with varying sample sizes and cell compositions. Stat Appl Genet Mol Biol 2019; 18:/j/sagmb.2019.18.issue-5/sagmb-2019-0004/sagmb-2019-0004.xml. [PMID: 31646845 DOI: 10.1515/sagmb-2019-0004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
A number of specialized clustering methods have been developed so far for the accurate analysis of single-cell RNA-sequencing (scRNA-seq) expression data, and several reports have been published documenting the performance measures of these clustering methods under different conditions. However, to date, there are no available studies regarding the systematic evaluation of the performance measures of the clustering methods taking into consideration the sample size and cell composition of a given scRNA-seq dataset. Herein, a comprehensive performance evaluation study of 11 selected scRNA-seq clustering methods was performed using synthetic datasets with known sample sizes and number of subpopulations, as well as varying levels of transcriptome complexity. The results indicate that the overall performance of the clustering methods under study are highly dependent on the sample size and complexity of the scRNA-seq dataset. In most of the cases, better clustering performances were obtained as the number of cells in a given expression dataset was increased. The findings of this study also highlight the importance of sample size for the successful detection of rare cell subpopulations with an appropriate clustering tool.
Collapse
Affiliation(s)
- Aslı Suner
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Ege University, Bornova, İzmir, Turkey
| |
Collapse
|
30
|
Senol Cali D, Kim JS, Ghose S, Alkan C, Mutlu O. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions. Brief Bioinform 2019; 20:1542-1559. [PMID: 29617724 PMCID: PMC6781587 DOI: 10.1093/bib/bby017] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 02/06/2018] [Indexed: 02/06/2023] Open
Abstract
Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.
Collapse
Affiliation(s)
- Damla Senol Cali
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Jeremie S Kim
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Computer Science, Systems Group, ETH Zürich, Zürich, Switzerland
| | - Saugata Ghose
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Bilkent, Ankara, Turkey
| | - Onur Mutlu
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Computer Science, Systems Group, ETH Zürich, Zürich, Switzerland
| |
Collapse
|
31
|
Mangul S, Mosqueiro T, Abdill RJ, Duong D, Mitchell K, Sarwal V, Hill B, Brito J, Littman RJ, Statz B, Lam AKM, Dayama G, Grieneisen L, Martin LS, Flint J, Eskin E, Blekhman R. Challenges and recommendations to improve the installability and archival stability of omics computational tools. PLoS Biol 2019; 17:e3000333. [PMID: 31220077 PMCID: PMC6605654 DOI: 10.1371/journal.pbio.3000333] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 07/02/2019] [Indexed: 01/07/2023] Open
Abstract
Developing new software tools for analysis of large-scale biological data is a key component of advancing modern biomedical research. Scientific reproduction of published findings requires running computational tools on data generated by such studies, yet little attention is presently allocated to the installability and archival stability of computational software tools. Scientific journals require data and code sharing, but none currently require authors to guarantee the continuing functionality of newly published tools. We have estimated the archival stability of computational biology software tools by performing an empirical analysis of the internet presence for 36,702 omics software resources published from 2005 to 2017. We found that almost 28% of all resources are currently not accessible through uniform resource locators (URLs) published in the paper they first appeared in. Among the 98 software tools selected for our installability test, 51% were deemed "easy to install," and 28% of the tools failed to be installed at all because of problems in the implementation. Moreover, for papers introducing new software, we found that the number of citations significantly increased when authors provided an easy installation process. We propose for incorporation into journal policy several practical solutions for increasing the widespread installability and archival stability of published bioinformatics software.
Collapse
Affiliation(s)
- Serghei Mangul
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, California, United States of America
| | - Thiago Mosqueiro
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, California, United States of America
| | - Richard J. Abdill
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Dat Duong
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Keith Mitchell
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Varuni Sarwal
- Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Brian Hill
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Jaqueline Brito
- Institute of Mathematics and Computer Science, University of São Paulo, São Paulo, Brazil
| | - Russell Jared Littman
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Benjamin Statz
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Angela Ka-Mei Lam
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Gargi Dayama
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Laura Grieneisen
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Lana S. Martin
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, California, United States of America
| | - Jonathan Flint
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, United States of America
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Ran Blekhman
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Department of Ecology, Evolution, and Behavior, University of Minnesota, Minnesota, United States of America
| |
Collapse
|
32
|
Deorowicz S, Debudaj-Grabysz A, Gudyś A, Grabowski S. Whisper: read sorting allows robust mapping of DNA sequencing data. Bioinformatics 2019; 35:2043-2050. [PMID: 30407485 DOI: 10.1093/bioinformatics/bty927] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Revised: 10/16/2018] [Accepted: 11/06/2018] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Mapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. The reduction of sequencing costs implies a need for algorithms able to process increasing amounts of generated data in reasonable time. RESULTS We present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements. Whisper excels at large NGS read collections, in particular Illumina reads with typical WGS coverage. The experiments with real data indicate that our solution works in about 15% of the time needed by the well-known BWA-MEM and Bowtie2 tools at a comparable accuracy, validated in a variant calling pipeline. AVAILABILITY AND IMPLEMENTATION Whisper is available for free from https://github.com/refresh-bio/Whisper or http://sun.aei.polsl.pl/REFRESH/Whisper/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sebastian Deorowicz
- Institute of Informatics, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice, PL, Poland
| | - Agnieszka Debudaj-Grabysz
- Institute of Informatics, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice, PL, Poland
| | - Adam Gudyś
- Institute of Informatics, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice, PL, Poland
| | - Szymon Grabowski
- Institute of Applied Computer Science, Faculty of Electrical, Electronic, Computer and Control Engineering, Lodz University of Technology, Stefanowskiego 18/22, Łódź, PL, Poland
| |
Collapse
|
33
|
Singer J, Irmisch A, Ruscheweyh HJ, Singer F, Toussaint NC, Levesque MP, Stekhoven DJ, Beerenwinkel N. Bioinformatics for precision oncology. Brief Bioinform 2019; 20:778-788. [PMID: 29272324 PMCID: PMC6585151 DOI: 10.1093/bib/bbx143] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 09/29/2017] [Indexed: 12/13/2022] Open
Abstract
Molecular profiling of tumor biopsies plays an increasingly important role not only in cancer research, but also in the clinical management of cancer patients. Multi-omics approaches hold the promise of improving diagnostics, prognostics and personalized treatment. To deliver on this promise of precision oncology, appropriate bioinformatics methods for managing, integrating and analyzing large and complex data are necessary. Here, we discuss the specific requirements of bioinformatics methods and software that arise in the setting of clinical oncology, owing to a stricter regulatory environment and the need for rapid, highly reproducible and robust procedures. We describe the workflow of a molecular tumor board and the specific bioinformatics support that it requires, from the primary analysis of raw molecular profiling data to the automatic generation of a clinical report and its delivery to decision-making clinical oncologists. Such workflows have to various degrees been implemented in many clinical trials, as well as in molecular tumor boards at specialized cancer centers and university hospitals worldwide. We review these and more recent efforts to include other high-dimensional multi-omics patient profiles into the tumor board, as well as the state of clinical decision support software to translate molecular findings into treatment recommendations.
Collapse
Affiliation(s)
- Jochen Singer
- Department of Biosystems Science and Engineering of ETH Zurich in Basel, Switzerland
| | - Anja Irmisch
- Department of Dermatology at the University of Zurich Hospital in Zurich, Switzerland
| | | | | | | | | | | | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering of ETH Zurich in Basel, Switzerland
| |
Collapse
|
34
|
Gao Y, Zhang J, Zhao F. Circular RNA identification based on multiple seed matching. Brief Bioinform 2019; 19:803-810. [PMID: 28334140 DOI: 10.1093/bib/bbx014] [Citation(s) in RCA: 397] [Impact Index Per Article: 79.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Indexed: 11/13/2022] Open
Abstract
Computational detection methods have been widely used in studies on the biogenesis and the function of circular RNAs (circRNAs). However, all of the existing tools showed disadvantages on certain aspects of circRNA detection. Here, we propose an improved multithreading detection tool, CIRI2, which used an adapted maximum likelihood estimation based on multiple seed matching to identify back-spliced junction reads and to filter false positives derived from repetitive sequences and mapping errors. We established objective assessment criteria based on real data from RNase R-treated samples and systematically compared 10 circular detection tools, which demonstrated that CIRI2 outperformed its previous version CIRI and all other widely used tools, featured with remarkably balanced sensitivity, reliability, duration and RAM usage.
Collapse
Affiliation(s)
- Yuan Gao
- University of Chinese Academy of Sciences
| | | | - Fangqing Zhao
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
35
|
Vukovic K, Gadaleta D, Benfenati E. Methodology of aiQSAR: a group-specific approach to QSAR modelling. J Cheminform 2019; 11:27. [PMID: 30945010 PMCID: PMC6446381 DOI: 10.1186/s13321-019-0350-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 03/25/2019] [Indexed: 12/26/2022] Open
Abstract
Background Several QSAR methodology developments have shown promise in recent years. These include the consensus approach to generate the final prediction of a model, utilizing new, advanced machine learning algorithms and streamlining, standardization and automation of various QSAR steps. One approach that seems under-explored is at-the-runtime generation of local models specific to individual compounds. This approach was quite likely limited by the computational requirements, but with current increases in processing power and the widespread availability of cluster-computing infrastructure, this limitation is no longer that severe. Results We propose a new QSAR methodology: aiQSAR, whose aim is to generate endpoint predictions directly from the input dataset by building an array of local models generated at-the-runtime and specific for each compound in the dataset. The local group of each compound is selected on the basis of fingerprint similarities and the final prediction is calculated by integrating the results of a number of autonomous mathematical models. The method is applicable to regression, binary classification and multi-class classification and was tested on one dataset for each endpoint type: bioconcentration factor (BCF) for regression, Ames test for binary classification and Environmental Protection Agency (EPA) acute rat oral toxicity ranking for multi-class classification. As part of this method, the applicability domain of each prediction is assessed through the applicability domain measure, calculated on the basis of the fingerprint similarities in each local group of compounds. Conclusions We outline the methodology for a new QSAR-based predictive tool whose advantages are automation, group-specific approach to modelling and simplicity of execution. Our aim now will be to develop this method into a stand-alone software tool. We hope that eventual adoption of our tool would make QSAR modelling more accessible and transparent. Our methodology could be used as an initial modelling step, to predict new compounds by simply loading the training dataset as an input. Predictions could then be further evaluated and refined either by other tools or through optimization of aiQSAR parameters. Electronic supplementary material The online version of this article (10.1186/s13321-019-0350-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kristijan Vukovic
- Istituto di Ricerche Farmacologiche Mario Negri-IRCCS, Via Mario Negri 2, 20156, Milan, Italy. .,Jozef Stefan International Postgraduate School, Jamova cesta 39, 1000, Ljubljana, Slovenia.
| | - Domenico Gadaleta
- Istituto di Ricerche Farmacologiche Mario Negri-IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche Mario Negri-IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| |
Collapse
|
36
|
Hwang KB, Lee IH, Li H, Won DG, Hernandez-Ferrer C, Negron JA, Kong SW. Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings. Sci Rep 2019; 9:3219. [PMID: 30824715 PMCID: PMC6397176 DOI: 10.1038/s41598-019-39108-2] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Accepted: 01/16/2019] [Indexed: 12/30/2022] Open
Abstract
Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipelines comprising the combination of 7 short-read aligners and 10 variant calling algorithms (VCAs), and observed remarkable differences in the number of variants called by different pipelines (max/min ratio: 1.3~3.4). The similarity between variant call sets was more closely determined by VCAs rather than by short-read aligners. Remarkably, reported minor allele frequency had a substantial effect on concordance between pipelines (concordance rate ratio: 0.11~0.92; Wald tests, P < 0.001), entailing more discordant results for rare and novel variants. We compared the performance of analytic pipelines and pipeline ensembles using gold-standard variant call sets and the catalog of variants from the 1000 Genomes Project. Notably, a single pipeline using BWA-MEM and GATK-HaplotypeCaller performed comparable to the pipeline ensembles for ‘callable’ regions (~97%) of the human reference genome. While a single pipeline is capable of analyzing common variants in most genomic regions, our findings demonstrated the limitations and challenges in analyzing rare or novel variants, especially for non-European genomes.
Collapse
Affiliation(s)
- Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Korea
| | - In-Hee Lee
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA
| | - Honglan Li
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Korea
| | - Dhong-Geon Won
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Korea
| | - Carles Hernandez-Ferrer
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA
| | - Jose Alberto Negron
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA
| | - Sek Won Kong
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA. .,Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
37
|
Circular RNA Profiling by Illumina Sequencing via Template-Dependent Multiple Displacement Amplification. BIOMED RESEARCH INTERNATIONAL 2019; 2019:2756516. [PMID: 30834258 PMCID: PMC6369502 DOI: 10.1155/2019/2756516] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 12/10/2018] [Accepted: 12/31/2018] [Indexed: 12/12/2022]
Abstract
Circular RNAs (circRNAs) are newly discovered incipient non-coding RNAs with potential roles in disease progression in living organisms. Significant reports, since their inception, highlight the abundance and putative functional roles of circRNAs in every organism checked for, like O. sativa, Arabidopsis, human, and mouse. CircRNA expression is generally less than their linear mRNA counterparts which fairly explains the competitive edge of canonical splicing over non-canonical splicing. However, existing methods may not be sensitive enough for the discovery of low-level expressed circRNAs. By combining template-dependent multiple displacement amplification (tdMDA), Illumina sequencing, and bioinformatics tools, we have developed an experimental protocol that is able to detect 1,875 novel and known circRNAs from O. sativa. The same method also revealed 9,242 putative circRNAs in less than 40 million reads for the first time from the Nicotiana benthamiana whose genome has not been fully annotated. Supported by the PCR-based validation and Sanger sequencing of selective circRNAs, our method represents a valuable tool in profiling circRNAs from the organisms with or without genome annotation.
Collapse
|
38
|
Wang M, Kong L. pblat: a multithread blat algorithm speeding up aligning sequences to genomes. BMC Bioinformatics 2019; 20:28. [PMID: 30646844 PMCID: PMC6334396 DOI: 10.1186/s12859-019-2597-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 01/03/2019] [Indexed: 11/17/2022] Open
Abstract
Background The blat is a widely used sequence alignment tool. It is especially useful for aligning long sequences and gapped mapping, which cannot be performed properly by other fast sequence mappers designed for short reads. However, the blat tool is single threaded and when used to map whole genome or whole transcriptome sequences to reference genomes this program can take days to finish, making it unsuitable for large scale sequencing projects and iterative analysis. Here, we present pblat (parallel blat), a parallelized blat algorithm with multithread and cluster computing support, which functions to rapidly fine map large scale DNA/RNA sequences against genomes. Results The pblat algorithm takes advantage of modern multicore processors and significantly reduces the run time with the number of threads used. pblat utilizes almost equal amount of memory as when running blat. The results generated by pblat are identical with those generated by blat. The pblat tool is easy to install and can run on Linux and Mac OS systems. In addition, we provide a cluster version of pblat (pblat-cluster) running on computing clusters with MPI support. Conclusion pblat is open source and free available for non-commercial users. It is easy to install and easy to use. pblat and pblat-cluster would facilitate the high-throughput mapping of large scale genomic and transcript sequences to reference genomes with both high speed and high precision.
Collapse
Affiliation(s)
- Meng Wang
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, 100871, People's Republic of China
| | - Lei Kong
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, 100871, People's Republic of China.
| |
Collapse
|
39
|
Sun S, Murray SS. Bioinformatics Basics for High-Throughput Hybridization-Based Targeted DNA Sequencing from FFPE-Derived Tumor Specimens: From Reads to Variants. Methods Mol Biol 2019; 1908:37-48. [PMID: 30649719 DOI: 10.1007/978-1-4939-9004-7_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The use of next-generation sequencing and hybridization-based capture for target enrichment have enabled the interrogation of coding regions of several clinically significant cancer genes in tumor specimens using both targeted panels of a few to hundreds of genes, to whole-exome panels encompassing coding regions of all genes in the genome. Next-generation sequencing (NGS) technologies produce millions of relatively short segments of sequences or reads that require bioinformatics tools to map reads back to a reference genome using various read alignment tools, as well as to determine differences between single bases (single nucleotide variants or SNVs) or multiple bases (insertions and deletions or indels) between the aligned reads and the reference genome to call variants. In addition to single nucleotide changes or small insertions and deletions, high copy gains and losses can also be gleaned from NGS data to call gene amplifications and deletions. Throughout these processes, numerous quality control metrics can be assessed at each step to ensure that the resulting called variants are of high quality and are accurate. In this chapter we review common tools used to generate reads from Illumina-derived sequence data, align reads, and call variants from hybridization-based targeted NGS panel data generated from tumor FFPE-derived DNA specimens as well as basic quality metrics to assess for each assayed specimen.
Collapse
Affiliation(s)
- Shulei Sun
- Center for Advanced Laboratory Medicine, University of California San Diego Health, La Jolla, CA, USA
| | - Sarah S Murray
- Center for Advanced Laboratory Medicine, University of California San Diego Health, La Jolla, CA, USA.
- Department of Pathology, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
40
|
Kruppa J, Jo WK, van der Vries E, Ludlow M, Osterhaus A, Baumgaertner W, Jung K. Virus detection in high-throughput sequencing data without a reference genome of the host. INFECTION GENETICS AND EVOLUTION 2018; 66:180-187. [DOI: 10.1016/j.meegid.2018.09.026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 09/25/2018] [Accepted: 09/27/2018] [Indexed: 01/19/2023]
|
41
|
Jin Y, Zhang L, Ning B, Hong H, Xiao W, Tong W, Tao Y, Ni X, Shi T, Guo Y. Application of genome analysis strategies in the clinical testing for pediatric diseases. Pediatr Investig 2018; 2:72-81. [PMID: 30112248 PMCID: PMC6089540 DOI: 10.1002/ped4.12044] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Next‐generation sequencing (NGS) is being used in clinical testing. Government authorities in both China and the United States are overseeing the clinical application of NGS instruments and reagents. In addition, the US Association for Molecular Pathology and the College of American Pathologists have jointly released a guidance to standardize the analysis and interpretation of NGS data involved in clinical testing. At present, the analysis strategies and pipelines for NGS data related to the clinical detection of pediatric disease are similar to those used for adult diseases. However, for rare pediatric diseases without linkage to known genetic variants, it is currently difficult to detect the relevant pathogenic genes using NGS technology. Additionally, it is challenging to identify novel pathogenic genes of familial pediatric tumors. Therefore, characterization of the pathogenic genes associated with above diseases is important for the diagnosis and treatment of rare diseases in children. This article introduces the general pipelines for NGS data analyses of diseases and elucidates data analysis strategies for the pathogenic genes of rare pediatric diseases and familial pediatric tumors.
Collapse
Affiliation(s)
- Yaqiong Jin
- Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, MOE Key Laboratory of Major Diseases in Children, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Li Zhang
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Baitang Ning
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Wenming Xiao
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Yiran Tao
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Xin Ni
- Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, MOE Key Laboratory of Major Diseases in Children, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Yongli Guo
- Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, MOE Key Laboratory of Major Diseases in Children, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| |
Collapse
|
42
|
Abstract
Single-cell RNA sequencing (scRNA-seq) is currently transforming our understanding of biology, as it is a powerful tool to resolve cellular heterogeneity and molecular networks. Over 50 protocols have been developed in recent years and also data processing and analyzes tools are evolving fast. Here, we review the basic principles underlying the different experimental protocols and how to benchmark them. We also review and compare the essential methods to process scRNA-seq data from mapping, filtering, normalization and batch corrections to basic differential expression analysis. We hope that this helps to choose appropriate experimental and computational methods for the research question at hand.
Collapse
Affiliation(s)
- Christoph Ziegenhain
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Beate Vieth
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Swati Parekh
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Ines Hellmann
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Wolfgang Enard
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| |
Collapse
|
43
|
Noninvasive Prenatal Testing: Comparison of Two Mappers and Influence in the Diagnostic Yield. BIOMED RESEARCH INTERNATIONAL 2018; 2018:9498140. [PMID: 29977923 PMCID: PMC6011118 DOI: 10.1155/2018/9498140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 04/16/2018] [Accepted: 05/07/2018] [Indexed: 11/18/2022]
Abstract
Objective The aim of this study was to determine if the use of different mappers for NIPT may vary the results considerably. Methods Peripheral blood was collected from 217 pregnant women, 58 pathological (34 pregnancies with trisomy 21, 18 with trisomy 18, and 6 with trisomy 13) and 159 euploid. MPS was performed following a manufacturer's modified protocol of semiconductor sequencing. Obtained reads were mapped with two different software programs: TMAP and HPG-Aligner, comparing the results. Results Using TMAP, 57 pathological samples were correctly detected (sensitivity 98.28%, specificity 93.08%): 33 samples as trisomy 21 (sensitivity 97.06%, specificity 99.45%), 16 as trisomy 18 (sensibility 88.89%, specificity 93.97%), and 6 as trisomy 13 (sensibility 100%, specificity 100%). 11 false positives, 1 false negative, and 2 samples incorrectly identified were obtained. Using HPG-Aligner, all the 58 pathological samples were correctly identified (sensibility 100%, specificity 96.86%): 34 as trisomy 21 (sensibility 100%, specificity 98.91%), 18 as trisomy 18 (sensibility 100%, specificity 98.99%), and 6 as trisomy 13 (sensibility 100%, specificity 99.53%). 5 false positives were obtained. Conclusion Different mappers use slightly different algorithms, so the use of one mapper or another with the same batch file can provide different results.
Collapse
|
44
|
Taron UH, Lell M, Barlow A, Paijmans JLA. Testing of Alignment Parameters for Ancient Samples: Evaluating and Optimizing Mapping Parameters for Ancient Samples Using the TAPAS Tool. Genes (Basel) 2018. [PMID: 29533977 PMCID: PMC5867878 DOI: 10.3390/genes9030157] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
High-throughput sequence data retrieved from ancient or other degraded samples has led to unprecedented insights into the evolutionary history of many species, but the analysis of such sequences also poses specific computational challenges. The most commonly used approach involves mapping sequence reads to a reference genome. However, this process becomes increasingly challenging with an elevated genetic distance between target and reference or with the presence of contaminant sequences with high sequence similarity to the target species. The evaluation and testing of mapping efficiency and stringency are thus paramount for the reliable identification and analysis of ancient sequences. In this paper, we present 'TAPAS', (Testing of Alignment Parameters for Ancient Samples), a computational tool that enables the systematic testing of mapping tools for ancient data by simulating sequence data reflecting the properties of an ancient dataset and performing test runs using the mapping software and parameter settings of interest. We showcase TAPAS by using it to assess and improve mapping strategy for a degraded sample from a banded linsang (Prionodon linsang), for which no closely related reference is currently available. This enables a 1.8-fold increase of the number of mapped reads without sacrificing mapping specificity. The increase of mapped reads effectively reduces the need for additional sequencing, thus making more economical use of time, resources, and sample material.
Collapse
Affiliation(s)
- Ulrike H Taron
- Institute for Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany.
| | - Moritz Lell
- Institute for Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany.
| | | | | |
Collapse
|
45
|
Keel BN, Snelling WM. Comparison of Burrows-Wheeler Transform-Based Mapping Algorithms Used in High-Throughput Whole-Genome Sequencing: Application to Illumina Data for Livestock Genomes. Front Genet 2018. [PMID: 29535759 PMCID: PMC5834436 DOI: 10.3389/fgene.2018.00035] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Ongoing developments and cost decreases in next-generation sequencing (NGS) technologies have led to an increase in their application, which has greatly enhanced the fields of genetics and genomics. Mapping sequence reads onto a reference genome is a fundamental step in the analysis of NGS data. Efficient alignment of the reads onto the reference genome with high accuracy is very important because it determines the global quality of downstream analyses. In this study, we evaluate the performance of three Burrows-Wheeler transform-based mappers, BWA, Bowtie2, and HISAT2, in the context of paired-end Illumina whole-genome sequencing of livestock, using simulated sequence data sets with varying sequence read lengths, insert sizes, and levels of genomic coverage, as well as five real data sets. The mappers were evaluated based on two criteria, computational resource/time requirements and robustness of mapping. Our results show that BWA and Bowtie2 tend to be more robust than HISAT2, while HISAT2 was significantly faster and used less memory than both BWA and Bowtie2. We conclude that there is not a single mapper that is ideal in all scenarios but rather the choice of alignment tool should be driven by the application and sequencing technology.
Collapse
Affiliation(s)
- Brittney N Keel
- USDA, Agricultural Research Service, U.S. Meat Animal Research Center, Clay Center, NE, United States
| | - Warren M Snelling
- USDA, Agricultural Research Service, U.S. Meat Animal Research Center, Clay Center, NE, United States
| |
Collapse
|
46
|
Fowler EK, Mohorianu I, Smith DT, Dalmay T, Chapman T. Small RNA populations revealed by blocking rRNA fragments in Drosophila melanogaster reproductive tissues. PLoS One 2018; 13:e0191966. [PMID: 29474379 PMCID: PMC5825024 DOI: 10.1371/journal.pone.0191966] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 01/15/2018] [Indexed: 12/31/2022] Open
Abstract
RNA interference (RNAi) is a complex and highly conserved regulatory mechanism mediated via small RNAs (sRNAs). Recent technical advances in high throughput sequencing have enabled an increasingly detailed analysis of sRNA abundances and profiles in specific body parts and tissues. This enables investigations of the localized roles of microRNAs (miRNAs) and small interfering RNAs (siRNAs). However, variation in the proportions of non-coding RNAs in the samples being compared can hinder these analyses. Specific tissues may vary significantly in the proportions of fragments of longer non-coding RNAs (such as ribosomal RNA or transfer RNA) present, potentially reflecting tissue-specific differences in biological functions. For example, in Drosophila, some tissues contain a highly abundant 30nt rRNA fragment (the 2S rRNA) as well as abundant 5’ and 3’ terminal rRNA fragments. These can pose difficulties for the construction of sRNA libraries as they can swamp the sequencing space and obscure sRNA abundances. Here we addressed this problem and present a modified “rRNA blocking” protocol for the construction of high-definition (HD) adapter sRNA libraries, in D. melanogaster reproductive tissues. The results showed that 2S rRNAs targeted by blocking oligos were reduced from >80% to < 0.01% total reads. In addition, the use of multiple rRNA blocking oligos to bind the most abundant rRNA fragments allowed us to reveal the underlying sRNA populations at increased resolution. Side-by-side comparisons of sequencing libraries of blocked and non-blocked samples revealed that rRNA blocking did not change the miRNA populations present, but instead enhanced their abundances. We suggest that this rRNA blocking procedure offers the potential to improve the in-depth analysis of differentially expressed sRNAs within and across different tissues.
Collapse
Affiliation(s)
- Emily K. Fowler
- School of Biological Sciences, University of East Anglia, Norwich Research Park, United Kingdom
| | - Irina Mohorianu
- School of Biological Sciences, University of East Anglia, Norwich Research Park, United Kingdom
- School of Computing Sciences, University of East Anglia, Norwich Research Park, United Kingdom
| | - Damian T. Smith
- School of Biological Sciences, University of East Anglia, Norwich Research Park, United Kingdom
| | - Tamas Dalmay
- School of Biological Sciences, University of East Anglia, Norwich Research Park, United Kingdom
| | - Tracey Chapman
- School of Biological Sciences, University of East Anglia, Norwich Research Park, United Kingdom
- * E-mail:
| |
Collapse
|
47
|
Naumenko FM, Abnizova II, Beka N, Genaev MA, Orlov YL. Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome. BMC Genomics 2018; 19:92. [PMID: 29504893 PMCID: PMC5836841 DOI: 10.1186/s12864-018-4475-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. Results We investigated whether a single chromosome mapping causes any artefacts in the alignments’ performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners’ performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. Conclusions The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.
Collapse
Affiliation(s)
- Fedor M Naumenko
- Novosibirsk State University, Pirogova, 1, Novosibirsk, 630090, Russia.
| | - Irina I Abnizova
- Wellcome Trust Sanger Institute, Cambridge, UK.,Babraham Institute, Cambridge, UK
| | - Nathan Beka
- University of Hertfordshire, Hertfordshire, UK
| | | | - Yuriy L Orlov
- Novosibirsk State University, Pirogova, 1, Novosibirsk, 630090, Russia. .,Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia. .,Institute of Marine Biology Researches of RAS, Sevastopol, Russia.
| |
Collapse
|
48
|
Carriço JA, Rossi M, Moran-Gilad J, Van Domselaar G, Ramirez M. A primer on microbial bioinformatics for nonbioinformaticians. Clin Microbiol Infect 2018; 24:342-349. [PMID: 29309933 DOI: 10.1016/j.cmi.2017.12.015] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 11/13/2017] [Accepted: 12/22/2017] [Indexed: 01/19/2023]
Abstract
BACKGROUND Presently, the bottleneck in the deployment of high-throughput sequencing technology is the ability to analyse the increasing amount of data produced in a fit-for-purpose manner. The field of microbial bioinformatics is thriving and quickly adapting to technological changes, which creates difficulties for nonbioinformaticians in following the complexity and increasingly obscure jargon of this field. AIMS This review is directed towards nonbioinformaticians who wish to gain understanding of the overall microbial bioinformatic processes, from raw data obtained from sequencers to final outputs. SOURCES The software and analytical strategies reviewed are based on the personal experience of the authors. CONTENT The bioinformatic processes of transforming raw reads to actionable information in a clinical and epidemiologic context is explained. We review the advantages and limitations of two major strategies currently applied: read mapping, which is the comparison with a predefined reference genome, and de novo assembly, which is the unguided assembly of the raw data. Finally, we discuss the main analytical methodologies and the most frequently used freely available software and its application in the context of bacterial infectious disease management. IMPLICATIONS High-throughput sequencing technologies are overhauling outbreak investigation and epidemiologic surveillance while creating new challenges due to the amount and complexity of data generated. The continuously evolving field of microbial bioinformatics is required for stakeholders to fully harness the power of these new technologies.
Collapse
Affiliation(s)
- J A Carriço
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal.
| | - M Rossi
- Department of Food Hygiene and Environmental Health, Faculty of Veterinary Medicine, University of Helsinki, Helsinki, Finland
| | - J Moran-Gilad
- Department of Health Systems Management, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel; Public Health Services, Ministry of Health, Jerusalem, Israel; ESCMID Study Group for Genomic and Molecular Diagnostics (ESGMD), Basel, Switzerland
| | - G Van Domselaar
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington St, Winnipeg, MB, R3E 3R2, Canada; Department of Medical Microbiology and Infectious Diseases, University of Manitoba, 745 Bannatyne Avenue, Winnipeg, MB, R3E 0J9, Canada
| | - M Ramirez
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
49
|
A review of bioinformatic methods for forensic DNA analyses. Forensic Sci Int Genet 2017; 33:117-128. [PMID: 29247928 DOI: 10.1016/j.fsigen.2017.12.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2017] [Revised: 11/30/2017] [Accepted: 12/10/2017] [Indexed: 12/20/2022]
Abstract
Short tandem repeats, single nucleotide polymorphisms, and whole mitochondrial analyses are three classes of markers which will play an important role in the future of forensic DNA typing. The arrival of massively parallel sequencing platforms in forensic science reveals new information such as insights into the complexity and variability of the markers that were previously unseen, along with amounts of data too immense for analyses by manual means. Along with the sequencing chemistries employed, bioinformatic methods are required to process and interpret this new and extensive data. As more is learnt about the use of these new technologies for forensic applications, development and standardization of efficient, favourable tools for each stage of data processing is being carried out, and faster, more accurate methods that improve on the original approaches have been developed. As forensic laboratories search for the optimal pipeline of tools, sequencer manufacturers have incorporated pipelines into sequencer software to make analyses convenient. This review explores the current state of bioinformatic methods and tools used for the analyses of forensic markers sequenced on the massively parallel sequencing (MPS) platforms currently most widely used.
Collapse
|
50
|
Bradley D, Xu P, Mohorianu II, Whibley A, Field D, Tavares H, Couchman M, Copsey L, Carpenter R, Li M, Li Q, Xue Y, Dalmay T, Coen E. Evolution of flower color pattern through selection on regulatory small RNAs. Science 2017; 358:925-928. [DOI: 10.1126/science.aao3526] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 10/06/2017] [Indexed: 12/19/2022]
Abstract
Small RNAs (sRNAs) regulate genes in plants and animals. Here, we show that population-wide differences in color patterns in snapdragon flowers are caused by an inverted duplication that generates sRNAs. The complexity and size of the transcripts indicate that the duplication represents an intermediate on the pathway to microRNA evolution. The sRNAs repress a pigment biosynthesis gene, creating a yellow highlight at the site of pollinator entry. The inverted duplication exhibits steep clines in allele frequency in a natural hybrid zone, showing that the allele is under selection. Thus, regulatory interactions of evolutionarily recent sRNAs can be acted upon by selection and contribute to the evolution of phenotypic diversity.
Collapse
Affiliation(s)
- Desmond Bradley
- Department of Cell and Developmental Biology, John Innes Centre, Colney Lane, Norwich NR4 7UH, UK
| | - Ping Xu
- School of Biological Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Irina-Ioana Mohorianu
- School of Biological Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
- School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
| | - Annabel Whibley
- Department of Cell and Developmental Biology, John Innes Centre, Colney Lane, Norwich NR4 7UH, UK
| | - David Field
- Department of Botany and Biodiversity Research, University of Vienna, Faculty of Life Sciences, Rennweg 14, A-1030 Vienna, Austria
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Hugo Tavares
- Department of Cell and Developmental Biology, John Innes Centre, Colney Lane, Norwich NR4 7UH, UK
| | - Matthew Couchman
- Department of Cell and Developmental Biology, John Innes Centre, Colney Lane, Norwich NR4 7UH, UK
| | - Lucy Copsey
- Department of Cell and Developmental Biology, John Innes Centre, Colney Lane, Norwich NR4 7UH, UK
| | - Rosemary Carpenter
- Department of Cell and Developmental Biology, John Innes Centre, Colney Lane, Norwich NR4 7UH, UK
| | - Miaomiao Li
- State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and National Center for Plant Gene Research, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100190, China
| | - Qun Li
- State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and National Center for Plant Gene Research, Beijing 100101, China
| | - Yongbiao Xue
- State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and National Center for Plant Gene Research, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100190, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tamas Dalmay
- School of Biological Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Enrico Coen
- Department of Cell and Developmental Biology, John Innes Centre, Colney Lane, Norwich NR4 7UH, UK
| |
Collapse
|