1
|
Su Q, Long Y, Gou D, Quan J, Lian Q. Enhancing RNA-seq bias mitigation with the Gaussian self-benchmarking framework: towards unbiased sequencing data. BMC Genomics 2024; 25:904. [PMID: 39350040 PMCID: PMC11441123 DOI: 10.1186/s12864-024-10814-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 09/19/2024] [Indexed: 10/04/2024] Open
Abstract
BACKGROUND RNA sequencing is a vital technique for analyzing RNA behavior in cells, but it often suffers from various biases that distort the data. Traditional methods to address these biases are typically empirical and handle them individually, limiting their effectiveness. Our study introduces the Gaussian Self-Benchmarking (GSB) framework, a novel approach that leverages the natural distribution patterns of guanine (G) and cytosine (C) content in RNA to mitigate multiple biases simultaneously. This method is grounded in a theoretical model, organizing k-mers based on their GC content and applying a Gaussian model for alignment to ensure empirical sequencing data closely match their theoretical distribution. RESULTS The GSB framework demonstrated superior performance in mitigating sequencing biases compared to existing methods. Testing with synthetic RNA constructs and real human samples showed that the GSB approach not only addresses individual biases more effectively but also manages co-existing biases jointly. The framework's reliance on accurately pre-determined parameters like mean and standard deviation of GC content distribution allows for a more precise representation of RNA samples. This results in improved accuracy and reliability of RNA sequencing data, enhancing our understanding of RNA behavior in health and disease. CONCLUSIONS The GSB framework presents a significant advancement in RNA sequencing analysis by providing a well-validated, multi-bias mitigation strategy. It functions independently from previously identified dataset flaws and sets a new standard for unbiased RNA sequencing results. This development enhances the reliability of RNA studies, broadening the potential for scientific breakthroughs in medicine and biology, particularly in genetic disease research and the development of targeted treatments.
Collapse
Affiliation(s)
- Qiang Su
- Faculty of Synthetic Biology, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Shenzhen University of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
- State Key Laboratory of Chemical Oncogenomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, China.
| | - Yi Long
- Institute of Chemical Biology, Shenzhen Bay Laboratory, Shenzhen, China
| | - Deming Gou
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China
| | - Junmin Quan
- State Key Laboratory of Chemical Oncogenomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, China.
| | - Qizhou Lian
- Faculty of Synthetic Biology, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Shenzhen University of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
- Cord Blood Bank, Guangzhou Institute of Eugenics and Perinatology, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China.
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Medicine, The University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
2
|
Su Q, Long Y, Gou D, Quan J, Lian Q. Enhancing RNA-seq analysis by addressing all co-existing biases using a self-benchmarking approach with 2D structural insights. Brief Bioinform 2024; 25:bbae532. [PMID: 39428128 PMCID: PMC11491153 DOI: 10.1093/bib/bbae532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 07/29/2024] [Accepted: 10/08/2024] [Indexed: 10/22/2024] Open
Abstract
We introduce a groundbreaking approach: the minimum free energy-based Gaussian Self-Benchmarking (MFE-GSB) framework, designed to combat the myriad of biases inherent in RNA-seq data. Central to our methodology is the MFE concept, facilitating the adoption of a Gaussian distribution model tailored to effectively mitigate all co-existing biases within a k-mer counting scheme. The MFE-GSB framework operates on a sophisticated dual-model system, juxtaposing modeling data of uniform k-mer distribution against the real, observed sequencing data characterized by nonuniform k-mer distributions. The framework applies a Gaussian function, guided by the predetermined parameters-mean and SD-derived from modeling data, to fit unknown sequencing data. This dual comparison allows for the accurate prediction of k-mer abundances across MFE categories, enabling simultaneous correction of biases at the single k-mer level. Through validation with both engineered RNA constructs and human tissue RNA samples, its wide-ranging efficacy and applicability are demonstrated.
Collapse
Affiliation(s)
- Qiang Su
- Faculty of Synthetic Biology, Shenzhen University of Advanced Technology, Shenzhen Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Nanshan District, Shenzhen, 518055, China
- State Key Laboratory of Chemical Oncogenomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, 2199 Lishui Avenue, Nanshan District, Shenzhen, 518055, China
| | - Yi Long
- Institute of Chemical Biology, Shenzhen Bay Laboratory, Gaoke International Innovation Center A14, Guangqiao Road, Guangming District, Shenzhen, 518132, China
| | - Deming Gou
- Shenzhen Key Laboratory of Microbial Genetic Engineering, Vascular Disease Research Center, College of Life Sciences and Oceanography, Shenzhen University, 1066 Xueyuan Street, Nanshan District, Shenzhen, 518055, China
| | - Junmin Quan
- State Key Laboratory of Chemical Oncogenomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, 2199 Lishui Avenue, Nanshan District, Shenzhen, 518055, China
| | - Qizhou Lian
- Faculty of Synthetic Biology, Shenzhen University of Advanced Technology, Shenzhen Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Nanshan District, Shenzhen, 518055, China
- Cord Blood Bank, Guangzhou Institute of Eugenics and Perinatology, Guangzhou Women and Children’s Medical Center, Guangzhou Medical University, 9 Jinshui Road, Tianhe District, Guangzhou, 510623, China
- State Key Laboratory of Pharmaceutical Biotechnology, and Department of Medicine, The University of Hong Kong, 102 Pok Fu Lam Road, Hong Kong SAR, China
| |
Collapse
|
3
|
Cheng X, Goktas MT, Williamson LM, Krzywinski M, Mulder DT, Swanson L, Slind J, Sihvonen J, Chow CR, Carr A, Bosdet I, Tucker T, Young S, Moore R, Mungall KL, Yip S, Jones SJM. Enhancing clinical genomic accuracy with panelGC: a novel metric and tool for quantifying and monitoring GC biases in hybridization capture panel sequencing. Brief Bioinform 2024; 25:bbae442. [PMID: 39256198 PMCID: PMC11387050 DOI: 10.1093/bib/bbae442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 07/26/2024] [Accepted: 08/23/2024] [Indexed: 09/12/2024] Open
Abstract
Accurate assessment of fragment abundance within a genome is crucial in clinical genomics applications such as the analysis of copy number variation (CNV). However, this task is often hindered by biased coverage in regions with varying guanine-cytosine (GC) content. These biases are particularly exacerbated in hybridization capture sequencing due to GC effects on probe hybridization and polymerase chain reaction (PCR) amplification efficiency. Such GC content-associated variations can exert a negative impact on the fidelity of CNV calling within hybridization capture panels. In this report, we present panelGC, a novel metric, to quantify and monitor GC biases in hybridization capture sequencing data. We establish the efficacy of panelGC, demonstrating its proficiency in identifying and flagging potential procedural anomalies, even in situations where instrument and experimental monitoring data may not be readily accessible. Validation using real-world datasets demonstrates that panelGC enhances the quality control and reliability of hybridization capture panel sequencing.
Collapse
Affiliation(s)
- Xuanjin Cheng
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Murathan T Goktas
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Laura M Williamson
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Martin Krzywinski
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| | - David T Mulder
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Lucas Swanson
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Jill Slind
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Jelena Sihvonen
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Cynthia R Chow
- Cancer Genetics and Genomics Laboratory at BC Cancer Agency, 600 W 10th Ave #3305, Vancouver, British Columbia, V5Z 4E6, Canada
| | - Amy Carr
- Cancer Genetics and Genomics Laboratory at BC Cancer Agency, 600 W 10th Ave #3305, Vancouver, British Columbia, V5Z 4E6, Canada
| | - Ian Bosdet
- Cancer Genetics and Genomics Laboratory at BC Cancer Agency, 600 W 10th Ave #3305, Vancouver, British Columbia, V5Z 4E6, Canada
| | - Tracy Tucker
- Cancer Genetics and Genomics Laboratory at BC Cancer Agency, 600 W 10th Ave #3305, Vancouver, British Columbia, V5Z 4E6, Canada
| | - Sean Young
- Cancer Genetics and Genomics Laboratory at BC Cancer Agency, 600 W 10th Ave #3305, Vancouver, British Columbia, V5Z 4E6, Canada
| | - Richard Moore
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Karen L Mungall
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Stephen Yip
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre, 570 W 7th Ave, Vancouver, British Columbia, V5Z 4S6, Canada
| |
Collapse
|
4
|
Pan L, Zheng C, Yang Z, Pawitan Y, Vu TN, Shen X. Hidden Genetic Regulation of Human Complex Traits via Brain Isoforms. PHENOMICS (CHAM, SWITZERLAND) 2023; 3:217-227. [PMID: 37325708 PMCID: PMC10260721 DOI: 10.1007/s43657-023-00100-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 02/15/2023] [Accepted: 02/17/2023] [Indexed: 06/17/2023]
Abstract
Alternative splicing exists in most multi-exonic genes, and exploring these complex alternative splicing events and their resultant isoform expressions is essential. However, it has become conventional that RNA sequencing results have often been summarized into gene-level expression counts mainly due to the multiple ambiguous mapping of reads at highly similar regions. Transcript-level quantification and interpretation are often overlooked, and biological interpretations are often deduced based on combined transcript information at the gene level. Here, for the most variable tissue of alternative splicing, the brain, we estimate isoform expressions in 1,191 samples collected by the Genotype-Tissue Expression (GTEx) Consortium using a powerful method that we previously developed. We perform genome-wide association scans on the isoform ratios per gene and identify isoform-ratio quantitative trait loci (irQTL), which could not be detected by studying gene-level expressions alone. By analyzing the genetic architecture of the irQTL, we show that isoform ratios regulate educational attainment via multiple tissues including the frontal cortex (BA9), cortex, cervical spinal cord, and hippocampus. These tissues are also associated with different neuro-related traits, including Alzheimer's or dementia, mood swings, sleep duration, alcohol intake, intelligence, anxiety or depression, etc. Mendelian randomization (MR) analysis revealed 1,139 pairs of isoforms and neuro-related traits with plausible causal relationships, showing much stronger causal effects than on general diseases measured in the UK Biobank (UKB). Our results highlight essential transcript-level biomarkers in the human brain for neuro-related complex traits and diseases, which could be missed by merely investigating overall gene expressions. Supplementary Information The online version contains supplementary material available at 10.1007/s43657-023-00100-6.
Collapse
Affiliation(s)
- Lu Pan
- Biostatistics Group, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510006 China
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, 17177 Sweden
| | - Chenqing Zheng
- Biostatistics Group, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510006 China
| | - Zhijian Yang
- Biostatistics Group, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510006 China
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, 17177 Sweden
| | - Trung Nghia Vu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, 17177 Sweden
| | - Xia Shen
- Biostatistics Group, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510006 China
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, 17177 Sweden
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, 200433 China
- Center for Intelligent Medicine Research, Greater Bay Area Institute of Precision Medicine (Guangzhou), Fudan University, Guangzhou, 511458 China
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, EH8 9AG UK
| |
Collapse
|
5
|
Trac QT, Pawitan Y, Mou T, Erkers T, Östling P, Bohlin A, Österroos A, Vesterlund M, Jafari R, Siavelis I, Bäckvall H, Kiviluoto S, Orre LM, Rantalainen M, Lehtiö J, Lehmann S, Kallioniemi O, Vu TN. Prediction model for drug response of acute myeloid leukemia patients. NPJ Precis Oncol 2023; 7:32. [PMID: 36964195 PMCID: PMC10039068 DOI: 10.1038/s41698-023-00374-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 03/13/2023] [Indexed: 03/26/2023] Open
Abstract
Despite some encouraging successes, predicting the therapy response of acute myeloid leukemia (AML) patients remains highly challenging due to tumor heterogeneity. Here we aim to develop and validate MDREAM, a robust ensemble-based prediction model for drug response in AML based on an integration of omics data, including mutations and gene expression, and large-scale drug testing. Briefly, MDREAM is first trained in the BeatAML cohort (n = 278), and then validated in the BeatAML (n = 183) and two external cohorts, including a Swedish AML cohort (n = 45) and a relapsed/refractory acute leukemia cohort (n = 12). The final prediction is based on 122 ensemble models, each corresponding to a drug. A confidence score metric is used to convey the uncertainty of predictions; among predictions with a confidence score >0.75, the validated proportion of good responders is 77%. The Spearman correlations between the predicted and the observed drug response are 0.68 (95% CI: [0.64, 0.68]) in the BeatAML validation set, -0.49 (95% CI: [-0.53, -0.44]) in the Swedish cohort and 0.59 (95% CI: [0.51, 0.67]) in the relapsed/refractory cohort. A web-based implementation of MDREAM is publicly available at https://www.meb.ki.se/shiny/truvu/MDREAM/ .
Collapse
Affiliation(s)
- Quang Thinh Trac
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Tian Mou
- School of Biomedical Engineering, Shenzhen University, Shenzhen, China
| | - Tom Erkers
- Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
| | - Päivi Östling
- Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Anna Bohlin
- Department of Medicine Huddinge, Karolinska Institutet, Unit for Hematology, Karolinska University Hospital Huddinge, Stockholm, Sweden
| | - Albin Österroos
- Department of Medical Sciences, Hematology, Uppsala University Hospital, Uppsala, Sweden
| | - Mattias Vesterlund
- Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
| | - Rozbeh Jafari
- Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
| | - Ioannis Siavelis
- Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
| | - Helena Bäckvall
- Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
| | - Santeri Kiviluoto
- Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
| | - Lukas M Orre
- Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
| | - Mattias Rantalainen
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Janne Lehtiö
- Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
| | - Sören Lehmann
- Department of Medicine Huddinge, Karolinska Institutet, Unit for Hematology, Karolinska University Hospital Huddinge, Stockholm, Sweden
- Department of Medical Sciences, Hematology, Uppsala University Hospital, Uppsala, Sweden
| | - Olli Kallioniemi
- Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Trung Nghia Vu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
| |
Collapse
|
6
|
Deng W, Murugan S, Lindberg J, Chellappa V, Shen X, Pawitan Y, Vu TN. Fusion Gene Detection Using Whole-Exome Sequencing Data in Cancer Patients. Front Genet 2022; 13:820493. [PMID: 35251131 PMCID: PMC8888970 DOI: 10.3389/fgene.2022.820493] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 01/31/2022] [Indexed: 12/13/2022] Open
Abstract
Several fusion genes are directly involved in the initiation and progression of cancers. Numerous bioinformatics tools have been developed to detect fusion events, but they are mainly based on RNA-seq data. The whole-exome sequencing (WES) represents a powerful technology that is widely used for disease-related DNA variant detection. In this study, we build a novel analysis pipeline called Fuseq-WES to detect fusion genes at DNA level based on the WES data. The same method applies also for targeted panel sequencing data. We assess the method to real datasets of acute myeloid leukemia (AML) and prostate cancer patients. The result shows that two of the main AML fusion genes discovered in RNA-seq data, PML-RARA and CBFB-MYH11, are detected in the WES data in 36 and 63% of the available samples, respectively. For the targeted deep-sequencing of prostate cancer patients, detection of the TMPRSS2-ERG fusion, which is the most frequent chimeric alteration in prostate cancer, is 91% concordant with a manually curated procedure based on four other methods. In summary, the overall results indicate that it is challenging to detect fusion genes in WES data with a standard coverage of ∼ 15–30x, where fusion candidates discovered in the RNA-seq data are often not detected in the WES data and vice versa. A subsampling study of the prostate data suggests that a coverage of at least 75x is necessary to achieve high accuracy.
Collapse
Affiliation(s)
- Wenjiang Deng
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Sarath Murugan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Johan Lindberg
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Venkatesh Chellappa
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Xia Shen
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Biostatistics Group, Greater Bay Area Institute of Precision Medicine, Fudan University, Guangzhou, China
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- *Correspondence: Yudi Pawitan, ; Trung Nghia Vu,
| | - Trung Nghia Vu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- *Correspondence: Yudi Pawitan, ; Trung Nghia Vu,
| |
Collapse
|
7
|
Davies P, Jones M, Liu J, Hebenstreit D. Anti-bias training for (sc)RNA-seq: experimental and computational approaches to improve precision. Brief Bioinform 2021; 22:6265204. [PMID: 33959753 PMCID: PMC8574610 DOI: 10.1093/bib/bbab148] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 03/10/2021] [Accepted: 03/26/2021] [Indexed: 12/29/2022] Open
Abstract
RNA-seq, including single cell RNA-seq (scRNA-seq), is plagued by insufficient sensitivity and lack of precision. As a result, the full potential of (sc)RNA-seq is limited. Major factors in this respect are the presence of global bias in most datasets, which affects detection and quantitation of RNA in a length-dependent fashion. In particular, scRNA-seq is affected by technical noise and a high rate of dropouts, where the vast majority of original transcripts is not converted into sequencing reads. We discuss these biases origins and implications, bioinformatics approaches to correct for them, and how biases can be exploited to infer characteristics of the sample preparation process, which in turn can be used to improve library preparation.
Collapse
Affiliation(s)
- Philip Davies
- Daniel Hebenstreit's Research Group University of Warwick, CV4 7AL Coventry, UK
| | - Matt Jones
- Daniel Hebenstreit's Research Group University of Warwick, CV4 7AL Coventry, UK
| | - Juntai Liu
- Physics Department, University of Warwick, CV4 7AL Coventry, UK
| | | |
Collapse
|
8
|
Patro R, Salmela L. Algorithms meet sequencing technologies - 10th edition of the RECOMB-Seq workshop. iScience 2021; 24:101956. [PMID: 33437938 PMCID: PMC7788091 DOI: 10.1016/j.isci.2020.101956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
DNA and RNA sequencing is a core technology in biological and medical research. The high throughput of these technologies and the consistent development of new experimental assays and biotechnologies demand the continuous development of methods to analyze the resulting data. The RECOMB Satellite Workshop on Massively Parallel Sequencing brings together leading researchers in computational genomics to discuss emerging frontiers in algorithm development for massively parallel sequencing data. The 10th meeting in this series, RECOMB-Seq 2020, was scheduled to be held in Padua, Italy, but due to the ongoing COVID-19 pandemic, the meeting was carried out virtually instead. The online workshop featured keynote talks by Paola Bonizzoni and Zamin Iqbal, two highlight talks, ten regular talks, and three short talks. Seven of the works presented in the workshop are featured in this edition of iScience, and many of the talks are available online in the RECOMB-Seq 2020 YouTube channel.
Collapse
Affiliation(s)
- Rob Patro
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Leena Salmela
- Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| |
Collapse
|