1
|
Hao W, Rajendran BK, Cui T, Sun J, Zhao Y, Palaniyandi T, Selvam M. Advances in predicting breast cancer driver mutations: Tools for precision oncology (Review). Int J Mol Med 2025; 55:6. [PMID: 39450552 PMCID: PMC11537269 DOI: 10.3892/ijmm.2024.5447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 09/30/2024] [Indexed: 10/26/2024] Open
Abstract
In the modern era of medicine, prognosis and treatment, options for a number of cancer types including breast cancer have been improved by the identification of cancer‑specific biomarkers. The availability of high‑throughput sequencing and analysis platforms, the growth of publicly available cancer databases and molecular and histological profiling facilitate the development of new drugs through a precision medicine approach. However, only a fraction of patients with breast cancer with few actionable mutations typically benefit from the precision medicine approach. In the present review, the current development in breast cancer driver gene identification, actionable breast cancer mutations, as well as the available therapeutic options, challenges and applications of breast precision oncology are systematically described. Breast cancer driver mutation‑based precision oncology helps to screen key drivers involved in disease development and progression, drug sensitivity and the genes responsible for drug resistance. Advances in precision oncology will provide more targeted therapeutic options for patients with breast cancer, improving disease‑free survival and potentially leading to significant successes in breast cancer treatment in the near future. Identification of driver mutations has allowed new targeted therapeutic approaches in combination with standard chemo‑ and immunotherapies in breast cancer. Developing new driver mutation identification strategies will help to define new therapeutic targets and improve the overall and disease‑free survival of patients with breast cancer through efficient medicine.
Collapse
Affiliation(s)
- Wenhui Hao
- Xinjiang Key Laboratory of Molecular Biology for Endemic Diseases, School of Basic Medical Sciences, Xinjiang Medical University, Urumqi, Xinjiang 830017, P.R. China
| | - Barani Kumar Rajendran
- Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT 06510, USA
| | - Tingting Cui
- Xinjiang Key Laboratory of Molecular Biology for Endemic Diseases, School of Basic Medical Sciences, Xinjiang Medical University, Urumqi, Xinjiang 830017, P.R. China
| | - Jiayi Sun
- Xinjiang Key Laboratory of Molecular Biology for Endemic Diseases, School of Basic Medical Sciences, Xinjiang Medical University, Urumqi, Xinjiang 830017, P.R. China
| | - Yingchun Zhao
- Xinjiang Key Laboratory of Molecular Biology for Endemic Diseases, School of Basic Medical Sciences, Xinjiang Medical University, Urumqi, Xinjiang 830017, P.R. China
| | | | - Masilamani Selvam
- Department of Biotechnology, Sathyabama Institute of Science and Technology, Chennai 600119, India
| |
Collapse
|
2
|
Lorenzana GP, Figueiró HV, Coutinho LL, Villela PMS, Eizirik E. Comparative assessment of genotyping-by-sequencing and whole-exome sequencing for estimating genetic diversity and geographic structure in small sample sizes: insights from wild jaguar populations. Genetica 2024; 152:133-144. [PMID: 39322785 DOI: 10.1007/s10709-024-00212-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 09/12/2024] [Indexed: 09/27/2024]
Abstract
Biologists currently have an assortment of high-throughput sequencing techniques allowing the study of population dynamics in increasing detail. The utility of genetic estimates depends on their ability to recover meaningful approximations while filtering out noise produced by artifacts. In this study, we empirically compared the congruence of two reduced representation approaches (genotyping-by-sequencing, GBS, and whole-exome sequencing, WES) in estimating genetic diversity and population structure using SNP markers typed in a small number of wild jaguar (Panthera onca) samples from South America. Due to its targeted nature, WES allowed for a more straightforward reconstruction of loci compared to GBS, facilitating the identification of true polymorphisms across individuals. We therefore used WES-derived metrics as a benchmark against which GBS-derived indicators were compared, adjusting parameters for locus assembly and SNP filtering in the latter. We observed significant variation in SNP call rates across samples in GBS datasets, leading to a recurrent miscalling of heterozygous sites. This issue was further amplified by small sample sizes, ultimately impacting the consistency of summary statistics between genotyping methods. Recognizing that the genetic markers obtained from GBS and WES are intrinsically different due to varying evolutionary pressures, particularly selection, we consider that our empirical comparison offers valuable insights and highlights critical considerations for estimating population genetic attributes using reduced representation datasets. Our results emphasize the critical need for careful evaluation of missing data and stringent filtering to achieve reliable estimates of genetic diversity and differentiation in elusive wildlife species.
Collapse
Affiliation(s)
- Gustavo P Lorenzana
- Laboratório de Biologia Genômica e Molecular, Escola de Ciências da Saúde e da Vida, PUCRS, Porto Alegre, Brazil.
- School of Forestry, Northern Arizona University, Flagstaff, AZ, USA.
| | - Henrique V Figueiró
- Laboratório de Biologia Genômica e Molecular, Escola de Ciências da Saúde e da Vida, PUCRS, Porto Alegre, Brazil
- Environmental Genomics Group, Vale Institute of Technology, Belem, Brazil
| | | | - Priscilla M S Villela
- Centro de Genômica Funcional, ESALQ-USP, Piracicaba, Brazil
- EcoMol Consultoria e Projetos, Piracicaba, Brazil
| | - Eduardo Eizirik
- Laboratório de Biologia Genômica e Molecular, Escola de Ciências da Saúde e da Vida, PUCRS, Porto Alegre, Brazil
- Instituto Pró-Carnívoros, Atibaia, Brazil
| |
Collapse
|
3
|
Fixman B, Díaz-Gay M, Qiu C, Margaryan T, Lee B, Chen XS. Validation of the APOBEC3A-mediated RNA Single Base Substitution Signature and Proposal of Novel APOBEC1, APOBEC3B, and APOBEC3G RNA Signatures. J Mol Biol 2024:168854. [PMID: 39510348 DOI: 10.1016/j.jmb.2024.168854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 10/30/2024] [Accepted: 10/31/2024] [Indexed: 11/15/2024]
Abstract
Mutational signature analysis gained significant attention for providing critical insights into the underlying mutational processes for various DNA single base substitution (SBS) signatures and their associations with different cancer types. Recently, RNA single base substitution (RNA-SBS) signatures were defined and described by decomposing RNA variants found in non-small cell lung cancer. Through statistical association, they attributed Apolipoprotein B mRNA Editing Enzyme, Catalytic Polypeptide 3A (APOBEC3A) mutagenesis to the RNA-SBS2 signature. Here, we provide the first validation of an RNA-SBS mutational signature by decomposing novel exogenous and endogenous APOBEC3A RNA editing signatures into COSMICv3.4 RNA-SBS reference signatures. Additionally, we have identified novel RNA-SBS signatures for APOBEC1, APOBEC3B, and APOBEC3G.
Collapse
Affiliation(s)
- Benjamin Fixman
- Molecular and Computational Biology, Departments of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Marcos Díaz-Gay
- Department of Cellular and Molecular Medicine and Department of Bioengineering and Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Connor Qiu
- Molecular and Computational Biology, Departments of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Tamara Margaryan
- Molecular and Computational Biology, Departments of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Brian Lee
- Molecular and Computational Biology, Departments of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Xiaojiang S Chen
- Molecular and Computational Biology, Departments of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA; Genetic, Molecular and Cellular Biology Program, Keck School of Medicine; Norris Comprehensive Cancer Center; Center of Excellence in NanoBiophysics, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
4
|
Emül AA, Ergün MA, Ertürk RA, Çinal Ö, Baysan M. VCF observer: a user-friendly software tool for preliminary VCF file analysis and comparison. BMC Bioinformatics 2024; 25:290. [PMID: 39227760 PMCID: PMC11373448 DOI: 10.1186/s12859-024-05860-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 07/10/2024] [Indexed: 09/05/2024] Open
Abstract
BACKGROUND Advancements over the past decade in DNA sequencing technology and computing power have created the potential to revolutionize medicine. There has been a marked increase in genetic data available, allowing for the advancement of areas such as personalized medicine. A crucial type of data in this context is genetic variant data which is stored in variant call format (VCF) files. However, the rapid growth in genomics has presented challenges in analyzing and comparing VCF files. RESULTS In response to the limitations of existing tools, this paper introduces a novel web application that provides a user-friendly solution for VCF file analyses and comparisons. The software tool enables researchers and clinicians to perform high-level analysis with ease and enhances productivity. The application's interface allows users to conveniently upload, analyze, and visualize their VCF files using simple drag-and-drop and point-and-click operations. Essential visualizations such as Venn diagrams, clustergrams, and precision-recall plots are provided to users. A key feature of the application is its support for metadata-based file grouping, accomplished through flexible data matrix uploads, streamlining organization and analysis of user-defined categories. Additionally, the application facilitates standardized benchmarking of VCF files by integrating user-provided ground truth regions and variant lists. CONCLUSIONS By providing a user-friendly interface and supporting essential visualizations, this software enhances the accessibility of VCF file analysis and assists researchers and clinicians in their scientific inquiries.
Collapse
Affiliation(s)
- Abdullah Asım Emül
- Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey
- Health Institutes of Türkiye, Istanbul, Turkey
| | - Mehmet Arif Ergün
- Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey
- Health Institutes of Türkiye, Istanbul, Turkey
| | | | - Ömer Çinal
- Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey
| | - Mehmet Baysan
- Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey.
- Health Institutes of Türkiye, Istanbul, Turkey.
| |
Collapse
|
5
|
Bonetti E, Tini G, Mazzarella L. Accuracy of renovo predictions on variants reclassified over time. J Transl Med 2024; 22:713. [PMID: 39085881 PMCID: PMC11293099 DOI: 10.1186/s12967-024-05508-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 07/14/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND Interpreting the clinical consequences of genetic variants is the central problem in modern clinical genomics, for both hereditary diseases and oncology. However, clinical validation lags behind the pace of discovery, leading to distressing uncertainty for patients, physicians and researchers. This "interpretation gap" changes over time as evidence accumulates, and variants initially deemed of uncertain (VUS) significance may be subsequently reclassified in pathogenic/benign. We previously developed RENOVO, a random forest-based tool able to predict variant pathogenicity based on publicly available information from GnomAD and dbNFSP, and tested on variants that have changed their classification status over time. Here, we comprehensively evaluated the accuracy of RENOVO predictions on variants that have been reclassified over the last four years. METHODS we retrieved 16 retrospective instances of the ClinVar database, every 3 months since March 2020 to March 2024, and analyzed time trends of variant classifications. We identified variants that changed their status over time and compared RENOVO predictions generated in 2020 with the actual reclassifications. RESULTS VUS have become the most represented class in ClinVar (44.97% vs. 9.75% (likely) pathogenic and 40,33% (likely) benign). The rate of VUS reclassification is linear and slow compared to the rate of VUS reporting, exponential and currently ~ 30x faster, creating a growing divide between what can be sequenced vs. what can be interpreted. Out of 10,196 VUS variants in January 2020 that have undergone a clinically meaningful reclassification to march 2024, RENOVO correctly classified 82.6% in 2020. In addition, RENOVO correctly identified the majority of the few variants that switched clinically meaningful classes (e.g., from benign to pathogenic and vice versa). We highlight variant classes and clinically relevant genes for which RENOVO provides particularly accurate estimates. In particularly, genes characterized by large prevalence of high- or low-impact variants (e.g., POLE, NOTCH1, FANCM etc.). Suboptimal RENOVO predictions mostly concern genes validated through dedicated consortia (e.g., BRCA1/2), in which RENOVO would anyway have a limited impact. CONCLUSIONS Time trend analysis demonstrates that the current model of variant interpretation cannot keep up with variant discovery. Machine learning-based tools like RENOVO confirm high accuracy that can aid in clinical practice and research.
Collapse
Affiliation(s)
- Emanuele Bonetti
- Department of Experimental Oncology, European Institute of Oncology, IEO-IRCCS, Milan, 20139, Italy
| | - Giulia Tini
- Department of Experimental Oncology, European Institute of Oncology, IEO-IRCCS, Milan, 20139, Italy
| | - Luca Mazzarella
- Department of Experimental Oncology, European Institute of Oncology, IEO-IRCCS, Milan, 20139, Italy.
| |
Collapse
|
6
|
Villani RM, McKenzie ME, Davidson AL, Spurdle AB. Regional-specific calibration enables application of computational evidence for clinical classification of 5' cis-regulatory variants in Mendelian disease. Am J Hum Genet 2024; 111:1301-1315. [PMID: 38815586 PMCID: PMC11267523 DOI: 10.1016/j.ajhg.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 05/02/2024] [Accepted: 05/03/2024] [Indexed: 06/01/2024] Open
Abstract
To date, clinical genetic testing for Mendelian disease variants has focused heavily on exonic coding and intronic gene regions. This multi-step study was undertaken to provide an evidence base for selecting and applying computational approaches for use in clinical classification of 5' cis-regulatory region variants. Curated datasets of clinically reported disease-causing 5' cis-regulatory region variants and variants from matched genomic regions in population controls were used to calibrate six bioinformatic tools as predictors of variant pathogenicity. Likelihood ratio estimates were aligned to code weights following ClinGen recommendations for application of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) classification scheme. Considering code assignment across all reference dataset variants, performance was best for CADD (81.2%) and REMM (81.5%). Optimized thresholds provided moderate evidence toward pathogenicity (CADD, REMM) and moderate (CADD) or supporting (REMM) evidence against pathogenicity. Both sensitivity and specificity of prediction were improved when further categorizing variants based on location in an EPDnew-defined promoter region. Combining predictions (CADD, REMM, and location in a promoter region) increased specificity at the expense of sensitivity. Importantly, the optimal CADD thresholds for assigning ACMG/AMP codes PP3 (≥10) and BP4 (≤8) were vastly different from recommendations for protein-coding variants (PP3 ≥25.3; BP4 ≤22.7); CADD <22.7 would incorrectly assign BP4 for >90% of reported disease-causing cis-regulatory region variants. Our results demonstrate the need to consider a tiered approach and tailored score thresholds to optimize bioinformatic impact prediction for clinical classification of 5' cis-regulatory region variants.
Collapse
Affiliation(s)
- Rehan M Villani
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Maddison E McKenzie
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Aimee L Davidson
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Amanda B Spurdle
- Population Health Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia; University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
7
|
Furukawa T, Sakai K, Suzuki T, Tanaka T, Kushiro M, Kusumoto KI. Comparative Genome Analysis of Japanese Field-Isolated Aspergillus for Aflatoxin Productivity and Non-Productivity. J Fungi (Basel) 2024; 10:459. [PMID: 39057344 PMCID: PMC11278155 DOI: 10.3390/jof10070459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 06/21/2024] [Accepted: 06/26/2024] [Indexed: 07/28/2024] Open
Abstract
Aspergillus flavus produces aflatoxin, a carcinogenic fungal toxin that poses a threat to the agricultural and food industries. There is a concern that the distribution of aflatoxin-producing A. flavus is expanding in Japan due to climate change, and it is necessary to understand what types of strains inhabit. In this study, we sequenced the genomes of four Aspergillus strains isolated from agricultural fields in the Ibaraki prefecture of Japan and identified their genetic variants. Phylogenetic analysis based on single-nucleotide variants revealed that the two aflatoxin-producing strains were closely related to A. flavus NRRL3357, whereas the two non-producing strains were closely related to the RIB40 strain of Aspergillus oryzae, a fungus widely used in the Japanese fermentation industry. A detailed analysis of the variants in the aflatoxin biosynthetic gene cluster showed that the two aflatoxin-producing strains belonged to different morphotype lineages. RT-qPCR results indicated that the expression of aflatoxin biosynthetic genes was consistent with aflatoxin production in the two aflatoxin-producing strains, whereas the two non-producing strains expressed most of the aflatoxin biosynthetic genes, unlike common knowledge in A. oryzae, suggesting that the lack of aflatoxin production was attributed to genes outside of the aflatoxin biosynthetic gene cluster in these strains.
Collapse
Affiliation(s)
- Tomohiro Furukawa
- Institute of Food Research, National Agriculture and Food Research Organization (NARO), 2-1-12 Kannondai, Tsukuba 305-8642, Japan
| | - Kanae Sakai
- Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita 565-0871, Japan
| | - Tadahiro Suzuki
- Institute of Food Research, National Agriculture and Food Research Organization (NARO), 2-1-12 Kannondai, Tsukuba 305-8642, Japan
| | - Takumi Tanaka
- Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita 565-0871, Japan
| | - Masayo Kushiro
- Institute of Food Research, National Agriculture and Food Research Organization (NARO), 2-1-12 Kannondai, Tsukuba 305-8642, Japan
| | - Ken-Ichi Kusumoto
- Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita 565-0871, Japan
| |
Collapse
|
8
|
Mahmood K, Sarup P, Oertelt L, Jahoor A, Orabi J. Assessing myBaits Target Capture Sequencing Methodology Using Short-Read Sequencing for Variant Detection in Oat Genomics and Breeding. Genes (Basel) 2024; 15:700. [PMID: 38927635 PMCID: PMC11203172 DOI: 10.3390/genes15060700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 05/18/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open
Abstract
The integration of target capture systems with next-generation sequencing has emerged as an efficient tool for exploring specific genetic regions with a high resolution and facilitating the rapid discovery of novel alleles. Despite these advancements, the application of targeted sequencing methodologies, such as the myBaits technology, in polyploid oat species remains relatively unexplored. In this study, we utilized the myBaits target capture method offered by Daicel Arbor Biosciences to detect variants and assess their reliability for variant detection in oat genomics and breeding. Ten oat genotypes were carefully chosen for targeted sequencing, focusing on specific regions on chromosome 2A to detect variants. The selected region harbors 98 genes. Precisely designed baits targeting the genes within these regions were employed for the target capture sequencing. We employed various mappers and variant callers to identify variants. After the identification of variants, we focused on the variants identified via all variants callers to assess the applicability of the myBaits sequencing methodology in oat breeding. In our efforts to validate the identified variants, we focused on two SNPs, one deletion and one insertion identified via all variant callers in the genotypes KF-318 and NOS 819111-70 but absent in the remaining eight genotypes. The Sanger sequencing of targeted SNPs failed to reproduce target capture data obtained through the myBaits technology. Similarly, the validation of deletion and insertion variants via high-resolution melting (HRM) curve analysis also failed to reproduce target capture data, again suggesting limitations in the reliability of the myBaits target capture sequencing using short-read sequencing for variant detection in the oat genome. This study shed light on the importance of exercising caution when employing the myBaits target capture strategy for variant detection in oats. This study provides valuable insights for breeders seeking to advance oat breeding efforts and marker development using myBaits target capture sequencing, emphasizing the significance of methodological sequencing considerations in oat genomics research.
Collapse
Affiliation(s)
- Khalid Mahmood
- Nordic Seed, Grindsnabevej 25, 8300 Odder, Denmark; (P.S.); (A.J.); (J.O.)
| | - Pernille Sarup
- Nordic Seed, Grindsnabevej 25, 8300 Odder, Denmark; (P.S.); (A.J.); (J.O.)
| | - Lukas Oertelt
- Nordic Seed Germany, Kirchhorster Str. 16, 31688 Nienstädt, Germany;
| | - Ahmed Jahoor
- Nordic Seed, Grindsnabevej 25, 8300 Odder, Denmark; (P.S.); (A.J.); (J.O.)
- Nordic Seed Germany, Kirchhorster Str. 16, 31688 Nienstädt, Germany;
| | - Jihad Orabi
- Nordic Seed, Grindsnabevej 25, 8300 Odder, Denmark; (P.S.); (A.J.); (J.O.)
| |
Collapse
|
9
|
Rashid M, Rashid R, Gadewal N, Carethers JM, Koi M, Brim H, Ashktorab H. High-throughput sequencing and in-silico analysis confirm pathogenicity of novel MSH3 variants in African American colorectal cancer. Neoplasia 2024; 49:100970. [PMID: 38281411 PMCID: PMC10840101 DOI: 10.1016/j.neo.2024.100970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 12/19/2023] [Accepted: 01/08/2024] [Indexed: 01/30/2024]
Abstract
The maintenance of DNA sequence integrity is critical to avoid accumulation of cancer-causing mutations. Inactivation of DNA Mismatch Repair (MMR) genes (e.g., MLH1 and MSH2) is common among many cancers, including colorectal cancer (CRC) and is the driver of classic microsatellite instability (MSI) in tumors. Somatic MSH3 alterations have been linked to a specific form of MSI called elevated microsatellite alterations at selected tetranucleotide repeats (EMAST) that is associated with patient poor prognosis and elevated among African American (AA) rectal cancer patients. Genetic variants of MSH3 and their pathogenicity vary among different populations, such as among AA, which are not well-represented in publicly available databases. Targeted exome sequencing of MSH3 among AA CRC samples followed by computational bioinformatic pipeline and molecular dynamic simulation analysis approach confirmed six identified MSH3 variants (c.G1237A, c.C2759T, c.G1397A, c.G2926A, c.C3028T, c.G3241A) that corresponded to MSH3 amino-acid changes (p.E413K; p.S466N; p.S920F; p.E976K; p.H1010Y; p.E1081K). All identified MSH3 variants were non-synonymous, novel, pathogenic, and show loss or gain of hydrogen bonding, ionic bonding, hydrophobic bonding, and disulfide bonding and have a deleterious effect on the structure of MSH3 protein. Some variants were located within the ATPase site of MSH3, affecting ATP hydrolysis that is critical for MSH3's function. Other variants were in the MSH3-MSH2 interacting domain, important for MSH3's binding to MSH2. Overall, our data suggest that these variants among AA CRC patients affect the function of MSH3 making them pathogenic and likely contributing to the development or advancement of CRC among AA. Further clarifying functional studies will be necessary to fully understand the impact of these variants on MSH3 function and CRC development in AA patients.
Collapse
Affiliation(s)
- Mudasir Rashid
- Department of Medicine, Gastroenterology Division, Department of Pathology and Cancer Center, Howard University College of Medicine, Washington, DC 20059, USA
| | - Rumaisa Rashid
- Department of Medicine, Gastroenterology Division, Department of Pathology and Cancer Center, Howard University College of Medicine, Washington, DC 20059, USA
| | - Nikhil Gadewal
- Bioinformatics and Computational Biology Facility, Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, MH 410210, India
| | - John M Carethers
- Division of Gastroenterology and Hepatology, Department of Medicine, UC San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA; Moores Cancer Center, and Herbert Wertheim School of Public Health and Human Longevity Science, UC San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Minoru Koi
- Division of Gastroenterology and Hepatology, Department of Medicine, UC San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Hassan Brim
- Department of Medicine, Gastroenterology Division, Department of Pathology and Cancer Center, Howard University College of Medicine, Washington, DC 20059, USA
| | - Hassan Ashktorab
- Department of Medicine, Gastroenterology Division, Department of Pathology and Cancer Center, Howard University College of Medicine, Washington, DC 20059, USA.
| |
Collapse
|
10
|
Wang J, Nakato R. Churros: a Docker-based pipeline for large-scale epigenomic analysis. DNA Res 2024; 31:dsad026. [PMID: 38102723 PMCID: PMC11389749 DOI: 10.1093/dnares/dsad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/23/2023] [Accepted: 12/13/2023] [Indexed: 12/17/2023] Open
Abstract
The epigenome, which reflects the modifications on chromatin or DNA sequences, provides crucial insight into gene expression regulation and cellular activity. With the continuous accumulation of epigenomic datasets such as chromatin immunoprecipitation followed by sequencing (ChIP-seq) data, there is a great demand for a streamlined pipeline to consistently process them, especially for large-dataset comparisons involving hundreds of samples. Here, we present Churros, an end-to-end epigenomic analysis pipeline that is environmentally independent and optimized for handling large-scale data. We successfully demonstrated the effectiveness of Churros by analyzing large-scale ChIP-seq datasets with the hg38 or Telomere-to-Telomere (T2T) human reference genome. We found that applying T2T to the typical analysis workflow has important impacts on read mapping, quality checks, and peak calling. We also introduced a useful feature to study context-specific epigenomic landscapes. Churros will contribute a comprehensive and unified resource for analyzing large-scale epigenomic data.
Collapse
Affiliation(s)
- Jiankang Wang
- School of Biomedical Sciences, Hunan University, Changsha, Hunan, China
- Institute for Quantitative Biosciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Ryuichiro Nakato
- Institute for Quantitative Biosciences, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
11
|
Kattoor JJ, Guag J, Nemser SM, Wilkes RP. Development of ion torrent-based targeted next-generation sequencing panel for identification of animal species in pet foods. Res Vet Sci 2024; 167:105117. [PMID: 38160490 DOI: 10.1016/j.rvsc.2023.105117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/29/2023] [Accepted: 12/18/2023] [Indexed: 01/03/2024]
Abstract
Manufacturers may intentionally or unintentionally incorporate ingredients not specified on the label of canned pet foods. Including any unacknowledged ingredients in a food product is considered food fraud or misbranding. Contamination of pet foods may occur in the processing of the foods, including potential cross-contamination in packaging facilities. Of the methods available to identify meat species in food products, Sanger sequencing and several next-generation sequencing methods are available, but there are limitations including the number of targets analyzed at a time and the method specificity. In this study, we developed a targeted next-generation sequencing panel to detect meat species in canned pet foods using Ion Torrent technology. The panel contains multiple primers targeting mitochondrial genes from as many as 27 animal species, of which 7 major animal species were validated. The meat species targets could be identified from samples spiked with as low as 0.01% w/w of the contaminating meat species in a vegetarian food matrix material. Targeted NGS in the current study enriches species-specific multiple target areas in the mitochondrial genome of the target material, which gives high accuracy in the sequencing results.
Collapse
Affiliation(s)
- J J Kattoor
- Animal Disease Diagnostic Laboratory, Purdue University, West Lafayette, IN, USA
| | - J Guag
- Center for Veterinary Medicine, Vet-LIRN, Food and Drug Administration, Laurel, MD, USA
| | - S M Nemser
- Center for Veterinary Medicine, Vet-LIRN, Food and Drug Administration, Laurel, MD, USA
| | - R P Wilkes
- Animal Disease Diagnostic Laboratory, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
12
|
Cui H, Srinivasan S, Gao Z, Korkin D. The Extent of Edgetic Perturbations in the Human Interactome Caused by Population-Specific Mutations. Biomolecules 2023; 14:40. [PMID: 38254640 PMCID: PMC11154503 DOI: 10.3390/biom14010040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 11/30/2023] [Accepted: 12/03/2023] [Indexed: 01/24/2024] Open
Abstract
Until recently, efforts in population genetics have been focused primarily on people of European ancestry. To attenuate this bias, global population studies, such as the 1000 Genomes Project, have revealed differences in genetic variation across ethnic groups. How many of these differences can be attributed to population-specific traits? To answer this question, the mutation data must be linked with functional outcomes. A new "edgotype" concept has been proposed, which emphasizes the interaction-specific, "edgetic", perturbations caused by mutations in the interacting proteins. In this work, we performed systematic in silico edgetic profiling of ~50,000 non-synonymous SNVs (nsSNVs) from the 1000 Genomes Project by leveraging our semi-supervised learning approach SNP-IN tool on a comprehensive set of over 10,000 protein interaction complexes. We interrogated the functional roles of the variants and their impact on the human interactome and compared the results with the pathogenic variants disrupting PPIs in the same interactome. Our results demonstrated that a considerable number of nsSNVs from healthy populations could rewire the interactome. We also showed that the proteins enriched with interaction-disrupting mutations were associated with diverse functions and had implications in a broad spectrum of diseases. Further analysis indicated that distinct gene edgetic profiles among major populations could shed light on the molecular mechanisms behind the population phenotypic variances. Finally, the network analysis revealed that the disease-associated modules surprisingly harbored a higher density of interaction-disrupting mutations from healthy populations. The variation in the cumulative network damage within these modules could potentially account for the observed disparities in disease susceptibility, which are distinctly specific to certain populations. Our work demonstrates the feasibility of a large-scale in silico edgetic study, and reveals insights into the orchestrated play of population-specific mutations in the human interactome.
Collapse
Affiliation(s)
- Hongzhu Cui
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Chromatography and Mass Spectrometry Division, Thermo Fisher Scientific, San Jose, CA 95134, USA
| | - Suhas Srinivasan
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Program in Epithelial Biology, Stanford School of Medicine, Stanford, CA 94305, USA
- Center for Personal Dynamic Regulomes, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Ziyang Gao
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA 01609, USA;
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA
| |
Collapse
|
13
|
Majidian S, Agustinho DP, Chin CS, Sedlazeck FJ, Mahmoud M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol 2023; 24:221. [PMID: 37798733 PMCID: PMC10552390 DOI: 10.1186/s13059-023-03061-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | | | | | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Medhat Mahmoud
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
14
|
Budiš J, Krampl W, Kucharík M, Hekel R, Goga A, Sitarčík J, Lichvár M, Smol’ak D, Böhmer M, Baláž A, Ďuriš F, Gazdarica J, Šoltys K, Turňa J, Radvánszky J, Szemes T. SnakeLines: integrated set of computational pipelines for sequencing reads. J Integr Bioinform 2023; 20:jib-2022-0059. [PMID: 37602733 PMCID: PMC10757078 DOI: 10.1515/jib-2022-0059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 03/21/2023] [Indexed: 08/22/2023] Open
Abstract
With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilising sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibility of computational analyses across separated computational centres with inconsistent versions of installed libraries and bioinformatics tools. We propose an easily extensible set of computational pipelines, called SnakeLines, for processing sequencing reads; including mapping, assembly, variant calling, viral identification, transcriptomics, and metagenomics analysis. Individual steps of an analysis, along with methods and their parameters can be readily modified in a single configuration file. Provided pipelines are embedded in virtual environments that ensure isolation of required resources from the host operating system, rapid deployment, and reproducibility of analysis across different Unix-based platforms. SnakeLines is a powerful framework for the automation of bioinformatics analyses, with emphasis on a simple set-up, modifications, extensibility, and reproducibility. The framework is already routinely used in various research projects and their applications, especially in the Slovak national surveillance of SARS-CoV-2.
Collapse
Affiliation(s)
- Jaroslav Budiš
- Geneton Ltd., 841 04Bratislava, Slovakia
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
| | - Werner Krampl
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Marcel Kucharík
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
| | - Rastislav Hekel
- Geneton Ltd., 841 04Bratislava, Slovakia
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Adrián Goga
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University, 841 04Bratislava, Slovakia
| | - Jozef Sitarčík
- Geneton Ltd., 841 04Bratislava, Slovakia
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
| | - Michal Lichvár
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
| | - Dávid Smol’ak
- Geneton Ltd., 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Miroslav Böhmer
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Andrej Baláž
- Geneton Ltd., 841 04Bratislava, Slovakia
- Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University, 841 04Bratislava, Slovakia
| | - František Ďuriš
- Geneton Ltd., 841 04Bratislava, Slovakia
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
| | - Juraj Gazdarica
- Geneton Ltd., 841 04Bratislava, Slovakia
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
| | - Katarína Šoltys
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Ján Turňa
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Ján Radvánszky
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences, 845 05Bratislava, Slovakia
| | - Tomáš Szemes
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| |
Collapse
|
15
|
Chen YJ, Wang MW, Qiu YS, Yuan RY, Wang N, Lin X, Chen WJ. Alu Retrotransposition Event in SPAST Gene as a Novel Cause of Hereditary Spastic Paraplegia. Mov Disord 2023; 38:1750-1755. [PMID: 37394769 DOI: 10.1002/mds.29522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/05/2023] [Accepted: 06/12/2023] [Indexed: 07/04/2023] Open
Abstract
OBJECTIVES To diagnose the molecular cause of hereditary spastic paraplegia (HSP) observed in a four-generation family with autosomal dominant inheritance. METHODS Multiplex ligation-dependent probe amplification (MLPA), whole-exome sequencing (WES), and RNA sequencing (RNA-seq) of peripheral blood leukocytes were performed. Reverse transcription polymerase chain reaction (RT-PCR) and Sanger sequencing were used to characterize target regions of SPAST. RESULTS A 121-bp AluYb9 insertion with a 30-bp poly-A tail flanked by 15-bp direct repeats on both sides was identified in the edge of intron 16 in SPAST that segregated with the disease phenotype. CONCLUSIONS We identified an intronic AluYb9 insertion inducing splicing alteration in SPAST causing pure HSP phenotype that was not detected by routine WES analysis. Our findings suggest RNA-seq is a recommended implementation for undiagnosed cases by first-line diagnostic approaches. © 2023 International Parkinson and Movement Disorder Society.
Collapse
Affiliation(s)
- Yi-Jun Chen
- Department of Neurology and Institute of Neurology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
- Department of Geriatrics, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
- Department of Geriatrics, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, China
| | - Meng-Wen Wang
- Department of Neurology and Institute of Neurology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Yu-Sen Qiu
- Department of Neurology and Institute of Neurology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Ru-Ying Yuan
- Department of Neurology and Institute of Neurology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Ning Wang
- Department of Neurology and Institute of Neurology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
- Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, China
| | - Xiang Lin
- Department of Neurology and Institute of Neurology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
- Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, China
| | - Wan-Jin Chen
- Department of Neurology and Institute of Neurology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
- Fujian Key Laboratory of Molecular Neurology, Fujian Medical University, Fuzhou, China
| |
Collapse
|
16
|
Kosugi S, Kamatani Y, Harada K, Tomizuka K, Momozawa Y, Morisaki T, Terao C. Detection of trait-associated structural variations using short-read sequencing. CELL GENOMICS 2023; 3:100328. [PMID: 37388916 PMCID: PMC10300613 DOI: 10.1016/j.xgen.2023.100328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 02/17/2023] [Accepted: 04/25/2023] [Indexed: 07/01/2023]
Abstract
Genomic structural variation (SV) affects genetic and phenotypic characteristics in diverse organisms, but the lack of reliable methods to detect SV has hindered genetic analysis. We developed a computational algorithm (MOPline) that includes missing call recovery combined with high-confidence SV call selection and genotyping using short-read whole-genome sequencing (WGS) data. Using 3,672 high-coverage WGS datasets, MOPline stably detected ∼16,000 SVs per individual, which is over ∼1.7-3.3-fold higher than previous large-scale projects while exhibiting a comparable level of statistical quality metrics. We imputed SVs from 181,622 Japanese individuals for 42 diseases and 60 quantitative traits. A genome-wide association study with the imputed SVs revealed 41 top-ranked or nearly top-ranked genome-wide significant SVs, including 8 exonic SVs with 5 novel associations and enriched mobile element insertions. This study demonstrates that short-read WGS data can be used to identify rare and common SVs associated with a variety of traits.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
| | - Yoichiro Kamatani
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba 277-8562, Japan
| | - Katsutoshi Harada
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan
| | - Takayuki Morisaki
- Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, 4-6-1, Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan
| | | | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
17
|
Bu M, Xu M, Tao S, Cui P, He B. Evaluation of Different SNP Analysis Software and Optimal Mining Process in Tree Species. Life (Basel) 2023; 13:life13051069. [PMID: 37240714 DOI: 10.3390/life13051069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 03/24/2023] [Accepted: 04/11/2023] [Indexed: 05/28/2023] Open
Abstract
Single nucleotide polymorphism (SNP) is one of the most widely used molecular markers to help researchers understand the relationship between phenotypes and genotypes. SNP calling mainly consists of two steps, including read alignment and locus identification based on statistical models, and various software have been developed and applied in this issue. Meanwhile, in our study, very low agreement (<25%) was found among the prediction results generated by different software, which was much less consistent than expected. In order to obtain the optimal protocol of SNP mining in tree species, the algorithm principles of different alignment and SNP mining software were discussed in detail. And the prediction results were further validated based on in silico and experimental methods. In addition, hundreds of validated SNPs were provided along with some practical suggestions on program selection and accuracy improvement were provided, and we wish that these results could lay the foundation for the subsequent analysis of SNP mining.
Collapse
Affiliation(s)
- Mengjia Bu
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Area, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng 475004, China
- Shenzhen Research Institute of Henan University, Shenzhen 518000, China
| | - Mengxuan Xu
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| | - Shentong Tao
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
| | - Peng Cui
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Area, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Bing He
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Area, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| |
Collapse
|
18
|
Gudkov M, Thibaut L, Khushi M, Blue GM, Winlaw DS, Dunwoodie SL, Giannoulatou E. ConanVarvar: a versatile tool for the detection of large syndromic copy number variation from whole-genome sequencing data. BMC Bioinformatics 2023; 24:49. [PMID: 36792982 PMCID: PMC9930243 DOI: 10.1186/s12859-023-05154-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 01/19/2023] [Indexed: 02/17/2023] Open
Abstract
BACKGROUND A wide range of tools are available for the detection of copy number variants (CNVs) from whole-genome sequencing (WGS) data. However, none of them focus on clinically-relevant CNVs, such as those that are associated with known genetic syndromes. Such variants are often large in size, typically 1-5 Mb, but currently available CNV callers have been developed and benchmarked for the discovery of smaller variants. Thus, the ability of these programs to detect tens of real syndromic CNVs remains largely unknown. RESULTS Here we present ConanVarvar, a tool which implements a complete workflow for the targeted analysis of large germline CNVs from WGS data. ConanVarvar comes with an intuitive R Shiny graphical user interface and annotates identified variants with information about 56 associated syndromic conditions. We benchmarked ConanVarvar and four other programs on a dataset containing real and simulated syndromic CNVs larger than 1 Mb. In comparison to other tools, ConanVarvar reports 10-30 times less false-positive variants without compromising sensitivity and is quicker to run, especially on large batches of samples. CONCLUSIONS ConanVarvar is a useful instrument for primary analysis in disease sequencing studies, where large CNVs could be the cause of disease.
Collapse
Affiliation(s)
- Mikhail Gudkov
- grid.1057.30000 0000 9472 3971Victor Chang Cardiac Research Institute, Sydney, NSW 2010 Australia ,grid.1013.30000 0004 1936 834XSchool of Biomedical Engineering, The University of Sydney, Sydney, NSW 2006 Australia ,grid.1005.40000 0004 4902 0432St Vincent’s Clinical Campus, School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Sydney, NSW 2010 Australia
| | - Loïc Thibaut
- grid.1057.30000 0000 9472 3971Victor Chang Cardiac Research Institute, Sydney, NSW 2010 Australia ,grid.1005.40000 0004 4902 0432School of Mathematics and Statistics, UNSW Sydney, Sydney, NSW 2052 Australia
| | - Matloob Khushi
- grid.1013.30000 0004 1936 834XSchool of Computer Science, The University of Sydney, Sydney, NSW 2006 Australia
| | - Gillian M. Blue
- grid.1013.30000 0004 1936 834XSydney Medical School, The University of Sydney, Sydney, NSW 2006 Australia ,grid.413973.b0000 0000 9690 854XHeart Centre for Children, The Children’s Hospital at Westmead, Sydney, NSW 2145 Australia
| | - David S. Winlaw
- grid.1013.30000 0004 1936 834XSydney Medical School, The University of Sydney, Sydney, NSW 2006 Australia ,grid.413973.b0000 0000 9690 854XHeart Centre for Children, The Children’s Hospital at Westmead, Sydney, NSW 2145 Australia
| | - Sally L. Dunwoodie
- grid.1057.30000 0000 9472 3971Victor Chang Cardiac Research Institute, Sydney, NSW 2010 Australia ,grid.1005.40000 0004 4902 0432St Vincent’s Clinical Campus, School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Sydney, NSW 2010 Australia ,grid.1005.40000 0004 4902 0432School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW 2052 Australia
| | - Eleni Giannoulatou
- Victor Chang Cardiac Research Institute, Sydney, NSW, 2010, Australia. .,St Vincent's Clinical Campus, School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Sydney, NSW, 2010, Australia.
| |
Collapse
|
19
|
Callea M, Bellacchio E, Cammarata Scalisi F, El Feghaly J, El-Ghandour RK, Avendaño A, Yavuz Y, Diociaiuti A, Digilio MC, DI Stazio M, Novelli A, Oranges T, Filippeschi C, Pisaneschi E, Jilani H, Gigola F, Willoughby CE, Morabito A. Next generation sequencing panel target genes: possible diagnostic tool for ectodermal dysplasia related diseases. Ital J Dermatol Venerol 2023; 158:32-38. [PMID: 36939501 DOI: 10.23736/s2784-8671.23.07540-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2023]
Abstract
BACKGROUND Ectodermal dysplasias (EDs) are a large and complex group of disorders affecting the ectoderm-derived organs; the clinical and genetic heterogeneity of these conditions renders an accurate diagnosis more challenging. The aim of this study is to demonstrate the clinical utility of a targeted resequencing panel through enhancing the molecular and clinical diagnosis of EDs. Given the recent developments in gene and protein-based therapies for X-linked hypohidrotic ectodermal dysplasia, there is a re-emerging interest in identifying the genetic basis of EDs and the respective phenotypic presentations, in an aim to facilitate potential treatments for affected families. METHODS We assessed seventeen individuals, from three unrelated families, who presented with diverse phenotypes suggestive of ED. An extensive multidisciplinary clinical evaluation was performed followed by a targeted exome resequencing panel (including genes that are known to cause EDs). MiSeqTM data software was used, variants with Qscore >30 were accepted. RESULTS Three different previously reported hemizygous EDA mutations were found in the families. However, a complete genotype-phenotype correlation could not be established, neither in our patients nor in the previously reported patients. CONCLUSIONS Targeted exome resequencing can provide a rapid and accurate diagnosis of EDs, while further contributing to the existing ED genetic data. Moreover, the identification of the disease-causing mutation in an affected family is crucial for proper genetic counseling and the establishment of a genotype-phenotype correlation which will subsequently provide the affected individuals with a more suitable treatment plan.
Collapse
Affiliation(s)
- Michele Callea
- Unit of Pediatric Dentistry and Special Dental Care, Meyer Children's Hospital IRCCS, Florence, Italy
| | | | | | - Jinia El Feghaly
- Department of Pediatric Dermatology, University of Rochester, Rochester, MN, USA
| | - Rabab K El-Ghandour
- Department of Pediatric Dentistry, Faculty of Dentistry, Pharos University, Alexandria, Egypt
| | - Andrea Avendaño
- Unit of Genetic Medicine, Department of Childcare Pediatrics, University of Los Andes, Mérida, Venezuela
| | - Yasemine Yavuz
- Department of Restorative Dentistry, Faculty of Dentistry, Harran University, Sanliurfa, Türkiye
| | - Andrea Diociaiuti
- Division of Dermatology, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Maria C Digilio
- Division of Genetics and Rare Diseases Research, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | | | - Antonio Novelli
- Division of Genetics and Rare Diseases Research, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | | | | | - Elisa Pisaneschi
- Division of Genetics and Rare Diseases Research, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Houweyda Jilani
- Department of Genetics, Mongi Slim Hospital, Marsa, Tunisia
- Faculty of Medicine of Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - Francesca Gigola
- Department of Neurofarba, University of Florence, Florence, Italy -
- Department of Pediatric Surgery, Meyer Children's Hospital IRCCS, Florence, Italy
| | | | - Antonino Morabito
- Department of Neurofarba, University of Florence, Florence, Italy
- Department of Pediatric Surgery, Meyer Children's Hospital IRCCS, Florence, Italy
| |
Collapse
|
20
|
Yu C, Qi X, Yan W, Wu W, Shen B. Next-Generation Sequencing Markup Language (NGSML): A Medium for the Representation and Exchange of NGS Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:576-585. [PMID: 35085089 DOI: 10.1109/tcbb.2022.3144170] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
With the increasing demand for low-cost high-throughput sequencing of large genomes, next-generation sequencing (NGS) technology has developed rapidly. NGS can not only be used in basic scientific research but also in clinical diagnostics and healthcare. Numerous software systems and tools have been developed to analyze NGS data, and various data formats have been produced to accommodate different sequencing equipment providers or analytical software. However, the data interoperability between these tools brings great challenges to researchers. A generic format that could be shared by most of the software and tools in the NGS field would make data interoperability and sharing easier. In this paper, we defined a general XML-based NGS markup language (NGSML) format for the representation and exchange of NGS data. We also developed a user-friendly GUI tool, NGSMLEditor, for presenting, creating, editing, and converting NGSML files. By using NGSML, various types of NGS data can be saved in one unified format. Compared with the unstructured plain text file, a structured data format based on XML technology solves the incompatibility of various NGS data formats. The NGSML specifications are freely available from http://www.sysbio.org.cn/NGSML. NGSMLEditor is open source under GNU GPL and can be downloaded from the website.
Collapse
|
21
|
Wadapurkar RM, Sivaram A, Vyas R. Computational studies reveal co-occurrence of two mutations in IL7R gene of high-grade serous carcinoma patients. J Biomol Struct Dyn 2022; 40:13310-13324. [PMID: 34657565 DOI: 10.1080/07391102.2021.1987326] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Major cause of mortality in ovarian cancer can be attributed to a lack of specific and sensitive biomarkers for diagnosis and prognosis of the disease. Uncovering the mutations in genes involved in crucial oncogenic pathways is a key step in discovery and development of novel biomarkers. Whole exome sequencing (WES) is a powerful method for the detection of cancer driver mutations. The present work focuses on identifying functionally damaging mutations in patients with high-grade serous ovarian carcinoma (HGSC) through computational analysis of WES. In this study, WES data of HGSC patients was retrieved from the genomic literature available in sequence read archive, the variants were identified and comprehensive structural and functional analysis was performed. Interestingly, I66T and V138I mutations were found to be co-occurring in the IL7R gene in four out of five HGSC patient samples investigated in this study. The V138I mutation was located in the fibronectin type-3 domain and computationally assessed to be causing disruptive effects on the structure and dynamics of IL7R protein. This mutation was found to be co-occurring with the neutral I66T mutation in the same domain which compensated the disruptive effects of V138I variant. These comprehensive studies point to a hitherto unexplored significant role of the IL7R gene in ovarian carcinoma. It is envisaged that the work will lay the foundation for the development of a novel biomarker with potential application in molecular profiling and in estimation of the disease prognosis.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Rucha M Wadapurkar
- MIT School of Bioengineering Sciences & Research, MIT-ADT University, Pune, Maharashtra, India
| | - Aruna Sivaram
- MIT School of Bioengineering Sciences & Research, MIT-ADT University, Pune, Maharashtra, India
| | - Renu Vyas
- MIT School of Bioengineering Sciences & Research, MIT-ADT University, Pune, Maharashtra, India
| |
Collapse
|
22
|
Coutinho MG, Câmara GB, Barbosa RDM, Fernandes MA. SARS-CoV-2 virus classification based on stacked sparse autoencoder. Comput Struct Biotechnol J 2022; 21:284-298. [PMID: 36530948 PMCID: PMC9742810 DOI: 10.1016/j.csbj.2022.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 12/04/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022] Open
Abstract
Since December 2019, the world has been intensely affected by the COVID-19 pandemic, caused by the SARS-CoV-2. In the case of a novel virus identification, the early elucidation of taxonomic classification and origin of the virus genomic sequence is essential for strategic planning, containment, and treatments. Deep learning techniques have been successfully used in many viral classification problems associated with viral infection diagnosis, metagenomics, phylogenetics, and analysis. Considering that motivation, the authors proposed an efficient viral genome classifier for the SARS-CoV-2 using the deep neural network based on the stacked sparse autoencoder (SSAE). For the best performance of the model, we explored the utilization of image representations of the complete genome sequences as the SSAE input to provide a classification of the SARS-CoV-2. For that, a dataset based on k-mers image representation was applied. We performed four experiments to provide different levels of taxonomic classification of the SARS-CoV-2. The SSAE technique provided great performance results in all experiments, achieving classification accuracy between 92% and 100% for the validation set and between 98.9% and 100% when the SARS-CoV-2 samples were applied for the test set. In this work, samples of the SARS-CoV-2 were not used during the training process, only during subsequent tests, in which the model was able to infer the correct classification of the samples in the vast majority of cases. This indicates that our model can be adapted to classify other emerging viruses. Finally, the results indicated the applicability of this deep learning technique in genome classification problems.
Collapse
Affiliation(s)
- Maria G.F. Coutinho
- Laboratory of Machine Learning and Intelligent Instrumentation, IMD/nPITI, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Gabriel B.M. Câmara
- Laboratory of Machine Learning and Intelligent Instrumentation, IMD/nPITI, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Raquel de M. Barbosa
- Department of Pharmacy and Pharmaceutical Technology, University of Granada, 18071 Granada, Spain
| | - Marcelo A.C. Fernandes
- Laboratory of Machine Learning and Intelligent Instrumentation, IMD/nPITI, Federal University of Rio Grande do Norte, Natal, Brazil
- Department of Computer and Automation Engineering, Federal University of Rio Grande do Norte, Natal, Brazil
| |
Collapse
|
23
|
Exploration of Tools for the Interpretation of Human Non-Coding Variants. Int J Mol Sci 2022; 23:ijms232112977. [PMID: 36361767 PMCID: PMC9654743 DOI: 10.3390/ijms232112977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/17/2022] [Accepted: 10/23/2022] [Indexed: 02/01/2023] Open
Abstract
The advent of Whole Genome Sequencing (WGS) broadened the genetic variation detection range, revealing the presence of variants even in non-coding regions of the genome, which would have been missed using targeted approaches. One of the most challenging issues in WGS analysis regards the interpretation of annotated variants. This review focuses on tools suitable for the functional annotation of variants falling into non-coding regions. It couples the description of non-coding genomic areas with the results and performance of existing tools for a functional interpretation of the effect of variants in these regions. Tools were tested in a controlled genomic scenario, representing the ground-truth and allowing us to determine software performance.
Collapse
|
24
|
Talenti A, Powell J, Wragg D, Chepkwony M, Fisch A, Ferreira BR, Mercadante MEZ, Santos IM, Ezeasor CK, Obishakin ET, Muhanguzi D, Amanyire W, Silwamba I, Muma JB, Mainda G, Kelly RF, Toye P, Connelley T, Prendergast J. Optical mapping compendium of structural variants across global cattle breeds. Sci Data 2022; 9:618. [PMID: 36229544 PMCID: PMC9561109 DOI: 10.1038/s41597-022-01684-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 09/04/2022] [Indexed: 11/30/2022] Open
Abstract
Structural variants (SV) have been linked to important bovine disease phenotypes, but due to the difficulty of their accurate detection with standard sequencing approaches, their role in shaping important traits across cattle breeds is largely unexplored. Optical mapping is an alternative approach for mapping SVs that has been shown to have higher sensitivity than DNA sequencing approaches. The aim of this project was to use optical mapping to develop a high-quality database of structural variation across cattle breeds from different geographical regions, to enable further study of SVs in cattle. To do this we generated 100X Bionano optical mapping data for 18 cattle of nine different ancestries, three continents and both cattle sub-species. In total we identified 13,457 SVs, of which 1,200 putatively overlap coding regions. This resource provides a high-quality set of optical mapping-based SV calls that can be used across studies, from validating DNA sequencing-based SV calls to prioritising candidate functional variants in genetic association studies and expanding our understanding of the role of SVs in cattle evolution. Measurement(s) | Optical Mapping | Technology Type(s) | Optical Mapping | Factor Type(s) | Structural variants | Sample Characteristic - Organism | Bos taurus | Sample Characteristic - Location | United Kingdom • Kenya • Zambia • Uganda • Brazil • Nigeria |
Collapse
Affiliation(s)
- A Talenti
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom.
| | - J Powell
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom
| | - D Wragg
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom.,Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, UK
| | - M Chepkwony
- The International Livestock Research Institute, PO Box 30709, Nairobi, Kenya.,Centre for Tropical Livestock Genetics and Health, ILRI Kenya, Nairobi, 30709-00100, Kenya
| | - A Fisch
- Ribeirão Preto College of Nursing, University of Sao Paulo, Ribeirão Preto, SP, Brazil
| | - B R Ferreira
- Ribeirão Preto College of Nursing, University of Sao Paulo, Ribeirão Preto, SP, Brazil
| | - M E Z Mercadante
- Institute of Animal Science, Agriculture Department of São Paulo Government, Sertãozinho, SP, 14.174-000, Brazil
| | - I M Santos
- Ribeirão Preto School of Medicine, University of São Paulo, Ribeirão Preto, SP, 14049-900, Brazil
| | - C K Ezeasor
- Department of Veterinary Pathology and Microbiology, University of Nigeria, Nsukka, Enugu State, Nigeria
| | - E T Obishakin
- Biotechnology Division, National Veterinary Research Institute, Vom, Plateau State, Nigeria.,Biomedical Research Centre, Ghent University Global Campus, Songdo, Incheon, South Korea
| | - D Muhanguzi
- School of Biosecurity, Biotechnology and Laboratory Sciences (SBLS), College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, P.O Box 7062, Kampala, Uganda
| | - W Amanyire
- School of Biosecurity, Biotechnology and Laboratory Sciences (SBLS), College of Veterinary Medicine, Animal Resources and Biosecurity, Makerere University, P.O Box 7062, Kampala, Uganda
| | - I Silwamba
- Department of Disease Control, School of Veterinary Medicine, University of Zambia, P.O BOX 32379, Lusaka, Zambia.,Department of Laboratory and Diagnostics, Livestock Services Cooperative Society, P.O. BOX 32025, Lusaka, Zambia
| | - J B Muma
- Department of Disease Control, School of Veterinary Medicine, University of Zambia, P.O BOX 32379, Lusaka, Zambia
| | - G Mainda
- Department of Veterinary Services, Ministry of Fisheries and Livestock, Central Veterinary Research Institute, P.O. Box 33980, Lusaka, Zambia
| | - R F Kelly
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom.,Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, UK
| | - P Toye
- The International Livestock Research Institute, PO Box 30709, Nairobi, Kenya
| | - T Connelley
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom. .,Centre for Tropical Livestock Genetics and Health, Easter Bush, Midlothian, EH25 9RG, UK.
| | - J Prendergast
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, United Kingdom. .,Centre for Tropical Livestock Genetics and Health, Easter Bush, Midlothian, EH25 9RG, UK.
| |
Collapse
|
25
|
Kang W, Tong Y, Zhang W, Jian M, Zhang A, Ren G, Fan H, Yang J. Computational Biology Predicts the Efficacy of Tumor Immune Checkpoint Blockade. BIOMED RESEARCH INTERNATIONAL 2022; 2022:6087751. [PMID: 36212709 PMCID: PMC9534640 DOI: 10.1155/2022/6087751] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 08/09/2022] [Accepted: 09/16/2022] [Indexed: 12/02/2022]
Abstract
Tumor immunotherapy is considered as one of the most promising methods in cancer treatment in recent years. Immune checkpoint blockade (ICB) can activate immune cells to destroy tumors by relieving the inhibitory pathway of tumor cells to immune cells. In silico prediction of the ICB response is an important step toward achieving effective and personalized cancer immunotherapy. Although immune checkpoint inhibitors have shown exciting clinical effects in the treatment of many types of tumors, there are still some clinical problems in practical application, such as low response rate and large individualized differences. How to predict the efficacy of effective individualized immune checkpoint inhibitors for tumor patients based on specific biomarkers and computational models is one of the key issues in the immunotherapy of this kind of tumor. In our work, from the five levels of genome level, transcription level, epigenetic level, microbial taxonomy level, and the immune cell infiltration profile level, the biomarkers and in silico calculation methods that affect the efficacy of tumor immune checkpoint inhibitors are comprehensively summarized.
Collapse
Affiliation(s)
- Wenyi Kang
- Department of Oncology, The First Affiliated Hospital of Yangtze University, Jingzhou, 434000 Hubei, China
| | - Yao Tong
- School of Medicine, Wuhan University of Science and Technology, Wuhan, China 430061
| | - Weijia Zhang
- Department of Oncology, The First Affiliated Hospital of Yangtze University, Jingzhou, 434000 Hubei, China
| | - Mengru Jian
- Department of Oncology, The First Affiliated Hospital of Yangtze University, Jingzhou, 434000 Hubei, China
| | - Anqi Zhang
- Department of Oncology, The First Affiliated Hospital of Yangtze University, Jingzhou, 434000 Hubei, China
| | - Guoqing Ren
- Department of Laboratory Medicine, Chuzhou Maternal and Child Health Care and Family Planning Service Center, Chuzhou 239000, China
| | - Hao Fan
- Huanggang Central Hospital of Yangtze University, Huanggang 43800, China
| | - Jiyuan Yang
- Department of Oncology, The First Affiliated Hospital of Yangtze University, Jingzhou, 434000 Hubei, China
| |
Collapse
|
26
|
Sangeet S, Khan A. Exploratory Data Analysis of Genomic Sequence of Variants of SARS-CoV-2 Reveals Sequence Divergence and Mutational Localization. Bioinform Biol Insights 2022; 16:11779322221126294. [PMID: 36157509 PMCID: PMC9500253 DOI: 10.1177/11779322221126294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 08/28/2022] [Indexed: 11/23/2022] Open
Abstract
Whole genome sequencing has rapidly progressed in recent years, with sequencing the SARS-CoV-2 genomes, making it a more reliable clinical tool for public health surveillance. This development has resulted in the production of a large amount of genomic data used for various types of genomic exploration. However, without a proper standard protocol, the usage of genomic data for analyzing various biological phenomena, such as mutation and evolution, may result in a propagating risk of using an unvalidated data set. This process could lead to irregular data being generated along with a high risk of altered analysis. Thus, the current study lays out the foundation for a preprocess pipeline using data analysis to analyze the genomic data set for its accuracy. We have used the recent example of SARS-CoV-2 to demonstrate the process overflow that can be utilized for various kinds of biological exploration such as understanding mutational events, evolutionary divergence, and speciation. Our analysis reveals a significant amount of sequence divergence in the gamma variant as compared with the reference genome thereby making the variant less infective and deadly. Moreover, we found regions in the genomic sequence that is more prone to mutational localization thereby altering the structural integrity of the virus resulting in a more reliable molecular viral mechanism. We believe that the current work will help for an initial check of the genomic data followed by the biological assessment of the process overflow which will be beneficial for the variant analysis and mutational uprising.
Collapse
Affiliation(s)
- Satyam Sangeet
- Department of Biological Science and Engineering, Maulana Azad National Institute of Technology, Bhopal, India
| | - Arshad Khan
- Department of Biological Science and Engineering, Maulana Azad National Institute of Technology, Bhopal, India
| |
Collapse
|
27
|
Zhang J, Xu M, Zou X, Chen J. Structural and functional characteristics of soil microbial community in a Pinus massoniana forest at different elevations. PeerJ 2022; 10:e13504. [PMID: 35860041 PMCID: PMC9290995 DOI: 10.7717/peerj.13504] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 05/05/2022] [Indexed: 01/17/2023] Open
Abstract
Shifts in forest soil microbial communities over altitudinal gradients have long been attracting scientific interest. The distribution patterns of different soil microbial communities along altitudinal gradients in subtropical mountain forest ecosystems remain unclear. To better understand the changes in soil microbial communities along an altitude gradient, we used Illumina MiSeq metagenome sequencing technology to survey the soil microbial communities in a Pinus massoniana forest at four elevations (Mp1000, Mp1200, Mp1400, Mp1600) and in a tea garden in Guizhou Leigong Mountain in Southwestern China. We observed that the richness of bacteria, fungi, and viruses in the soil microbial community changed in a unimodal pattern with increasing elevation while that of Archaea first increased significantly, then decreased, and finally increased again. Euryarchaeota and Thaumarchaeota were the predominant Archaea, Proteobacteria and Acidobacteria were the predominant bacterial groups, Ascomycota and Basidiomycota were the predominant fungal groups, and Myoviridae, Podoviridae, and Siphoviridae were the predominant virus groups. Amino acid transport and metabolism, energy production and conversion, signal transduction mechanisms, and DNA replication, restructuring and repair were the predominant categories as per NOG function gene-annotation. Carbohydrate metabolism, global and overview map, amino acid metabolism, and energy metabolism were predominant categories in the KEGG pathways. Glycosyl transferase and glycoside hydrolase were predominant categories among carbohydrate enzyme-functional genes. Cluster, redundancy, and co-occurring network analyses showed obvious differences in the composition, structure, and function of different soil microbial communities along the altitudinal gradient studied. Our findings indicate that the different soil microbial communities along the altitudinal gradient have different distribution patterns, which may provide a better understanding of the mechanisms that determine microbial life in a mid-subtropical mountain forest ecosystem.
Collapse
Affiliation(s)
- Jian Zhang
- The Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), Guizhou University, Guiyang, China,Institute of Fungal Resources, Institute of Edible Fungus, College of Life Sciences, Guizhou University, Guiyang, China
| | - Ming Xu
- The Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), Guizhou University, Guiyang, China,Institute of Fungal Resources, Institute of Edible Fungus, College of Life Sciences, Guizhou University, Guiyang, China
| | - Xiao Zou
- The Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), Guizhou University, Guiyang, China,Institute of Fungal Resources, Institute of Edible Fungus, College of Life Sciences, Guizhou University, Guiyang, China
| | - Jin Chen
- Institute of Fungal Resources, Institute of Edible Fungus, College of Life Sciences, Guizhou University, Guiyang, China
| |
Collapse
|
28
|
Dunn GP, Sherpa N, Manyanga J, Johanns TM. Considerations for personalized neoantigen vaccination in Malignant glioma. Adv Drug Deliv Rev 2022; 186:114312. [PMID: 35487282 DOI: 10.1016/j.addr.2022.114312] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 04/12/2022] [Accepted: 04/21/2022] [Indexed: 12/11/2022]
Abstract
Malignant gliomas are the most common primary brain cancer diagnosed and still carry a poor prognosis despite aggressive multimodal management. Despite the continued advances in immunotherapy for other cancer types, however, there remain no FDA approved immunotherapies for cancers such as glioblastoma. OF the many approaches being explored, cancer vaccine programs are undergoing a renaissance due to the technological advances and personalized nature of their contemporary design. Neoantigen vaccines are a form of immunotherapy involving the use of DNA, mRNA, and proteins derived from non-synonymous mutations identified in patient tumor tissue samples to stimulate tumor-specific T-cell reactivity leading to enhance tumor targeting. In the last several years, the study of neoantigens as a therapeutic target has increased, with the routine workflow implementation of comprehensive next generation sequencing and in silico peptide binding prediction algorithms. Several neoantigen vaccine platforms are being evaluated in clinical trials for malignancies including melanoma, pancreatic cancer, breast cancer, lung cancer, and glioblastoma, among others. In this review, we will review the concept of neoantigen discovery using cancer immunogenomics approaches in glioblastoma and explore the disease-specific issues being addressed in the design of effective personalized cancer vaccine strategies.
Collapse
Affiliation(s)
- Gavin P Dunn
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA, United States
| | - Ngima Sherpa
- Department of Neurosurgery, Massachusetts General Hospital, Boston, MA, United States
| | - Jimmy Manyanga
- Department of Neurological Surgery, Washington University School of Medicine, St Louis, MO, United States
| | - Tanner M Johanns
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, United States; The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO, United States
| |
Collapse
|
29
|
Guo Y, Li B, Li M, Zhu H, Yang Q, Liu X, Qu L, Fan L, Wang T. Efficient marker-assisted breeding for clubroot resistance in elite Pol-CMS rapeseed varieties by updating the PbBa8.1 locus. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2022; 42:41. [PMID: 37313506 PMCID: PMC10248692 DOI: 10.1007/s11032-022-01305-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 06/03/2022] [Indexed: 06/15/2023]
Abstract
Clubroot disease poses a severe threat to rapeseed (Brassica napus) production worldwide and has recently been spreading across China at an unprecedented pace. Breeding and cultivation of resistant varieties constitute a promising and environment-friendly approach to mitigating this threat. In this study, the clubroot resistance locus PbBa8.1 was successfully transferred into SC4, a shared paternal line of three elite varieties in five generations by marker-assisted backcross breeding. Kompetitive allele specific PCR (KASP) markers of clubroot resistance gene PbBa8.1 and its linked high erucic acid gene (FAE1) were designed and applied for foreground selection, and 1,000 single-nucleotide polymorphisms (SNPs) were selected and used for the background selection. This breeding strategy produced recombinants with the highest recovery ratio of the recurrent parent genome (> 95%) at BC2F2 while breaking the linkage with FAE1 during the selection. An updated version of the paternal line (SC4R) was generated at BC2F3, showing significantly improved clubroot resistance at the seedling stage via artificial inoculation, and was comparable to that of the donor parent. Field trials of the three elite varieties and their updated versions in five environments indicated similar agronomic appearance and final yield. The introduced breeding strategy precisely pyramids the PbBa8.1 and FAE1 loci with the assistance of technical markers in a shorter period and could be applied to other desirable traits for directional improvement in the future. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-022-01305-9.
Collapse
Affiliation(s)
- Yiming Guo
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, 410125 China
- Hunan Engineering and Technology Research Center of Hybrid Rapeseed, Changsha, 410125 China
| | - Bao Li
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, 410125 China
- Hunan Engineering and Technology Research Center of Hybrid Rapeseed, Changsha, 410125 China
| | - Mei Li
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, 410125 China
- Hunan Engineering and Technology Research Center of Hybrid Rapeseed, Changsha, 410125 China
| | - Hongjian Zhu
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128 China
| | - Qian Yang
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, 410125 China
- Hunan Engineering and Technology Research Center of Hybrid Rapeseed, Changsha, 410125 China
| | - Xinhong Liu
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, 410125 China
- Hunan Engineering and Technology Research Center of Hybrid Rapeseed, Changsha, 410125 China
| | - Liang Qu
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, 410125 China
- Hunan Engineering and Technology Research Center of Hybrid Rapeseed, Changsha, 410125 China
| | - Lianyi Fan
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, 410125 China
- Hunan Engineering and Technology Research Center of Hybrid Rapeseed, Changsha, 410125 China
| | - Tonghua Wang
- Crop Research Institute, Hunan Academy of Agricultural Sciences, Changsha, 410125 China
- Hunan Engineering and Technology Research Center of Hybrid Rapeseed, Changsha, 410125 China
| |
Collapse
|
30
|
Kumar S U, Balasundaram A, Cathryn R H, Varghese RP, R S, R G, Younes S, Zayed H, Doss C GP. Whole-exome sequencing analysis of NSCLC reveals the pathogenic missense variants from cancer-associated genes. Comput Biol Med 2022; 148:105701. [PMID: 35753820 DOI: 10.1016/j.compbiomed.2022.105701] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 05/17/2022] [Accepted: 06/04/2022] [Indexed: 11/16/2022]
Abstract
BACKGROUND Non-small-cell lung cancer (NSCLC) is the most common type of lung cancer. NSCLC accounts for 84% of all lung cancer cases. In recent years, advances in pathway understanding, methods for discovering novel genetic biomarkers, and new drugs designed to inhibit the signaling cascades have enabled clinicians to personalize therapy for NSCLC. OBJECTIVES The primary aim of this study is to identify the genes associated with NSCLC that harbor pathogenic variants that could be causative for NSCLC. The second aim is to investigate their roles in different pathways that lead to NSCLC. METHODS We examined exome-sequencing datasets from 54 NSCLC patients to characterize the variants associated with NSCLC. RESULTS Our findings revealed that 17 variants in 14 genes were considered highly pathogenic, including CDKN2A, ERBB2, FOXP1, IDH1, JAK3, KMT2D, K-Ras, MSH3, MSH6, POLE, RNF43, TCF7L2, TP53, and TSC1. Gene set enrichment analysis revealed the involvement of transmembrane receptor protein tyrosine kinase activity, protein binding, ATP binding, phosphatidylinositol-4,5-bisphosphate 3-kinase, and Ras guanyl-nucleotide exchange factor activity. Pathway analysis of these genes yielded different cancer-related pathways, including colorectal, prostate, endometrial, pancreatic, PI3K-Akt signaling pathways, and signaling pathways regulating pluripotency of stem cells. Module 1 from protein-protein interactions (PPIs) identified genes that harbor pathogenic SNPs. Three of the most deleterious SNPs are ERBB2 (rs1196929947), K-Ras (rs121913529), and POLE (rs751425952). Interestingly, one patient has a pathogenic K-Ras variant (rs121913529) co-occurred with the missense variant (rs752054698) inTSC1 gene. CONCLUSION This study maps highly pathogenic variants associated with NSCLC and investigates their contributions to the pathogenesis of NSCLC. This study sheds light on the potential applications of precision medicine in patients with NSCLC.
Collapse
Affiliation(s)
- Udhaya Kumar S
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - Ambritha Balasundaram
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - Hephzibah Cathryn R
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - Rinku Polachirakkal Varghese
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - Siva R
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - Gnanasambandan R
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - Salma Younes
- Department of Biomedical Sciences, College of Health and Sciences, Qatar University, QU Health, Doha, 2713, Qatar
| | - Hatem Zayed
- Department of Biomedical Sciences, College of Health and Sciences, Qatar University, QU Health, Doha, 2713, Qatar
| | - George Priya Doss C
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India.
| |
Collapse
|
31
|
Mani DR, Krug K, Zhang B, Satpathy S, Clauser KR, Ding L, Ellis M, Gillette MA, Carr SA. Cancer proteogenomics: current impact and future prospects. Nat Rev Cancer 2022; 22:298-313. [PMID: 35236940 DOI: 10.1038/s41568-022-00446-5] [Citation(s) in RCA: 76] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/21/2022] [Indexed: 02/07/2023]
Abstract
Genomic analyses in cancer have been enormously impactful, leading to the identification of driver mutations and development of targeted therapies. But the functions of the vast majority of somatic mutations and copy number variants in tumours remain unknown, and the causes of resistance to targeted therapies and methods to overcome them are poorly defined. Recent improvements in mass spectrometry-based proteomics now enable direct examination of the consequences of genomic aberrations, providing deep and quantitative characterization of tumour tissues. Integration of proteins and their post-translational modifications with genomic, epigenomic and transcriptomic data constitutes the new field of proteogenomics, and is already leading to new biological and diagnostic knowledge with the potential to improve our understanding of malignant transformation and therapeutic outcomes. In this Review we describe recent developments in proteogenomics and key findings from the proteogenomic analysis of a wide range of cancers. Considerations relevant to the selection and use of samples for proteogenomics and the current technologies used to generate, analyse and integrate proteomic with genomic data are described. Applications of proteogenomics in translational studies and immuno-oncology are rapidly emerging, and the prospect for their full integration into therapeutic trials and clinical care seems bright.
Collapse
Affiliation(s)
- D R Mani
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA.
| | - Karsten Krug
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA
| | - Shankha Satpathy
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Karl R Clauser
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Li Ding
- Department of Medicine and Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Matthew Ellis
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA
| | - Michael A Gillette
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Steven A Carr
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA.
| |
Collapse
|
32
|
Späth GF, Bussotti G. GIP: an open-source computational pipeline for mapping genomic instability from protists to cancer cells. Nucleic Acids Res 2022; 50:e36. [PMID: 34928370 PMCID: PMC8989552 DOI: 10.1093/nar/gkab1237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 11/01/2021] [Accepted: 12/03/2021] [Indexed: 11/25/2022] Open
Abstract
Genome instability has been recognized as a key driver for microbial and cancer adaptation and thus plays a central role in many diseases. Genome instability encompasses different types of genomic alterations, yet most available genome analysis software are limited to just one type of mutation. To overcome this limitation and better understand the role of genetic changes in enhancing pathogenicity we established GIP, a novel, powerful bioinformatic pipeline for comparative genome analysis. Here, we show its application to whole genome sequencing datasets of Leishmania, Plasmodium, Candida and cancer. Applying GIP on available data sets validated our pipeline and demonstrated the power of our tool to drive biological discovery. Applied to Plasmodium vivax genomes, our pipeline uncovered the convergent amplification of erythrocyte binding proteins and identified a nullisomic strain. Re-analyzing genomes of drug adapted Candida albicans strains revealed correlated copy number variations of functionally related genes, strongly supporting a mechanism of epistatic adaptation through interacting gene-dosage changes. Our results illustrate how GIP can be used for the identification of aneuploidy, gene copy number variations, changes in nucleic acid sequences, and chromosomal rearrangements. Altogether, GIP can shed light on the genetic bases of cell adaptation and drive disease biomarker discovery.
Collapse
Affiliation(s)
- Gerald F Späth
- Institut Pasteur, Université de Paris, INSERM U1201, Unité de Parasitologie moléculaire et Signalisation, Paris, France
| | - Giovanni Bussotti
- Institut Pasteur, Université de Paris, INSERM U1201, Unité de Parasitologie moléculaire et Signalisation, Paris, France
- Institut Pasteur, Université de Paris, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| |
Collapse
|
33
|
Urbanek-Trzeciak MO, Kozlowski P, Galka-Marciniak P. miRMut: Annotation of mutations in miRNA genes from human whole-exome or whole-genome sequencing. STAR Protoc 2022; 3:101023. [PMID: 34977675 PMCID: PMC8686061 DOI: 10.1016/j.xpro.2021.101023] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Here, we present the miRMut protocol to annotate mutations found in miRNA genes based on whole-exome sequencing (WES) or whole-genome sequencing (WGS) results. The pipeline assigns mutation characteristics, including miRNA gene IDs (miRBase and MirGeneDB), mutation localization within the miRNA precursor structure, potential RNA-binding motif disruption, the ascription of mutation according to Human Genome Variation Society (HGVS) nomenclature, and miRNA gene characteristics, such as miRNA gene confidence and miRNA arm balance. The pipeline includes creating tabular and graphical summaries. For complete details on the use and execution of this protocol, please refer to Urbanek-Trzeciak et al. (2020).
Collapse
Affiliation(s)
- Martyna O. Urbanek-Trzeciak
- Department of Molecular Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Piotr Kozlowski
- Department of Molecular Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Paulina Galka-Marciniak
- Department of Molecular Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| |
Collapse
|
34
|
Liu J, Shen Q, Bao H. Comparison of seven SNP calling pipelines for the next-generation sequencing data of chickens. PLoS One 2022; 17:e0262574. [PMID: 35100292 PMCID: PMC8803190 DOI: 10.1371/journal.pone.0262574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 12/29/2021] [Indexed: 11/18/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are widely used in genome-wide association studies and population genetics analyses. Next-generation sequencing (NGS) has become convenient, and many SNP-calling pipelines have been developed for human NGS data. We took advantage of a gap knowledge in selecting the appropriated SNP calling pipeline to handle with high-throughput NGS data. To fill this gap, we studied and compared seven SNP calling pipelines, which include 16GT, genome analysis toolkit (GATK), Bcftools-single (Bcftools single sample mode), Bcftools-multiple (Bcftools multiple sample mode), VarScan2-single (VarScan2 single sample mode), VarScan2-multiple (VarScan2 multiple sample mode) and Freebayes pipelines, using 96 NGS data with the different depth gradients of approximately 5X, 10X, 20X, 30X, 40X, and 50X coverage from 16 Rhode Island Red chickens. The sixteen chickens were also genotyped with a 50K SNP array, and the sensitivity and specificity of each pipeline were assessed by comparison to the results of SNP arrays. For each pipeline, except Freebayes, the number of detected SNPs increased as the input read depth increased. In comparison with other pipelines, 16GT, followed by Bcftools-multiple, obtained the most SNPs when the input coverage exceeded 10X, and Bcftools-multiple obtained the most when the input was 5X and 10X. The sensitivity and specificity of each pipeline increased with increasing input. Bcftools-multiple had the highest sensitivity numerically when the input ranged from 5X to 30X, and 16GT showed the highest sensitivity when the input was 40X and 50X. Bcftools-multiple also had the highest specificity, followed by GATK, at almost all input levels. For most calling pipelines, there were no obvious changes in SNP numbers, sensitivities or specificities beyond 20X. In conclusion, (1) if only SNPs were detected, the sequencing depth did not need to exceed 20X; (2) the Bcftools-multiple may be the best choice for detecting SNPs from chicken NGS data, but for a single sample or sequencing depth greater than 20X, 16GT was recommended. Our findings provide a reference for researchers to select suitable pipelines to obtain SNPs from the NGS data of chickens or nonhuman animals.
Collapse
Affiliation(s)
- Jing Liu
- National Engineering Laboratory for Animal Breeding, Beijing Key Laboratory for Animal Genetic Improvement, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Qingmiao Shen
- National Engineering Laboratory for Animal Breeding, Beijing Key Laboratory for Animal Genetic Improvement, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Haigang Bao
- National Engineering Laboratory for Animal Breeding, Beijing Key Laboratory for Animal Genetic Improvement, College of Animal Science and Technology, China Agricultural University, Beijing, China
- * E-mail:
| |
Collapse
|
35
|
Sequence Fusion Algorithm of Tumor Gene Sequencing and Alignment Based on Machine Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2021:9444194. [PMID: 35003249 PMCID: PMC8741399 DOI: 10.1155/2021/9444194] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 11/24/2021] [Accepted: 12/10/2021] [Indexed: 11/18/2022]
Abstract
With the rapid development of DNA high-throughput testing technology, there is a high correlation between DNA sequence variation and human diseases, and detecting whether there is variation in DNA sequence has become a hot research topic at present. DNA sequence variation is relatively rare, and the establishment of DNA sequence sparse matrix, which can quickly detect and reason fusion variation point, has become an important work of tumor gene testing. Because there are differences between the current comparison software and mutation detection software in detecting the same sample, there are errors between the results of derivative sequence comparison and the detection of mutation. In this paper, SNP and InDel detection methods based on machine learning and sparse matrix detection are proposed, and VarScan 2, Genome Analysis Toolkit (GATK), BCFtools, and FreeBayes are compared. In the research of SNP and InDel detection with intelligent reasoning, the experimental results show that the detection accuracy and recall rate are better when the depth is increasing. The reasoning fusion method proposed in this paper has certain advantages in comparison effect and discovery in SNP and InDel and has good effect on swelling and pain gene detection.
Collapse
|
36
|
Landry KK, Seward DJ, Dragon JA, Slavik M, Xu K, McKinnon WC, Colello L, Sweasy J, Wallace SS, Cuke M, Wood ME. Investigation of discordant sibling pairs from hereditary breast cancer families and analysis of a rare PMS1 variant. Cancer Genet 2021; 260-261:30-36. [PMID: 34852986 DOI: 10.1016/j.cancergen.2021.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 10/12/2021] [Accepted: 11/11/2021] [Indexed: 11/02/2022]
Abstract
BACKGROUND It is likely that additional genes for hereditary breast cancer can be identified using a discordant sib pair design. Using this design we identified individuals harboring a rare PMS1 c.605G>A variant previously predicted to result in loss of function. OBJECTIVES A family-based design and predictive algorithms were used to prioritize candidate variants possibly associated with an increased risk of hereditary breast cancer. Functional analyses were performed for one of the candidate variants, PMS1 c.605G>A. METHODS 1) 14 discordant sister-pairs from hereditary breast cancer families were identified. 2) Whole exome sequencing was performed and candidate risk variants identified. 3) A rare PMS variant was identified in 2 unrelated affected sisters but no unaffected siblings. 4) Functional analysis of this variant was carried out using targeted mRNA sequencing. RESULTS Genotype-phenotype correlation did not demonstrate tracking of the variant with cancer in the family. Functional analysis revealed no difference in exon 6 incorporation, which was validated by analyzing PMS1 allele specific expression. CONCLUSIONS The PMS1 c.605G>A variant did not segregate with disease, and there was no variant-dependent impact on PMS1 exon 6 splicing, supporting this variant is likely benign. Functional analyses are imperative to understanding the clinical significance of predictive algorithms.
Collapse
Affiliation(s)
- K K Landry
- Department of Medicine Hematology-Oncology, UVM Medical Center, Burlington, VT, USA.
| | - D J Seward
- Department of Pathology and Laboratory Medicine, U-VM Larner College of Medicine, Burlington, VT, USA
| | - J A Dragon
- Department of Microbiology and Molecular Genetics, UVM Larner College of Medicine, Burlington, VT, USA
| | - M Slavik
- Department of Microbiology and Molecular Genetics, UVM Larner College of Medicine, Burlington, VT, USA
| | - K Xu
- Department of Pathology and Laboratory Medicine, U-VM Larner College of Medicine, Burlington, VT, USA
| | - W C McKinnon
- Department of Medicine Hematology-Oncology, UVM Medical Center, Burlington, VT, USA
| | - L Colello
- Department of Medicine Hematology-Oncology, UVM Medical Center, Burlington, VT, USA
| | - J Sweasy
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ, USA
| | - S S Wallace
- Department of Microbiology and Molecular Genetics, UVM Larner College of Medicine, Burlington, VT, USA
| | - M Cuke
- Department of Medicine Hematology-Oncology, UVM Medical Center, Burlington, VT, USA
| | - M E Wood
- Department of Medicine Hematology-Oncology, UVM Medical Center, Burlington, VT, USA
| |
Collapse
|
37
|
Tangaro MA, Mandreoli P, Chiara M, Donvito G, Antonacci M, Parisi A, Bianco A, Romano A, Bianchi DM, Cangelosi D, Uva P, Molineris I, Nosi V, Calogero RA, Alessandri L, Pedrini E, Mordenti M, Bonetti E, Sangiorgi L, Pesole G, Zambelli F. Laniakea@ReCaS: exploring the potential of customisable Galaxy on-demand instances as a cloud-based service. BMC Bioinformatics 2021; 22:544. [PMID: 34749633 PMCID: PMC8574934 DOI: 10.1186/s12859-021-04401-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 09/24/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Improving the availability and usability of data and analytical tools is a critical precondition for further advancing modern biological and biomedical research. For instance, one of the many ramifications of the COVID-19 global pandemic has been to make even more evident the importance of having bioinformatics tools and data readily actionable by researchers through convenient access points and supported by adequate IT infrastructures. One of the most successful efforts in improving the availability and usability of bioinformatics tools and data is represented by the Galaxy workflow manager and its thriving community. In 2020 we introduced Laniakea, a software platform conceived to streamline the configuration and deployment of "on-demand" Galaxy instances over the cloud. By facilitating the set-up and configuration of Galaxy web servers, Laniakea provides researchers with a powerful and highly customisable platform for executing complex bioinformatics analyses. The system can be accessed through a dedicated and user-friendly web interface that allows the Galaxy web server's initial configuration and deployment. RESULTS "Laniakea@ReCaS", the first instance of a Laniakea-based service, is managed by ELIXIR-IT and was officially launched in February 2020, after about one year of development and testing that involved several users. Researchers can request access to Laniakea@ReCaS through an open-ended call for use-cases. Ten project proposals have been accepted since then, totalling 18 Galaxy on-demand virtual servers that employ ~ 100 CPUs, ~ 250 GB of RAM and ~ 5 TB of storage and serve several different communities and purposes. Herein, we present eight use cases demonstrating the versatility of the platform. CONCLUSIONS During this first year of activity, the Laniakea-based service emerged as a flexible platform that facilitated the rapid development of bioinformatics tools, the efficient delivery of training activities, and the provision of public bioinformatics services in different settings, including food safety and clinical research. Laniakea@ReCaS provides a proof of concept of how enabling access to appropriate, reliable IT resources and ready-to-use bioinformatics tools can considerably streamline researchers' work.
Collapse
Affiliation(s)
- Marco Antonio Tangaro
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy
- National Institute for Nuclear Physics (INFN), Section of Bari, Via Orabona 4, 70126, Bari, Italy
| | - Pietro Mandreoli
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy
- Department of Biosciences, University of Milan, Via Celoria 26, 20133, Milano, Italy
| | - Matteo Chiara
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy
- Department of Biosciences, University of Milan, Via Celoria 26, 20133, Milano, Italy
| | - Giacinto Donvito
- National Institute for Nuclear Physics (INFN), Section of Bari, Via Orabona 4, 70126, Bari, Italy
| | - Marica Antonacci
- National Institute for Nuclear Physics (INFN), Section of Bari, Via Orabona 4, 70126, Bari, Italy
| | - Antonio Parisi
- Istituto Zooprofilattico Sperimentale Della Puglia e Della Basilicata, Via Manfredonia 20, 71121, Foggia, Italy
| | - Angelica Bianco
- Istituto Zooprofilattico Sperimentale Della Puglia e Della Basilicata, Via Manfredonia 20, 71121, Foggia, Italy
| | - Angelo Romano
- National Reference Laboratory for Coagulase-Positive Staphylococci Including Staphylococcus Aureus, Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| | - Daniela Manila Bianchi
- National Reference Laboratory for Coagulase-Positive Staphylococci Including Staphylococcus Aureus, Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| | - Davide Cangelosi
- Clinical Bioinformatics Unit, Scientific Direction, IRCCS Istituto Giannina Gaslini, Via Gerolamo Gaslini 5, 16147, Genova, Italy
| | - Paolo Uva
- Clinical Bioinformatics Unit, Scientific Direction, IRCCS Istituto Giannina Gaslini, Via Gerolamo Gaslini 5, 16147, Genova, Italy
- Italian Institute of Technology, Via Morego 30, 16163, Genova, Italy
| | - Ivan Molineris
- Department of Life Science and System Biology, University of Turin, Via Accademia Albertina, 13-1023, Turin, Italy
| | - Vladimir Nosi
- Department of Computer Science, University of Turin, Via Pessinetto 12, 10049, Turin, Italy
| | - Raffaele A Calogero
- Department of Molecular Biotechnology and Health Sciences, Via Nizza 52, 10126, Turin, Italy
| | - Luca Alessandri
- Department of Molecular Biotechnology and Health Sciences, Via Nizza 52, 10126, Turin, Italy
| | - Elena Pedrini
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
| | - Marina Mordenti
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
| | - Emanuele Bonetti
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
- Department of Experimental Oncology, European Institute of Oncology, Via Adamello 16, 20139, Milan, Italy
| | - Luca Sangiorgi
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy.
- Department of Biosciences, Biotechnologies and Biopharmaceutics, University of Bari, Via Orabona 4, 70126, Bari, Italy.
| | - Federico Zambelli
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy.
- Department of Biosciences, University of Milan, Via Celoria 26, 20133, Milano, Italy.
| |
Collapse
|
38
|
Della Coletta R, Lavell AA, Garvin DF. A Homolog of the Arabidopsis TIME FOR COFFEE Gene Is Involved in Nonhost Resistance to Wheat Stem Rust in Brachypodium distachyon. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2021; 34:1298-1306. [PMID: 34340534 DOI: 10.1094/mpmi-06-21-0137-r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Plants resist infection by pathogens using both preexisting barriers and inducible defense responses. Inducible responses are governed in a complex manner by various hormone signaling pathways. The relative contribution of hormone signaling pathways to nonhost resistance to pathogens is not well understood. In this study, we examined the molecular basis of disrupted nonhost resistance to the fungal species Puccinia graminis, which causes stem rust of wheat, in an induced mutant of the model grass Brachypodium distachyon. Through bioinformatic analysis, a 1-bp deletion in the mutant genotype was identified that introduces a premature stop codon in the gene Bradi1g24100, which is a homolog of the Arabidopsis thaliana gene TIME FOR COFFEE (TIC). In Arabidopsis, TIC is central to the regulation of the circadian clock and plays a crucial role in jasmonate signaling by attenuating levels of the transcription factor protein MYC2, and its mutational disruption results in enhanced susceptibility to the hemibiotroph Pseudomonas syringae. Our similar finding for an obligate biotroph suggests that the biochemical role of TIC in mediating disease resistance to biotrophs is conserved in grasses, and that the correct modulation of jasmonate signaling during infection by Puccinia graminis may be essential for nonhost resistance to wheat stem rust in B. distachyon.[Formula: see text] Copyright © 2021 The Author(s). This is an open access article distributed under the CC BY 4.0 International license.
Collapse
Affiliation(s)
- Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, U.S.A
- CAPES Foundation, Ministry of Education of Brazil, Brasilia, DF, Brazil
| | - Anastasiya A Lavell
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, U.S.A
| | - David F Garvin
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, U.S.A
- Plant Science Research Unit, United States Department of Agriculture-Agricultural Research Service, St. Paul, MN 55108, U.S.A
| |
Collapse
|
39
|
Bathke J, Lühken G. OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow. BMC Bioinformatics 2021; 22:402. [PMID: 34388963 PMCID: PMC8361789 DOI: 10.1186/s12859-021-04317-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 08/04/2021] [Indexed: 12/30/2022] Open
Abstract
Background The advent of next generation sequencing has opened new avenues for basic and applied research. One application is the discovery of sequence variants causative of a phenotypic trait or a disease pathology. The computational task of detecting and annotating sequence differences of a target dataset between a reference genome is known as "variant calling". Typically, this task is computationally involved, often combining a complex chain of linked software tools. A major player in this field is the Genome Analysis Toolkit (GATK). The "GATK Best Practices" is a commonly referred recipe for variant calling. However, current computational recommendations on variant calling predominantly focus on human sequencing data and ignore ever-changing demands of high-throughput sequencing developments. Furthermore, frequent updates to such recommendations are counterintuitive to the goal of offering a standard workflow and hamper reproducibility over time. Results A workflow for automated detection of single nucleotide polymorphisms and insertion-deletions offers a wide range of applications in sequence annotation of model and non-model organisms. The introduced workflow builds on the GATK Best Practices, while enabling reproducibility over time and offering an open, generalized computational architecture. The workflow achieves parallelized data evaluation and maximizes performance of individual computational tasks. Optimized Java garbage collection and heap size settings for the GATK applications SortSam, MarkDuplicates, HaplotypeCaller, and GatherVcfs effectively cut the overall analysis time in half. Conclusions The demand for variant calling, efficient computational processing, and standardized workflows is growing. The Open source Variant calling workFlow (OVarFlow) offers automation and reproducibility for a computationally optimized variant calling task. By reducing usage of computational resources, the workflow removes prior existing entry barriers to the variant calling field and enables standardized variant calling.
Collapse
Affiliation(s)
- Jochen Bathke
- Institute of Animal Breeding and Genetics, Justus Liebig University Gießen, Ludwigstraße 21, 35390, Gießen, Germany.
| | - Gesine Lühken
- Institute of Animal Breeding and Genetics, Justus Liebig University Gießen, Ludwigstraße 21, 35390, Gießen, Germany
| |
Collapse
|
40
|
Ahmed Z, Renart EG, Zeeshan S. Genomics pipelines to investigate susceptibility in whole genome and exome sequenced data for variant discovery, annotation, prediction and genotyping. PeerJ 2021; 9:e11724. [PMID: 34395068 PMCID: PMC8320519 DOI: 10.7717/peerj.11724] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 06/14/2021] [Indexed: 12/12/2022] Open
Abstract
Over the last few decades, genomics is leading toward audacious future, and has been changing our views about conducting biomedical research, studying diseases, and understanding diversity in our society across the human species. The whole genome and exome sequencing (WGS/WES) are two of the most popular next-generation sequencing (NGS) methodologies that are currently being used to detect genetic variations of clinical significance. Investigating WGS/WES data for the variant discovery and genotyping is based on the nexus of different data analytic applications. Although several bioinformatics applications have been developed, and many of those are freely available and published. Timely finding and interpreting genetic variants are still challenging tasks among diagnostic laboratories and clinicians. In this study, we are interested in understanding, evaluating, and reporting the current state of solutions available to process the NGS data of variable lengths and types for the identification of variants, alleles, and haplotypes. Residing within the scope, we consulted high quality peer reviewed literature published in last 10 years. We were focused on the standalone and networked bioinformatics applications proposed to efficiently process WGS and WES data, and support downstream analysis for gene-variant discovery, annotation, prediction, and interpretation. We have discussed our findings in this manuscript, which include but not are limited to the set of operations, workflow, data handling, involved tools, technologies and algorithms and limitations of the assessed applications.
Collapse
Affiliation(s)
- Zeeshan Ahmed
- Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA.,Department of Medicine, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| | - Eduard Gibert Renart
- Institute for Health, Health Care Policy and Aging Research, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| | - Saman Zeeshan
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| |
Collapse
|
41
|
Ortiz-Aguirre JP, Velandia-Vargas EA, Rodríguez-Bohorquez OM, Amaya-Ramírez D, Bernal-Estévez D, Parra-López CA. Inmunoterapia personalizada contra el cáncer basada en neoantígenos. Revisión de la literatura. REVISTA DE LA FACULTAD DE MEDICINA 2021. [DOI: 10.15446/revfacmed.v69n3.81633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Introducción. Los avances que se han hecho en inmunoterapia contra el cáncer y la respuesta clínica de los pacientes que han recibido este tipo de terapia la han convertido en el cuarto pilar para el tratamiento del cáncer.
Objetivo. Describir brevemente el fundamento biológico de la inmunoterapia personalizada contra el cáncer basada en neoantígenos, las perspectivas actuales de su desarrollo y algunos resultados clínicos de esta terapia.
Materiales y métodos. Se realizó una búsqueda de la literatura en PubMed, Scopus y EBSCO utilizando la siguiente estrategia de búsqueda: tipo de artículos: estudios experimentales originales, ensayos clínicos y revisiones narrativas y sistemáticas sobre métodos de identificación de mutaciones generadas en los tumores y estrategias de inmunoterapia del cáncer con vacunas basadas en neoantígenos; población de estudio: humanos y modelos animales; periodo de publicación: enero 1989- diciembre 2019; idioma: inglés y español; términos de búsqueda: “Immunotherapy”, “Neoplasms”, “Mutation” y “Cancer Vaccines”.
Resultados. La búsqueda inicial arrojó 1344 registros; luego de remover duplicados (n=176), 780 fueron excluidos luego de leer su resumen y título, y se evaluó el texto completo de 338 para verificar cuáles cumplían con los criterios de inclusión, seleccionándose finalmente 73 estudios para análisis completo. Todos los artículos recuperados se publicaron en inglés, y fueron realizados principalmente en EE. UU. (43.83%) y Alemania (23.65%). En el caso de los estudios originales (n=43), 20 se realizaron únicamente en humanos, 9 solo en animales, 2 en ambos modelos, y 12 usaron metodología in silico.
Conclusión. La inmunoterapia personalizada contra el cáncer con vacunas basadas en neoantígenos tumorales se está convirtiendo de forma contundente en una nueva alternativa para tratar el cáncer. Sin embargo, para lograr su implementación adecuada, es necesario usarla en combinación con tratamientos convencionales, generar más conocimiento que contribuya a aclarar la inmunobiología del cáncer, y reducir los costos asociados con su producción.
Collapse
|
42
|
Zanti M, Michailidou K, Loizidou MA, Machattou C, Pirpa P, Christodoulou K, Spyrou GM, Kyriacou K, Hadjisavvas A. Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels. BMC Bioinformatics 2021; 22:218. [PMID: 33910496 PMCID: PMC8080428 DOI: 10.1186/s12859-021-04144-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Accepted: 04/15/2021] [Indexed: 11/10/2022] Open
Abstract
Background Next-generation sequencing (NGS) represents a significant advancement in clinical genetics. However, its use creates several technical, data interpretation and management challenges. It is essential to follow a consistent data analysis pipeline to achieve the highest possible accuracy and avoid false variant calls. Herein, we aimed to compare the performance of twenty-eight combinations of NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM, Bowtie2, Stampy), variant calling (GATK-HaplotypeCaller, GATK-UnifiedGenotyper, SAMtools) and interval padding (null, 50 bp, 100 bp) methods, along with a commercially available pipeline (BWA Enrichment, Illumina®). Fourteen germline DNA samples from breast cancer patients were sequenced using a targeted NGS panel approach and subjected to data analysis. Results We highlight that interval padding is required for the accurate detection of intronic variants including spliceogenic pathogenic variants (PVs). In addition, using nearly default parameters, the BWA Enrichment algorithm, failed to detect these spliceogenic PVs and a missense PV in the TP53 gene. We also recommend the BWA-MEM algorithm for sequence alignment, whereas variant calling should be performed using a combination of variant calling algorithms; GATK-HaplotypeCaller and SAMtools for the accurate detection of insertions/deletions and GATK-UnifiedGenotyper for the efficient detection of single nucleotide variant calls. Conclusions These findings have important implications towards the identification of clinically actionable variants through panel testing in a clinical laboratory setting, when dedicated bioinformatics personnel might not always be available. The results also reveal the necessity of improving the existing tools and/or at the same time developing new pipelines to generate more reliable and more consistent data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04144-1.
Collapse
Affiliation(s)
- Maria Zanti
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus.,Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus.,Bioinformatics Department, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - Kyriaki Michailidou
- Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus.,Biostatistics Unit, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - Maria A Loizidou
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus.,Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus
| | - Christina Machattou
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - Panagiota Pirpa
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - Kyproula Christodoulou
- Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus.,Neurogenetics Department, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - George M Spyrou
- Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus.,Bioinformatics Department, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - Kyriacos Kyriacou
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus.,Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus
| | - Andreas Hadjisavvas
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus. .,Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus.
| |
Collapse
|
43
|
Chen Z, Tang D, Ni J, Li P, Wang L, Zhou J, Li C, Lan H, Li L, Liu J. Development of genic KASP SNP markers from RNA-Seq data for map-based cloning and marker-assisted selection in maize. BMC PLANT BIOLOGY 2021; 21:157. [PMID: 33771110 PMCID: PMC8004444 DOI: 10.1186/s12870-021-02932-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2021] [Accepted: 03/10/2021] [Indexed: 05/25/2023]
Abstract
BACKGROUND Maize is one of the most important field crops in the world. Most of the key agronomic traits, including yield traits and plant architecture traits, are quantitative. Fine mapping of genes/ quantitative trait loci (QTL) influencing a key trait is essential for marker-assisted selection (MAS) in maize breeding. However, the SNP markers with high density and high polymorphism are lacking, especially kompetitive allele specific PCR (KASP) SNP markers that can be used for automatic genotyping. To date, a large volume of sequencing data has been produced by the next generation sequencing technology, which provides a good pool of SNP loci for development of SNP markers. In this study, we carried out a multi-step screening method to identify kompetitive allele specific PCR (KASP) SNP markers based on the RNA-Seq data sets of 368 maize inbred lines. RESULTS A total of 2,948,985 SNPs were identified in the high-throughput RNA-Seq data sets with the average density of 1.4 SNP/kb. Of these, 71,311 KASP SNP markers (the average density of 34 KASP SNP/Mb) were developed based on the strict criteria: unique genomic region, bi-allelic, polymorphism information content (PIC) value ≥0.4, and conserved primer sequences, and were mapped on 16,161 genes. These 16,161 genes were annotated to 52 gene ontology (GO) terms, including most of primary and secondary metabolic pathways. Subsequently, the 50 KASP SNP markers with the PIC values ranging from 0.14 to 0.5 in 368 RNA-Seq data sets and with polymorphism between the maize inbred lines 1212 and B73 in in silico analysis were selected to experimentally validate the accuracy and polymorphism of SNPs, resulted in 46 SNPs (92.00%) showed polymorphism between the maize inbred lines 1212 and B73. Moreover, these 46 polymorphic SNPs were utilized to genotype the other 20 maize inbred lines, with all 46 SNPs showing polymorphism in the 20 maize inbred lines, and the PIC value of each SNP was 0.11 to 0.50 with an average of 0.35. The results suggested that the KASP SNP markers developed in this study were accurate and polymorphic. CONCLUSIONS These high-density polymorphic KASP SNP markers will be a valuable resource for map-based cloning of QTL/genes and marker-assisted selection in maize. Furthermore, the method used to develop SNP markers in maize can also be applied in other species.
Collapse
Affiliation(s)
- Zhengjie Chen
- Maize Research Institute, Sichuan Agricultural University, 211 Huiming Road, Wenjiang District, Chengdu City, 611000 Sichuan China
- Industrial Crop Research Institute, Sichuan Academy of Agricultural Science, No.159 Huajin Avanue, Qingbaijiang District, Chengdu City, 610300 Sichuan China
| | - Dengguo Tang
- Maize Research Institute, Sichuan Agricultural University, 211 Huiming Road, Wenjiang District, Chengdu City, 611000 Sichuan China
| | - Jixing Ni
- Maize Research Institute, Sichuan Agricultural University, 211 Huiming Road, Wenjiang District, Chengdu City, 611000 Sichuan China
| | - Peng Li
- Maize Research Institute, Sichuan Agricultural University, 211 Huiming Road, Wenjiang District, Chengdu City, 611000 Sichuan China
| | - Le Wang
- Maize Research Institute, Sichuan Agricultural University, 211 Huiming Road, Wenjiang District, Chengdu City, 611000 Sichuan China
| | - Jinhong Zhou
- Maize Research Institute, Sichuan Agricultural University, 211 Huiming Road, Wenjiang District, Chengdu City, 611000 Sichuan China
| | - Chenyang Li
- Maize Research Institute, Sichuan Agricultural University, 211 Huiming Road, Wenjiang District, Chengdu City, 611000 Sichuan China
| | - Hai Lan
- Maize Research Institute, Sichuan Agricultural University, 211 Huiming Road, Wenjiang District, Chengdu City, 611000 Sichuan China
| | - Lujiang Li
- Maize Research Institute, Sichuan Agricultural University, 211 Huiming Road, Wenjiang District, Chengdu City, 611000 Sichuan China
| | - Jian Liu
- Maize Research Institute, Sichuan Agricultural University, 211 Huiming Road, Wenjiang District, Chengdu City, 611000 Sichuan China
| |
Collapse
|
44
|
van Belzen IAEM, Schönhuth A, Kemmeren P, Hehir-Kwa JY. Structural variant detection in cancer genomes: computational challenges and perspectives for precision oncology. NPJ Precis Oncol 2021; 5:15. [PMID: 33654267 PMCID: PMC7925608 DOI: 10.1038/s41698-021-00155-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 01/12/2021] [Indexed: 01/31/2023] Open
Abstract
Cancer is generally characterized by acquired genomic aberrations in a broad spectrum of types and sizes, ranging from single nucleotide variants to structural variants (SVs). At least 30% of cancers have a known pathogenic SV used in diagnosis or treatment stratification. However, research into the role of SVs in cancer has been limited due to difficulties in detection. Biological and computational challenges confound SV detection in cancer samples, including intratumor heterogeneity, polyploidy, and distinguishing tumor-specific SVs from germline and somatic variants present in healthy cells. Classification of tumor-specific SVs is challenging due to inconsistencies in detected breakpoints, derived variant types and biological complexity of some rearrangements. Full-spectrum SV detection with high recall and precision requires integration of multiple algorithms and sequencing technologies to rescue variants that are difficult to resolve through individual methods. Here, we explore current strategies for integrating SV callsets and to enable the use of tumor-specific SVs in precision oncology.
Collapse
Affiliation(s)
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Patrick Kemmeren
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jayne Y Hehir-Kwa
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
| |
Collapse
|
45
|
Li Z, Fang S, Zhang R, Yu L, Zhang J, Bu D, Sun L, Zhao Y, Li J. VarBen. J Mol Diagn 2021; 23:285-299. [DOI: 10.1016/j.jmoldx.2020.11.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 10/06/2020] [Accepted: 11/17/2020] [Indexed: 02/08/2023] Open
|
46
|
Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing. Methods Mol Biol 2021; 2243:1-25. [PMID: 33606250 DOI: 10.1007/978-1-0716-1103-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Increasingly affordable sequencing technologies are revolutionizing the field of genomic medicine. It is now feasible to interrogate all major classes of variation in an individual across the entire genome for less than $1000 USD. While the generation of patient sequence information using these technologies has become routine, the analysis and interpretation of this data remains the greatest obstacle to widespread clinical implementation. This chapter summarizes the steps to identify, annotate, and prioritize variant information required for clinical report generation. We discuss methods to detect each variant class and describe strategies to increase the likelihood of detecting causal variant(s) in Mendelian disease. Lastly, we describe a sample workflow for synthesizing large amount of genetic information into concise clinical reports.
Collapse
|
47
|
Sysoev M, Grötzinger SW, Renn D, Eppinger J, Rueping M, Karan R. Bioprospecting of Novel Extremozymes From Prokaryotes-The Advent of Culture-Independent Methods. Front Microbiol 2021; 12:630013. [PMID: 33643258 PMCID: PMC7902512 DOI: 10.3389/fmicb.2021.630013] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 01/21/2021] [Indexed: 12/20/2022] Open
Abstract
Extremophiles are remarkable organisms that thrive in the harshest environments on Earth, such as hydrothermal vents, hypersaline lakes and pools, alkaline soda lakes, deserts, cold oceans, and volcanic areas. These organisms have developed several strategies to overcome environmental stress and nutrient limitations. Thus, they are among the best model organisms to study adaptive mechanisms that lead to stress tolerance. Genetic and structural information derived from extremophiles and extremozymes can be used for bioengineering other nontolerant enzymes. Furthermore, extremophiles can be a valuable resource for novel biotechnological and biomedical products due to their biosynthetic properties. However, understanding life under extreme conditions is challenging due to the difficulties of in vitro cultivation and observation since > 99% of organisms cannot be cultivated. Consequently, only a minor percentage of the potential extremophiles on Earth have been discovered and characterized. Herein, we present a review of culture-independent methods, sequence-based metagenomics (SBM), and single amplified genomes (SAGs) for studying enzymes from extremophiles, with a focus on prokaryotic (archaea and bacteria) microorganisms. Additionally, we provide a comprehensive list of extremozymes discovered via metagenomics and SAGs.
Collapse
Affiliation(s)
- Maksim Sysoev
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Stefan W. Grötzinger
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Dominik Renn
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Jörg Eppinger
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Institute for Experimental Molecular Imaging, University Clinic, RWTH Aachen University, Aachen, Germany
| | - Magnus Rueping
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Institute for Experimental Molecular Imaging, University Clinic, RWTH Aachen University, Aachen, Germany
| | - Ram Karan
- KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
48
|
Valiente-Mullor C, Beamud B, Ansari I, Francés-Cuesta C, García-González N, Mejía L, Ruiz-Hueso P, González-Candelas F. One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads. PLoS Comput Biol 2021; 17:e1008678. [PMID: 33503026 PMCID: PMC7870062 DOI: 10.1371/journal.pcbi.1008678] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 02/08/2021] [Accepted: 01/05/2021] [Indexed: 12/17/2022] Open
Abstract
Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended. Mapping consists in the alignment of reads (i.e., DNA fragments) obtained through high-throughput genome sequencing to a previously assembled reference sequence. It is a common practice in genomic studies to use a single reference for mapping, usually the ‘reference genome’ of a species—a high-quality assembly. However, the selection of an optimal reference is hindered by intrinsic intra-species genetic variability, particularly in bacteria. It is known that genetic differences between the reference genome and the read sequences may produce incorrect alignments during mapping. Eventually, these errors could lead to misidentification of variants and biased reconstruction of phylogenetic trees (which reflect ancestry between different bacterial lineages). To our knowledge, this is the first work to systematically examine the effect of different references for mapping on the inference of tree topology as well as the impact on recombination and natural selection inferences. Furthermore, the novelty of this work relies on a procedure that guarantees that we are evaluating only the effect of the reference. This effect has proved to be pervasive in the five bacterial species that we have studied and, in some cases, alterations in phylogenetic trees could lead to incorrect epidemiological inferences. Hence, the use of different reference genomes may be prescriptive to assess the potential biases of mapping.
Collapse
Affiliation(s)
- Carlos Valiente-Mullor
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Beatriz Beamud
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
- * E-mail: (BB); (FG-C)
| | - Iván Ansari
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Carlos Francés-Cuesta
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Neris García-González
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Lorena Mejía
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
- Instituto de Microbiología, Colegio de Ciencias Biológicas y Ambientales, Universidad San Francisco de Quito, Quito, Ecuador
| | - Paula Ruiz-Hueso
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Fernando González-Candelas
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
- CIBER in Epidemiology and Public Health, Valencia, Spain
- * E-mail: (BB); (FG-C)
| |
Collapse
|
49
|
Chen Y, Yan W, Xie Z, Guo W, Lu D, Lv Z, Zhang X. Comparative analysis of target gene exon sequencing by cognitive technology using a next generation sequencing platform in patients with lung cancer. Mol Clin Oncol 2021; 14:36. [PMID: 33414916 PMCID: PMC7783722 DOI: 10.3892/mco.2020.2198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Accepted: 12/09/2020] [Indexed: 11/06/2022] Open
Abstract
Next generation sequencing (NGS) technology is an increasingly important clinical tool for therapeutic decision-making. However, interpretation of NGS data presents challenges at the point of care, due to limitations in understanding the clinical importance of gene variants and efficiently translating results into actionable information for the clinician. The present study compared two approaches for annotating and reporting actionable genes and gene mutations from tumor samples: The traditional approach of manual curation, annotation and reporting using an experienced molecular tumor bioinformationist; and a cloud-based cognitive technology, with the goal to detect gene mutations of potential significance in Chinese patients with lung cancer. Data from 285 gene-targeted exon sequencing previously conducted on 115 patient tissue samples between 2014 and 2016 and subsequently manually annotated and evaluated by the Guangdong Lung Cancer Institute (GLCI) research team were analyzed by the Watson for Genomics (WfG) cognitive genomics technology. A comparative analysis of the annotation results of the two methods was conducted to identify quantitative and qualitative differences in the mutations generated. The complete congruence rate of annotation results between WfG analysis and the GLCI bioinformatician was 43.48%. In 65 (56.52%) samples, WfG analysis identified and interpreted, on average, 1.54 more mutation sites in each sample than the manual GLCI review. These mutation sites were located on 27 genes, including EP300, ARID1A, STK11 and DNMT3A. Mutations in the EP300 gene were most prevalent, and present in 30.77% samples. The Tumor Mutation Burden (TMB) interpreted by WfG analysis (1.82) was significantly higher than the TMB (0.73) interpreted by GLCI review. Compared with manual curation by a bioinformatician, WfG analysis provided comprehensive insights and additional genetic alterations to inform clinical therapeutic strategies for patients with lung cancer. These findings suggest the valuable role of cognitive computing to increase efficiency in the comprehensive detection and interpretation of genetic alterations which may inform opportunities for targeted cancer therapies.
Collapse
Affiliation(s)
- Yu Chen
- Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510080, P.R. China.,Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, The Second School of Clinical Medicine, Southern Medical University, Guangzhou, Guangdong 510280, P.R. China
| | - Wenqing Yan
- Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510080, P.R. China.,Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, The Second School of Clinical Medicine, Southern Medical University, Guangzhou, Guangdong 510280, P.R. China
| | - Zhi Xie
- Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510080, P.R. China.,Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, The Second School of Clinical Medicine, Southern Medical University, Guangzhou, Guangdong 510280, P.R. China
| | - Weibang Guo
- Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510080, P.R. China.,Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, The Second School of Clinical Medicine, Southern Medical University, Guangzhou, Guangdong 510280, P.R. China
| | - Danxia Lu
- Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510080, P.R. China.,Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, The Second School of Clinical Medicine, Southern Medical University, Guangzhou, Guangdong 510280, P.R. China
| | - Zhiyi Lv
- Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510080, P.R. China.,Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, The Second School of Clinical Medicine, Southern Medical University, Guangzhou, Guangdong 510280, P.R. China
| | - Xuchao Zhang
- Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, School of Medicine, South China University of Technology, Guangzhou, Guangdong 510080, P.R. China.,Guangdong Provincial Key Laboratory of Translational Medicine in Lung Cancer, Medical Research Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, The Second School of Clinical Medicine, Southern Medical University, Guangzhou, Guangdong 510280, P.R. China
| |
Collapse
|
50
|
Chiara M, Mandreoli P, Tangaro MA, D'Erchia AM, Sorrentino S, Forleo C, Horner DS, Zambelli F, Pesole G. VINYL: Variant prIoritizatioN by survivaL analysis. Bioinformatics 2020; 36:5590-5599. [PMID: 33367501 DOI: 10.1093/bioinformatics/btaa1067] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 10/31/2020] [Accepted: 12/14/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Clinical applications of genome re-sequencing technologies typically generate large amounts of data that need to be carefully annotated and interpreted to identify genetic variants potentially associated with pathological conditions. In this context, accurate and reproducible methods for the functional annotation and prioritization of genetic variants are of fundamental importance. RESULTS In this paper, we present VINYL, a flexible and fully automated system for the functional annotation and prioritization of genetic variants. Extensive analyses of both real and simulated datasets suggest that VINYL can identify clinically relevant genetic variants in a more accurate manner compared to equivalent state of the art methods, allowing a more rapid and effective prioritization of genetic variants in different experimental settings. As such we believe that VINYL can establish itself as a valuable tool to assist healthcare operators and researchers in clinical genomics investigations. AVAILABILITY VINYL is available at http://beaconlab.it/VINYL and https://github.com/matteo14c/VINYL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matteo Chiara
- Department of Biosciences, University of Milan, Milan, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| | | | - Marco Antonio Tangaro
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| | - Anna Maria D'Erchia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy.,Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari "Aldo Moro", Bari, Italy
| | - Sandro Sorrentino
- Cardiology Unit, Department of Emergency and Organ Transplantation, University of Bari "Aldo Moro", Bari, Italy
| | - Cinzia Forleo
- Cardiology Unit, Department of Emergency and Organ Transplantation, University of Bari "Aldo Moro", Bari, Italy
| | - David S Horner
- Department of Biosciences, University of Milan, Milan, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| | - Federico Zambelli
- Department of Biosciences, University of Milan, Milan, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy.,Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari "Aldo Moro", Bari, Italy
| |
Collapse
|