1
|
Tandon S, Sharma M, Kasar P, Kala A. A cloud-based precision oncology framework for whole genome sequence analysis. Comput Biol Chem 2024; 110:108062. [PMID: 38554501 DOI: 10.1016/j.compbiolchem.2024.108062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 03/05/2024] [Accepted: 03/25/2024] [Indexed: 04/01/2024]
Abstract
Cancer is one of the wide-ranging diseases which have a high mortality rate impacting globally. This scenario can be switched by early detection and correct precision treatment, a major concern for cancer patients. Clinicians can figure out the best-suited treatments for cancer patients by analyzing the patient's genome, which will treat the patient well and minimize the chances of side effects as well. Therefore, we have developed a fast, robust, and efficient solution as our precision oncology framework based on the whole genome sequencing of the individual's DNA. This platform can perform the entire genomic analysis, starting from the quality assessment of the input file to the variant annotation and functional prediction, followed by a certain level of interpretation. This analysis helps in the molecular profiling of the tumors for the identification of the targetable alterations. It takes in FASTQ or BAM file as an input and provides us with two output reports: a primary report, which consists of the patients' details, a summary of the analysis, and a secondary report, which is an elaborated report comprised of numerous results obtained from the analysis such as base changes, codon changes, amino acid changes, TMB analysis, MSI analysis, the variant frequency with its effects and impacts, affected biomarkers, etc. This framework can be effectively utilized for cancer treatment guidance, identification and validation of novel biomarkers, oncology research & development, genomic analysis, and gene manipulation.
Collapse
Affiliation(s)
- Saloni Tandon
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India.
| | - Medha Sharma
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India
| | - Pratik Kasar
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India
| | - Anirudh Kala
- Celebal Technologies Private Limited, 7th Floor Corporate tower, JLN Marg, Near Jawahar Circle, Malviya Nagar, Jaipur, Rajasthan 302017, India
| |
Collapse
|
2
|
Maphosa MN, Steenkamp ET, Kanzi AM, van Wyk S, De Vos L, Santana QC, Duong TA, Wingfield BD. Intra-Species Genomic Variation in the Pine Pathogen Fusarium circinatum. J Fungi (Basel) 2022; 8:jof8070657. [PMID: 35887414 PMCID: PMC9316270 DOI: 10.3390/jof8070657] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 06/02/2022] [Accepted: 06/08/2022] [Indexed: 12/10/2022] Open
Abstract
Fusarium circinatum is an important global pathogen of pine trees. Genome plasticity has been observed in different isolates of the fungus, but no genome comparisons are available. To address this gap, we sequenced and assembled to chromosome level five isolates of F. circinatum. These genomes were analysed together with previously published genomes of F. circinatum isolates, FSP34 and KS17. Multi-sample variant calling identified a total of 461,683 micro variants (SNPs and small indels) and a total of 1828 macro structural variants of which 1717 were copy number variants and 111 were inversions. The variant density was higher on the sub-telomeric regions of chromosomes. Variant annotation revealed that genes involved in transcription, transport, metabolism and transmembrane proteins were overrepresented in gene sets that were affected by high impact variants. A core genome representing genomic elements that were conserved in all the isolates and a non-redundant pangenome representing all genomic elements is presented. Whole genome alignments showed that an average of 93% of the genomic elements were present in all isolates. The results of this study reveal that some genomic elements are not conserved within the isolates and some variants are high impact. The described genome-scale variations will help to inform novel disease management strategies against the pathogen.
Collapse
|
3
|
Ghorbani A, Samarfard S, Jajarmi M, Bagheri M, Karbanowicz TP, Afsharifar A, Eskandari MH, Niazi A, Izadpanah K. Highlight of potential impact of new viral genotypes of SARS-CoV-2 on vaccines and anti-viral therapeutics. GENE REPORTS 2022; 26:101537. [PMID: 35128175 PMCID: PMC8808475 DOI: 10.1016/j.genrep.2022.101537] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 11/10/2021] [Accepted: 12/02/2021] [Indexed: 12/23/2022]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causal agent of the coronavirus disease (COVID-19) pandemic, has infected millions of people globally. Genetic variation and selective pressures lead to the accumulation of single nucleotide polymorphism (SNP) within the viral genome that may affect virulence, transmission rate, viral recognition and the efficacy of prophylactic and interventional measures. To address these concerns at the genomic level, we assessed the phylogeny and SNPs of the SARS-CoV-2 mutant population collected to date in Iran in relation to globally reported variants. Phylogenetic analysis of mutant strains revealed the occurrence of the variants known as B.1.1.7 (Alpha), B.1.525 (Eta), and B.1.617 (Delta) that appear to have delineated independently in Iran. SNP analysis of the Iranian sequences revealed that the mutations were predominantly positioned within the S protein-coding region, with most SNPs localizing to the S1 subunit. Seventeen S1-localizing SNPs occurred in the RNA binding domain that interacts with ACE2 of the host cell. Importantly, many of these SNPs are predicted to influence the binding of antibodies and anti-viral therapeutics, indicating that the adaptive host response appears to be imposing a selective pressure that is driving the evolution of the virus in this closed population through enhancing virulence. The SNPs detected within these mutant cohorts are addressed with respect to current prophylactic measures and therapeutic interventions.
Collapse
Key Words
- ACE2, Angiotensin-converting enzyme 2
- Antiviral drugs
- Bioinformatics
- CSSE, Center for Systems Science and Engineering
- E, Envelope
- FP, Fusion peptide
- HR1, Heptad repeat 1
- HR2, Heptad repeat 2
- IC, Intracellular domain
- JHU, Johns Hopkins University
- M, Membrane
- Mutation detection
- N, Nucleocapsid
- NAG, N-acetylglucosamine
- NSP, Non-structural proteins
- NTD, N-terminal domain
- Phylogenetic analysis
- RBD, Receptor-binding domain
- S, Spike glycoprotein
- SARS-CoV-2
- SARS-CoV-2, Severe acute respiratory syndrome coronavirus 2;
- SD1, Subdomain 1
- SD2, Subdomain 2
- SNP, Single nucleotide polymorphism
- SP, Structural proteins
- TM, Transmembrane region
- UTRs, Untranslated regions
- Viral vaccines
Collapse
Affiliation(s)
- Abozar Ghorbani
- Plant Virology Research Centre, College of Agriculture, Shiraz University, Shiraz, Iran
| | - Samira Samarfard
- Berrimah Veterinary Laboratory, Department of Primary Industry and Resources, Berrimah, NT 0828 Australia
| | - Maziar Jajarmi
- Department of Pathobiology, Faculty of Veterinary Medicine, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Mahboube Bagheri
- Department of Food Science and Technology, Bardsir Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, Iran
| | | | - Alireza Afsharifar
- Plant Virology Research Centre, College of Agriculture, Shiraz University, Shiraz, Iran
| | - Mohammad Hadi Eskandari
- Department of Food Science and Technology, College of Agriculture, Shiraz University, Shiraz, Iran
| | - Ali Niazi
- Institute of Biotechnology, College of Agriculture, Shiraz University, Shiraz, Iran
| | | |
Collapse
|
4
|
Oh JH, Lee YJ, Byeon EJ, Kang BC, Kyeoung DS, Kim CK. Whole-genome resequencing and transcriptomic analysis of genes regulating anthocyanin biosynthesis in black rice plants. 3 Biotech 2018; 8:115. [PMID: 29430376 PMCID: PMC5801106 DOI: 10.1007/s13205-018-1140-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 01/29/2018] [Indexed: 12/11/2022] Open
Abstract
Anthocyanins are involved in many diverse functions in rice, but their benefits have yet to be clearly demonstrated. Our objective in this study was to identify anthocyanin-related genes in black rice plants. We identified anthocyanin-related genes in black rice plants using a combination of whole-genome resequencing, RNA-sequencing (RNA-seq), microarray experiments, and reverse-transcriptase polymerase chain reaction (RT-PCR). Using multi-layer screening from 30 rice accessions, we identified 172,922 single-nucleotide polymorphisms (SNPs) and 1276 differentially expressed genes that appear to be related to anthocyanin biosynthesis. We identified 18 putative genes from 172,922 SNPs using intensive selective sweeps. The 18 candidate genes identified from SNPs were not significantly correlated with the RNA-seq expression pattern or other well-known anthocyanin biosynthesis/metabolism genes. We also identified nine putative genes from 1276 differentially expressed genes using RNA-seq transcriptome analysis. In addition, we identified four phylogenetic groups from these nine candidate genes and 51 pathway-network genes. Finally, we verified nine anthocyanin-related genes using a newly designed microarray and semi-quantitative RT-PCR. We suggest that these nine identified genes appear to be related to the regulation of anthocyanin biosynthesis and/or metabolism.
Collapse
Affiliation(s)
- Jae-Hyeon Oh
- Genomics Division, National Institute of Agricultural Sciences, Jeonju, 54874 Korea
| | - Ye-Ji Lee
- Department of Environmental Resources, Sangmyung University, Cheonan, 31066 Korea
| | - Eun-Ju Byeon
- Department of Crop Science and Biotechnology, Chonbuk National University, Jeonju, 54896 Korea
| | - Byeong-Chul Kang
- Codes Division, Insilicogen Inc., Suwon, 16954 Gyeonggi-do Korea
| | - Dong-Soo Kyeoung
- Codes Division, Insilicogen Inc., Suwon, 16954 Gyeonggi-do Korea
| | - Chang-Kug Kim
- Genomics Division, National Institute of Agricultural Sciences, Jeonju, 54874 Korea
| |
Collapse
|
5
|
Tammone MN, Pardiñas UFJ, Lacey EA. Contrasting patterns of Holocene genetic variation in two parapatric species of Ctenomys from Northern Patagonia, Argentina. Biol J Linn Soc Lond 2017. [DOI: 10.1093/biolinnean/blx118] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Affiliation(s)
- Mauro N Tammone
- Instituto de Diversidad y Evolución Austral (IDEAus-CONICET), Argentina
- Programa de Estudios Aplicados a la Conservación del Parque Nacional Nahuel Huapi (CENAC-PNNH, CONICET), Argentina
| | | | - Eileen A Lacey
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, USA
| |
Collapse
|
6
|
Ram H, Kumar A, Thomas L, Singh VP. In silico Approach to Study Adaptive Divergence in Nucleotide Composition of the 16S rRNA Gene Among Bacteria Thriving Under Different Temperature Regimes. J Comput Biol 2014; 21:753-9. [DOI: 10.1089/cmb.2014.0116] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Hari Ram
- Applied Microbiology and Biotechnology Laboratory, Department of Botany, University of Delhi, Delhi, India
| | - Alok Kumar
- Applied Microbiology and Biotechnology Laboratory, Department of Botany, University of Delhi, Delhi, India
| | - Lebin Thomas
- Applied Microbiology and Biotechnology Laboratory, Department of Botany, University of Delhi, Delhi, India
| | - Ved Pal Singh
- Applied Microbiology and Biotechnology Laboratory, Department of Botany, University of Delhi, Delhi, India
| |
Collapse
|
7
|
Bamidele O, Van As P, Elferink MG. Molecular characterization of the leptin receptor gene as a candidate gene in the pulmonary hypertension syndrome in broiler chickens. Pak J Biol Sci 2013; 15:1187-90. [PMID: 23755410 DOI: 10.3923/pjbs.2012.1187.1190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Leptin Receptor Gene (LEPR) is a candidate gene in understanding the genetic basis of the Pulmonary Hypertension Syndrome (PHS) in broilers. Identification and evaluation of genetic polymorphisms in LEPR may provide a link between traits like Body Weight (BW) and Total Ventricle weight (TV) to the development of PHS. In this study, primers were designed in exons, upstream and downstream sequences to identify mutations in the LEPR on four broilers selected with respect to the PHS-related traits. About 77% of the 11,820 bp of the LEPR gene covered by the primers were sequenced. No mutations were found between the chickens associating the traits to the occurrence of PHS. However, 42 single nucleotide polymorphisms and four Indels were found between the reference sequences of the red jungle fowl and the experimental population. Ten of these mutations were not previously reported in LEPR at the genomic and transcript sequences (NP_989654.1, ENSGALT00000018009). The 10 mutations include six SNPs in intron regions, two Indels and two non-synonymous SNPs. The two new non-synonymous SNPs; G301A and A1637G, led to amino acid change A89T and N534S, respectively.
Collapse
Affiliation(s)
- O Bamidele
- Animal Breeding and Genomics Centre, Wageningen University, The Netherlands
| | | | | |
Collapse
|
8
|
Abnizova I, Leonard S, Skelly T, Brown A, Jackson D, Gourtovaia M, Qi G, Te Boekhorst R, Faruque N, Lewis K, Cox T. Analysis of context-dependent errors for illumina sequencing. J Bioinform Comput Biol 2012; 10:1241005. [PMID: 22809341 DOI: 10.1142/s0219720012410053] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The new generation of short-read sequencing technologies requires reliable measures of data quality. Such measures are especially important for variant calling. However, in the particular case of SNP calling, a great number of false-positive SNPs may be obtained. One needs to distinguish putative SNPs from sequencing or other errors. We found that not only the probability of sequencing errors (i.e. the quality value) is important to distinguish an FP-SNP but also the conditional probability of "correcting" this error (the "second best call" probability, conditional on that of the first call). Surprisingly, around 80% of mismatches can be "corrected" with this second call. Another way to reduce the rate of FP-SNPs is to retrieve DNA motifs that seem to be prone to sequencing errors, and to attach a corresponding conditional quality value to these motifs. We have developed several measures to distinguish between sequence errors and candidate SNPs, based on a base call's nucleotide context and its mismatch type. In addition, we suggested a simple method to correct the majority of mismatches, based on conditional probability of their "second" best intensity call. We attach a corresponding second call confidence (quality value) of being corrected to each mismatch.
Collapse
Affiliation(s)
- Irina Abnizova
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Vanneste K, Van de Peer Y, Maere S. Inference of genome duplications from age distributions revisited. Mol Biol Evol 2012; 30:177-90. [PMID: 22936721 DOI: 10.1093/molbev/mss214] [Citation(s) in RCA: 112] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Whole-genome duplications (WGDs), thought to facilitate evolutionary innovations and adaptations, have been uncovered in many phylogenetic lineages. WGDs are frequently inferred from duplicate age distributions, where they manifest themselves as peaks against a small-scale duplication background. However, the interpretation of duplicate age distributions is complicated by the use of K(S), the number of synonymous substitutions per synonymous site, as a proxy for the age of paralogs. Two particular concerns are the stochastic nature of synonymous substitutions leading to increasing uncertainty in K(S) with increasing age since duplication and K(S) saturation caused by the inability of evolutionary models to fully correct for the occurrence of multiple substitutions at the same site. K(S) stochasticity is expected to erode the signal of older WGDs, whereas K(S) saturation may lead to artificial peaks in the distribution. Here, we investigate the consequences of these effects on K(S)-based age distributions and WGD inference by simulating the evolution of duplicated sequences according to predefined real age distributions and re-estimating the corresponding K(S) distributions. We show that, although K(S) estimates can be used for WGD inference far beyond the commonly accepted K(S) threshold of 1, K(S) saturation effects can cause artificial peaks at higher ages. Moreover, K(S) stochasticity and saturation may lead to confounded peaks encompassing multiple WGD events and/or saturation artifacts. We argue that K(S) effects need to be properly accounted for when inferring WGDs from age distributions and that the failure to do so could lead to false inferences.
Collapse
Affiliation(s)
- Kevin Vanneste
- Department of Plant Systems Biology, VIB, Ghent, Belgium
| | | | | |
Collapse
|
10
|
Phylogenetic relationships among the Caribbean members of the Cliona viridis complex (Porifera, Demospongiae, Hadromerida) using nuclear and mitochondrial DNA sequences. Mol Phylogenet Evol 2012; 64:271-84. [DOI: 10.1016/j.ympev.2012.03.021] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2011] [Revised: 03/31/2012] [Accepted: 03/31/2012] [Indexed: 11/22/2022]
|
11
|
Bioinformatic analysis of fruit-specific expressed sequence tag libraries of Diospyros kaki Thunb.: view at the transcriptome at different developmental stages. 3 Biotech 2011; 1:35-45. [PMID: 22558534 PMCID: PMC3339603 DOI: 10.1007/s13205-011-0005-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2011] [Accepted: 03/21/2011] [Indexed: 11/06/2022] Open
Abstract
We present here a systematic analysis of the Diospyros kaki expressed sequence tags (ESTs) generated from development stage-specific libraries. A total of 2,529 putative tentative unigenes were identified in the MF library whereas the OYF library displayed 3,775 tentative unigenes. Among the two cDNA libraries, 325 EST-Simple sequence repeats (SSRs) in 296 putative unigenes were detected in the MF library showing an occurrence of 11.7% with a frequency of 1 SSR/3.16 kb whereas the OYF library had an EST-SSRs occurrence of 10.8% with 407 EST-SSRs in the 352 putative unigenes with a frequency of 1 SSR/2.92 kb. We observed a higher frequency of SNPs and indels in the OYF library (20.94 SNPs/indels per 100 bp) in comparison to MF library showed a relatively lower frequency (0.74 SNPs/indels per 100 bp). A combined homology and secondary structure analysis approach identified a potential miRNA precursor, an ortholog of miR159, and potential miR159 targets, in the development-specific ESTs of D. kaki.
Collapse
|
12
|
Seligmann H, Krishnan NM, Rao BJ. Possible multiple origins of replication in primate mitochondria: Alternative role of tRNA sequences. J Theor Biol 2006; 241:321-32. [PMID: 16430924 DOI: 10.1016/j.jtbi.2005.11.035] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2005] [Revised: 11/29/2005] [Accepted: 11/30/2005] [Indexed: 11/20/2022]
Abstract
DNA replication in vertebrate mitochondria is usually directional, leaving different portions of the genome single-stranded for different periods of time. During this time, mutations resulting from deaminations of cytosines to thymines and adenines to guanines accumulate on the heavy strand. Therefore, T/C and G/A ratios increase along mitochondrial genomes, proportionally to the time spent single-stranded during replication. Such trends exist at third codon positions for base ratios averaged across genes in individual genomes as well as for gene-specific and site-specific substitution frequencies estimated using phylogenetic methods. We use multiple regressions to test for the potential functioning of all 12 tRNA clusters in 19 primate mitochondrial genomes as alternative origins of light strand replication (OL). We provide a general algorithm for calculating time spent single stranded by a given site for any possible locations of the site and OL. For codon positions 1, 2, and 3, respectively, 23%, 9% and 35% of tRNA gene clusters have significant (p < 0.05) deamination gradients originating from them. The strength of the deamination gradient originating from tRNA gene clusters varies among species, and for five clusters, correlates with the tendency of tRNA genes in each of these clusters to form secondary structures that resemble the OL's structure. This is notably true for all codon positions for tRNA-Lys, which in absence of nuclear regulation, forms secondary structures resembling the hairpin structure of OL. For two tRNA gene clusters, correlations were statistically significant, but opposite to the direction expected by the known unidirectional replication, putatively compatible with bi-directional replication. Few substitutions in tRNA sequences can be neutral at the level of cloverleaf structure and function, yet significantly alter capacities to form OL-like structures, causing sudden evolution of genome-wide nucleotide contents.
Collapse
Affiliation(s)
- Hervé Seligmann
- Department of Evolution, Systematics and Ecology, The Hebrew University of Jerusalem, 91904, Israel.
| | | | | |
Collapse
|