1
|
An O, Tan KT, Li Y, Li J, Wu CS, Zhang B, Chen L, Yang H. CSI NGS Portal: An Online Platform for Automated NGS Data Analysis and Sharing. Int J Mol Sci 2020; 21:ijms21113828. [PMID: 32481589 PMCID: PMC7312552 DOI: 10.3390/ijms21113828] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 05/24/2020] [Accepted: 05/26/2020] [Indexed: 12/18/2022] Open
Abstract
Next-generation sequencing (NGS) has been a widely-used technology in biomedical research for understanding the role of molecular genetics of cells in health and disease. A variety of computational tools have been developed to analyse the vastly growing NGS data, which often require bioinformatics skills, tedious work and a significant amount of time. To facilitate data processing steps minding the gap between biologists and bioinformaticians, we developed CSI NGS Portal, an online platform which gathers established bioinformatics pipelines to provide fully automated NGS data analysis and sharing in a user-friendly website. The portal currently provides 16 standard pipelines for analysing data from DNA, RNA, smallRNA, ChIP, RIP, 4C, SHAPE, circRNA, eCLIP, Bisulfite and scRNA sequencing, and is flexible to expand with new pipelines. The users can upload raw data in FASTQ format and submit jobs in a few clicks, and the results will be self-accessible via the portal to view/download/share in real-time. The output can be readily used as the final report or as input for other tools depending on the pipeline. Overall, CSI NGS Portal helps researchers rapidly analyse their NGS data and share results with colleagues without the aid of a bioinformatician. The portal is freely available at: https://csibioinfo.nus.edu.sg/csingsportal.
Collapse
Affiliation(s)
- Omer An
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; (K.-T.T.); (Y.L.); (J.L.); (C.-S.W.); (B.Z.); (L.C.)
- Correspondence: (O.A.); (H.Y.); Tel.: +65-8452-1766 (O.A.); +65-6601-1533 (H.Y.)
| | - Kar-Tong Tan
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; (K.-T.T.); (Y.L.); (J.L.); (C.-S.W.); (B.Z.); (L.C.)
| | - Ying Li
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; (K.-T.T.); (Y.L.); (J.L.); (C.-S.W.); (B.Z.); (L.C.)
| | - Jia Li
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; (K.-T.T.); (Y.L.); (J.L.); (C.-S.W.); (B.Z.); (L.C.)
| | - Chan-Shuo Wu
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; (K.-T.T.); (Y.L.); (J.L.); (C.-S.W.); (B.Z.); (L.C.)
| | - Bin Zhang
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; (K.-T.T.); (Y.L.); (J.L.); (C.-S.W.); (B.Z.); (L.C.)
| | - Leilei Chen
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; (K.-T.T.); (Y.L.); (J.L.); (C.-S.W.); (B.Z.); (L.C.)
- Department of Anatomy, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117594, Singapore
| | - Henry Yang
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; (K.-T.T.); (Y.L.); (J.L.); (C.-S.W.); (B.Z.); (L.C.)
- Correspondence: (O.A.); (H.Y.); Tel.: +65-8452-1766 (O.A.); +65-6601-1533 (H.Y.)
| |
Collapse
|
2
|
Ruiz S, Chandakkar P, Zhao H, Papoin J, Chatterjee PK, Christen E, Metz CN, Blanc L, Campagne F, Marambaud P. Tacrolimus rescues the signaling and gene expression signature of endothelial ALK1 loss-of-function and improves HHT vascular pathology. Hum Mol Genet 2017; 26:4786-4798. [PMID: 28973643 PMCID: PMC5886173 DOI: 10.1093/hmg/ddx358] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 08/09/2017] [Accepted: 09/11/2017] [Indexed: 01/02/2023] Open
Abstract
Hereditary hemorrhagic telangiectasia (HHT) is a highly debilitating and life-threatening genetic vascular disorder arising from endothelial cell (EC) proliferation and hypervascularization, for which no cure exists. Because HHT is caused by loss-of-function mutations in bone morphogenetic protein 9 (BMP9)-ALK1-Smad1/5/8 signaling, interventions aimed at activating this pathway are of therapeutic value. We interrogated the whole-transcriptome in human umbilical vein ECs (HUVECs) and found that ALK1 signaling inhibition was associated with a specific pro-angiogenic gene expression signature, which included a significant elevation of DLL4 expression. By screening the NIH clinical collections of FDA-approved drugs, we identified tacrolimus (FK-506) as the most potent activator of ALK1 signaling in BMP9-challenged C2C12 reporter cells. In HUVECs, tacrolimus activated Smad1/5/8 and opposed the pro-angiogenic gene expression signature associated with ALK1 loss-of-function, by notably reducing Dll4 expression. In these cells, tacrolimus also inhibited Akt and p38 stimulation by vascular endothelial growth factor, a major driver of angiogenesis. In the BMP9/10-immunodepleted postnatal retina-a mouse model of HHT vascular pathology-tacrolimus activated endothelial Smad1/5/8 and prevented the Dll4 overexpression and hypervascularization associated with this model. Finally, tacrolimus stimulated Smad1/5/8 signaling in C2C12 cells expressing BMP9-unresponsive ALK1 HHT mutants and in HHT patient blood outgrowth ECs. Tacrolimus repurposing has therefore therapeutic potential in HHT.
Collapse
Affiliation(s)
- Santiago Ruiz
- Litwin-Zucker Research Center for the Study of Alzheimer's Disease
| | | | - Haitian Zhao
- Litwin-Zucker Research Center for the Study of Alzheimer's Disease
| | | | - Prodyot K Chatterjee
- Center for Biomedical Science, The Feinstein Institute for Medical Research, Manhasset, NY 11030, USA
| | - Erica Christen
- Litwin-Zucker Research Center for the Study of Alzheimer's Disease
| | - Christine N Metz
- Center for Biomedical Science, The Feinstein Institute for Medical Research, Manhasset, NY 11030, USA
- Hofstra Northwell School of Medicine, Hempstead, NY 11549, USA
| | - Lionel Blanc
- Center for Autoimmune and Musculoskeletal Disorders
- Hofstra Northwell School of Medicine, Hempstead, NY 11549, USA
| | - Fabien Campagne
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine
- Department of Physiology and Biophysics, The Weill Cornell Medical College, New York, NY 10021, USA
| | - Philippe Marambaud
- Litwin-Zucker Research Center for the Study of Alzheimer's Disease
- Hofstra Northwell School of Medicine, Hempstead, NY 11549, USA
| |
Collapse
|
3
|
A mouse model of hereditary hemorrhagic telangiectasia generated by transmammary-delivered immunoblocking of BMP9 and BMP10. Sci Rep 2016; 5:37366. [PMID: 27874028 PMCID: PMC5118799 DOI: 10.1038/srep37366] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 10/27/2016] [Indexed: 01/03/2023] Open
Abstract
Hereditary hemorrhagic telangiectasia (HHT) is a potentially life-threatening genetic vascular disorder caused by loss-of-function mutations in the genes encoding activin receptor-like kinase 1 (ALK1), endoglin, Smad4, and bone morphogenetic protein 9 (BMP9). Injections of mouse neonates with BMP9/10 blocking antibodies lead to HHT-like vascular defects in the postnatal retinal angiogenesis model. Mothers and their newborns share the same immunity through the transfer of maternal antibodies during lactation. Here, we investigated whether the transmammary delivery route could improve the ease and consistency of administering anti-BMP9/10 antibodies in the postnatal retinal angiogenesis model. We found that anti-BMP9/10 antibodies, when intraperitoneally injected into lactating dams, are efficiently transferred into the blood circulation of lactationally-exposed neonatal pups. Strikingly, pups receiving anti-BMP9/10 antibodies via lactation displayed consistent and robust vascular pathology in the retina, which included hypervascularization and defects in arteriovenous specification, as well as the presence of multiple and massive arteriovenous malformations. Furthermore, RNA-Seq analyses of neonatal retinas identified an increase in the key pro-angiogenic factor, angiopoietin-2, as the most significant change in gene expression triggered by the transmammary delivery of anti-BMP9/10 antibodies. Transmammary-delivered BMP9/10 immunoblocking in the mouse neonatal retina is therefore a practical, noninvasive, reliable, and robust model of HHT vascular pathology.
Collapse
|
4
|
Mesnard L, Muthukumar T, Burbach M, Li C, Shang H, Dadhania D, Lee JR, Sharma VK, Xiang J, Suberbielle C, Carmagnat M, Ouali N, Rondeau E, Friedewald JJ, Abecassis MM, Suthanthiran M, Campagne F. Exome Sequencing and Prediction of Long-Term Kidney Allograft Function. PLoS Comput Biol 2016; 12:e1005088. [PMID: 27684477 PMCID: PMC5042552 DOI: 10.1371/journal.pcbi.1005088] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 07/26/2016] [Indexed: 11/18/2022] Open
Abstract
Current strategies to improve graft outcome following kidney transplantation consider information at the human leukocyte antigen (HLA) loci. Cell surface antigens, in addition to HLA, may serve as the stimuli as well as the targets for the anti-allograft immune response and influence long-term graft outcomes. We therefore performed exome sequencing of DNA from kidney graft recipients and their living donors and estimated all possible cell surface antigens mismatches for a given donor/recipient pair by computing the number of amino acid mismatches in trans-membrane proteins. We designated this tally as the allogenomics mismatch score (AMS). We examined the association between the AMS and post-transplant estimated glomerular filtration rate (eGFR) using mixed models, considering transplants from three independent cohorts (a total of 53 donor-recipient pairs, 106 exomes, and 239 eGFR measurements). We found that the AMS has a significant effect on eGFR (mixed model, effect size across the entire range of the score: -19.4 [-37.7, -1.1], P = 0.0042, χ2 = 8.1919, d.f. = 1) that is independent of the HLA-A, B, DR matching, donor age, and time post-transplantation. The AMS effect is consistent across the three independent cohorts studied and similar to the strong effect size of donor age. Taken together, these results show that the AMS, a novel tool to quantify amino acid mismatches in trans-membrane proteins in individual donor/recipient pair, is a strong, robust predictor of long-term graft function in kidney transplant recipients.
Collapse
Affiliation(s)
- Laurent Mesnard
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, United States of America; Department of Physiology and Biophysics, The Weill Cornell Medical College, New York, New York, United States of America
- INSERM UMR1155 et Service des Urgences Néphrologiques et Transplantation Rénale, APHP, Hôpital Tenon, Paris, France
- Sorbonne Universités, UPMC Université Paris 06, Paris, France
| | - Thangamani Muthukumar
- Division of Nephrology and Hypertension, Weill Cornell Medical College, New York, New York, United States of America
- Department of Transplantation Medicine, New York Presbyterian Hospital, New York, New York, United States of America
| | - Maren Burbach
- Division of Nephrology and Hypertension, Weill Cornell Medical College, New York, New York, United States of America
| | - Carol Li
- Division of Nephrology and Hypertension, Weill Cornell Medical College, New York, New York, United States of America
| | - Huimin Shang
- Genomics Core Facility, Weill Cornell Medical College, New York, New York, United States of America
| | - Darshana Dadhania
- Division of Nephrology and Hypertension, Weill Cornell Medical College, New York, New York, United States of America
- Department of Transplantation Medicine, New York Presbyterian Hospital, New York, New York, United States of America
| | - John R. Lee
- Division of Nephrology and Hypertension, Weill Cornell Medical College, New York, New York, United States of America
- Department of Transplantation Medicine, New York Presbyterian Hospital, New York, New York, United States of America
| | - Vijay K. Sharma
- Division of Nephrology and Hypertension, Weill Cornell Medical College, New York, New York, United States of America
| | - Jenny Xiang
- Genomics Core Facility, Weill Cornell Medical College, New York, New York, United States of America
| | | | | | - Nacera Ouali
- INSERM UMR1155 et Service des Urgences Néphrologiques et Transplantation Rénale, APHP, Hôpital Tenon, Paris, France
| | - Eric Rondeau
- INSERM UMR1155 et Service des Urgences Néphrologiques et Transplantation Rénale, APHP, Hôpital Tenon, Paris, France
- Sorbonne Universités, UPMC Université Paris 06, Paris, France
| | - John J. Friedewald
- Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Michael M. Abecassis
- Comprehensive Transplant Center, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Manikkam Suthanthiran
- Division of Nephrology and Hypertension, Weill Cornell Medical College, New York, New York, United States of America
- Department of Transplantation Medicine, New York Presbyterian Hospital, New York, New York, United States of America
| | - Fabien Campagne
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, United States of America; Department of Physiology and Biophysics, The Weill Cornell Medical College, New York, New York, United States of America
| |
Collapse
|
5
|
Garrett-Bakelman FE, Sheridan CK, Kacmarczyk TJ, Ishii J, Betel D, Alonso A, Mason CE, Figueroa ME, Melnick AM. Enhanced reduced representation bisulfite sequencing for assessment of DNA methylation at base pair resolution. J Vis Exp 2015:e52246. [PMID: 25742437 PMCID: PMC4354670 DOI: 10.3791/52246] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
DNA methylation pattern mapping is heavily studied in normal and diseased tissues. A variety of methods have been established to interrogate the cytosine methylation patterns in cells. Reduced representation of whole genome bisulfite sequencing was developed to detect quantitative base pair resolution cytosine methylation patterns at GC-rich genomic loci. This is accomplished by combining the use of a restriction enzyme followed by bisulfite conversion. Enhanced Reduced Representation Bisulfite Sequencing (ERRBS) increases the biologically relevant genomic loci covered and has been used to profile cytosine methylation in DNA from human, mouse and other organisms. ERRBS initiates with restriction enzyme digestion of DNA to generate low molecular weight fragments for use in library preparation. These fragments are subjected to standard library construction for next generation sequencing. Bisulfite conversion of unmethylated cytosines prior to the final amplification step allows for quantitative base resolution of cytosine methylation levels in covered genomic loci. The protocol can be completed within four days. Despite low complexity in the first three bases sequenced, ERRBS libraries yield high quality data when using a designated sequencing control lane. Mapping and bioinformatics analysis is then performed and yields data that can be easily integrated with a variety of genome-wide platforms. ERRBS can utilize small input material quantities making it feasible to process human clinical samples and applicable in a range of research applications. The video produced demonstrates critical steps of the ERRBS protocol.
Collapse
Affiliation(s)
| | | | | | | | - Doron Betel
- Department of Medicine, Weill Cornell Medical College; Institute for Computational Biomedicine, Weill Cornell Medical College
| | - Alicia Alonso
- Department of Medicine, Weill Cornell Medical College
| | | | | | - Ari M Melnick
- Department of Medicine, Weill Cornell Medical College
| |
Collapse
|
6
|
Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet 2015; 16:85-97. [PMID: 25582081 DOI: 10.1038/nrg3868] [Citation(s) in RCA: 542] [Impact Index Per Article: 60.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Recent technological advances have expanded the breadth of available omic data, from whole-genome sequencing data, to extensive transcriptomic, methylomic and metabolomic data. A key goal of analyses of these data is the identification of effective models that predict phenotypic traits and outcomes, elucidating important biomarkers and generating important insights into the genetic underpinnings of the heritability of complex traits. There is still a need for powerful and advanced analysis strategies to fully harness the utility of these comprehensive high-throughput data, identifying true associations and reducing the number of false associations. In this Review, we explore the emerging approaches for data integration - including meta-dimensional and multi-staged analyses - which aim to deepen our understanding of the role of genetics and genomics in complex outcomes. With the use and further development of these approaches, an improved understanding of the relationship between genomic variation and human phenotypes may be revealed.
Collapse
Affiliation(s)
- Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Emily R Holzinger
- National Human Genome Research Institute, Inherited Disease Research Branch, Baltimore, Maryland 21224, USA
| | - Ruowang Li
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Sarah A Pendergrass
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Dokyoon Kim
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
7
|
Marcinkiewicz KM, Gudas LJ. Altered histone mark deposition and DNA methylation at homeobox genes in human oral squamous cell carcinoma. J Cell Physiol 2014; 229:1405-16. [PMID: 24519855 DOI: 10.1002/jcp.24577] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Accepted: 01/16/2014] [Indexed: 01/03/2023]
Abstract
We recently reported a role of polycomb repressive complex 2 (PRC2) and PRC2 trimethylation of histone 3 lysine 27 (H3K27me3) in the regulation of homeobox (HOX) (Marcinkiewicz and Gudas, 2013, Exp Cell Res) gene transcript levels in human oral keratinocytes (OKF6-TERT1R) and tongue squamous cell carcinoma (SCC) cells. Here, we assessed both the levels of various histone modifications at a subset of homeobox genes and genome wide DNA methylation patterns in OKF6-TERT1R and SCC-9 cells by using ERRBS (enhanced reduced representation bisulfite sequencing). We detected the H3K9me3 mark at HOXB7, HOXC10, HOXC13, and HOXD8 at levels higher in OKF6-TERT1R than in SCC-9 cells; at IRX1 and SIX2 the H3K9me3 levels were conversely higher in SCC-9 than in OKF6-TERT1R. The H3K79me3 mark was detectable only at IRX1 in OKF6-TERT1R and at IRX4 in SCC-9 cells. The levels of H3K4me3 and H3K36me3 marks correlate with the transcript levels of the assessed homeobox genes in both OKF6-TERT1R and SCC-9. We detected generally lower CpG methylation levels on DNA in SCC-9 cells at annotated genomic regions which were differentially methylated between OKF6-TERT1R and SCC-9 cells; however, some genomic regions, including the HOX gene clusters, showed DNA methylation at higher levels in SCC-9 than OKF6-TERT1R. Thus, both altered histone modification patterns and changes in DNA methylation are associated with dysregulation of homeobox gene expression in human oral cavity SCC cells, and this dysregulation potentially plays a role in the neoplastic phenotype of oral keratinocytes.
Collapse
Affiliation(s)
- Katarzyna M Marcinkiewicz
- Department of Pharmacology, Weill Cornell Medical College and Weill Graduate School of Biomedical Sciences of Cornell University, New York, New York
| | | |
Collapse
|
8
|
Adusumalli S, Mohd Omar MF, Soong R, Benoukraf T. Methodological aspects of whole-genome bisulfite sequencing analysis. Brief Bioinform 2014; 16:369-79. [DOI: 10.1093/bib/bbu016] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 04/17/2014] [Indexed: 12/17/2022] Open
|
9
|
Pollock A, Bian S, Zhang C, Chen Z, Sun T. Growth of the developing cerebral cortex is controlled by microRNA-7 through the p53 pathway. Cell Rep 2014; 7:1184-96. [PMID: 24813889 DOI: 10.1016/j.celrep.2014.04.003] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 02/18/2014] [Accepted: 04/02/2014] [Indexed: 11/25/2022] Open
Abstract
Proper growth of the mammalian cerebral cortex is crucial for normal brain functions and is controlled by precise gene-expression regulation. Here, we show that microRNA-7 (miR-7) is highly expressed in cortical neural progenitors and describe miR-7 sponge transgenic mice in which miR-7-silencing activity is specifically knocked down in the embryonic cortex. Blocking miR-7 function causes microcephaly-like brain defects due to reduced intermediate progenitor (IP) production and apoptosis. Upregulation of miR-7 target genes, including those implicated in the p53 pathway, such as Ak1 and Cdkn1a (p21), is responsible for abnormalities in neural progenitors. Furthermore, ectopic expression of Ak1 or p21 and specific blockade of miR-7 binding sites in target genes using protectors in vivo induce similarly reduced IP production. Using conditional miRNA sponge transgenic approaches, we uncovered an unexpected role for miR-7 in cortical growth through its interactions with genes in the p53 pathway.
Collapse
Affiliation(s)
- Andrew Pollock
- Department of Cell and Developmental Biology, Weill Medical College of Cornell University, 1300 York Avenue, Box 60, New York, NY 10065, USA
| | - Shan Bian
- Department of Cell and Developmental Biology, Weill Medical College of Cornell University, 1300 York Avenue, Box 60, New York, NY 10065, USA
| | - Chao Zhang
- Department of Medicine and Institute for Computational Biomedicine, Weill Medical College of Cornell University, New York, NY 10065, USA
| | - Zhengming Chen
- Division of Biostatistics and Epidemiology, Department of Public Health, Weill Medical College of Cornell University, New York, NY 10065, USA
| | - Tao Sun
- Department of Cell and Developmental Biology, Weill Medical College of Cornell University, 1300 York Avenue, Box 60, New York, NY 10065, USA; School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
10
|
Li P, Jiang X, Wang S, Kim J, Xiong H, Ohno-Machado L. HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads. J Am Med Inform Assoc 2014; 21:363-73. [PMID: 24368726 PMCID: PMC3932469 DOI: 10.1136/amiajnl-2013-002147] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Revised: 11/27/2013] [Accepted: 12/03/2013] [Indexed: 11/07/2022] Open
Abstract
BACKGROUND AND OBJECTIVE Short-read sequencing is becoming the standard of practice for the study of structural variants associated with disease. However, with the growth of sequence data largely surpassing reasonable storage capability, the biomedical community is challenged with the management, transfer, archiving, and storage of sequence data. METHODS We developed Hierarchical mUlti-reference Genome cOmpression (HUGO), a novel compression algorithm for aligned reads in the sorted Sequence Alignment/Map (SAM) format. We first aligned short reads against a reference genome and stored exactly mapped reads for compression. For the inexact mapped or unmapped reads, we realigned them against different reference genomes using an adaptive scheme by gradually shortening the read length. Regarding the base quality value, we offer lossy and lossless compression mechanisms. The lossy compression mechanism for the base quality values uses k-means clustering, where a user can adjust the balance between decompression quality and compression rate. The lossless compression can be produced by setting k (the number of clusters) to the number of different quality values. RESULTS The proposed method produced a compression ratio in the range 0.5-0.65, which corresponds to 35-50% storage savings based on experimental datasets. The proposed approach achieved 15% more storage savings over CRAM and comparable compression ratio with Samcomp (CRAM and Samcomp are two of the state-of-the-art genome compression algorithms). The software is freely available at https://sourceforge.net/projects/hierachicaldnac/with a General Public License (GPL) license. LIMITATION Our method requires having different reference genomes and prolongs the execution time for additional alignments. CONCLUSIONS The proposed multi-reference-based compression algorithm for aligned reads outperforms existing single-reference based algorithms.
Collapse
Affiliation(s)
- Pinghao Li
- EE Department, Shanghai Jiaotong University, Shanghai, China
| | - Xiaoqian Jiang
- Division of Biomedical Informatics, University of California–San Diego, La Jolla, California, USA
| | - Shuang Wang
- Division of Biomedical Informatics, University of California–San Diego, La Jolla, California, USA
| | - Jihoon Kim
- Division of Biomedical Informatics, University of California–San Diego, La Jolla, California, USA
| | - Hongkai Xiong
- EE Department, Shanghai Jiaotong University, Shanghai, China
| | - Lucila Ohno-Machado
- Division of Biomedical Informatics, University of California–San Diego, La Jolla, California, USA
| |
Collapse
|
11
|
Simi M, Campagne F. Composable languages for bioinformatics: the NYoSh experiment. PeerJ 2014; 2:e241. [PMID: 24482760 PMCID: PMC3898313 DOI: 10.7717/peerj.241] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Accepted: 12/17/2013] [Indexed: 11/20/2022] Open
Abstract
Language WorkBenches (LWBs) are software engineering tools that help domain experts develop solutions to various classes of problems. Some of these tools focus on non-technical users and provide languages to help organize knowledge while other workbenches provide means to create new programming languages. A key advantage of language workbenches is that they support the seamless composition of independently developed languages. This capability is useful when developing programs that can benefit from different levels of abstraction. We reasoned that language workbenches could be useful to develop bioinformatics software solutions. In order to evaluate the potential of language workbenches in bioinformatics, we tested a prominent workbench by developing an alternative to shell scripting. To illustrate what LWBs and Language Composition can bring to bioinformatics, we report on our design and development of NYoSh (Not Your ordinary Shell). NYoSh was implemented as a collection of languages that can be composed to write programs as expressive and concise as shell scripts. This manuscript offers a concrete illustration of the advantages and current minor drawbacks of using the MPS LWB. For instance, we found that we could implement an environment-aware editor for NYoSh that can assist the programmers when developing scripts for specific execution environments. This editor further provides semantic error detection and can be compiled interactively with an automatic build and deployment system. In contrast to shell scripts, NYoSh scripts can be written in a modern development environment, supporting context dependent intentions and can be extended seamlessly by end-users with new abstractions and language constructs. We further illustrate language extension and composition with LWBs by presenting a tight integration of NYoSh scripts with the GobyWeb system. The NYoSh Workbench prototype, which implements a fully featured integrated development environment for NYoSh is distributed at http://nyosh.campagnelab.org.
Collapse
Affiliation(s)
- Manuele Simi
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, The Weill Cornell Medical College , New York, NY , United States of America
| | - Fabien Campagne
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, The Weill Cornell Medical College , New York, NY , United States of America
| |
Collapse
|
12
|
Giancarlo R, Rombo SE, Utro F. Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies. Brief Bioinform 2013; 15:390-406. [DOI: 10.1093/bib/bbt088] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
|
13
|
Abstract
Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays.
Collapse
|
14
|
Kim HS, Mendiratta S, Kim J, Pecot CV, Larsen JE, Zubovych I, Seo BY, Kim J, Eskiocak B, Chung H, McMillan E, Wu S, De Brabander J, Komurov K, Toombs JE, Wei S, Peyton M, Williams N, Gazdar AF, Posner BA, Brekken RA, Sood AK, Deberardinis RJ, Roth MG, Minna JD, White MA. Systematic identification of molecular subtype-selective vulnerabilities in non-small-cell lung cancer. Cell 2013; 155:552-66. [PMID: 24243015 DOI: 10.1016/j.cell.2013.09.041] [Citation(s) in RCA: 139] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Revised: 08/15/2013] [Accepted: 08/30/2013] [Indexed: 01/27/2023]
Abstract
Context-specific molecular vulnerabilities that arise during tumor evolution represent an attractive intervention target class. However, the frequency and diversity of somatic lesions detected among lung tumors can confound efforts to identify these targets. To confront this challenge, we have applied parallel screening of chemical and genetic perturbations within a panel of molecularly annotated NSCLC lines to identify intervention opportunities tightly linked to molecular response indicators predictive of target sensitivity. Anchoring this analysis on a matched tumor/normal cell model from a lung adenocarcinoma patient identified three distinct target/response-indicator pairings that are represented with significant frequencies (6%-16%) in the patient population. These include NLRP3 mutation/inflammasome activation-dependent FLIP addiction, co-occurring KRAS and LKB1 mutation-driven COPI addiction, and selective sensitivity to a synthetic indolotriazine that is specified by a seven-gene expression signature. Target efficacies were validated in vivo, and mechanism-of-action studies informed generalizable principles underpinning cancer cell biology.
Collapse
Affiliation(s)
- Hyun Seok Kim
- Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Marcinkiewicz KM, Gudas LJ. Altered epigenetic regulation of homeobox genes in human oral squamous cell carcinoma cells. Exp Cell Res 2013; 320:128-43. [PMID: 24076275 DOI: 10.1016/j.yexcr.2013.09.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Revised: 09/11/2013] [Accepted: 09/17/2013] [Indexed: 12/18/2022]
Abstract
To gain insight into oral squamous cell carcinogenesis, we performed deep sequencing (RNAseq) of non-tumorigenic human OKF6-TERT1R and tumorigenic SCC-9 cells. Numerous homeobox genes are differentially expressed between OKF6-TERT1R and SCC-9 cells. Data from Oncomine, a cancer microarray database, also show that homeobox (HOX) genes are dysregulated in oral SCC patients. The activity of Polycomb repressive complexes (PRC), which causes epigenetic modifications, and retinoic acid (RA) signaling can control HOX gene transcription. HOXB7, HOXC10, HOXC13, and HOXD8 transcripts are higher in SCC-9 than in OKF6-TERT1R cells; using ChIP (chromatin immunoprecipitation) we detected PRC2 protein SUZ12 and the epigenetic H3K27me3 mark on histone H3 at these genes in OKF6-TERT1R, but not in SCC-9 cells. In contrast, IRX1, IRX4, SIX2 and TSHZ3 transcripts are lower in SCC-9 than in OKF6-TERT1R cells. We detected SUZ12 and the H3K27me3 mark at these genes in SCC-9, but not in OKF6-TERT1R cells. SUZ12 depletion increased HOXB7, HOXC10, HOXC13, and HOXD8 transcript levels and decreased the proliferation of OKF6-TERT1R cells. Transcriptional responses to RA are attenuated in SCC-9 versus OKF6-TERT1R cells. SUZ12 and H3K27me3 levels were not altered by RA at these HOX genes in SCC-9 and OKF6-TERT1R cells. We conclude that altered activity of PRC2 is associated with dysregulation of homeobox gene expression in human SCC cells, and that this dysregulation potentially plays a role in the neoplastic transformation of oral keratinocytes.
Collapse
Affiliation(s)
- Katarzyna M Marcinkiewicz
- Department of Pharmacology, Weill Cornell Medical College of Cornell University, 1300 York Avenue, New York, NY 10065, USA; Department of Pharmacology, Weill Cornell Graduate School of Medical Sciences of Cornell University, 1300 York Avenue, New York, NY 10065, USA
| | | |
Collapse
|
16
|
Novikov N, Evans T. Tmem88a mediates GATA-dependent specification of cardiomyocyte progenitors by restricting WNT signaling. Development 2013; 140:3787-98. [PMID: 23903195 DOI: 10.1242/dev.093567] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Biphasic control of WNT signaling is essential during cardiogenesis, but how the pathway switches from promoting cardiac mesoderm to restricting cardiomyocyte progenitor fate is unknown. We identified genes expressed in lateral mesoderm that are dysregulated in zebrafish when both gata5 and gata6 are depleted, causing a block to cardiomyocyte specification. This screen identified tmem88a, which is expressed in the early cardiac progenitor field and was previously implicated in WNT modulation by overexpression studies. Depletion of tmem88a results in a profound cardiomyopathy, secondary to impaired cardiomyocyte specification. In tmem88a morphants, activation of the WNT pathway exacerbates the cardiomyocyte deficiency, whereas WNT inhibition rescues progenitor cells and cardiogenesis. We conclude that specification of cardiac fate downstream of gata5/6 involves activation of the tmem88a gene to constrain WNT signaling and expand the number of cardiac progenitors. Tmem88a is a novel component of the regulatory mechanism controlling the second phase of biphasic WNT activity essential for embryonic cardiogenesis.
Collapse
Affiliation(s)
- Natasha Novikov
- Department of Surgery, Weill Cornell Medical College, Cornell University, 1300 York Ave., LC-708, New York, NY, USA
| | | |
Collapse
|
17
|
Fu L, Wang G, Shevchuk MM, Nanus DM, Gudas LJ. Activation of HIF2α in kidney proximal tubule cells causes abnormal glycogen deposition but not tumorigenesis. Cancer Res 2013; 73:2916-25. [PMID: 23447580 DOI: 10.1158/0008-5472.can-12-3983] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Renal cell carcinoma (RCC) is the most common primary cancer arising from the kidney in adults, with clear cell renal cell carcinoma (ccRCC) representing approximately 75% of all RCCs. Increased expression of the hypoxia-induced factors-1α (HIF1α) and HIF2α has been suggested as a pivotal step in ccRCC carcinogenesis, but this has not been thoroughly tested. Here, we report that expression of a constitutively activated form of HIF2α (P405A, P530A, and N851A, named as HIF2αM3) in the proximal tubules of mice is not sufficient to promote ccRCC by itself, nor does it enhance HIF1αM3 oncogenesis when coexpressed with constitutively active HIF1αM3. Neoplastic transformation in kidneys was not detected at up to 33 months of age, nor was increased expression of Ki67 (MKI67), γH2AX (H2AFX), or CD70 observed. Furthermore, the genome-wide transcriptome of the transgenic kidneys does not resemble human ccRCC. We conclude that a constitutively active HIF2α is not sufficient to cause neoplastic transformation of proximal tubules, arguing against the idea that HIF2α activation is critical for ccRCC tumorigenesis.
Collapse
Affiliation(s)
- Leiping Fu
- Department of Pharmacology and Pathology, Weill Cornell Medical College, Cornell University, New York, New York 10065, USA
| | | | | | | | | |
Collapse
|