1
|
Wang F, Li X, Li M, Liu W, Lu L, Li Y, Chen X, Yang S, Liu T, Cheng W, Weng L, Wang H, Lu D, Yao Q, Wang Y, Wu J, Wittkop T, Faham M, Zhou H, Hu H, Jin H, Hu Z, Ma D, Cheng X. Ultra-short cell-free DNA fragments enhance cancer early detection in a multi-analyte blood test combining mutation, protein and fragmentomics. Clin Chem Lab Med 2024; 62:168-177. [PMID: 37678194 DOI: 10.1515/cclm-2023-0541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/21/2023] [Indexed: 09/09/2023]
Abstract
OBJECTIVES Cancer morbidity and mortality can be reduced if the cancer is detected early. Cell-free DNA (cfDNA) fragmentomics emerged as a novel epigenetic biomarker for early cancer detection, however, it is still at its infancy and requires technical improvement. We sought to apply a single-strand DNA sequencing technology, for measuring genetic and fragmentomic features of cfDNA and evaluate the performance in detecting multiple cancers. METHODS Blood samples of 364 patients from six cancer types (colorectal, esophageal, gastric, liver, lung, and ovarian cancers) and 675 healthy individuals were included in this study. Circulating tumor DNA mutations, cfDNA fragmentomic features and a set of protein biomarkers were assayed. Sensitivity and specificity were reported by cancer types and stages. RESULTS Circular Ligation Amplification and sequencing (CLAmp-seq), a single-strand DNA sequencing technology, yielded a population of ultra-short fragments (<100 bp) than double-strand DNA preparation protocols and reveals a more significant size difference between cancer and healthy cfDNA fragments (25.84 bp vs. 16.05 bp). Analysis of the subnucleosomal peaks in ultra-short cfDNA fragments indicates that these peaks are regulatory element "footprints" and correlates with gene expression and cancer stages. At 98 % specificity, a prediction model using ctDNA mutations alone showed an overall sensitivity of 46 %; sensitivity reaches 60 % when protein is added, sensitivity further increases to 66 % when fragmentomics is also integrated. More improvements observed for samples representing earlier cancer stages than later ones. CONCLUSIONS These results suggest synergistic properties of protein, genetic and fragmentomics features in the identification of early-stage cancers.
Collapse
Affiliation(s)
- Fenfen Wang
- Gynecological Oncology Department, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, P.R. China
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, P.R. China
- Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Hangzhou, P.R. China
| | - Xinxing Li
- Department of Gastrointestinal Surgery, Tongji Hospital Medical College of Tongji University, Shanghai, P.R. China
| | - Mengxing Li
- Department of Thoracic Surgery, Changhai Hospital, Second Military Medical University, Shanghai, P.R. China
| | - Wendi Liu
- Department of Hepatobiliary Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, P.R. China
| | - Lingjia Lu
- Gynecological Oncology Department, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, P.R. China
| | - Yang Li
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, P.R. China
- Zhejiang Provincial Key Laboratory of Traditional Chinese Medicine for Reproductive Health Research, Hangzhou, P.R. China
- Women's Reproductive Health Key Laboratory of Zhejiang Province, Hangzhou, P.R. China
| | - Xiaojing Chen
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, P.R. China
- Zhejiang Provincial Key Laboratory of Traditional Chinese Medicine for Reproductive Health Research, Hangzhou, P.R. China
- Women's Reproductive Health Key Laboratory of Zhejiang Province, Hangzhou, P.R. China
| | - Siqi Yang
- Women's Reproductive Health Key Laboratory of Zhejiang Province, Hangzhou, P.R. China
| | - Tao Liu
- Department of Thoracic Surgery, Changhai Hospital, Second Military Medical University, Shanghai, P.R. China
| | - Wen Cheng
- Department of Thoracic Surgery, Changhai Hospital, Second Military Medical University, Shanghai, P.R. China
| | - Li Weng
- Department of Research and Development, AccuraGen Inc., San Jose, CA, USA
| | - Hongyan Wang
- Department of Research and Development, Shanghai Yunsheng Medical Laboratory Co., Ltd., Shanghai, P.R. China
| | - Dongsheng Lu
- Department of Bioinformatics, Shanghai Yunsheng Medical Laboratory Co., Ltd., Shanghai, P.R. China
| | - Qianqian Yao
- Department of Medical Science, Shanghai Yunsheng Medical Laboratory Co., Ltd., Shanghai, P.R. China
| | - Yingyu Wang
- Department of Bioinformatics, AccuraGen Inc., San Jose, CA, USA
| | - Johnny Wu
- Department of Bioinformatics, AccuraGen Inc., San Jose, CA, USA
| | - Tobias Wittkop
- Department of Bioinformatics, AccuraGen Inc., San Jose, CA, USA
| | | | - Huabang Zhou
- Department of Hepatobiliary Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, P.R. China
| | - Heping Hu
- Department of Hepatobiliary Medicine, Shanghai Eastern Hepatobiliary Surgery Hospital, Shanghai, P.R. China
| | - Hai Jin
- Department of Thoracic Surgery, Shanghai Changhai Hospital, Shanghai, P.R. China
| | - Zhiqian Hu
- Department of Gastrointestinal Surgery, Tongji Hospital Medical College of Tongji University, Shanghai, P.R. China
- Department of General Surgery, Changzheng Hospital Naval Medical University, Shanghai, P.R. China
| | - Ding Ma
- Department of Obstetrics and Gynaecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, P.R. China
| | - Xiaodong Cheng
- Gynecological Oncology Department, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, P.R. China
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, P.R. China
- Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Hangzhou, P.R. China
- Zhejiang Provincial Key Laboratory of Traditional Chinese Medicine for Reproductive Health Research, Hangzhou, P.R. China
| |
Collapse
|
2
|
Li X, Liu T, Bacchiocchi A, Li M, Cheng W, Wittkop T, Mendez F, Wang Y, Tang P, Yao Q, Bosenberg MW, Sznol M, Yan Q, Faham M, Weng L, Halaban R, Jin H, Hu Z. Ultra-sensitive molecular residual disease detection through whole genome sequencing with single-read error correction. medRxiv 2024:2024.01.13.24301070. [PMID: 38260271 PMCID: PMC10802755 DOI: 10.1101/2024.01.13.24301070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
While whole genome sequencing (WGS) of cell-free DNA (cfDNA) holds enormous promise for molecular residual disease (MRD) detection, its performance is limited by WGS error rate. Here we introduce AccuScan, an efficient cfDNA WGS technology that enables genome-wide error correction at single read level, achieving an error rate of 4.2×10 -7 , which is about two orders of magnitude lower than a read-centric de-noising method. When applied to MRD detection, AccuScan demonstrated analytical sensitivity down to 10 -6 circulating tumor allele fraction at 99% sample level specificity. In colorectal cancer, AccuScan showed 90% landmark sensitivity for predicting relapse. It also showed robust MRD performance with esophageal cancer using samples collected as early as 1 week after surgery, and predictive value for immunotherapy monitoring with melanoma patients. Overall, AccuScan provides a highly accurate WGS solution for MRD, empowering circulating tumor DNA detection at parts per million range without high sample input nor personalized reagents. One Sentence Summary AccuScan showed remarkable ultra-low limit of detection with a short turnaround time, low sample requirement and a simple workflow for MRD detection.
Collapse
|
3
|
Wu J, Li XX, Zhou H, Liu W, Wang F, Yao Q, Lu D, Lu H, Wang H, Wittkop T, Tang P, Hu H, Cheng X, Hu Z, Faham M, Weng L. Early cancer detection using multi-omic approach including epigenetic signal from ultra-small fragments. J Clin Oncol 2020. [DOI: 10.1200/jco.2020.38.15_suppl.e13561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
e13561 Background: Genetic and epigenetic signals from plasma cell-free DNA (cfDNA) as well as proteins have been shown to detect cancer early. We have previously developed Circular Ligation and Amplification (CLAmp-seq) to detect mutations very sensitively and demonstrated higher performance than molecular barcode-based methods. The same single strand based technology yielded a much larger proportion of small fragments ( < 100 bp) than traditional methods typically relying on double stranded ligation, allowing us to investigate epigenetic signature of cancer in ultra-short fragments. We sought to measure the performance using a multi-omics approach of these genetic and epigenetic changes as well as proteins in detecting colorectal cancer (CRC), ovarian cancer (OC) and hepatocellular carcinoma (HCC). Methods: A healthy and a late stage cancer sample were assessed by whole genome sequencing (WGS) using CLAmp-seq. Then we analyzed cfDNA from plasma samples of 731 patients, including 69 CRC, 57 HCC, 49 OC patients and 556 age-matched healthy individuals. Out of the diseased samples, the numbers for stages I-IV are 49, 39, 71, and 16, respectively. CLAmp-seq WGS was performed on 58 healthy and 66 cancer samples to discover cancer epigenetic signature. In addition, all the samples were analyzed for a panel of proteins and a CLAmp-seq targeted panel that includes known mutation sites. Results: Using CLAmp-Seq in late stage cancer showed 33% of its fragments as smaller than 100 bp compared to 15% in healthy and < 1% in late stage by double stranded library prep. In addition, the difference in fragment size between late stage cancer and healthy was 29bp using CLAmp-Seq and 12bp using traditional double stranded prep. This focused our attention to detect epigenetic signature specific to cancer on the small fragments using CLAmp-Seq. Using data from whole genome analysis we demonstrated a performance using the epigenetic signature alone of 50% sensitivity at 97% specificity. Combined with mutations and proteins, we obtained at specificity of 97% sensitivities of 50%, 88%, 88%, and 100% in stage I, II, III, and IV, respectively. At the same 97% specificity we obtained the sensitivities of 73%, 100%, and 85% in CRC, OC, and HCC, respectively. Conclusions: We have demonstrated that CLAmp-Seq detects small fragments that are enriched in cancer. We have found predictive epigenetic signature in these small fragments. When combined with mutations and proteins we obtained a performance of 80% sensitivity at 97% specificity.
Collapse
Affiliation(s)
| | - Xin-Xing Li
- Changzheng Hospital, the Second Military Medical University, Shanghai, China
| | - Huabang Zhou
- Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, China
| | - Wendi Liu
- Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, China
| | - Fenfen Wang
- Women’s Reproductive Health Key Laboratory of Zhejiang Province, Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Qianqian Yao
- Shanghai AccuraGen Biotechnology Co., Ltd., Shanghai, China
| | - Dongsheng Lu
- Shanghai AccuraGen Biotechnology Co., Ltd., Shanghai, China
| | - Huiqi Lu
- Shanghai AccuraGen Biotechnology Co., Ltd., Shanghai, China
| | - Hongyan Wang
- Shanghai AccuraGen Biotechnology Co., Ltd., Shanghai, China
| | | | | | - Heping Hu
- Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai, China
| | - Xiaodong Cheng
- Women’s Reproductive Health Key Laboratory of Zhejiang Province, Women’s Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Zhiqian Hu
- Changzheng Hospital, the Second Military Medical University, Shanghai, China
| | | | - Li Weng
- AccuraGen Inc., Menlo Park, CA
| |
Collapse
|
4
|
Lou J, Wang L, Weng L, Chen X, Li M, Guo Q, Yu W, Meng Q, Wang H, Wittkop T, Zhao G, Fahem M, Lin S. P1.09-13 Detection of Actionable Mutations in Plasma cfDNA Samples From NSCLC Patients Using a Novel Amplicon-Based Firefly NGS Assay. J Thorac Oncol 2018. [DOI: 10.1016/j.jtho.2018.08.789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
5
|
Wang L, Weng L, Chen X, Li M, Guo Q, Yu W, Wittkop T, Wang H, Fahem M, Lin S, Zhao GQ, Lou J. Abstract 938: Detection of actionable mutations in plasma cfDNA samples from patients with non-small cell lung carcinoma using a novel amplicon-based Firefly NGS assay. Cancer Res 2018. [DOI: 10.1158/1538-7445.am2018-938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Background: Detection of EGFR, KRAS and BRAF mutations can help guide cancer treatment for non-small cell lung cancer (NSCLC) patients. To identify an easy to use, accurate, multiplex molecular diagnostic assay, we evaluated the performance of a novel next-generation sequencing (NGS)-based cell-free DNA (cfDNA) assay, Firefly assay, which employs a concatemer-based noise suppression mechanism with an amplicon workflow.
Methods: Performance of amplicon based Firefly assay, with a panel covering EGFR, BRAF, and KRAS mutations designed for targeted therapy selection of NSCLC was first evaluated using a cfDNA reference standard and blank control samples. This panel was then used to analyze plasma cfDNA samples from 134 NSCLC cancer patients and 50 non-cancerous controls, and results were compared with tumor tissue ARMS and cfDNA ddPCR results.
Results: Firefly assay demonstrated superior sensitivity and specificity with median detection of 100% at allele frequency of 0.1% for 20ng of cfDNA and zero false positive in all blank control samples. In cfDNA from plasma collected before treatment, EGFR mutation detection by Firefly assay was 94% concordant with tumor tissue ARMS. Firefly assay demonstrated strong per-variant detection-rate concordance (98%) and allele frequency concordance (R2 = 0.95) when compared with cfDNA ddPCR result.
Conclusions: The amplicon based Firefly assay offers multiplex capacity, de novo variant detection, high sensitivity and specificity. Thus, Firefly assay is a kitable NGS solution for cfDNA analysis, which can help guide targeted therapy selection, drug resistance detection, and disease monitoring in NSCLC and other cancer patients.
Citation Format: Lin Wang, Li Weng, Xiao Chen, Min Li, Qiaomei Guo, Wenjun Yu, Tobias Wittkop, Hongyan Wang, Malek Fahem, Shengrong Lin, Grace Q. Zhao, Jiatao Lou. Detection of actionable mutations in plasma cfDNA samples from patients with non-small cell lung carcinoma using a novel amplicon-based Firefly NGS assay [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 938.
Collapse
Affiliation(s)
- Lin Wang
- 1Shanghai Thoracic Hospital, Shanghai, China
| | | | - Xiao Chen
- 3Jilin University Hospital, Jilin, China
| | - Min Li
- 4Zhongshan Hospital, Guangzhou, China
| | - Qiaomei Guo
- 1Shanghai Thoracic Hospital, Shanghai, China
| | - Wenjun Yu
- 1Shanghai Thoracic Hospital, Shanghai, China
| | | | | | | | | | | | - Jiatao Lou
- 6Shanghai Chest Hospital, Shanghai, China
| |
Collapse
|
6
|
Klinger M, Taniguchi R, Hu J, Hayes T, Wittkop T, Asbury T, Moorhead M, Emerson R, Sherwood A, Robins H, Faham M. A scalable multiplex assay enabling assessment of T cell receptor specificity to hundreds of self- and pathogen-derived antigens. The Journal of Immunology 2016. [DOI: 10.4049/jimmunol.196.supp.209.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Abstract
Monitoring antigen-specific T cells is critical for the study of immune responses and development of biomarkers and immunotherapeutics. We previously developed and validated a novel multiplex assay (MIRA, or Multiplexed Identification of T cell Receptor Antigen specificity) that combines conventional immune monitoring techniques and TCR repertoire sequencing to assess T cell specificity to large numbers of query antigens. MIRA is a sensitive assay enabling detection of antigen-specific TCR clonotypes well below the limit of detection of conventional immune monitoring assays including flow cytometry and ELISPOT. Here we report the results from a scaled-up version of the assay using 270 different query peptide antigens (159 self- and 111 pathogen-derived). We identified >500 TCR clonotypes at frequencies as low as 1 per million T cells that were specific to 41 query antigens from 6 healthy HLA-A*02-positive individuals. Most of the antigen-specific TCRs identified recognized one of 27 different peptides derived from a variety of pathogens including CMV, EBV, Flu, Rotavirus, HSV, mTB, WNV and HIV. A subset of antigen-specific TCRs recognized one of 14 different peptides derived from self including MART1, RCC, BCL-2, MAGE, STEAP1, KLK4, CAMEL and MOG. These data support the notion that escape and survival of self antigen-specific T cells occurs without causing overt autoimmunity in healthy individuals. We show here that MIRA can be used to assess TCR specificities to hundreds of query antigens simultaneously. The assay is highly scalable and can be easily modified to accommodate thousands of additional query antigens. This technology may be used to monitor T cell specificity to antigens relevant to infection, autoimmunity and cancer.
Collapse
|
7
|
Bivol A, Wittkop T, Davis D, Mooney SD. Genome and proteome annotation using automatically recognized concepts and functional networks. AMIA Jt Summits Transl Sci Proc 2013; 2013:26. [PMID: 24303290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Many tools have been developed for prediction of the function or disease association of genes and proteins, and this continues to be a highly active area of bioinformatics research. Typically, these methods predict which concepts should be annotated to genes or proteins, using terms from ontologies such as Gene Ontology (GO), largely overlooking other ontologies that are available. Here, we set out to broadly evaluate novel, automatically retrieved, gene-term annotations and identify those concepts of publicly available ontologies that can be predicted using a generalized tool for prediction of annotations. We identified terms that perform better than expected by chance using randomly generated gene sets and show that both manually curated terms in GO and automatically recognized terms can be used to develop reasonable predictive models. In all, we characterize terms in over 250 ontologies and identify more than 127,000 statistically significant terms that can be predicted on human genes.
Collapse
Affiliation(s)
- Adrian Bivol
- Buck Institute for Research on Aging, Novato, CA
| | | | | | | |
Collapse
|
8
|
Wittkop T, TerAvest E, Evani US, Fleisch KM, Berman AE, Powell C, Shah NH, Mooney SD. STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation. BMC Bioinformatics 2013; 14:53. [PMID: 23409969 PMCID: PMC3635999 DOI: 10.1186/1471-2105-14-53] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2012] [Accepted: 01/28/2013] [Indexed: 12/21/2022] Open
Abstract
Background Gene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets. However, we believe that researchers strive to test other hypotheses that fall outside of GO. Here, we developed and evaluated a tool for hypothesis generation from gene or protein lists using ontological concepts present in manually curated text that describes those genes and proteins. Results As a consequence we have developed the method Statistical Tracking of Ontological Phrases (STOP) that expands the realm of testable hypotheses in gene set enrichment analyses by integrating automated annotations of genes to terms from over 200 biomedical ontologies. While not as precise as manually curated terms, we find that the additional enriched concepts have value when coupled with traditional enrichment analyses using curated terms. Conclusion Multiple ontologies have been developed for gene and protein annotation, by using a dataset of both manually curated GO terms and automatically recognized concepts from curated text we can expand the realm of hypotheses that can be discovered. The web application STOP is available at http://mooneygroup.org/stop/.
Collapse
|
9
|
Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Pandey G, Yunes JM, Talwalkar AS, Repo S, Souza ML, Piovesan D, Casadio R, Wang Z, Cheng J, Fang H, Gough J, Koskinen P, Törönen P, Nokso-Koivisto J, Holm L, Cozzetto D, Buchan DWA, Bryson K, Jones DT, Limaye B, Inamdar H, Datta A, Manjari SK, Joshi R, Chitale M, Kihara D, Lisewski AM, Erdin S, Venner E, Lichtarge O, Rentzsch R, Yang H, Romero AE, Bhat P, Paccanaro A, Hamp T, Kaßner R, Seemayer S, Vicedo E, Schaefer C, Achten D, Auer F, Boehm A, Braun T, Hecht M, Heron M, Hönigschmid P, Hopf TA, Kaufmann S, Kiening M, Krompass D, Landerer C, Mahlich Y, Roos M, Björne J, Salakoski T, Wong A, Shatkay H, Gatzmann F, Sommer I, Wass MN, Sternberg MJE, Škunca N, Supek F, Bošnjak M, Panov P, Džeroski S, Šmuc T, Kourmpetis YAI, van Dijk ADJ, ter Braak CJF, Zhou Y, Gong Q, Dong X, Tian W, Falda M, Fontana P, Lavezzo E, Di Camillo B, Toppo S, Lan L, Djuric N, Guo Y, Vucetic S, Bairoch A, Linial M, Babbitt PC, Brenner SE, Orengo C, Rost B, Mooney SD, Friedberg I. A large-scale evaluation of computational protein function prediction. Nat Methods 2013; 10:221-7. [PMID: 23353650 PMCID: PMC3584181 DOI: 10.1038/nmeth.2340] [Citation(s) in RCA: 564] [Impact Index Per Article: 51.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 12/10/2012] [Indexed: 01/03/2023]
Abstract
A report on the results of the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
Collapse
Affiliation(s)
- Predrag Radivojac
- School of Informatics and Computing, Indiana University, Bloomington, Indiana, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Röttger R, Kalaghatgi P, Sun P, Soares SDC, Azevedo V, Wittkop T, Baumbach J. Density parameter estimation for finding clusters of homologous proteins--tracing actinobacterial pathogenicity lifestyles. ACTA ACUST UNITED AC 2012; 29:215-22. [PMID: 23142964 DOI: 10.1093/bioinformatics/bts653] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Homology detection is a long-standing challenge in computational biology. To tackle this problem, typically all-versus-all BLAST results are coupled with data partitioning approaches resulting in clusters of putative homologous proteins. One of the main problems, however, has been widely neglected: all clustering tools need a density parameter that adjusts the number and size of the clusters. This parameter is crucial but hard to estimate without gold standard data at hand. Developing a gold standard, however, is a difficult and time consuming task. Having a reliable method for detecting clusters of homologous proteins between a huge set of species would open opportunities for better understanding the genetic repertoire of bacteria with different lifestyles. RESULTS Our main contribution is a method for identifying a suitable and robust density parameter for protein homology detection without a given gold standard. Therefore, we study the core genome of 89 actinobacteria. This allows us to incorporate background knowledge, i.e. the assumption that a set of evolutionarily closely related species should share a comparably high number of evolutionarily conserved proteins (emerging from phylum-specific housekeeping genes). We apply our strategy to find genes/proteins that are specific for certain actinobacterial lifestyles, i.e. different types of pathogenicity. The whole study was performed with transitivity clustering, as it only requires a single intuitive density parameter and has been shown to be well applicable for the task of protein sequence clustering. Note, however, that the presented strategy generally does not depend on our clustering method but can easily be adapted to other clustering approaches. AVAILABILITY All results are publicly available at http://transclust.mmci.uni-saarland.de/actino_core/ or as Supplementary Material of this article. CONTACT roettger@mpi-inf.mpg.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Richard Röttger
- Max Planck Institute for Informatics, Saarland University, 66123 Saarbrücken, Germany.
| | | | | | | | | | | | | |
Collapse
|
11
|
Abstract
High-throughput biological experiments commonly result in a list of genes or proteins of interest. In order to understand the observed changes of the genes and to generate new hypotheses, one needs to understand the functions and roles of the genes and how those functions relate to the experimental conditions. Typically, statistical tests are performed in order to detect enriched Gene Ontology categories or pathways, i.e. the categories are observed in the genes of interest more often than is expected by chance. Depending on the number of genes and the complexity and quantity of functions in which they are involved, such an analysis can easily result in hundreds of enriched terms. To this end we developed DEFOG, a web-based application that facilitates the functional analysis of gene sets by hierarchically organizing the genes into functionally related modules. Our computational pipeline utilizes three powerful tools to achieve this goal: (1) GeneMANIA creates a functional consensus network of the genes of interest based on gene-list-specific data fusion of hundreds of genomic networks from publicly available sources; (2) Transitivity Clustering organizes those genes into a clear hierarchy of functionally related groups, and (3) Ontologizer performs a Gene Ontology enrichment analysis on the resulting gene clusters. DEFOG integrates this computational pipeline within an easy-to-use web interface, thus allowing for a novel visual analysis of gene sets that aids in the discovery of potentially important biological mechanisms and facilitates the creation of new hypotheses. DEFOG is available at http://www.mooneygroup.org/defog.
Collapse
Affiliation(s)
- Tobias Wittkop
- Buck Institute for Research on Aging, 8001 Redwood Blvd., Novato, CA 94945, USA.
| | | | | | | |
Collapse
|
12
|
Wittkop T, Rahmann S, Röttger R, Böcker S, Baumbach J. Extension and Robustness of Transitivity Clustering for Protein–Protein Interaction Network Analysis. ACTA ACUST UNITED AC 2011. [DOI: 10.1080/15427951.2011.604559] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
13
|
Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, Su G, Bader GD, Ferrin TE. clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 2011; 12:436. [PMID: 22070249 PMCID: PMC3262844 DOI: 10.1186/1471-2105-12-436] [Citation(s) in RCA: 389] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2011] [Accepted: 11/09/2011] [Indexed: 12/02/2022] Open
Abstract
Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager.
Collapse
Affiliation(s)
- John H Morris
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, USA.
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Wittkop T, Rahmann S, Baumbach J. Efficient Online Transcription Factor Binding Site Adjustment by Integrating Transitive Graph Projection with MoRAine 2.0. J Integr Bioinform 2010. [DOI: 10.1515/jib-2010-117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
SummaryWe investigated the problem of imprecisely determined prokaryotic transcription factor (TF) binding sites (TFBSs). We found that the identification and reinvestigation of questionable binding motifs may result in improved models of these motifs. Subsequent modelbased predictions of gene regulatory interactions may be performed with increased accuracy when the TFBSs annotation underlying these models has been re-adjusted.We present MoRAine 2.0, a significantly improved version of MoRAine. It can automatically identify cases of unfavorable TFBS strand annotations and imprecisely determined TFBS positions. With release 2.0, we close the gap between reasonable running time and high accuracy. Furthermore, it requires only minimal input from the user: (1) the input TFBS sequences and (2) the length of the flanking sequences.Conclusions: MoRAine 2.0 is an easy-to-use, integrated, and publicly available web tool for the re-annotation of questionable TFBSs. It can be used online or downloaded as a stand-alone version from http://moraine.cebitec.uni-bielefeld.de.
Collapse
|
15
|
Wittkop T, Rahmann S, Baumbach J. Efficient online transcription factor binding site adjustment by integrating transitive graph projection with MoRAine 2.0. J Integr Bioinform 2010; 7:461. [PMID: 20375458 DOI: 10.2390/biecoll-jib-2010-117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Revised: 02/17/2010] [Accepted: 03/25/2010] [Indexed: 05/29/2023] Open
Abstract
UNLABELLED We investigated the problem of imprecisely determined prokaryotic transcription factor (TF) binding sites (TFBSs). We found that the identification and reinvestigation of questionable binding motifs may result in improved models of these motifs. Subsequent modelbased predictions of gene regulatory interactions may be performed with increased accuracy when the TFBSs annotation underlying these models has been re-adjusted. We present MoRAine 2.0, a significantly improved version of MoRAine. It can automatically identify cases of unfavorable TFBS strand annotations and imprecisely determined TFBS positions. With release 2.0, we close the gap between reasonable running time and high accuracy. Furthermore, it requires only minimal input from the user: (1) the input TFBS sequences and (2) the length of the flanking sequences. CONCLUSIONS MoRAine 2.0 is an easy-to-use, integrated, and publicly available web tool for the re-annotation of questionable TFBSs. It can be used online or downloaded as a stand-alone version from http://moraine.cebitec.uni-bielefeld.de.
Collapse
Affiliation(s)
- Tobias Wittkop
- Genome Informatics, Bielefeld University, Bielefeld, Germany.
| | | | | |
Collapse
|
16
|
Baumbach J, Wittkop T, Kleindt CK, Tauch A. Integrated analysis and reconstruction of microbial transcriptional gene regulatory networks using CoryneRegNet. Nat Protoc 2009; 4:992-1005. [DOI: 10.1038/nprot.2009.81] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
17
|
Baumbach J, Wittkop T, Weile J, Kohl T, Rahmann S. MoRAine - A web server for fast computational transcription factor binding motif re-annotation. J Integr Bioinform 2008. [DOI: 10.1515/jib-2008-91] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
SummaryBackground: A precise experimental identification of transcription factor binding motifs (TFBMs), accurate to a single base pair, is time-consuming and difficult. For several databases, TFBM annotations are extracted from the literature and stored 5ʹ → 3ʹ relative to the target gene. Mixing the two possible orientations of a motif results in poor information content of subsequently computed position frequency matrices (PFMs) and sequence logos. Since these PFMs are used to predict further TFBMs, we address the question if the TFBMs underlying a PFM can be re-annotated automatically to improve both the information content of the PFM and subsequent classification performance.Results: We present MoRAine, an algorithm that re-annotates transcription factor binding motifs. Each motif with experimental evidence underlying a PFM is compared against each other such motif. The goal is to re-annotate TFBMs by possibly switching their strands and shifting them a few positions in order to maximize the information content of the resulting adjusted PFM. We present two heuristic strategies to perform this optimization and subsequently show that MoRAine significantly improves the corresponding sequence logos. Furthermore, we justify the method by evaluating specificity, sensitivity, true positive, and false positive rates of PFM-based TFBM predictions for E. coli using the original database motifs and the MoRAine-adjusted motifs. The classification performance is considerably increased if MoRAine is used as a preprocessing step.Conclusions: MoRAine is integrated into a publicly available web server and can be used online or downloaded as a stand-alone version from http://moraine.cebitec.uni-bielefeld.de.
Collapse
|
18
|
Baumbach J, Wittkop T, Rademacher K, Rahmann S, Brinkrolf K, Tauch A. CoryneRegNet 3.0—An interactive systems biology platform for the analysis of gene regulatory networks in corynebacteria and Escherichia coli. J Biotechnol 2007; 129:279-89. [PMID: 17229482 DOI: 10.1016/j.jbiotec.2006.12.012] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2006] [Revised: 11/22/2006] [Accepted: 12/04/2006] [Indexed: 11/30/2022]
Abstract
CoryneRegNet is an ontology-based data warehouse for the reconstruction and visualization of transcriptional regulatory interactions in prokaryotes. To extend the biological content of CoryneRegNet, we added comprehensive data on transcriptional regulations in the model organism Escherichia coli K-12, originally deposited in the international reference database RegulonDB. The enhanced web interface of CoryneRegNet offers several types of search options. The results of a search are displayed in a table-based style and include a visualization of the genetic organization of the respective gene region. Information on DNA binding sites of transcriptional regulators is depicted by sequence logos. The results can also be displayed by several layouters implemented in the graphical user interface GraphVis, allowing, for instance, the visualization of genome-wide network reconstructions and the homology-based inter-species comparison of reconstructed gene regulatory networks. In an application example, we compare the composition of the gene regulatory networks involved in the SOS response of E. coli and Corynebacterium glutamicum. CoryneRegNet is available at the following URL: http://www.cebitec.uni-bielefeld.de/groups/gi/software/coryneregnet/.
Collapse
Affiliation(s)
- Jan Baumbach
- Algorithms and Statistics for Systems Biology Group, Genominformatik, Technische Fakultät, Universität Bielefeld, Universitätsstrasse 25, D-33615 Bielefeld, Germany
| | | | | | | | | | | |
Collapse
|
19
|
Rahmann S, Wittkop T, Baumbach J, Martin M, Truss A, Böcker S. Exact and heuristic algorithms for weighted cluster editing. Comput Syst Bioinformatics Conf 2007; 6:391-401. [PMID: 17951842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Clustering objects according to given similarity or distance values is a ubiquitous problem in computational biology with diverse applications, e.g., in defining families of orthologous genes, or in the analysis of microarray experiments. While there exists a plenitude of methods, many of them produce clusterings that can be further improved. "Cleaning up" initial clusterings can be formalized as projecting a graph on the space of transitive graphs; it is also known as the cluster editing or cluster partitioning problem in the literature. In contrast to previous work on cluster editing, we allow arbitrary weights on the similarity graph. To solve the so-defined weighted transitive graph projection problem, we present (1) the first exact fixed-parameter algorithm, (2) a polynomial-time greedy algorithm that returns the optimal result on a well-defined subset of "close-to-transitive" graphs and works heuristically on other graphs, and (3) a fast heuristic that uses ideas similar to those from the Fruchterman-Reingold graph layout algorithm. We compare quality and running times of these algorithms on both artificial graphs and protein similarity graphs derived from the 66 organisms of the COG dataset.
Collapse
Affiliation(s)
- Sven Rahmann
- Computational Methods for Emerging Technologies group, Genome Informatics, Technische Fakultät, Bielefeld University, D-33594 Bielefeld, Germany.
| | | | | | | | | | | |
Collapse
|
20
|
Baumbach J, Brinkrolf K, Wittkop T, Tauch A, Rahmann S. CoryneRegNet 2: An Integrative Bioinformatics Approach for Reconstruction and Comparison of Transcriptional Regulatory Networks in Prokaryotes. J Integr Bioinform 2006. [DOI: 10.1515/jib-2006-24] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
SummaryCoryneRegNet is an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Now we integrated the genomes and transcriptional interactions of three other corynebacteria, C. diphtheriae, C. efficiens, and C. jeikeium into CoryneRegNet; providing comparative analysis and visualization with GraphVis. We also integrated the high-performance PSSM search tool PoSSuM search to detect potential transcription factor binding sites within and across species. As an application, we reconstruct in silico the regulatory network of the iron metabolism regulator DtxR in the four corynebacteria.CoryneRegNet is freely accessible at https://www.cebitec.uni-bielefeld.de/groups/gi/software/coryneregnet/. The final slash (/) is mandatory. In order to use the GraphVis feature, Java (at least version 1.4.2) is required.
Collapse
Affiliation(s)
- Jan Baumbach
- 1Algorithms and Statistics for Systems Biology group, Genome Informatics, Technische Fakultät, Universität Bielefeld, D-33594 Bielefeld, Germany Germany
- 2International NRW Graduate School in Bioinformatics and Genome Research, CeBiTec, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Karina Brinkrolf
- 3Institute for Genome Research, CeBiTec, Universität Bielefeld, D-33594 Bielefeld, Germany Germany
- 2International NRW Graduate School in Bioinformatics and Genome Research, CeBiTec, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Tobias Wittkop
- 4Algorithms and Statistics for Systems Biology group, Genome Informatics, Technische Fakultät, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Andreas Tauch
- 3Institute for Genome Research, CeBiTec, Universität Bielefeld, D-33594 Bielefeld, Germany Germany
- 2International NRW Graduate School in Bioinformatics and Genome Research, CeBiTec, Universität Bielefeld, D-33594 Bielefeld, Germany
| | - Sven Rahmann
- 1Algorithms and Statistics for Systems Biology group, Genome Informatics, Technische Fakultät, Universität Bielefeld, D-33594 Bielefeld, Germany Germany
- 2International NRW Graduate School in Bioinformatics and Genome Research, CeBiTec, Universität Bielefeld, D-33594 Bielefeld, Germany
| |
Collapse
|