1
|
Abstract 139: cfDNA methylation profiling distinguishes lineage-specific hematologic malignancies. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Introduction: Hematologic (heme) malignancies and their precursor conditions are highly prevalent. They are also diverse in biology, clinical presentation, and outcomes, underlining the importance of differentiating them. Previously, we demonstrated that a blood-based targeted methylation assay detected multiple cancer types across stages. Here, we examined test performance on various heme cancers, identifying specific methylation signatures.
Methods: From the second substudy (training set) of the Circulating Cell-free Genome Atlas (CCGA) study (NCT02889978), we evaluated 325 participants from 17 different heme disease subtypes and 3,211 non-cancer controls enrolled without a cancer diagnosis. A cross-validated mutual information-based algorithm was used to identify features that discriminated heme subtypes. The resulting feature distribution was visualized using uniform manifold approximation and projection (UMAP) dimensionality reduction on held-out data. In cross validation with feature selection, we then trained a multinomial classifier to distinguish from among the major heme cancers and non-cancer and correlated the model's class probabilities to positions in feature space.
Results: Dimensionality reduction and visualization of input features demonstrated that heme malignancies separated into five major clusters reflecting developmental lineages and disease ontogeny: myeloid, circulating lymphomas, hodgkin lymphomas, non-hodgkin lymphomas, and plasma cell neoplasm. The position of samples within each heme cluster correlated with the cancer signal strength. At 99.4% specificity [95% CI: 99.1, 99.7], heme cancer detection was 74.5% [69.4, 79.1] overall, 67.7% [41.1, 87.8] for myeloid, 77.9% [66.3, 86.9] for circulating lymphomas, 90.7% [73.2, 98.4] for hodgkin lymphomas, 68.6% [60.4, 76.1] for other non-hodgkin lymphomas, and 78.8% [67.0, 87.9] for plasma cell neoplasms. Of 18 non-cancer participants who were classified as having heme cancers, 4 were predicted as myeloid, 6 as circulating lymphoid, and 8 as other non-hodgkin lymphoid neoplasms (<1% false positive rate).
Conclusion: Methylation features of cfDNA in patients with heme malignancies delineated five major clusters that reflected disease ontogeny and heme lineage. Lineage-specific signals followed a gradient suggestive of variation in disease-related methylation or tumor DNA shedding. These findings contribute to the understanding of biological signals that arise from various heme conditions. Since in general, most cfDNA arises from blood lineages, this knowledge will guide further efforts towards removing interfering biological signals from cfDNA-based cancer detection assays and achieving even more sensitive detection of multiple cancer types.
Citation Format: Qinwen Liu, Rita Shaknovich, Xiaoji Chen, Zhao Dong, M. C. Maher, Samuel Gross, Alexander P. Fields, Jan Schellenberger, Kathryn N. Kurtzman, Eric T. Fung, Anne-Renee Hartman, Earl Hubbell, Arash Jamshidi, Alexander M. Aravanis, Oliver Venn. cfDNA methylation profiling distinguishes lineage-specific hematologic malignancies [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 139.
Collapse
|
2
|
Genome-wide cell-free DNA (cfDNA) methylation signatures and effect on tissue of origin (TOO) performance. J Clin Oncol 2019. [DOI: 10.1200/jco.2019.37.15_suppl.3049] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
3049 Background: For multi-cancer detection using cfDNA, TOO determination is critical to enable safe and efficient diagnostic follow-up. Previous array-based studies captured < 2% of genomic CpGs. Here, we report genome-wide fragment-level methylation patterns across 811 cancer cell methylomes representing 21 tumor types (97% of SEER cancer incidence), and define effects of this methylation database on TOO prediction within a machine learning framework. Methods: Genomic DNA from 655 formalin-fixed paraffin-embedded (FFPE) tumor tissues and 156 isolated cells from tumors was subjected to a prototype 30x whole-genome bisulfite sequencing (WGBS) assay, as previously reported in the Circulating Cell-free Genome Atlas (CCGA) study (NCT02889978). Two independent TOO models, one with and one without the methylation database, were fitted on training samples; each was used to predict on the test set. A WGBS classifier was used to detect cancer at 98% specificity; reported TOO results reflect percent agreement between predicted and true TOO among those detected cancers (166 cases: 81 stage I-III, 69 stage IV, 16 non-informative). Results: Genome-wide methylation data generated from this database allowed fragment-level analysis and coverage of ~30 million CpGs across the genome (~60-fold greater than array-based approaches). Incorrect TOO assignments decreased by 35% (20% to 13%) after incorporating methylation database information into TOO classification. Improvement was observed across all cancer types and was consistent in early-stage cancers (stage I-III). Respective performances in breast cancer (n = 23) were 87% vs 96%; in lung cancer (n = 32) were 85% vs 88%; in hepatobiliary (n = 10) were 70% vs 90%; and in pancreatic cancer (n = 17) were 94% vs 100%. Results using an optimized approach informed by these results in a large cohort of CCGA participants will be reported. Conclusions: Incorporating data from a large methylation database improved TOO performance in multiple cancer types. This supports feasibility of this methylation-based approach as an early cancer detection test across cancer types. Clinical trial information: NCT02889978.
Collapse
|
3
|
Abstract
Pathways are a universal paradigm for functionally describing cellular processes. Even though advances in high-throughput data generation have transformed biology, the core of our biological understanding, and hence data interpretation, is still predicated on human-defined pathways. Here, we introduce an unbiased, pathway structure for genome-scale metabolic networks defined based on principles of parsimony that do not mimic canonical human-defined textbook pathways. Instead, these minimal pathways better describe multiple independent pathway-associated biomolecular interaction datasets suggesting a functional organization for metabolism based on parsimonious use of cellular components. We use the inherent predictive capability of these pathways to experimentally discover novel transcriptional regulatory interactions in Escherichia coli metabolism for three transcription factors, effectively doubling the known regulatory roles for Nac and MntR. This study suggests an underlying and fundamental principle in the evolutionary selection of pathway structures; namely, that pathways may be minimal, independent, and segregated.
Collapse
|
4
|
Predicting outcomes of steady-state ¹³C isotope tracing experiments using Monte Carlo sampling. BMC SYSTEMS BIOLOGY 2012; 6:9. [PMID: 22289253 PMCID: PMC3323462 DOI: 10.1186/1752-0509-6-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2011] [Accepted: 01/30/2012] [Indexed: 01/04/2023]
Abstract
BACKGROUND Carbon-13 (13C) analysis is a commonly used method for estimating reaction rates in biochemical networks. The choice of carbon labeling pattern is an important consideration when designing these experiments. We present a novel Monte Carlo algorithm for finding the optimal substrate input label for a particular experimental objective (flux or flux ratio). Unlike previous work, this method does not require assumption of the flux distribution beforehand. RESULTS Using a large E. coli isotopomer model, different commercially available substrate labeling patterns were tested computationally for their ability to determine reaction fluxes. The choice of optimal labeled substrate was found to be dependent upon the desired experimental objective. Many commercially available labels are predicted to be outperformed by complex labeling patterns. Based on Monte Carlo Sampling, the dimensionality of experimental data was found to be considerably less than anticipated, suggesting that effectiveness of 13C experiments for determining reaction fluxes across a large-scale metabolic network is less than previously believed. CONCLUSIONS While 13C analysis is a useful tool in systems biology, high redundancy in measurements limits the information that can be obtained from each experiment. It is however possible to compute potential limitations before an experiment is run and predict whether, and to what degree, the rate of each reaction can be resolved.
Collapse
|
5
|
Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat Protoc 2011; 6:1290-307. [PMID: 21886097 DOI: 10.1038/nprot.2011.308] [Citation(s) in RCA: 980] [Impact Index Per Article: 75.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Over the past decade, a growing community of researchers has emerged around the use of constraint-based reconstruction and analysis (COBRA) methods to simulate, analyze and predict a variety of metabolic phenotypes using genome-scale models. The COBRA Toolbox, a MATLAB package for implementing COBRA methods, was presented earlier. Here we present a substantial update of this in silico toolbox. Version 2.0 of the COBRA Toolbox expands the scope of computations by including in silico analysis methods developed since its original release. New functions include (i) network gap filling, (ii) (13)C analysis, (iii) metabolic engineering, (iv) omics-guided analysis and (v) visualization. As with the first version, the COBRA Toolbox reads and writes systems biology markup language-formatted models. In version 2.0, we improved performance, usability and the level of documentation. A suite of test scripts can now be used to learn the core functionality of the toolbox and validate results. This toolbox lowers the barrier of entry to use powerful COBRA methods.
Collapse
|
6
|
|
7
|
|
8
|
Elimination of thermodynamically infeasible loops in steady-state metabolic models. Biophys J 2011; 100:544-553. [PMID: 21281568 PMCID: PMC3030201 DOI: 10.1016/j.bpj.2010.12.3707] [Citation(s) in RCA: 135] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Revised: 11/16/2010] [Accepted: 12/10/2010] [Indexed: 01/03/2023] Open
Abstract
The constraint-based reconstruction and analysis (COBRA) framework has been widely used to study steady-state flux solutions in genome-scale metabolic networks. One shortcoming of current COBRA methods is the possible violation of the loop law in the computed steady-state flux solutions. The loop law is analogous to Kirchhoff's second law for electric circuits, and states that at steady state there can be no net flux around a closed network cycle. Although the consequences of the loop law have been known for years, it has been computationally difficult to work with. Therefore, the resulting loop-law constraints have been overlooked. Here, we present a general mixed integer programming approach called loopless COBRA (ll-COBRA), which can be used to eliminate all steady-state flux solutions that are incompatible with the loop law. We apply this approach to improve flux predictions on three common COBRA methods: flux balance analysis, flux variability analysis, and Monte Carlo sampling of the flux space. Moreover, we demonstrate that the imposition of loop-law constraints with ll-COBRA improves the consistency of simulation results with experimental data. This method provides an additional constraint for many COBRA methods, enabling the acquisition of more realistic simulation results.
Collapse
|
9
|
Functional characterization of alternate optimal solutions of Escherichia coli's transcriptional and translational machinery. Biophys J 2010; 98:2072-81. [PMID: 20483314 DOI: 10.1016/j.bpj.2010.01.060] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2009] [Revised: 01/08/2010] [Accepted: 01/22/2010] [Indexed: 12/24/2022] Open
Abstract
The constraint-based reconstruction and analysis approach has recently been extended to describe Escherichia coli's transcriptional and translational machinery. Here, we introduce the concept of reaction coupling to represent the dependency between protein synthesis and utilization. These coupling constraints lead to a significant contraction of the feasible set of steady-state fluxes. The subset of alternate optimal solutions (AOS) consistent with maximal ribosome production was calculated. The majority of transcriptional and translational reactions were active for all of these AOS, showing that the network has a low degree of redundancy. Furthermore, all calculated AOS contained the qualitative expression of at least 92% of the known essential genes. Principal component analysis of AOS demonstrated that energy currencies (ATP, GTP, and phosphate) dominate the network's capability to produce ribosomes. Additionally, we identified regulatory control points of the network, which include the transcription reactions of sigma70 (RpoD) as well as that of a degradosome component (Rne) and of tRNA charging (ValS). These reactions contribute significant variance among AOS. These results show that constraint-based modeling can be applied to gain insight into the systemic properties of E. coli's transcriptional and translational machinery.
Collapse
|
10
|
Model-driven evaluation of the production potential for growth-coupled products of Escherichia coli. Metab Eng 2010; 12:173-86. [PMID: 19840862 PMCID: PMC3125152 DOI: 10.1016/j.ymben.2009.10.003] [Citation(s) in RCA: 165] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2009] [Accepted: 10/12/2009] [Indexed: 12/11/2022]
Abstract
Integrated approaches utilizing in silico analyses will be necessary to successfully advance the field of metabolic engineering. Here, we present an integrated approach through a systematic model-driven evaluation of the production potential for the bacterial production organism Escherichia coli to produce multiple native products from different representative feedstocks through coupling metabolite production to growth rate. Designs were examined for 11 unique central metabolism and amino acid targets from three different substrates under aerobic and anaerobic conditions. Optimal strain designs were reported for designs which possess maximum yield, substrate-specific productivity, and strength of growth-coupling for up to 10 reaction eliminations (knockouts). In total, growth-coupled designs could be identified for 36 out of the total 54 conditions tested, corresponding to eight out of the 11 targets. There were 17 different substrate/target pairs for which over 80% of the theoretical maximum potential could be achieved. The developed method introduces a new concept of objective function tilting for strain design. This study provides specific metabolic interventions (strain designs) for production strains that can be experimentally implemented, characterizes the potential for E. coli to produce native compounds, and outlines a strain design pipeline that can be utilized to design production strains for additional organisms.
Collapse
|
11
|
BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinformatics 2010; 11:213. [PMID: 20426874 PMCID: PMC2874806 DOI: 10.1186/1471-2105-11-213] [Citation(s) in RCA: 356] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2009] [Accepted: 04/29/2010] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Genome-scale metabolic reconstructions under the Constraint Based Reconstruction and Analysis (COBRA) framework are valuable tools for analyzing the metabolic capabilities of organisms and interpreting experimental data. As the number of such reconstructions and analysis methods increases, there is a greater need for data uniformity and ease of distribution and use. DESCRIPTION We describe BiGG, a knowledgebase of Biochemically, Genetically and Genomically structured genome-scale metabolic network reconstructions. BiGG integrates several published genome-scale metabolic networks into one resource with standard nomenclature which allows components to be compared across different organisms. BiGG can be used to browse model content, visualize metabolic pathway maps, and export SBML files of the models for further analysis by external software packages. Users may follow links from BiGG to several external databases to obtain additional information on genes, proteins, reactions, metabolites and citations of interest. CONCLUSIONS BiGG addresses a need in the systems biology community to have access to high quality curated metabolic models and reconstructions. It is freely available for academic use at http://bigg.ucsd.edu.
Collapse
|
12
|
Abstract
Genome-scale metabolic network reconstructions in microorganisms have been formulated and studied for about 8 years. The constraint-based approach has shown great promise in analyzing the systemic properties of these network reconstructions. Notably, constraint-based models have been used successfully to predict the phenotypic effects of knock-outs and for metabolic engineering. The inherent uncertainty in both parameters and variables of large-scale models is significant and is well suited to study by Monte Carlo sampling of the solution space. These techniques have been applied extensively to the reaction rate (flux) space of networks, with more recent work focusing on dynamic/kinetic properties. Monte Carlo sampling as an analysis tool has many advantages, including the ability to work with missing data, the ability to apply post-processing techniques, and the ability to quantify uncertainty and to optimize experiments to reduce uncertainty. We present an overview of this emerging area of research in systems biology.
Collapse
|
13
|
Uniform sampling of steady-state flux spaces: means to design experiments and to interpret enzymopathies. Biophys J 2005; 87:2172-86. [PMID: 15454420 PMCID: PMC1304643 DOI: 10.1529/biophysj.104.043000] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Reconstruction of genome-scale metabolic networks is now possible using multiple different data types. Constraint-based modeling is an approach to interrogate capabilities of reconstructed networks by constraining possible cellular behavior through the imposition of physicochemical laws. As a result, a steady-state flux space is defined that contains all possible functional states of the network. Uniform random sampling of the steady-state flux space allows for the unbiased appraisal of its contents. Monte Carlo sampling of the steady-state flux space of the reconstructed human red blood cell metabolic network under simulated physiologic conditions yielded the following key results: 1), probability distributions for the values of individual metabolic fluxes showed a wide variety of shapes that could not have been inferred without computation; 2), pairwise correlation coefficients were calculated between all fluxes, determining the level of independence between the measurement of any two fluxes, and identifying highly correlated reaction sets; and 3), the network-wide effects of the change in one (or a few) variables (i.e., a simulated enzymopathy or fixing a flux range based on measurements) were computed. Mathematical models provide the most compact and informative representation of a hypothesis of how a cell works. Thus, understanding model predictions clearly is vital to driving forward the iterative model-building procedure that is at the heart of systems biology. Taken together, the Monte Carlo sampling procedure provides a broadening of the constraint-based approach by allowing for the unbiased and detailed assessment of the impact of the applied physicochemical constraints on a reconstructed network.
Collapse
|