1
|
Lin Y, Qian F, Shen L, Chen F, Chen J, Shen B. Computer-aided biomarker discovery for precision medicine: data resources, models and applications. Brief Bioinform 2020; 20:952-975. [PMID: 29194464 DOI: 10.1093/bib/bbx158] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Revised: 10/17/2017] [Indexed: 12/21/2022] Open
Abstract
Biomarkers are a class of measurable and evaluable indicators with the potential to predict disease initiation and progression. In contrast to disease-associated factors, biomarkers hold the promise to capture the changeable signatures of biological states. With methodological advances, computer-aided biomarker discovery has now become a burgeoning paradigm in the field of biomedical science. In recent years, the 'big data' term has accumulated for the systematical investigation of complex biological phenomena and promoted the flourishing of computational methods for systems-level biomarker screening. Compared with routine wet-lab experiments, bioinformatics approaches are more efficient to decode disease pathogenesis under a holistic framework, which is propitious to identify biomarkers ranging from single molecules to molecular networks for disease diagnosis, prognosis and therapy. In this review, the concept and characteristics of typical biomarker types, e.g. single molecular biomarkers, module/network biomarkers, cross-level biomarkers, etc., are explicated on the guidance of systems biology. Then, publicly available data resources together with some well-constructed biomarker databases and knowledge bases are introduced. Biomarker identification models using mathematical, network and machine learning theories are sequentially discussed. Based on network substructural and functional evidences, a novel bioinformatics model is particularly highlighted for microRNA biomarker discovery. This article aims to give deep insights into the advantages and challenges of current computational approaches for biomarker detection, and to light up the future wisdom toward precision medicine and nation-wide healthcare.
Collapse
Affiliation(s)
- Yuxin Lin
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Fuliang Qian
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Li Shen
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Feifei Chen
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Jiajia Chen
- School of Chemistry, Biology and Material Engineering, Suzhou University of Science and Technology, China
| | - Bairong Shen
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| |
Collapse
|
2
|
Lin Y, Chen J, Shen B. Interactions Between Genetics, Lifestyle, and Environmental Factors for Healthcare. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2017; 1005:167-191. [PMID: 28916933 DOI: 10.1007/978-981-10-5717-5_8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The occurrence and progression of diseases are strongly associated with a combination of genetic, lifestyle, and environmental factors. Understanding the interplay between genetic and nongenetic components provides deep insights into disease pathogenesis and promotes personalized strategies for people healthcare. Recently, the paradigm of systems medicine, which integrates biomedical data and knowledge at multidimensional levels, is considered to be an optimal way for disease management and clinical decision-making in the era of precision medicine. In this chapter, epigenetic-mediated genetics-lifestyle-environment interactions within specific diseases and different ethnic groups are systematically discussed, and data sources, computational models, and translational platforms for systems medicine research are sequentially presented. Moreover, feasible suggestions on precision healthcare and healthy longevity are kindly proposed based on the comprehensive review of current studies.
Collapse
Affiliation(s)
- Yuxin Lin
- Center for Systems Biology, Soochow University, No.1 Shizi Street, Suzhou, Jiangsu, 215006, China
| | - Jiajia Chen
- School of Chemistry, Biology and Materials Engineering, Suzhou University of Science and Technology, No.1 Kerui road, Suzhou, Jiangsu, 215011, China
| | - Bairong Shen
- Center for Systems Biology, Soochow University, No.1 Shizi Street, Suzhou, Jiangsu, 215006, China. .,Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, Jiangsu, 215163, China. .,Medical College of Guizhou University, Guiyang, 550025, China.
| |
Collapse
|
3
|
Brito I, Hupé P, Neuvial P, Barillot E. Stability-based comparison of class discovery methods for DNA copy number profiles. PLoS One 2013; 8:e81458. [PMID: 24339933 PMCID: PMC3855312 DOI: 10.1371/journal.pone.0081458] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2010] [Accepted: 10/22/2013] [Indexed: 11/19/2022] Open
Abstract
MOTIVATION Array-CGH can be used to determine DNA copy number, imbalances in which are a fundamental factor in the genesis and progression of tumors. The discovery of classes with similar patterns of array-CGH profiles therefore adds to our understanding of cancer and the treatment of patients. Various input data representations for array-CGH, dissimilarity measures between tumor samples and clustering algorithms may be used for this purpose. The choice between procedures is often difficult. An evaluation procedure is therefore required to select the best class discovery method (combination of one input data representation, one dissimilarity measure and one clustering algorithm) for array-CGH. Robustness of the resulting classes is a common requirement, but no stability-based comparison of class discovery methods for array-CGH profiles has ever been reported. RESULTS We applied several class discovery methods and evaluated the stability of their solutions, with a modified version of Bertoni's [Formula: see text]-based test [1]. Our version relaxes the assumption of independency required by original Bertoni's [Formula: see text]-based test. We conclude that Minimal Regions of alteration (a concept introduced by [2]) for input data representation, sim [3] or agree [4] for dissimilarity measure and the use of average group distance in the clustering algorithm produce the most robust classes of array-CGH profiles. AVAILABILITY The software is available from http://bioinfo.curie.fr/projects/cgh-clustering. It has also been partly integrated into "Visualization and analysis of array-CGH"(VAMP)[5]. The data sets used are publicly available from ACTuDB [6].
Collapse
Affiliation(s)
- Isabel Brito
- Institut Curie, Paris, France
- INSERM, U900, Paris, France
- Mines ParisTech, Fontainebleau, France
| | - Philippe Hupé
- Institut Curie, Paris, France
- INSERM, U900, Paris, France
- Mines ParisTech, Fontainebleau, France
- CNRS UMR144, Paris, France
| | - Pierre Neuvial
- Laboratoire Statistique & Génome, Université d′Évry Val d′Essonne, UMR CNRS 8071-USC INRA, Évry, France
| | - Emmanuel Barillot
- Institut Curie, Paris, France
- INSERM, U900, Paris, France
- Mines ParisTech, Fontainebleau, France
| |
Collapse
|
4
|
RASOnD-a comprehensive resource and search tool for RAS superfamily oncogenes from various species. BMC Genomics 2011; 12:341. [PMID: 21729256 PMCID: PMC3141677 DOI: 10.1186/1471-2164-12-341] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Accepted: 07/05/2011] [Indexed: 12/30/2022] Open
Abstract
Background The Ras superfamily plays an important role in the control of cell signalling and division. Mutations in the Ras genes convert them into active oncogenes. The Ras oncogenes form a major thrust of global cancer research as they are involved in the development and progression of tumors. This has resulted in the exponential growth of data on Ras superfamily across different public databases and in literature. However, no dedicated public resource is currently available for data mining and analysis on this family. The present database was developed to facilitate straightforward accession, retrieval and analysis of information available on Ras oncogenes from one particular site. Description We have developed the RAS Oncogene Database (RASOnD) as a comprehensive knowledgebase that provides integrated and curated information on a single platform for oncogenes of Ras superfamily. RASOnD encompasses exhaustive genomics and proteomics data existing across diverse publicly accessible databases. This resource presently includes overall 199,046 entries from 101 different species. It provides a search tool to generate information about their nucleotide and amino acid sequences, single nucleotide polymorphisms, chromosome positions, orthologies, motifs, structures, related pathways and associated diseases. We have implemented a number of user-friendly search interfaces and sequence analysis tools. At present the user can (i) browse the data (ii) search any field through a simple or advance search interface and (iii) perform a BLAST search and subsequently CLUSTALW multiple sequence alignment by selecting sequences of Ras oncogenes. The Generic gene browser, GBrowse, JMOL for structural visualization and TREEVIEW for phylograms have been integrated for clear perception of retrieved data. External links to related databases have been included in RASOnD. Conclusions This database is a resource and search tool dedicated to Ras oncogenes. It has utility to cancer biologists and cell molecular biologists as it is a ready source for research, identification and elucidation of the role of these oncogenes. The data generated can be used for understanding the relationship between the Ras oncogenes and their association with cancer. The database updated monthly is freely accessible online at http://202.141.47.181/rasond/ and http://www.aiims.edu/RAS.html.
Collapse
|
5
|
Ostrovnaya I, Olshen AB, Seshan VE, Orlow I, Albertson DG, Begg CB. A metastasis or a second independent cancer? Evaluating the clonal origin of tumors using array copy number data. Stat Med 2011; 29:1608-21. [PMID: 20205270 DOI: 10.1002/sim.3866] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
When a cancer patient develops a new tumor it is necessary to determine if it is a recurrence (metastasis) of the original cancer, or an entirely new occurrence of the disease. This is accomplished by assessing the histo-pathology of the lesions. However, there are many clinical scenarios in which this pathological diagnosis is difficult. Since each tumor is characterized by a distinct pattern of somatic mutations, a more definitive diagnosis is possible in principle in these difficult clinical scenarios by comparing the two patterns. In this article we develop and evaluate a statistical strategy for this comparison when the data are derived from array copy number data, designed to identify all of the somatic allelic gains and losses across the genome. First a segmentation algorithm is used to estimate the regions of allelic gain and loss. The correlation in these patterns between the two tumors is assessed, and this is complemented with more precise quantitative comparisons of each plausibly clonal mutation within individual chromosome arms. The results are combined to determine a likelihood ratio to distinguish clonal tumor pairs (metastases) from independent second primaries. Our data analyses show that in many cases a strong clonal signal emerges. Sensitivity analyses show that most of the diagnoses are robust when the data are of high quality.
Collapse
|
6
|
Genome-wide analysis of cutaneous T-cell lymphomas identifies three clinically relevant classes. J Invest Dermatol 2010; 130:1707-18. [PMID: 20130593 DOI: 10.1038/jid.2010.8] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
This study was undertaken to identify recurrent genetic alterations of the three main types of cutaneous T-cell lymphomas (CTCLs): mycosis fungoides (MF), Sézary syndrome (SS), and cutaneous anaplastic large-cell lymphoma (CALCL). Using array-based comparative genomic hybridization, the molecular cytogenetic profiles of 72 samples obtained from 58 patients with CTCL corresponding to 24 transformed MF (T-MF), 16 SS, and 18 CALCLs were determined. T-MF was characterized by gains of 1q25-31, 7p22-11.2, 7q21, 7q31, and 17q12, and losses of 9p21, 10p11.2, and 10q26. SS exhibited gains of 8q23-24.3 and 17q23-24, as well as losses of 9p21, 10p12-11.2, 10q22-24, 10q25-26, and 17p13-q11.1. Finally, CALCL exhibited 6q27 and 13q34 losses. Such imbalances were statistically associated with one CTCL subtype. Unsupervised hierarchical clustering defined three categories of clinical relevance: (1) CALCL apart from epidermotropic-CTCL, (2) an SS-only category, and (3) a mixed category with T-MF and SS cases, with both primary and secondary SS cases. In rare cases, the genetic classification did not correspond to the inclusion diagnosis, possibly reflecting the association of two diseases in the same patient or initial misdiagnosis according to follow-up. Finally, different samples in the same patient clustered together, showing reproducibility of such a classifier.
Collapse
|
7
|
CNV Workshop: an integrated platform for high-throughput copy number variation discovery and clinical diagnostics. BMC Bioinformatics 2010; 11:74. [PMID: 20132550 PMCID: PMC2827374 DOI: 10.1186/1471-2105-11-74] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Accepted: 02/04/2010] [Indexed: 01/13/2023] Open
Abstract
Background Recent studies have shown that copy number variations (CNVs) are frequent in higher eukaryotes and associated with a substantial portion of inherited and acquired risk for various human diseases. The increasing availability of high-resolution genome surveillance platforms provides opportunity for rapidly assessing research and clinical samples for CNV content, as well as for determining the potential pathogenicity of identified variants. However, few informatics tools for accurate and efficient CNV detection and assessment currently exist. Results We developed a suite of software tools and resources (CNV Workshop) for automated, genome-wide CNV detection from a variety of SNP array platforms. CNV Workshop includes three major components: detection, annotation, and presentation of structural variants from genome array data. CNV detection utilizes a robust and genotype-specific extension of the Circular Binary Segmentation algorithm, and the use of additional detection algorithms is supported. Predicted CNVs are captured in a MySQL database that supports cohort-based projects and incorporates a secure user authentication layer and user/admin roles. To assist with determination of pathogenicity, detected CNVs are also annotated automatically for gene content, known disease loci, and gene-based literature references. Results are easily queried, sorted, filtered, and visualized via a web-based presentation layer that includes a GBrowse-based graphical representation of CNV content and relevant public data, integration with the UCSC Genome Browser, and tabular displays of genomic attributes for each CNV. Conclusions To our knowledge, CNV Workshop represents the first cohesive and convenient platform for detection, annotation, and assessment of the biological and clinical significance of structural variants. CNV Workshop has been successfully utilized for assessment of genomic variation in healthy individuals and disease cohorts and is an ideal platform for coordinating multiple associated projects. Availability and Implementation Available on the web at: http://sourceforge.net/projects/cnv
Collapse
|
8
|
Abstract
The analysis of cancer genomes has benefited from the advances in technology that enable data to be generated on an unprecedented scale, describing a tumour genome's sequence and composition at increasingly high resolution and reducing cost. This progress is likely to increase further over the coming years as next-generation sequencing approaches are applied to the study of cancer genomes, in tandem with large-scale efforts such as the Cancer Genome Atlas and recently announced International Cancer Genome Consortium efforts to complement those already established such as the Sanger Institute Cancer Genome Project. This presents challenges for the cancer researcher and the research community in general, in terms of analysing the data generated in one's own projects and also in coordinating and interrogating data that are publicly available. This review aims to provide a brief overview of some of the main informatics resources currently available and their use, and some of the informatics approaches that may be applied in the study of cancer genomes.
Collapse
Affiliation(s)
- Ian P Barrett
- Cancer Bioscience, AstraZeneca, Macclesfield, Cheshire, UK
| |
Collapse
|
9
|
Abba MC, Lacunza E, Nunez MI, Colussi A, Isla-Larrain M, Segal-Eiras A, Croce MV, Aldaz CM. Rhomboid domain containing 2 (RHBDD2): a novel cancer-related gene over-expressed in breast cancer. Biochim Biophys Acta Mol Basis Dis 2009; 1792:988-97. [PMID: 19616622 DOI: 10.1016/j.bbadis.2009.07.006] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2009] [Revised: 07/03/2009] [Accepted: 07/07/2009] [Indexed: 01/14/2023]
Abstract
In the course of breast cancer global gene expression studies, we identified an uncharacterized gene known as RHBDD2 (Rhomboid domain containing 2) to be markedly over-expressed in primary tumors from patients with recurrent disease. In this study, we identified RHBDD2 mRNA and protein expression significantly elevated in breast carcinomas compared with normal breast samples as analyzed by SAGE (n=46) and immunohistochemistry (n=213). Interestingly, specimens displaying RHBDD2 over-expression were predominantly advanced stage III breast carcinomas (p=0.001). Western-blot, RT-PCR and cDNA sequencing analyses allowed us to identify two RHBDD2 alternatively spliced mRNA isoforms expressed in breast cancer cell lines. We further investigated the occurrence and frequency of gene amplification and over-expression affecting RHBDD2 in 131 breast samples. RHBDD2 gene amplification was detected in 21% of 98 invasive breast carcinomas analyzed. However, no RHBDD2 amplification was detected in normal breast tissues (n=17) or breast benign lesions (n=16) (p=0.014). Interestingly, siRNA-mediated silencing of RHBDD2 expression results in a decrease of MCF7 breast cancer cells proliferation compared with the corresponding controls (p=0.001). In addition, analysis of publicly available gene expression data showed a strong association between high RHBDD2 expression and decreased overall survival (p=0.0023), relapse-free survival (p=0.0013), and metastasis-free interval (p=0.006) in patients with primary ER-negative breast carcinomas. In conclusion, our findings suggest that RHBDD2 over-expression behaves as an indicator of poor prognosis and may play a role facilitating breast cancer progression.
Collapse
Affiliation(s)
- M C Abba
- Centro de Investigaciones Inmunológicas Básicas y Aplicadas (CINIBA), Facultad de Ciencias Médicas, Universidad Nacional de La Plata, Argentina.
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Bollet MA, Servant N, Neuvial P, Decraene C, Lebigot I, Meyniel JP, De Rycke Y, Savignoni A, Rigaill G, Hupé P, Fourquet A, Sigal-Zafrani B, Barillot E, Thiery JP. High-resolution mapping of DNA breakpoints to define true recurrences among ipsilateral breast cancers. J Natl Cancer Inst 2007; 100:48-58. [PMID: 18159071 DOI: 10.1093/jnci/djm266] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND To distinguish new primary breast cancers from true recurrences, pangenomic analyses of DNA copy number alterations (CNAs) using single-nucleotide polymorphism arrays have proven useful. METHODS The pangenomic profiles of 22 pairs of primary breast carcinoma (ductal or lobular) and ipsilateral breast cancers from the same patients were analyzed. Hierarchical clustering was performed using CNAs and DNA breakpoint information. A partial identity score developed using DNA breakpoint information was used to quantify partial identities between two tumors. The nature of ipsilateral breast cancers (true recurrence vs new primary tumor) as defined using the clustering methods and the partial identity score was compared with that based on clinical characteristics. Metastasis-free survival was compared among patients with primary tumors and true recurrences as defined using the partial identity score and by clinical characteristics. All statistical tests were two-sided. RESULTS All methods agreed on the nature of ipsilateral breast cancers for 14 pairs of samples. For five pairs, the clinical definition disagreed with both clustering methods. For three pairs, the two clustering methods were discordant and the one using DNA breakpoints agreed with the clinical definition. The partial identity score confirmed the nature of ipsilateral breast cancers as defined by clustering of DNA breakpoints in 21 of 22 pairs. The difference in metastasis-free survival of patients with new primary tumors and those with true recurrences was not statistically significant when tumors were defined based on clinical and histologic characteristics (5-year metastasis-free survival: 76%, 95% confidence interval [CI] = 52% to 100% for new primary tumors and 38%, 95% CI = 17% to 83% for true recurrences; P = .18; new primary tumor vs true recurrence, hazard ratio = 2.8, 95% CI = 0.6 to 13.7), but the difference was statistically significant when tumors were defined using the partial identity score (5-year metastasis-free survival: 100% for new primary tumors and 29%, 95% CI = 11% to 78% for true recurrences; P = .01). CONCLUSIONS DNA breakpoint information more often agreed with the clinical determination than CNAs in this population. The partial identity score, which was calculated based on DNA breakpoints, allows statistical discrimination between new primary tumors and true recurrences that could outperform the clinical determination in terms of prognosis.
Collapse
Affiliation(s)
- Marc A Bollet
- Département d'oncologie radiothérapie, Institut Curie, 26, rue d'Ulm, 75248 Paris cedex 05, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|