1
|
Li T, Li X, Luo W, Cai G. Combined classification and source apportionment analysis for trace elements in western Philippine Sea sediments. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 675:408-419. [PMID: 31030147 DOI: 10.1016/j.scitotenv.2019.04.236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 04/14/2019] [Accepted: 04/15/2019] [Indexed: 06/09/2023]
Abstract
Trace elements have been widely used for classification (of variables and of samples) and source apportionment studies, but the comparison and combination of the two is uncommon in previous works. In this paper, the grouping of trace elements, clustering of samples, and source identification were merged for an integrated understanding of the origin and distribution of trace elements in western Philippine Sea sediments. The grouping and clustering studies were implemented by a nonlinear clustering method called a self-organizing map (SOM), and the source identification was accomplished by a nontraditional factor analysis method called positive matrix factorization (PMF). Through visualization and clustering techniques, the SOM simultaneously classified a database of 26 trace elements into four groups of trace elements and five clusters of samples. Each sample cluster occupies a certain geographic area and is characterized by high concentrations of trace elements that are classified within one or two groups. Five potential sources were identified by PMF, representing the land mass of Taiwan Island, anthropogenic emissions from Taiwan, nutrient exportation from the South China Sea, mineral attachment in the deep ocean, and biogenetic components and riverine inputs from the Luzon Islands. The spatial distributions of the sample clusters are comparable to the ranges of high contributions from the five sources distinguished by PMF. This conclusion was further supported by displaying the PMF outputs on the SOM plane. Furthermore, a corresponding relationship was observed between every factor profile and every trace element group. Our work tests the consistency of the classification (of the trace elements and of the samples) and source identification and improves the application of multiperspective methodology in environmental studies.
Collapse
Affiliation(s)
- Tao Li
- Guangzhou Marine Geological Survey, China Geological Survey, Guangzhou 510760, People's Republic of China.
| | - Xuejie Li
- Guangzhou Marine Geological Survey, China Geological Survey, Guangzhou 510760, People's Republic of China
| | - Weidong Luo
- Guangzhou Marine Geological Survey, China Geological Survey, Guangzhou 510760, People's Republic of China
| | - Guanqiang Cai
- Guangzhou Marine Geological Survey, China Geological Survey, Guangzhou 510760, People's Republic of China
| |
Collapse
|
2
|
Wen JX, Li XQ, Chang Y. Signature Gene Identification of Cancer Occurrence and Pattern Recognition. J Comput Biol 2018; 25:907-916. [PMID: 29957033 DOI: 10.1089/cmb.2017.0261] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
To identify signature genes for the pathogenesis of cancer, which provides a theoretical support for prevention and early diagnosis of cancer. The pattern recognition method was used to analyze the genome-wide gene expression data, which was collected from the The Cancer Genome Atlas (TCGA) database. For the transcription of invasive breast carcinoma, lung adenocarcinoma, lung squamous cell carcinoma, colon adenocarcinoma, renal clear-cell carcinoma, thyroid carcinoma, and hepatocellular carcinoma of the seven cancers, the signature genes were selected by means of a combination of statistical methods, such as correlation, t-test, confidence interval, etc. Modeling by artificial neural network model, the accuracy can be as high as 98% for the TCGA data and as high as 92% for the Gene Expression Omnibus (GEO) independent data, the recognition accuracy of stage I is more than 95%, which is higher compared with the previous study. The common genes emerging in five cancers were obtained from the signature genes of seven cancers, PID1, and SPTBN2. At the same time, we obtain three common pathways of cancer by using Kyoto Encyclopedia of Genes and Genomes' pathway analysis. A functional analysis of the pathways shows their close relationship at the level of gene regulation, which indicted that the identified signature genes play an important role in the pathogenesis of cancer and is very important for understanding the pathogenesis of cancer and the early diagnosis.
Collapse
Affiliation(s)
- Jian-Xin Wen
- College of Life Science and Bioengineering, Beijing University of Technology , Beijing, P.R. China
| | - Xiao-Qin Li
- College of Life Science and Bioengineering, Beijing University of Technology , Beijing, P.R. China
| | - Yu Chang
- College of Life Science and Bioengineering, Beijing University of Technology , Beijing, P.R. China
| |
Collapse
|
3
|
Documenting and predicting topic changes in Computers in Biology and Medicine: A bibliometric keyword analysis from 1990 to 2017. INFORMATICS IN MEDICINE UNLOCKED 2018. [DOI: 10.1016/j.imu.2018.03.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
4
|
Li B, Tian BB, Zhang XL, Zhang XP. Locally linear representation Fisher criterion based tumor gene expressive data classification. Comput Biol Med 2014; 53:48-54. [DOI: 10.1016/j.compbiomed.2014.07.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Revised: 07/18/2014] [Accepted: 07/22/2014] [Indexed: 10/25/2022]
|
5
|
Multi-stage filtering for improving confidence level and determining dominant clusters in clustering algorithms of gene expression data. Comput Biol Med 2013; 43:1120-33. [DOI: 10.1016/j.compbiomed.2013.05.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2010] [Revised: 05/14/2013] [Accepted: 05/15/2013] [Indexed: 02/02/2023]
|
6
|
Abstract
Applications of clustering algorithms in biomedical research are ubiquitous, with typical examples including gene expression data analysis, genomic sequence analysis, biomedical document mining, and MRI image analysis. However, due to the diversity of cluster analysis, the differing terminologies, goals, and assumptions underlying different clustering algorithms can be daunting. Thus, determining the right match between clustering algorithms and biomedical applications has become particularly important. This paper is presented to provide biomedical researchers with an overview of the status quo of clustering algorithms, to illustrate examples of biomedical applications based on cluster analysis, and to help biomedical researchers select the most suitable clustering algorithms for their own applications.
Collapse
Affiliation(s)
- Rui Xu
- Industrial Artificial Intelligence Laboratory, GE Global Research Center, Niskayuna, NY 12309, USA.
| | | |
Collapse
|
7
|
Chattopadhyay M, Dan PK, Mazumdar S. Application of visual clustering properties of self organizing map in machine–part cell formation. Appl Soft Comput 2012. [DOI: 10.1016/j.asoc.2011.11.004] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Zhang J, Zheng CH, Liu JX, Wang HQ. Discovering the transcriptional modules using microarray data by penalized matrix decomposition. Comput Biol Med 2011; 41:1041-50. [PMID: 22001074 DOI: 10.1016/j.compbiomed.2011.09.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2011] [Revised: 08/30/2011] [Accepted: 09/12/2011] [Indexed: 11/25/2022]
Abstract
Uncovering the transcriptional modules with context-specific cellular activities or functions is important for understanding biological network, deciphering regulatory mechanisms and identifying biomarkers. In this paper, we propose to use the penalized matrix decomposition (PMD) to discover the transcriptional modules from microarray data. With the sparsity constraint on the decomposition factors, metagenes can be extracted from the gene expression data and they can well capture the intrinsic patterns of genes with the similar functions. Meanwhile, the PMD factors of each gene are good indicators of the cluster it belongs to. Compared with traditional methods, our method can cluster genes of similar functions but without similar expression profiles. It can also assign a gene into different modules. Moreover, the clustering results by our method are stable and more biologically relevant transcriptional modules can be discovered. Experimental results on two public datasets show that the proposed PMD based method is promising to discover transcriptional modules.
Collapse
Affiliation(s)
- Jun Zhang
- College of Electrical Engineering and Automation, Anhui University, Hefei, Anhui, China
| | | | | | | |
Collapse
|
9
|
Kong W, Mou X, Hu X. Exploring matrix factorization techniques for significant genes identification of Alzheimer's disease microarray gene expression data. BMC Bioinformatics 2011; 12 Suppl 5:S7. [PMID: 21989140 PMCID: PMC3203370 DOI: 10.1186/1471-2105-12-s5-s7] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes. Although biomedical researchers typically design microarray experiments to explore specific biological contexts, the relationships between genes are hard to identified because they are complex and noisy high-dimensional data and are often hindered by low statistical power. The main challenge now is to extract valuable biological information from the colossal amount of data to gain insight into biological processes and the mechanisms of human disease. To overcome the challenge requires mathematical and computational methods that are versatile enough to capture the underlying biological features and simple enough to be applied efficiently to large datasets. METHODS Unsupervised machine learning approaches provide new and efficient analysis of gene expression profiles. In our study, two unsupervised knowledge-based matrix factorization methods, independent component analysis (ICA) and nonnegative matrix factorization (NMF) are integrated to identify significant genes and related pathways in microarray gene expression dataset of Alzheimer's disease. The advantage of these two approaches is they can be performed as a biclustering method by which genes and conditions can be clustered simultaneously. Furthermore, they can group genes into different categories for identifying related diagnostic pathways and regulatory networks. The difference between these two method lies in ICA assume statistical independence of the expression modes, while NMF need positivity constrains to generate localized gene expression profiles. RESULTS In our work, we performed FastICA and non-smooth NMF methods on DNA microarray gene expression data of Alzheimer's disease respectively. The simulation results shows that both of the methods can clearly classify severe AD samples from control samples, and the biological analysis of the identified significant genes and their related pathways demonstrated that these genes play a prominent role in AD and relate the activation patterns to AD phenotypes. It is validated that the combination of these two methods is efficient. CONCLUSIONS Unsupervised matrix factorization methods provide efficient tools to analyze high-throughput microarray dataset. According to the facts that different unsupervised approaches explore correlations in the high-dimensional data space and identify relevant subspace base on different hypotheses, integrating these methods to explore the underlying biological information from microarray dataset is an efficient approach. By combining the significant genes identified by both ICA and NMF, the biological analysis shows great efficient for elucidating the molecular taxonomy of Alzheimer's disease and enable better experimental design to further identify potential pathways and therapeutic targets of AD.
Collapse
Affiliation(s)
- Wei Kong
- Information Engineering College, Shanghai Maritime University, Haigang Ave., Shanghai, 201306, P R China.
| | | | | |
Collapse
|
10
|
Wu T, Jia L, Du R, Tao X, Chen J, Cheng B. Genome-wide analysis reveals the active roles of keratinocytes in oral mucosal adaptive immune response. Exp Biol Med (Maywood) 2011; 236:832-43. [PMID: 21676921 DOI: 10.1258/ebm.2011.010307] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
To elucidate the roles of oral keratinocytes in the adaptive immune response of oral mucosa, global gene expression analysis was performed by microarray technique and integrating computational methods, including hierarchical clustering, biological process Gene Ontology analysis, Kyoto Encyclopedia of Genes and Genomes pathway analysis, self-organizing maps (SOMs) and biological association network analysis (BAN). Raw data from microarray experiments were uploaded to the Gene Expression Omnibus Database, http://www.ncbi.nlm.nih.gov/geo/ (GEO accession GSE28035). We identified 666 differentially expressed genes in the early stage (48 h) and 993 in the late stage (96 h) of the oral mucosal adaptive immune response. The analysis revealed that oral keratinocytes exerted diverse biological functions in different stages of immune response. Specifically, in 48 h the differentially expressed genes encompassed an array of biological ontology associated with immune response, such as antigen processing and presentation, and positive regulation of T-cell-mediated cytotoxicity. Several pathways which have been reported to be critical in inflammation, including mitogen-activated protein kinase pathway, were activated. Furthermore, after BAN construction, some putative hub genes and networks such as interleukin-1α and its subnetwork were recognized. Taken together, these results give substantial evidence to support the active roles of keratinocytes in the oral mucosal adaptive immune response.
Collapse
Affiliation(s)
- Tong Wu
- Department of Oral Medicine, Guanghua School of Stomatology, Sun Yat-Sen University, Guangzhou, China
| | | | | | | | | | | |
Collapse
|
11
|
Zhang L, Zheng Y, Li D, Zhong Y. Self-organizing map of gene regulatory networks for cell phenotypes during reprogramming. Comput Biol Chem 2011; 35:211-7. [PMID: 21864790 DOI: 10.1016/j.compbiolchem.2011.05.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2011] [Revised: 05/04/2011] [Accepted: 05/04/2011] [Indexed: 10/18/2022]
Abstract
The induced pluripotent cells (iPSCs) are derived from somatic cells by reprogramming their genetic profiles. Such a process requires coordinated dynamic expression of hundreds of genes and proteins. As both deterministic and stochastic elements control the reprogramming process, it is not easy to have a way to reflect the status of gene regulatory network in those reprogramming cells. In this study, we applied self-organizing maps (SOMs) on those complex gene expression data from different pluripotent cells, including partially reprogrammed and fully reprogrammed induced pluripotent cells (iPSCs), embryonic stem cells (ESCs), and adult stem cells came from different tissues. We showed that our SOMs have good correlation with the previously reported PluriNet of stem cells and they are pictorial diagrams which can reflect the intrinsic status of cells.
Collapse
Affiliation(s)
- Leping Zhang
- School of Life Sciences, Fudan University, Shanghai, People's Republic of China.
| | | | | | | |
Collapse
|
12
|
Darling EM, Guilak F. A neural network model for cell classification based on single-cell biomechanical properties. Tissue Eng Part A 2009; 14:1507-15. [PMID: 18620486 DOI: 10.1089/ten.tea.2008.0180] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The potential success of tissue engineering or other cell-based therapies is dependent on factors such as the purity and homogeneity of the source cell populations. The ability to enrich cell harvests for specific phenotypes can have significant effects on the overall success of such therapies. While most techniques for cell sorting or enrichment have relied on cell surface markers, recent studies have shown that single-cell mechanical properties can serve as identifying markers of phenotype. In this study, a neural network modeling approach was developed to classify mesenchymal-derived primary and stem cells based on their biomechanical properties. Cell sorting was simulated using previously published data characterizing the mechanical properties of several different cell types as measured by atomic force microscopy. Neural networks were trained using combined data sets, with the resultant groupings analyzed for their purity, efficiency, and enrichment. Heterogeneous populations of zonal chondrocytes, chondrosarcoma cells, and mesenchymal-lineage cells, respectively, could all be classified into enriched subpopulations. Additionally, adult stem cells (adipose-derived or bone marrow-derived) separated disproportionately into nodes associated with the three primary mesenchymal lineages examined. These findings suggest that mathematical approaches such as neural network modeling, in combination with novel measures of cell properties, may provide a means of classifying and eventually sorting mixed populations of cells that are otherwise difficult to identify using more established techniques. In this respect, the identification of biomechanically based cell properties that increase the percentage of stem cells capable of differentiating into predictable lineages may improve the overall success of cell-based therapies.
Collapse
Affiliation(s)
- Eric M Darling
- Department of Surgery, Duke University Medical Center, Durham, North Carolina, USA
| | | |
Collapse
|
13
|
Tsigelny I, Kouznetsova V, Sweeney DE, Wu W, Bush KT, Nigam SK. Analysis of metagene portraits reveals distinct transitions during kidney organogenesis. Sci Signal 2008; 1:ra16. [PMID: 19066399 PMCID: PMC3016920 DOI: 10.1126/scisignal.1163630] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Organogenesis is a multistage process, but it has been difficult, by conventional analysis, to separate stages and identify points of transition in developmentally complex organs or define genetic pathways that regulate pattern formation. We performed a detailed time-series examination of global gene expression during kidney development and then represented the resulting data as self-organizing maps (SOMs), which reduced more than 30,000 genes to 650 metagenes. Further clustering of these maps identified potential stages of development and suggested points of stability and transition during kidney organogenesis that are not obvious from either standard morphological analyses or conventional microarray clustering algorithms. We also performed entropy calculations of SOMs generated for each day of development and found correlations with morphometric parameters and expression of candidate genes that may help in orchestrating the transitions between stages of kidney development, as well as macro- and micropatterning of the organ.
Collapse
Affiliation(s)
- Igor Tsigelny
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093–0505, USA
- San Diego Supercomputer Center, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0505, USA
| | - Valentina Kouznetsova
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
| | - Derina E. Sweeney
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
| | - Wei Wu
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
| | - Kevin T. Bush
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
| | - Sanjay K. Nigam
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
- John and Rebecca Moores UCSD Cancer Center, School of Medicine, University of California, San Diego, La Jolla, CA 92093–0693, USA
| |
Collapse
|
14
|
|