1
|
Abstract
Background:
The evolutionary history of organisms can be described by phylogenetic
trees. We need to compare the topologies of rooted phylogenetic trees when researching the
evolution of a given set of species.
Objective:
Up to now, there are several metrics measuring the dissimilarity between rooted
phylogenetic trees, and those metrics are defined by different ways.
Methods:
This paper analyzes those metrics from their definitions and the distance values
computed by those metrics by terms of experiments.
Results:
The results of experiments show that the distances calculated by the cluster metric, the
partition metric, and the equivalent metric have a good Gaussian fitting, and the equivalent metric
can describe the difference between trees better than the others.
Conclusion:
Moreover, it presents a tool called as CDRPT (Computing Distance for Rooted
Phylogenetic Trees). CDRPT is a web server to calculate the distance for trees by an on-line way.
CDRPT can also be off-line used by means of installing application packages for the Windows
system. It greatly facilitates the use of researchers. The home page of CDRPT is
http://bioinformatics.imu.edu.cn/tree/.
Collapse
Affiliation(s)
- Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Xinyue Qi
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Bo Cui
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
2
|
Li J, Chang M, Gao Q, Song X, Gao Z. Lung Cancer Classification and Gene Selection by Combining Affinity Propagation Clustering and Sparse Group Lasso. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017103557] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Background:
Cancer threatens human health seriously. Diagnosing cancer via gene expression
analysis is a hot topic in cancer research.
Objective:
The study aimed to diagnose the accurate type of lung cancer and discover the pathogenic
genes.
Methods:
In this study, Affinity Propagation (AP) clustering with similarity score was employed
to each type of lung cancer and normal lung. After grouping genes, sparse group lasso was adopted
to construct four binary classifiers and the voting strategy was used to integrate them.
Results:
This study screened six gene groups that may associate with different lung cancer subtypes
among 73 genes groups, and identified three possible key pathogenic genes, KRAS, BRAF
and VDR. Furthermore, this study achieved improved classification accuracies at minority classes
SQ and COID in comparison with other four methods.
Conclusion:
We propose the AP clustering based sparse group lasso (AP-SGL), which provides
an alternative for simultaneous diagnosis and gene selection for lung cancer.
Collapse
Affiliation(s)
- Juntao Li
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Mingming Chang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Qinghui Gao
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Xuekun Song
- School of Information Technology, Henan University of Chinese Medicine, Zhengzhou, 450046, China
| | - Zhiyu Gao
- School of Information Technology, Henan University of Chinese Medicine, Zhengzhou, 450046, China
| |
Collapse
|
3
|
McClure RS, Wendler JP, Adkins JN, Swanstrom J, Baric R, Kaiser BLD, Oxford KL, Waters KM, McDermott JE. Unified feature association networks through integration of transcriptomic and proteomic data. PLoS Comput Biol 2019; 15:e1007241. [PMID: 31527878 PMCID: PMC6748406 DOI: 10.1371/journal.pcbi.1007241] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 07/02/2019] [Indexed: 11/18/2022] Open
Abstract
High-throughput multi-omics studies and corresponding network analyses of multi-omic data have rapidly expanded their impact over the last 10 years. As biological features of different types (e.g. transcripts, proteins, metabolites) interact within cellular systems, the greatest amount of knowledge can be gained from networks that incorporate multiple types of -omic data. However, biological and technical sources of variation diminish the ability to detect cross-type associations, yielding networks dominated by communities comprised of nodes of the same type. We describe here network building methods that can maximize edges between nodes of different data types leading to integrated networks, networks that have a large number of edges that link nodes of different-omic types (transcripts, proteins, lipids etc). We systematically rank several network inference methods and demonstrate that, in many cases, using a random forest method, GENIE3, produces the most integrated networks. This increase in integration does not come at the cost of accuracy as GENIE3 produces networks of approximately the same quality as the other network inference methods tested here. Using GENIE3, we also infer networks representing antibody-mediated Dengue virus cell invasion and receptor-mediated Dengue virus invasion. A number of functional pathways showed centrality differences between the two networks including genes responding to both GM-CSF and IL-4, which had a higher centrality value in an antibody-mediated vs. receptor-mediated Dengue network. Because a biological system involves the interplay of many different types of molecules, incorporating multiple data types into networks will improve their use as models of biological systems. The methods explored here are some of the first to specifically highlight and address the challenges associated with how such multi-omic networks can be assembled and how the greatest number of interactions can be inferred from different data types. The resulting networks can lead to the discovery of new host response patterns and interactions during viral infection, generate new hypotheses of pathogenic mechanisms and confirm mechanisms of disease.
Collapse
Affiliation(s)
- Ryan S. McClure
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Jason P. Wendler
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Joshua N. Adkins
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Jesica Swanstrom
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States of America
| | - Ralph Baric
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States of America
| | - Brooke L. Deatherage Kaiser
- Signatures Science and Technology Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Kristie L. Oxford
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Katrina M. Waters
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Jason E. McDermott
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
- Department of Molecular Microbiology and Immunology, Oregon Health & Sciences University, Portland, OR, United States of America
| |
Collapse
|