1
|
Li Y, Ma B, Hua K, Gong H, He R, Luo R, Bi D, Zhou R, Langford PR, Jin H. PPNet: Identifying Functional Association Networks by Phylogenetic Profiling of Prokaryotic Genomes. Microbiol Spectr 2023; 11:e0387122. [PMID: 36602356 PMCID: PMC9927313 DOI: 10.1128/spectrum.03871-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 12/01/2022] [Indexed: 01/06/2023] Open
Abstract
Identification of microbial functional association networks allows interpretation of biological phenomena and a greater understanding of the molecular basis of pathogenicity and also underpins the formulation of control measures. Here, we describe PPNet, a tool that uses genome information and analysis of phylogenetic profiles with binary similarity and distance measures to derive large-scale bacterial gene association networks of a single species. As an exemplar, we have derived a functional association network in the pig pathogen Streptococcus suis using 81 binary similarity and dissimilarity measures which demonstrates excellent performance based on the area under the receiver operating characteristic (AUROC), the area under the precision-recall (AUPR), and a derived overall scoring method. Selected network associations were validated experimentally by using bacterial two-hybrid experiments. We conclude that PPNet, a publicly available (https://github.com/liyangjie/PPNet), can be used to construct microbial association networks from easily acquired genome-scale data. IMPORTANCE This study developed PPNet, the first tool that can be used to infer large-scale bacterial functional association networks of a single species. PPNet includes a method for assigning the uniqueness of a bacterial strain using the average nucleotide identity and the average nucleotide coverage. PPNet collected 81 binary similarity and distance measures for phylogenetic profiling and then evaluated and divided them into four groups. PPNet can effectively capture gene networks that are functionally related to phenotype from publicly prokaryotic genomes, as well as provide valuable results for downstream analysis and experiment testing.
Collapse
Affiliation(s)
- Yangjie Li
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Bin Ma
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Kexin Hua
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Huimin Gong
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Rongrong He
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Rui Luo
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Dingren Bi
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Rui Zhou
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Paul R. Langford
- Section of Paediatric Infectious Disease, Imperial College London, St Mary’s Campus, London, United Kingdom
| | - Hui Jin
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
- Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
2
|
Comparative Analysis of Binary Similarity Measures for Compound Identification in MassSpectrometry-Based Metabolomics. Metabolites 2022; 12:metabo12080694. [PMID: 35893261 PMCID: PMC9394311 DOI: 10.3390/metabo12080694] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 07/22/2022] [Accepted: 07/26/2022] [Indexed: 02/01/2023] Open
Abstract
Compound identification is a critical step in untargeted metabolomics. Its most important procedure is to calculate the similarity between experimental mass spectra and either predicted mass spectra or mass spectra in a mass spectral library. Unlike the continuous similarity measures, there is no study to assess the performance of binary similarity measures in compound identification, even though the well-known Jaccard similarity measure has been widely used without proper evaluation. The objective of this study is thus to evaluate the performance of binary similarity measures for compound identification in untargeted metabolomics. Fifteen binary similarity measures, including the well-known Jaccard, Dice, Sokal–Sneath, Cosine, and Simpson measures, were selected to assess their performance in compound identification. using both electron ionization (EI) and electrospray ionization (ESI) mass spectra. Our theoretical evaluations show that the accuracy of the compound identification was exactly the same between the Jaccard, Dice, 3W-Jaccard, Sokal–Sneath, and Kulczynski measures, between the Cosine and Hellinger measures, and between the McConnaughey and Driver–Kroeber measures, which were practically confirmed using mass spectra libraries. From the mass spectrum-based evaluation, we observed that the best performing similarity measures were the McConnaughey and Driver–Kroeber measures for EI mass spectra and the Cosine and Hellinger measures for ESI mass spectra. The most robust similarity measure was the Fager–McGowan measure, the second-best performing similarity measure in both EI and ESI mass spectra.
Collapse
|
4
|
A comparison of 71 binary similarity coefficients: The effect of base rates. PLoS One 2021; 16:e0247751. [PMID: 33826612 PMCID: PMC8026075 DOI: 10.1371/journal.pone.0247751] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 02/13/2021] [Indexed: 11/23/2022] Open
Abstract
There are many psychological applications that require collapsing the information in a two-mode (e.g., respondents-by-attributes) binary matrix into a one-mode (e.g., attributes-by-attributes) similarity matrix. This process requires the selection of a measure of similarity between binary attributes. A vast number of binary similarity coefficients have been proposed in fields such as biology, geology, and ecology. Although previous studies have reported cluster analyses of binary similarity coefficients, there has been little exploration of how cluster memberships are affected by the base rates (percentage of ones) for the binary attributes. We conducted a simulation experiment that compared two-cluster K-median partitions of 71 binary similarity coefficients based on their pairwise correlations obtained under 15 different base-rate configurations. The results reveal that some subsets of coefficients consistently group together regardless of the base rates. However, there are other subsets of coefficients that group together for some base rates, but not for others.
Collapse
|
5
|
Predicting Forest Cover in Distinct Ecosystems: The Potential of Multi-Source Sentinel-1 and -2 Data Fusion. REMOTE SENSING 2020. [DOI: 10.3390/rs12020302] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The fusion of microwave and optical data sets is expected to provide great potential for the derivation of forest cover around the globe. As Sentinel-1 and Sentinel-2 are now both operating in twin mode, they can provide an unprecedented data source to build dense spatial and temporal high-resolution time series across a variety of wavelengths. This study investigates (i) the ability of the individual sensors and (ii) their joint potential to delineate forest cover for study sites in two highly varied landscapes located in Germany (temperate dense mixed forests) and South Africa (open savanna woody vegetation and forest plantations). We used multi-temporal Sentinel-1 and single time steps of Sentinel-2 data in combination to derive accurate forest/non-forest (FNF) information via machine-learning classifiers. The forest classification accuracies were 90.9% and 93.2% for South Africa and Thuringia, respectively, estimated while using autocorrelation corrected spatial cross-validation (CV) for the fused data set. Sentinel-1 only classifications provided the lowest overall accuracy of 87.5%, while Sentinel-2 based classifications led to higher accuracies of 91.9%. Sentinel-2 short-wave infrared (SWIR) channels, biophysical parameters (Leaf Area Index (LAI), and Fraction of Absorbed Photosynthetically Active Radiation (FAPAR)) and the lower spectrum of the Sentinel-1 synthetic aperture radar (SAR) time series were found to be most distinctive in the detection of forest cover. In contrast to homogenous forests sites, Sentinel-1 time series information improved forest cover predictions in open savanna-like environments with heterogeneous regional features. The presented approach proved to be robust and it displayed the benefit of fusing optical and SAR data at high spatial resolution.
Collapse
|