1
|
Pipes L, Nielsen R. AncestralClust: clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees. Bioinformatics 2022; 38:663-670. [PMID: 34668516 PMCID: PMC8756197 DOI: 10.1093/bioinformatics/btab723] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 09/30/2021] [Accepted: 10/15/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences. RESULTS We describe a clustering program AncestralClust, which is developed for clustering divergent sequences. We compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. We show that, in divergent datasets, AncestralClust has higher accuracy and more even cluster sizes than current popular methods. AVAILABILITY AND IMPLEMENTATION AncestralClust is an Open Source program available at https://github.com/lpipes/ancestralclust. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lenore Pipes
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94707, USA
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94707, USA
- Department of Statistics, University of California-Berkeley, Berkeley, CA 94707, USA
- Globe Institute, University of Copenhagen, 1350 København K, Copenhagen, Denmark
| |
Collapse
|
2
|
Tsai NC, Hsu TS, Kuo SC, Kao CT, Hung TH, Lin DG, Yeh CS, Chu CC, Lin JS, Lin HH, Ko CY, Chang TH, Su JC, Lin YCJ. Large-scale data analysis for robotic yeast one-hybrid platforms and multi-disciplinary studies using GateMultiplex. BMC Biol 2021; 19:214. [PMID: 34560855 PMCID: PMC8461970 DOI: 10.1186/s12915-021-01140-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 09/03/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Yeast one-hybrid (Y1H) is a common technique for identifying DNA-protein interactions, and robotic platforms have been developed for high-throughput analyses to unravel the gene regulatory networks in many organisms. Use of these high-throughput techniques has led to the generation of increasingly large datasets, and several software packages have been developed to analyze such data. We previously established the currently most efficient Y1H system, meiosis-directed Y1H; however, the available software tools were not designed for processing the additional parameters suggested by meiosis-directed Y1H to avoid false positives and required programming skills for operation. RESULTS We developed a new tool named GateMultiplex with high computing performance using C++. GateMultiplex incorporated a graphical user interface (GUI), which allows the operation without any programming skills. Flexible parameter options were designed for multiple experimental purposes to enable the application of GateMultiplex even beyond Y1H platforms. We further demonstrated the data analysis from other three fields using GateMultiplex, the identification of lead compounds in preclinical cancer drug discovery, the crop line selection in precision agriculture, and the ocean pollution detection from deep-sea fishery. CONCLUSIONS The user-friendly GUI, fast C++ computing speed, flexible parameter setting, and applicability of GateMultiplex facilitate the feasibility of large-scale data analysis in life science fields.
Collapse
Affiliation(s)
- Ni-Chiao Tsai
- Department of Life Science and Institute of Plant Biology, College of Life Science, National Taiwan University, Taipei, 10617, Taiwan
| | - Tzu-Shu Hsu
- Department of Pharmacy, National Yang Ming Chiao Tung University, Taipei, 11221, Taiwan
| | - Shang-Che Kuo
- Department of Pharmacy, National Yang Ming Chiao Tung University, Taipei, 11221, Taiwan
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, 10617, Taiwan
| | - Chung-Ting Kao
- Department of Life Science and Institute of Plant Biology, College of Life Science, National Taiwan University, Taipei, 10617, Taiwan
| | - Tzu-Huan Hung
- Biotechnology Division, Taiwan Agricultural Research Institute, Taichung, 41362, Taiwan
| | - Da-Gin Lin
- Biotechnology Division, Taiwan Agricultural Research Institute, Taichung, 41362, Taiwan
| | - Chung-Shu Yeh
- Genomics Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | - Chia-Chen Chu
- Department of Life Science and Institute of Plant Biology, College of Life Science, National Taiwan University, Taipei, 10617, Taiwan
| | - Jeng-Shane Lin
- Department of Life Sciences, National Chung Hsing University, Taichung, 40227, Taiwan
| | - Hsin-Hung Lin
- Department of Horticulture and Biotechnology, Chinese Culture University, Taipei, 11114, Taiwan
| | - Chia-Ying Ko
- Department of Life Sciences and Institute of Fisheries Science, National Taiwan University, Taipei, 10617, Taiwan
| | - Tien-Hsien Chang
- Genomics Research Center, Academia Sinica, Taipei, 11529, Taiwan.
| | - Jung-Chen Su
- Department of Pharmacy, National Yang Ming Chiao Tung University, Taipei, 11221, Taiwan.
| | - Ying-Chung Jimmy Lin
- Department of Life Science and Institute of Plant Biology, College of Life Science, National Taiwan University, Taipei, 10617, Taiwan.
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, 10617, Taiwan.
| |
Collapse
|
3
|
Charr JC, Garavito A, Guyeux C, Crouzillat D, Descombes P, Fournier C, Ly SN, Raharimalala EN, Rakotomalala JJ, Stoffelen P, Janssens S, Hamon P, Guyot R. Complex evolutionary history of coffees revealed by full plastid genomes and 28,800 nuclear SNP analyses, with particular emphasis on Coffea canephora (Robusta coffee). Mol Phylogenet Evol 2020; 151:106906. [PMID: 32653553 DOI: 10.1016/j.ympev.2020.106906] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 06/17/2020] [Accepted: 07/06/2020] [Indexed: 11/16/2022]
Abstract
For decades coffees were associated with the genus Coffea. In 2011, the closely related genus Psilanthus was subsumed into Coffea. However, results obtained in 2017-based on 28,800 nuclear SNPs-indicated that there is not substantial phylogenetic support for this incorporation. In addition, a recent study of 16 plastid full-genome sequences highlighted an incongruous placement of Coffea canephora (Robusta coffee) between maternal and nuclear trees. In this study, similar global features of the plastid genomes of Psilanthus and Coffea are observed. In agreement with morphological and physiological traits, the nuclear phylogenetic tree clearly separates Psilanthus from Coffea (with exception to C. rhamnifolia, closer to Psilanthus than to Coffea). In contrast, the maternal molecular tree was incongruent with both morphological and nuclear differentiation, with four main clades observed, two of which include both Psilanthus and Coffea species, and two with either Psilanthus or Coffea species. Interestingly, Coffea and Psilanthus taxa sampled in West and Central Africa are members of the same group. Several mechanisms such as the retention of ancestral polymorphisms due to incomplete lineage sorting, hybridization leading to homoploidy (without chromosome doubling) and alloploidy (for C. arabica) are involved in the evolutionary history of the coffee species. While sharing similar morphological characteristics, the genetic relationships within C. canephora have shown that some populations are well differentiated and genetically isolated. Given the position of its closely-related species, we may also consider C. canephora to be undergoing a long process of speciation with an intermediate step of (sub-)speciation.
Collapse
Affiliation(s)
- Jean-Claude Charr
- Femto-ST Institute, UMR 6174 CNRS, Université de Bourgogne Franche-Comté, France.
| | - Andrea Garavito
- Departamento de Ciencias biológicas, Facultad de Ciencias Exactas y Naturales, Universidad de Caldas, Manizales, Colombia
| | - Christophe Guyeux
- Femto-ST Institute, UMR 6174 CNRS, Université de Bourgogne Franche-Comté, France.
| | | | | | | | - Serigne N Ly
- Institut de Recherche pour le Développement, UMR DIADE, CIRAD, Université de Montpellier, France.
| | | | | | - Piet Stoffelen
- Meise Botanic Garden, Nieuwelaan 38, BE-1860 Meise, Belgium.
| | - Steven Janssens
- Meise Botanic Garden, Nieuwelaan 38, BE-1860 Meise, Belgium.
| | - Perla Hamon
- Institut de Recherche pour le Développement, UMR DIADE, CIRAD, Université de Montpellier, France.
| | - Romain Guyot
- Institut de Recherche pour le Développement, UMR DIADE, CIRAD, Université de Montpellier, France; Department of Electronics and Automatization, Universidad Autónoma de Manizales, Manizales, Colombia.
| |
Collapse
|