1
|
Abstract
Background Recent advances in genome sequencing technologies and the cost drop in high-throughput sequencing continue to give rise to a deluge of data available for downstream analyses. Among others, evolutionary biologists often make use of genomic data to uncover phenotypic diversity and adaptive evolution in protein-coding genes. Therefore, multiple sequence alignments (MSA) and phylogenetic trees (PT) need to be estimated with optimal results. However, the preparation of an initial dataset of multiple sequence file(s) (MSF) and the steps involved can be challenging when considering extensive amount of data. Thus, it becomes necessary the development of a tool that removes the potential source of error and automates the time-consuming steps of a typical workflow with high-throughput and optimal MSA and PT estimations. Results We introduce LMAP_S (Lightweight Multigene Alignment and Phylogeny eStimation), a user-friendly command-line and interactive package, designed to handle an improved alignment and phylogeny estimation workflow: MSF preparation, MSA estimation, outlier detection, refinement, consensus, phylogeny estimation, comparison and editing, among which file and directory organization, execution, manipulation of information are automated, with minimal manual user intervention. LMAP_S was developed for the workstation multi-core environment and provides a unique advantage for processing multiple datasets. Our software, proved to be efficient throughout the workflow, including, the (unlimited) handling of more than 20 datasets. Conclusions We have developed a simple and versatile LMAP_S package enabling researchers to effectively estimate multiple datasets MSAs and PTs in a high-throughput fashion. LMAP_S integrates more than 25 software providing overall more than 65 algorithm choices distributed in five stages. At minimum, one FASTA file is required within a single input directory. To our knowledge, no other software combines MSA and phylogeny estimation with as many alternatives and provides means to find optimal MSAs and phylogenies. Moreover, we used a case study comparing methodologies that highlighted the usefulness of our software. LMAP_S has been developed as an open-source package, allowing its integration into more complex open-source bioinformatics pipelines. LMAP_S package is released under GPLv3 license and is freely available at https://lmap-s.sourceforge.io/.
Collapse
Affiliation(s)
- Emanuel Maldonado
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal
| | - Agostinho Antunes
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Porto, Portugal. .,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal.
| |
Collapse
|
2
|
Xiang N, Li Q, Shi Z, Zhou C, Jiang F, Han Y, Ni Z. Low-cost multi-core inertial microfluidic centrifuge for high-throughput cell concentration. Electrophoresis 2019; 41:875-882. [PMID: 31705675 DOI: 10.1002/elps.201900385] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 10/25/2019] [Accepted: 11/05/2019] [Indexed: 12/12/2022]
Abstract
We developed a low-cost multi-core inertial microfluidic centrifuge (IM-centrifuge) to achieve a continuous-flow cell/particle concentration at a throughput of up to 20 mL/min. To lower the cost of our IM-centrifuge, we clamped a disposable multilayer film-based inertial microfluidic (MFIM) chip with two reusable plastic housings. The key MFIM chip was fabricated in low-cost materials by stacking different polymer-film channel layers and double-sided tape. To increase processing throughput, multiplexing spiral inertial microfluidic channels were integrated within an all-in-one MFIM chip, and a novel sample distribution strategy was employed to equally distribute the sample into each channel layer. Then, we characterized the focusing performance in the MFIM chip over a wide flow-rate range. The experimental results showed that our IM-centrifuge was able to focus various-sized particles/cells to achieve volume reduction. The sample distribution strategy also effectively ensured identical focusing and concentration performances in different cores. Finally, our IM-centrifuge was successfully applied to concentrate microalgae cells with irregular shapes and highly polydisperse sizes. Thus, our IM-centrifuge holds the potential to be employed as a low-cost, high-throughput centrifuge for disposable use in low-resource settings.
Collapse
Affiliation(s)
- Nan Xiang
- School of Mechanical Engineering, and Jiangsu Key Laboratory for Design and Manufacture of Micro-Nano Biomedical Instruments, Southeast University, Nanjing, P. R. China
| | - Qiao Li
- School of Mechanical Engineering, and Jiangsu Key Laboratory for Design and Manufacture of Micro-Nano Biomedical Instruments, Southeast University, Nanjing, P. R. China
| | - Zhiguo Shi
- School of Mechanical Engineering, and Jiangsu Key Laboratory for Design and Manufacture of Micro-Nano Biomedical Instruments, Southeast University, Nanjing, P. R. China
| | - Chenguang Zhou
- School of Mechanical Engineering, and Jiangsu Key Laboratory for Design and Manufacture of Micro-Nano Biomedical Instruments, Southeast University, Nanjing, P. R. China
| | - Fengtao Jiang
- School of Mechanical Engineering, and Jiangsu Key Laboratory for Design and Manufacture of Micro-Nano Biomedical Instruments, Southeast University, Nanjing, P. R. China
| | - Yu Han
- School of Mechanical Engineering, and Jiangsu Key Laboratory for Design and Manufacture of Micro-Nano Biomedical Instruments, Southeast University, Nanjing, P. R. China
| | - Zhonghua Ni
- School of Mechanical Engineering, and Jiangsu Key Laboratory for Design and Manufacture of Micro-Nano Biomedical Instruments, Southeast University, Nanjing, P. R. China
| |
Collapse
|
3
|
Jain AN, Cleves AE, Gao Q, Wang X, Liu Y, Sherer EC, Reibarkh MY. Complex macrocycle exploration: parallel, heuristic, and constraint-based conformer generation using ForceGen. J Comput Aided Mol Des 2019; 33:531-58. [PMID: 31054028 DOI: 10.1007/s10822-019-00203-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 04/20/2019] [Indexed: 11/24/2022]
Abstract
ForceGen is a template-free, non-stochastic approach for 2D to 3D structure generation and conformational elaboration for small molecules, including both non-macrocycles and macrocycles. For conformational search of non-macrocycles, ForceGen is both faster and more accurate than the best of all tested methods on a very large, independently curated benchmark of 2859 PDB ligands. In this study, the primary results are on macrocycles, including results for 431 unique examples from four separate benchmarks. These include complex peptide and peptide-like cases that can form networks of internal hydrogen bonds. By making use of new physical movements (“flips” of near-linear sub-cycles and explicit formation of hydrogen bonds), ForceGen exhibited statistically significantly better performance for overall RMS deviation from experimental coordinates than all other approaches. The algorithmic approach offers natural parallelization across multiple computing-cores. On a modest multi-core workstation, for all but the most complex macrocycles, median wall-clock times were generally under a minute in fast search mode and under 2 min using thorough search. On the most complex cases (roughly cyclic decapeptides and larger) explicit exploration of likely hydrogen bonding networks yielded marked improvements, but with calculation times increasing to several minutes and in some cases to roughly an hour for fast search. In complex cases, utilization of NMR data to constrain conformational search produces accurate conformational ensembles representative of solution state macrocycle behavior. On macrocycles of typical complexity (up to 21 rotatable macrocyclic and exocyclic bonds), design-focused macrocycle optimization can be practically supported by computational chemistry at interactive time-scales, with conformational ensemble accuracy equaling what is seen with non-macrocyclic ligands. For more complex macrocycles, inclusion of sparse biophysical data is a helpful adjunct to computation.
Collapse
|
4
|
Maldonado E, Almeida D, Escalona T, Khan I, Vasconcelos V, Antunes A. LMAP: Lightweight Multigene Analyses in PAML. BMC Bioinformatics 2016; 17:354. [PMID: 27597435 PMCID: PMC5011788 DOI: 10.1186/s12859-016-1204-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 08/24/2016] [Indexed: 12/22/2022] Open
Abstract
Background Uncovering how phenotypic diversity arises and is maintained in nature has long been a major interest of evolutionary biologists. Recent advances in genome sequencing technologies have remarkably increased the efficiency to pinpoint genes involved in the adaptive evolution of phenotypes. Reliability of such findings is most often examined with statistical and computational methods using Maximum Likelihood codon-based models (i.e., site, branch, branch-site and clade models), such as those available in codeml from the Phylogenetic Analysis by Maximum Likelihood (PAML) package. While these models represent a well-defined workflow for documenting adaptive evolution, in practice they can be challenging for researchers having a vast amount of data, as multiple types of relevant codon-based datasets are generated, making the overall process hard and tedious to handle, error-prone and time-consuming. Results We introduce LMAP (Lightweight Multigene Analyses in PAML), a user-friendly command-line and interactive package, designed to handle the codeml workflow, namely: directory organization, execution, results gathering and organization for Likelihood Ratio Test estimations with minimal manual user intervention. LMAP was developed for the workstation multi-core environment and provides a unique advantage for processing one, or more, if not all codeml codon-based models for multiple datasets at a time. Our software, proved efficiency throughout the codeml workflow, including, but not limited, to simultaneously handling more than 20 datasets. Conclusions We have developed a simple and versatile LMAP package, with outstanding performance, enabling researchers to analyze multiple different codon-based datasets in a high-throughput fashion. At minimum, two file types are required within a single input directory: one for the multiple sequence alignment and another for the phylogenetic tree. To our knowledge, no other software combines all codeml codon substitution models of adaptive evolution. LMAP has been developed as an open-source package, allowing its integration into more complex open-source bioinformatics pipelines. LMAP package is released under GPLv3 license and is freely available at http://lmapaml.sourceforge.net/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1204-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Emanuel Maldonado
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal
| | - Daniela Almeida
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal.,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Tibisay Escalona
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal.,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Imran Khan
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal.,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Vitor Vasconcelos
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal.,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Agostinho Antunes
- CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal. .,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal.
| |
Collapse
|
5
|
Abstract
Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.
Collapse
Affiliation(s)
- Mohammad Mahdi Jaghoori
- />Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
| | - Boris Bleijlevens
- />Department of Medical Biochemistry, Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
| | - Silvia D. Olabarriaga
- />Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
| |
Collapse
|