Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Zheng Z, Zhu M, Zhang J, Liu X, Hou L, Liu W, Yuan S, Luo C, Yao X, Liu J, Yang Y. A sequence-aware merger of genomic structural variations at population scale. Nat Commun 2024;15:960. [PMID: 38307885 PMCID: PMC10837428 DOI: 10.1038/s41467-024-45244-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 01/18/2024] [Indexed: 02/04/2024] Open

Niu M, Wang C, Chen Y, Zou Q, Xu L. Identification, characterization and expression analysis of circRNA encoded by SARS-CoV-1 and SARS-CoV-2. Brief Bioinform 2024;25:bbad537. [PMID: 38279648 PMCID: PMC10818166 DOI: 10.1093/bib/bbad537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/12/2023] [Accepted: 12/22/2023] [Indexed: 01/28/2024] Open

Zulfiqar H, Guo Z, Ahmad RM, Ahmed Z, Cai P, Chen X, Zhang Y, Lin H, Shi Z. Deep-STP: a deep learning-based approach to predict snake toxin proteins by using word embeddings. Front Med (Lausanne) 2024;10:1291352. [PMID: 38298505 PMCID: PMC10829051 DOI: 10.3389/fmed.2023.1291352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 12/26/2023] [Indexed: 02/02/2024] Open

Pang Y, Liu B. DisoFLAG: accurate prediction of protein intrinsic disorder and its functions using graph-based interaction protein language model. BMC Biol 2024;22:3. [PMID: 38166858 PMCID: PMC10762911 DOI: 10.1186/s12915-023-01803-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 12/15/2023] [Indexed: 01/05/2024] Open

Abstract

Intrinsically disordered proteins and regions (IDPs/IDRs) are functionally important proteins and regions that exist as highly dynamic conformations under natural physiological conditions. IDPs/IDRs exhibit a broad range of molecular functions, and their functions involve binding interactions with partners and remaining native structural flexibility. The rapid increase in the number of proteins in sequence databases and the diversity of disordered functions challenge existing computational methods for predicting protein intrinsic disorder and disordered functions. A disordered region interacts with different partners to perform multiple functions, and these disordered functions exhibit different dependencies and correlations. In this study, we introduce DisoFLAG, a computational method that leverages a graph-based interaction protein language model (GiPLM) for jointly predicting disorder and its multiple potential functions. GiPLM integrates protein semantic information based on pre-trained protein language models into graph-based interaction units to enhance the correlation of the semantic representation of multiple disordered functions. The DisoFLAG predictor takes amino acid sequences as the only inputs and provides predictions of intrinsic disorder and six disordered functions for proteins, including protein-binding, DNA-binding, RNA-binding, ion-binding, lipid-binding, and flexible linker. We evaluated the predictive performance of DisoFLAG following the Critical Assessment of protein Intrinsic Disorder (CAID) experiments, and the results demonstrated that DisoFLAG offers accurate and comprehensive predictions of disordered functions, extending the current coverage of computationally predicted disordered function categories. The standalone package and web server of DisoFLAG have been established to provide accurate prediction tools for intrinsic disorders and their associated functions.

Collapse

Zhang P, Liu H, Wei Y, Zhai Y, Tian Q, Zou Q. FMAlign2: a novel fast multiple nucleotide sequence alignment method for ultralong datasets. Bioinformatics 2024;40:btae014. [PMID: 38200554 PMCID: PMC10809904 DOI: 10.1093/bioinformatics/btae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 12/27/2023] [Accepted: 01/09/2024] [Indexed: 01/12/2024] Open

Mahmoud MAB. Classification of DNA Sequence Based on a Non-gradient Algorithm: Pseudoinverse Learners. Methods Mol Biol 2024;2744:359-373. [PMID: 38683331 DOI: 10.1007/978-1-0716-3581-0_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]

Chen J, Chao J, Liu H, Yang F, Zou Q, Tang F. WMSA 2: a multiple DNA/RNA sequence alignment tool implemented with accurate progressive mode and a fast win-win mode combining the center star and progressive strategies. Brief Bioinform 2023:7169135. [PMID: 37200156 DOI: 10.1093/bib/bbad190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/09/2023] [Accepted: 04/26/2023] [Indexed: 05/20/2023] Open

Wei Y, Zou Q, Tang F, Yu L. WMSA: a novel method for multiple sequence alignment of DNA sequences. Bioinformatics 2022;38:5019-5025. [PMID: 36179076 DOI: 10.1093/bioinformatics/btac658] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/30/2022] [Accepted: 09/29/2022] [Indexed: 12/24/2022] Open

Zhang Y, Zhang Q, Liu Y, Lin M, Ding C. Multiple Sequence Alignment based on deep Q Network with negative feedback policy. Comput Biol Chem 2022;101:107780. [DOI: 10.1016/j.compbiolchem.2022.107780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 09/27/2022] [Accepted: 10/18/2022] [Indexed: 11/28/2022]

Tang F, Chao J, Wei Y, Yang F, Zhai Y, Xu L, Zou Q. HAlign 3: fast multiple alignment of ultra-large numbers of similar DNA/RNA sequences. Mol Biol Evol 2022;39:6653123. [PMID: 35915051 PMCID: PMC9372455 DOI: 10.1093/molbev/msac166] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Karabacak M, Ozkara BB, Mordag S, Bisdas S. Deep learning for prediction of isocitrate dehydrogenase mutation in gliomas: a critical approach, systematic review and meta-analysis of the diagnostic test performance using a Bayesian approach. Quant Imaging Med Surg 2022;12:4033-4046. [PMID: 35919062 PMCID: PMC9338374 DOI: 10.21037/qims-22-34] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 05/25/2022] [Indexed: 11/08/2022]

Abstract

Background

Conventionally, identifying isocitrate dehydrogenase (IDH) mutation in gliomas is based on histopathological analysis of tissue specimens acquired via stereotactic biopsy or definitive resection. Accurate pre-treatment prediction of IDH mutation status using magnetic resonance imaging (MRI) can guide clinical decision-making. We aim to evaluate the diagnostic performance of deep learning (DL) to determine IDH mutation status in gliomas.

Methods

A systematic search of Cochrane Library, Web of Science, Medline, and Scopus was conducted to identify relevant publications until August 1, 2021. Articles were included if all the following criteria were met: (I) patients with histopathologically confirmed World Health Organization (WHO) grade II, III, or IV gliomas; (II) histopathological examination with the IDH mutation; (III) DL was used to predict the IDH mutation status; (IV) sufficient data for reconstruction of confusion matrices in terms of the diagnostic performance of the DL algorithms; and (V) original research articles. Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) and Checklist for Artificial Intelligence in Medical Imaging (CLAIM) was used to assess the studies' quality. Bayes theorem was utilized to calculate the posttest probability.

Results

Four studies with a total of 1,295 patients were included. In the training set, the pooled sensitivity, specificity, and area under the summary receiver operating characteristic (SROC) curve were 93.9%, 90.9% and 0.958, respectively. In the validation set, the pooled sensitivity, specificity, and area under the SROC curve were 90.8%, 85.5% and 0.939, respectively. With a known pretest probability of 80.2%, the Bayes theorem yielded a posttest probability of 97.6% and 96.0% for a positive test and 27.0% and 30.6% for a negative test for training sets and validation sets, respectively.

Discussion

This is the first meta-analysis that summarizes the diagnostic performance of DL in predicting IDH mutation status in gliomas via the Bayes theorem. DL algorithms demonstrate excellent diagnostic performance in predicting IDH mutation in gliomas. Radiomic features associated with IDH mutation, and its underlying pathophysiology extracted from advanced MRI may improve prediction probability. However, more studies are required to optimize and increase its reliability. Limitations include obtaining some data via email and lack of training and test sets statistics.

Collapse

Pang Y, Liu B. SelfAT-Fold: Protein Fold Recognition Based on Residue-Based and Motif-Based Self-Attention Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:1861-1869. [PMID: 33090951 DOI: 10.1109/tcbb.2020.3031888] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Chao J, Tang F, Xu L. Developments in Algorithms for Sequence Alignment: A Review. Biomolecules 2022;12:biom12040546. [PMID: 35454135 PMCID: PMC9024764 DOI: 10.3390/biom12040546] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/29/2022] [Accepted: 03/31/2022] [Indexed: 01/27/2023] Open

Zhang H, Zou Q, Ju Y, Song C, Chen D. Distance-based support vector machine to predict DNA N6-methyladenine modification. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220404145517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Wang Z, Tan J, Long Y, Liu Y, Lei W, Cai J, Yang Y, Liu Z. SaAlign: Multiple DNA/RNA Sequence Alignment and Phylogenetic Tree Construction Tool for Ultra-large Datasets and Ultra-long Sequences Based on Suffix Array. Comput Struct Biotechnol J 2022;20:1487-1493. [PMID: 35422971 PMCID: PMC8976100 DOI: 10.1016/j.csbj.2022.03.018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 03/09/2022] [Accepted: 03/19/2022] [Indexed: 11/03/2022] Open

Chen J, Li F, Wang M, Li J, Marquez-Lago TT, Leier A, Revote J, Li S, Liu Q, Song J. BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data. Front Big Data 2022;4:727216. [PMID: 35118375 PMCID: PMC8805145 DOI: 10.3389/fdata.2021.727216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 12/13/2021] [Indexed: 11/22/2022] Open

Abstract

Background

Simple Sequence Repeats (SSRs) are short tandem repeats of nucleotide sequences. It has been shown that SSRs are associated with human diseases and are of medical relevance. Accordingly, a variety of computational methods have been proposed to mine SSRs from genomes. Conventional methods rely on a high-quality complete genome to identify SSRs. However, the sequenced genome often misses several highly repetitive regions. Moreover, many non-model species have no entire genomes. With the recent advances of next-generation sequencing (NGS) techniques, large-scale sequence reads for any species can be rapidly generated using NGS. In this context, a number of methods have been proposed to identify thousands of SSR loci within large amounts of reads for non-model species. While the most commonly used NGS platforms (e.g., Illumina platform) on the market generally provide short paired-end reads, merging overlapping paired-end reads has become a common way prior to the identification of SSR loci. This has posed a big data analysis challenge for traditional stand-alone tools to merge short read pairs and identify SSRs from large-scale data.

Results

In this study, we present a new Hadoop-based software program, termed BigFiRSt, to address this problem using cutting-edge big data technology. BigFiRSt consists of two major modules, BigFLASH and BigPERF, implemented based on two state-of-the-art stand-alone tools, FLASH and PERF, respectively. BigFLASH and BigPERF address the problem of merging short read pairs and mining SSRs in the big data manner, respectively. Comprehensive benchmarking experiments show that BigFiRSt can dramatically reduce the execution times of fast read pairs merging and SSRs mining from very large-scale DNA sequence data.

Conclusions

The excellent performance of BigFiRSt mainly resorts to the Big Data Hadoop technology to merge read pairs and mine SSRs in parallel and distributed computing on clusters. We anticipate BigFiRSt will be a valuable tool in the coming biological Big Data era.

Collapse

Affiliation(s)

Jinxiang Chen Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
Fuyi Li Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia Monash Centre for Data Science, Monash University, Melbourne, VIC, Australia Department of Microbiology and Immunity, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC, Australia
Miao Wang Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
Junlong Li Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
Tatiana T. Marquez-Lago Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
André Leier Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
Jerico Revote Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
Shuqin Li Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China
Quanzhong Liu Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China Quanzhong Liu
Jiangning Song Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia Monash Centre for Data Science, Monash University, Melbourne, VIC, Australia *Correspondence: Jiangning Song

Collapse

Wang Y, Wu J, Yan J, Guo M, Xu L, Hou L, Zou Q. Comparative genome analysis of plant ascomycete fungal pathogens with different lifestyles reveals distinctive virulence strategies. BMC Genomics 2022;23:34. [PMID: 34996360 PMCID: PMC8740420 DOI: 10.1186/s12864-021-08165-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/10/2021] [Indexed: 12/21/2022] Open

Yan K, Lv H, Guo Y, Wen J, Liu B. TP-MV: Therapeutic protein prediction by multi-view learning. Curr Bioinform 2021. [DOI: 10.2174/1574893617666211220153429] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Liu H, Zou Q, Xu Y. A novel fast multiple nucleotide sequence alignment method based on FM-index. Brief Bioinform 2021;23:6458932. [PMID: 34893794 DOI: 10.1093/bib/bbab519] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 10/19/2021] [Accepted: 11/14/2021] [Indexed: 11/13/2022] Open

Yan K, Wen J, Xu Y, Liu B. Protein Fold Recognition Based on Auto-Weighted Multi-View Graph Embedding Learning Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2682-2691. [PMID: 32356759 DOI: 10.1109/tcbb.2020.2991268] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Yan K, Wen J, Liu JX, Xu Y, Liu B. Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2008-2016. [PMID: 31940548 DOI: 10.1109/tcbb.2020.2966450] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Lu D, Zhang Y, Zhang L, Wang H, Weng W, Li L, Cai H. Methods of privacy-preserving genomic sequencing data alignments. Brief Bioinform 2021;22:6279828. [PMID: 34021302 DOI: 10.1093/bib/bbab151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 03/10/2021] [Accepted: 03/30/2021] [Indexed: 11/14/2022] Open

Zou Y, Zhu Y, Li Y, Wu FX, Wang J. Parallel computing for genome sequence processing. Brief Bioinform 2021;22:6210355. [PMID: 33822883 DOI: 10.1093/bib/bbab070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 01/26/2021] [Accepted: 02/10/2021] [Indexed: 01/08/2023] Open

Shi H, Shi H, Xu S. Efficient Multiple Sequences Alignment Algorithm Generation via Components Assembly Under PAR Framework. Front Genet 2021;11:628175. [PMID: 33613626 PMCID: PMC7890700 DOI: 10.3389/fgene.2020.628175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 12/29/2020] [Indexed: 11/25/2022] Open

Chen L, Li J, Chang M. Cancer Diagnosis and Disease Gene Identification via Statistical Machine Learning. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200207094947] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Li F, Luo M, Zhou W, Li J, Jin X, Xu Z, Juan L, Zhang Z, Li Y, Liu R, Li Y, Xu C, Ma K, Cao H, Wang J, Wang P, Bu Z, Jiang Q. Single cell RNA and immune repertoire profiling of COVID-19 patients reveal novel neutralizing antibody. Protein Cell 2020;12:751-755. [PMID: 33237441 PMCID: PMC7686823 DOI: 10.1007/s13238-020-00807-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/29/2020] [Indexed: 11/27/2022] Open

Affiliation(s)

Fang Li State Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin, 150001, China
Meng Luo School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
Wenyang Zhou School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
Jinliang Li Harbin Sixth Hospital, Harbin, 150000, China
Xiyun Jin School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
Zhaochun Xu School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
Liran Juan School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
Zheng Zhang Harbin Sixth Hospital, Harbin, 150000, China
Yuou Li Harbin Sixth Hospital, Harbin, 150000, China
Renqiang Liu State Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin, 150001, China
Yiqun Li School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
Chang Xu School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
Kexin Ma School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
Huimin Cao School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
Jingwei Wang Harbin Sixth Hospital, Harbin, 150000, China
Pingping Wang School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China.
Zhigao Bu State Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin, 150001, China.
Qinghua Jiang School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China. Key Laboratory of Biological Big Data (Harbin Institute of Technology), Ministry of Education, Harbin, 150001, China.

Collapse

DNA sequence classification based on MLP with PILAE algorithm. Soft comput 2020. [DOI: 10.1007/s00500-020-05429-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Zhan Q, Fu Y, Jiang Q, Liu B, Peng J, Wang Y. SpliVert: A Protein Multiple Sequence Alignment Refinement Method Based on Splitting-Splicing Vertically. Protein Pept Lett 2020;27:295-302. [PMID: 31385760 DOI: 10.2174/0929866526666190806143959] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 04/26/2019] [Accepted: 06/14/2019] [Indexed: 11/22/2022]

Abstract

BACKGROUND

Multiple Sequence Alignment (MSA) is a fundamental task in bioinformatics and is required for many biological analysis tasks. The more accurate the alignments are, the more credible the downstream analyses. Most protein MSA algorithms realign an alignment to refine it by dividing it into two groups horizontally and then realign the two groups. However, this strategy does not consider that different regions of the sequences have different conservation; this property may lead to incorrect residue-residue or residue-gap pairs, which cannot be corrected by this strategy.

OBJECTIVE

In this article, our motivation is to develop a novel refinement method based on splitting- splicing vertically.

METHODS

Here, we present a novel refinement method based on splitting-splicing vertically, called SpliVert. For an alignment, we split it vertically into 3 parts, remove the gap characters in the middle, realign the middle part alone, and splice the realigned middle parts with the other two initial pieces to obtain a refined alignment. In the realign procedure of our method, the aligner will only focus on a certain part, ignoring the disturbance of the other parts, which could help fix the incorrect pairs.

RESULTS

We tested our refinement strategy for 2 leading MSA tools on 3 standard benchmarks, according to the commonly used average SP (and TC) score. The results show that given appropriate proportions to split the initial alignment, the average scores are increased comparably or slightly after using our method. We also compared the alignments refined by our method with alignments directly refined by the original alignment tools. The results suggest that using our SpliVert method to refine alignments can also outperform direct use of the original alignment tools.

CONCLUSION

The results reveal that splitting vertically and realigning part of the alignment is a good strategy for the refinement of protein multiple sequence alignments.

Collapse

Naznooshsadat E, Elham P, Ali SZ. FAME: fast and memory efficient multiple sequences alignment tool through compatible chain of roots. Bioinformatics 2020;36:3662-3668. [PMID: 32170927 DOI: 10.1093/bioinformatics/btaa175] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 02/10/2020] [Accepted: 03/12/2020] [Indexed: 11/13/2022] Open

Libin PJK, Deforche K, Abecasis AB, Theys K. VIRULIGN: fast codon-correct alignment and annotation of viral genomes. Bioinformatics 2020;35:1763-1765. [PMID: 30295730 PMCID: PMC6513156 DOI: 10.1093/bioinformatics/bty851] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 09/24/2018] [Accepted: 10/05/2018] [Indexed: 12/11/2022] Open

Fang C, Jia Y, Hu L, Lu Y, Wang H. IMPContact: An Interhelical Residue Contact Prediction Method. BIOMED RESEARCH INTERNATIONAL 2020;2020:4569037. [PMID: 32309431 PMCID: PMC7140131 DOI: 10.1155/2020/4569037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Accepted: 03/09/2020] [Indexed: 11/17/2022]

Shi H, Zhang X. Component-Based Design and Assembly of Heuristic Multiple Sequence Alignment Algorithms. Front Genet 2020;11:105. [PMID: 32174970 PMCID: PMC7056898 DOI: 10.3389/fgene.2020.00105] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 01/29/2020] [Indexed: 11/24/2022] Open

Mrozek D, Kwiendacz J, Malysiak-Mrozek B. Protein Construction-Based Data Partitioning Scheme for Alignment of Protein Macromolecular Structures Through Distributed Querying in Federated Databases. IEEE Trans Nanobioscience 2020;19:102-116. [DOI: 10.1109/tnb.2019.2930494] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Al-Eitan L, Saadeh H, Alnaamneh A, Darabseh S, Al-Sarhan N, Alzihlif M, Hakooz N, Ivanova E, Kelsey G, Dajani R. The genetic landscape of Arab Population, Chechens and Circassians subpopulations from Jordan through HV1 and HV2 regions of mtDNA. Gene 2019;729:144314. [PMID: 31884104 DOI: 10.1016/j.gene.2019.144314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Revised: 12/18/2019] [Accepted: 12/20/2019] [Indexed: 11/29/2022]

Deng L, Zhong G, Liu C, Luo J, Liu H. MADOKA: an ultra-fast approach for large-scale protein structure similarity searching. BMC Bioinformatics 2019;20:662. [PMID: 31870277 PMCID: PMC6929402 DOI: 10.1186/s12859-019-3235-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 11/14/2019] [Indexed: 01/22/2023] Open

Liu B, Zhu Y, Yan K. Fold-LTR-TCP: protein fold recognition based on triadic closure principle. Brief Bioinform 2019;21:2185-2193. [DOI: 10.1093/bib/bbz139] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 10/01/2019] [Accepted: 10/09/2019] [Indexed: 11/13/2022] Open

Shi L, Wang Z. Computational Strategies for Scalable Genomics Analysis. Genes (Basel) 2019;10:E1017. [PMID: 31817630 PMCID: PMC6947637 DOI: 10.3390/genes10121017] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 12/01/2019] [Accepted: 12/03/2019] [Indexed: 12/14/2022] Open

Li CC, Liu B. MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks. Brief Bioinform 2019;21:2133-2141. [PMID: 31774907 DOI: 10.1093/bib/bbz133] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 12/31/2022] Open

Zhang W, Lin W, Zhang D, Wang S, Shi J, Niu Y. Recent Advances in the Machine Learning-Based Drug-Target Interaction Prediction. Curr Drug Metab 2019;20:194-202. [PMID: 30129407 DOI: 10.2174/1389200219666180821094047] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 01/18/2018] [Accepted: 03/19/2018] [Indexed: 12/28/2022]

Bai N, Tang S, Yu C, Fu H, Wang C, Chen X. GMSA: A Data Sharing System for Multiple Sequence Alignment Across Multiple Users. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190111160101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Abstract Background: In recent years, the rapid growth of biological datasets in Bioinformatics has made the computation of Multiple Sequence Alignment (MSA) become extremely slow. Using the GPU to accelerate MSA has shown to be an effective approach. Moreover, there is a trend that many bioinformatic researchers or institutes setup a shared server for remote users to submit MSA jobs via provided web-pages or tools. Objective: Given the fact that different MSA jobs submitted by users often process similar datasets, there can be an opportunity for users to share their computation results between each other, which can avoid the redundant computation and thereby reduce the overall computing time. Furthermore, in the heterogeneous CPU/GPU platform, many existing applications assign their computation on GPU devices only, which leads to a waste of the CPU resources. Co-run computation can increase the utilization of computing resources on both CPUs and GPUs by dispatching workloads onto them simultaneously. Methods: In this paper, we propose an efficient MSA system called GMSA for multi-users on shared heterogeneous CPU/GPU platforms. To accelerate the computation of jobs from multiple users, data sharing is considered in GMSA due to the fact that different MSA jobs often have a percentage of the same data and tasks. Additionally, we also propose a scheduling strategy based on the similarity in datasets or tasks between MSA jobs. Furthermore, co-run computation model is adopted to take full use of both CPUs and GPUs. Results: We use four protein datasets which were redesigned according to different similarity. We compare GMSA with ClustalW and CUDA-ClustalW in multiple users scenarios. Experiments results showed that GMSA can achieve a speedup of up to 32X. Conclusion: GMSA is a system designed for accelerating the computation of MSA jobs with shared input datasets on heterogeneous CPU/GPU platforms. In this system, a strategy was proposed and implemented to find the common datasets among jobs submitted by multiple users, and a scheduling algorithm is presented based on it. To utilize the overall resource of both CPU and GPU, GMSA employs the co-run computation model. Results showed that it can speed up the total computation of jobs efficiently. Collapse

Liu B, Li S. ProtDet-CCH: Protein Remote Homology Detection by Combining Long Short-Term Memory and Ranking Methods. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:1203-1210. [PMID: 29993950 DOI: 10.1109/tcbb.2018.2789880] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Zhang J, Zhang Y, Ma Z. In silico Prediction of Human Secretory Proteins in Plasma Based on Discrete Firefly Optimization and Application to Cancer Biomarkers Identification. Front Genet 2019;10:542. [PMID: 31244885 PMCID: PMC6563772 DOI: 10.3389/fgene.2019.00542] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 05/21/2019] [Indexed: 12/20/2022] Open

Vineetha V, Biji CL, Nair AS. SPARK-MSNA: Efficient algorithm on Apache Spark for aligning multiple similar DNA/RNA sequences with supervised learning. Sci Rep 2019;9:6631. [PMID: 31036850 PMCID: PMC6488671 DOI: 10.1038/s41598-019-42966-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Accepted: 04/12/2019] [Indexed: 11/09/2022] Open

Liu L, Wang H. The Recent Applications and Developments of Bioinformatics and Omics Technologies in Traditional Chinese Medicine. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190102125403] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction. Genes (Basel) 2019;10:genes10020073. [PMID: 30678245 PMCID: PMC6410268 DOI: 10.3390/genes10020073] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 01/04/2019] [Accepted: 01/14/2019] [Indexed: 11/21/2022] Open

Wang Y, Wu H, Cai Y. A benchmark study of sequence alignment methods for protein clustering. BMC Bioinformatics 2018;19:529. [PMID: 30598070 PMCID: PMC6311937 DOI: 10.1186/s12859-018-2524-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Prediction of GluN2B-CT_1290-1310/DAPK1 Interaction by Protein⁻Peptide Docking and Molecular Dynamics Simulation. Molecules 2018;23:molecules23113018. [PMID: 30463177 PMCID: PMC6278559 DOI: 10.3390/molecules23113018] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 11/04/2018] [Accepted: 11/06/2018] [Indexed: 02/08/2023] Open

Zou Q, Lin G, Jiang X, Liu X, Zeng X. Sequence clustering in bioinformatics: an empirical study. Brief Bioinform 2018;21:1-10. [PMID: 30239587 DOI: 10.1093/bib/bby090] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 08/18/2018] [Accepted: 08/18/2018] [Indexed: 12/13/2022] Open

Guo R, Zhao Y, Zou Q, Fang X, Peng S. Bioinformatics applications on Apache Spark. Gigascience 2018;7:5067872. [PMID: 30101283 PMCID: PMC6113509 DOI: 10.1093/gigascience/giy098] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 07/28/2018] [Indexed: 11/13/2022] Open

Liang S, Zhang R, Liang D, Song T, Ai T, Xia C, Xia L, Wang Y. Multimodal 3D DenseNet for IDH Genotype Prediction in Gliomas. Genes (Basel) 2018;9:E382. [PMID: 30061525 PMCID: PMC6115744 DOI: 10.3390/genes9080382] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2018] [Revised: 07/06/2018] [Accepted: 07/16/2018] [Indexed: 01/08/2023] Open