1
|
Lin B, Luo X, Liu Y, Jin X. A comprehensive review and comparison of existing computational methods for protein function prediction. Brief Bioinform 2024; 25:bbae289. [PMID: 39003530 PMCID: PMC11246557 DOI: 10.1093/bib/bbae289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/18/2024] [Indexed: 07/15/2024] Open
Abstract
Protein function prediction is critical for understanding the cellular physiological and biochemical processes, and it opens up new possibilities for advancements in fields such as disease research and drug discovery. During the past decades, with the exponential growth of protein sequence data, many computational methods for predicting protein function have been proposed. Therefore, a systematic review and comparison of these methods are necessary. In this study, we divide these methods into four different categories, including sequence-based methods, 3D structure-based methods, PPI network-based methods and hybrid information-based methods. Furthermore, their advantages and disadvantages are discussed, and then their performance is comprehensively evaluated and compared. Finally, we discuss the challenges and opportunities present in this field.
Collapse
Affiliation(s)
- Baohui Lin
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
| | - Xiaoling Luo
- Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Shenzhen, Guangdong, China
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518061, China
| | - Yumeng Liu
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
| | - Xiaopeng Jin
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
| |
Collapse
|
2
|
Guerreiro R, Bonthala VS, Schlüter U, Hoang NV, Triesch S, Schranz ME, Weber APM, Stich B. A genomic panel for studying C3-C4 intermediate photosynthesis in the Brassiceae tribe. PLANT, CELL & ENVIRONMENT 2023; 46:3611-3627. [PMID: 37431820 DOI: 10.1111/pce.14662] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 05/18/2023] [Accepted: 06/23/2023] [Indexed: 07/12/2023]
Abstract
Research on C4 and C3-C4 photosynthesis has attracted significant attention because the understanding of the genetic underpinnings of these traits will support the introduction of its characteristics into commercially relevant crop species. We used a panel of 19 taxa of 18 Brassiceae species with different photosynthesis characteristics (C3 and C3-C4) with the following objectives: (i) create draft genome assemblies and annotations, (ii) quantify orthology levels using synteny maps between all pairs of taxa, (iii) describe the phylogenetic relatedness across all the species, and (iv) track the evolution of C3-C4 intermediate photosynthesis in the Brassiceae tribe. Our results indicate that the draft de novo genome assemblies are of high quality and cover at least 90% of the gene space. Therewith we more than doubled the sampling depth of genomes of the Brassiceae tribe that comprises commercially important as well as biologically interesting species. The gene annotation generated high-quality gene models, and for most genes extensive upstream sequences are available for all taxa, yielding potential to explore variants in regulatory sequences. The genome-based phylogenetic tree of the Brassiceae contained two main clades and indicated that the C3-C4 intermediate photosynthesis has evolved five times independently. Furthermore, our study provides the first genomic support of the hypothesis that Diplotaxis muralis is a natural hybrid of D. tenuifolia and D. viminea. Altogether, the de novo genome assemblies and the annotations reported in this study are a valuable resource for research on the evolution of C3-C4 intermediate photosynthesis.
Collapse
Affiliation(s)
- Ricardo Guerreiro
- Institute of Quantitative Genetics and Genomics of Plants, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
| | - Venkata Suresh Bonthala
- Institute of Quantitative Genetics and Genomics of Plants, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
| | - Urte Schlüter
- Institute of Plant Biochemistry, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| | - Nam V Hoang
- Biosystematics Group, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands
| | - Sebastian Triesch
- Institute of Plant Biochemistry, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| | - M Eric Schranz
- Biosystematics Group, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands
| | - Andreas P M Weber
- Institute of Plant Biochemistry, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| | - Benjamin Stich
- Institute of Quantitative Genetics and Genomics of Plants, Faculty of Mathematics and Natural Sciences, Heinrich Heine University, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
- Max Planck Institute for Plant Breeding Research, Köln, Germany
| |
Collapse
|
3
|
Sharma L, Deepak A, Ranjan A, Krishnasamy G. A novel hybrid CNN and BiGRU-Attention based deep learning model for protein function prediction. Stat Appl Genet Mol Biol 2023; 22:sagmb-2022-0057. [PMID: 37658681 DOI: 10.1515/sagmb-2022-0057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 04/20/2023] [Indexed: 09/03/2023]
Abstract
Proteins are the building blocks of all living things. Protein function must be ascertained if the molecular mechanism of life is to be understood. While CNN is good at capturing short-term relationships, GRU and LSTM can capture long-term dependencies. A hybrid approach that combines the complementary benefits of these deep-learning models motivates our work. Protein Language models, which use attention networks to gather meaningful data and build representations for proteins, have seen tremendous success in recent years processing the protein sequences. In this paper, we propose a hybrid CNN + BiGRU - Attention based model with protein language model embedding that effectively combines the output of CNN with the output of BiGRU-Attention for predicting protein functions. We evaluated the performance of our proposed hybrid model on human and yeast datasets. The proposed hybrid model improves the Fmax value over the state-of-the-art model SDN2GO for the cellular component prediction task by 1.9 %, for the molecular function prediction task by 3.8 % and for the biological process prediction task by 0.6 % for human dataset and for yeast dataset the cellular component prediction task by 2.4 %, for the molecular function prediction task by 5.2 % and for the biological process prediction task by 1.2 %.
Collapse
Affiliation(s)
- Lavkush Sharma
- Department of Computer Science and Engineering, National Institute of Technology Patna, Patna, Bihar, India
| | - Akshay Deepak
- Department of Computer Science and Engineering, National Institute of Technology Patna, Patna, Bihar, India
| | - Ashish Ranjan
- Department of Computer Science and Engineering, ITER, Siksha 'O' Anusandhan University (Deemed to be University), Bhubaneswar, Odisha, India
| | | |
Collapse
|
4
|
Song J, Kuan PF. A systematic assessment of cell type deconvolution algorithms for DNA methylation data. Brief Bioinform 2022; 23:bbac449. [PMID: 36242584 PMCID: PMC9947552 DOI: 10.1093/bib/bbac449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 08/11/2022] [Accepted: 09/20/2022] [Indexed: 12/14/2022] Open
Abstract
We performed systematic assessment of computational deconvolution methods that play an important role in the estimation of cell type proportions from bulk methylation data. The proposed framework methylDeConv (available as an R package) integrates several deconvolution methods for methylation profiles (Illumina HumanMethylation450 and MethylationEPIC arrays) and offers different cell-type-specific CpG selection to construct the extended reference library which incorporates the main immune cell subsets, epithelial cells and cell-free DNAs. We compared the performance of different deconvolution algorithms via simulations and benchmark datasets and further investigated the associations of the estimated cell type proportions to cancer therapy in breast cancer and subtypes in melanoma methylation case studies. Our results indicated that the deconvolution based on the extended reference library is critical to obtain accurate estimates of cell proportions in non-blood tissues.
Collapse
Affiliation(s)
- Junyan Song
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY
| | - Pei-Fen Kuan
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY
| |
Collapse
|
5
|
Gnilopyat S, DePietro PJ, Parry TK, McLaughlin WA. The Pharmacorank Search Tool for the Retrieval of Prioritized Protein Drug Targets and Drug Repositioning Candidates According to Selected Diseases. Biomolecules 2022; 12:1559. [PMID: 36358909 PMCID: PMC9687941 DOI: 10.3390/biom12111559] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/19/2022] [Accepted: 10/22/2022] [Indexed: 08/13/2023] Open
Abstract
We present the Pharmacorank search tool as an objective means to obtain prioritized protein drug targets and their associated medications according to user-selected diseases. This tool could be used to obtain prioritized protein targets for the creation of novel medications or to predict novel indications for medications that already exist. To prioritize the proteins associated with each disease, a gene similarity profiling method based on protein functions is implemented. The priority scores of the proteins are found to correlate well with the likelihoods that the associated medications are clinically relevant in the disease's treatment. When the protein priority scores are plotted against the percentage of protein targets that are known to bind medications currently indicated to treat the disease, which we termed the pertinency score, a strong correlation was observed. The correlation coefficient was found to be 0.9978 when using a weighted second-order polynomial fit. As the highly predictive fit was made using a broad range of diseases, we were able to identify a general threshold for the pertinency score as a starting point for considering drug repositioning candidates. Several repositioning candidates are described for proteins that have high predicated pertinency scores, and these provide illustrative examples of the applications of the tool. We also describe focused reviews of repositioning candidates for Alzheimer's disease. Via the tool's URL, https://protein.som.geisinger.edu/Pharmacorank/, an open online interface is provided for interactive use; and there is a site for programmatic access.
Collapse
Affiliation(s)
| | | | | | - William A. McLaughlin
- Department of Medical Education, Geisinger Commonwealth School of Medicine, 525 Pine Street, Scranton, PA 18509, USA
| |
Collapse
|
6
|
Liu M, Liu P, Chang Y, Xu B, Wang N, Qin L, Zheng J, Liu Y, Wu L, Yan H. Genome-wide DNA methylation profiles and small noncoding RNA signatures in sperm with a high DNA fragmentation index. J Assist Reprod Genet 2022; 39:2255-2274. [PMID: 36190595 PMCID: PMC9596664 DOI: 10.1007/s10815-022-02618-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 09/07/2022] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND A growing number of studies have reported that sperm DNA fragmentation (SDF) is associated with male infertility. However, no studies have compared genome-wide DNA methylation profiles and sncRNA signatures between sperm with high and low sperm DNA fragmentation indices (DFIs). METHODS Whole-genome bisulfite sequencing (WGBS) was performed on sperm samples from a weak group (DFI ≥ 30%, n = 6) and normal group (DFI ≤ 15%, n = 7). Small noncoding RNA (sncRNA) deep sequencing was conducted for sperm samples from the weak (DFI ≥ 30%, n = 13) and normal (DFI ≤ 15%, n = 17) groups. RESULTS A total of 4939 differentially methylated regions (DMRs) were identified in the weak group sperm samples relative to normal group sperm samples, with 2072 (41.95%) of them located in promoter regions. The percentages of hypermethylated DMRs were higher than those of hypomethylated DMRs in all seven examined gene annotation groups. Hypermethylated DMRs were significantly enriched in terms associated with neurons and microtubules. Compared with the normal group, the global DNA methylation level of the weak group sperm showed a downward trend, with lower correlation for methylation in the weak group sperm; therefore, the chromosomes of high-DFI sperm may be loose. On average, 40.5% of sncRNAs were annotated as rsRNAs, 19.3% as tsRNAs, 10.4% as yRNAs, and 7.1% as miRNAs. A total of 27 miRNAs, 151 tsRNAs, and 70 rsRNAs were differentially expressed between the two groups of sperm samples. Finally, 7 sncRNAs were identified as candidate sperm quality biomarkers, and the target genes of the differentially expressed miRNAs are involved in nervous system development. CONCLUSION Our findings suggest that genome-wide DNA methylation profiles and sncRNA signatures are significantly altered in high-DFI sperm. Our study provides potential biomarkers for sperm quality.
Collapse
Affiliation(s)
- Minghua Liu
- Reproductive Medical Center, Changhai Hospital, Naval Medical University, Shanghai, China
| | - Peiru Liu
- MOE Key Laboratory of Metabolism and Molecular Medicine, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences and Zhongshan Hospital, Fudan University, Shanghai, China
| | - Yunjian Chang
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Beiying Xu
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Nengzhuang Wang
- Reproductive Medical Center, Changhai Hospital, Naval Medical University, Shanghai, China
| | - Lina Qin
- Reproductive Medical Center, Changhai Hospital, Naval Medical University, Shanghai, China
| | - Jufen Zheng
- Reproductive Medical Center, Changhai Hospital, Naval Medical University, Shanghai, China.
| | - Yun Liu
- MOE Key Laboratory of Metabolism and Molecular Medicine, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences and Zhongshan Hospital, Fudan University, Shanghai, China.
| | - Ligang Wu
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China.
| | - Hongli Yan
- Reproductive Medical Center, Changhai Hospital, Naval Medical University, Shanghai, China.
| |
Collapse
|
7
|
Yang CX, Yang YW, Mou Q, Chen L, Wang C, Du ZQ. Proteomic changes induced by ascorbic acid treatment on porcine immature Sertoli cells. Theriogenology 2022; 188:13-21. [DOI: 10.1016/j.theriogenology.2022.05.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 04/24/2022] [Accepted: 05/13/2022] [Indexed: 01/08/2023]
|
8
|
Jia B, Xiang D, Shao Q, Hong Q, Quan G, Wu G. Proteomic Exploration of Porcine Oocytes During Meiotic Maturation in vitro Using an Accurate TMT-Based Quantitative Approach. Front Vet Sci 2022; 8:792869. [PMID: 35198619 PMCID: PMC8859466 DOI: 10.3389/fvets.2021.792869] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 12/20/2021] [Indexed: 01/19/2023] Open
Abstract
The dynamic changes in protein expression are well known to be required for oocyte meiotic maturation. Although proteomic analysis has been performed in porcine oocytes during in vitro maturation, there is still no full data because of the technical limitations at that time. Here, a novel tandem mass tag (TMT)-based quantitative approach was used to compare the proteomic profiles of porcine immature and in vitro mature oocytes. The results of our study showed that there were 763 proteins considered with significant difference−450 over-expressed and 313 under-expressed proteins. The GO and KEGG analyses revealed multiple regulatory mechanisms of oocyte nuclear and cytoplasmic maturation such as spindle and chromosome configurations, cytoskeletal reconstruction, epigenetic modifications, energy metabolism, signal transduction and others. In addition, 12 proteins identified with high-confidence peptide and related to oocyte maturation were quantified by a parallel reaction monitoring technique to validate the reliability of TMT results. In conclusion, we provided a detailed proteomics dataset to enrich the understanding of molecular characteristics underlying porcine oocyte maturation in vitro.
Collapse
Affiliation(s)
- Baoyu Jia
- Key Laboratory of Animal Gene Editing and Animal Cloning in Yunnan Province, College of Veterinary Medicine, Yunnan Agricultural University, Kunming, China
| | - Decai Xiang
- Yunnan Provincial Genebank of Livestock and Poultry Genetic Resources, Yunnan Provincial Engineering Laboratory of Animal Genetic Resource Conservation and Germplasm Enhancement, Yunnan Animal Science and Veterinary Institute, Kunming, China
| | - Qingyong Shao
- Yunnan Provincial Genebank of Livestock and Poultry Genetic Resources, Yunnan Provincial Engineering Laboratory of Animal Genetic Resource Conservation and Germplasm Enhancement, Yunnan Animal Science and Veterinary Institute, Kunming, China
| | - Qionghua Hong
- Yunnan Provincial Genebank of Livestock and Poultry Genetic Resources, Yunnan Provincial Engineering Laboratory of Animal Genetic Resource Conservation and Germplasm Enhancement, Yunnan Animal Science and Veterinary Institute, Kunming, China
| | - Guobo Quan
- Yunnan Provincial Genebank of Livestock and Poultry Genetic Resources, Yunnan Provincial Engineering Laboratory of Animal Genetic Resource Conservation and Germplasm Enhancement, Yunnan Animal Science and Veterinary Institute, Kunming, China
- *Correspondence: Guobo Quan
| | - Guoquan Wu
- Yunnan Provincial Genebank of Livestock and Poultry Genetic Resources, Yunnan Provincial Engineering Laboratory of Animal Genetic Resource Conservation and Germplasm Enhancement, Yunnan Animal Science and Veterinary Institute, Kunming, China
- Guoquan Wu
| |
Collapse
|
9
|
Zhou J, Xiong W, Wang Y, Guan J. Protein Function Prediction Based on PPI Networks: Network Reconstruction vs Edge Enrichment. Front Genet 2022; 12:758131. [PMID: 34970299 PMCID: PMC8712557 DOI: 10.3389/fgene.2021.758131] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 11/11/2021] [Indexed: 01/21/2023] Open
Abstract
Over the past decades, massive amounts of protein-protein interaction (PPI) data have been accumulated due to the advancement of high-throughput technologies, and but data quality issues (noise or incompleteness) of PPI have been still affecting protein function prediction accuracy based on PPI networks. Although two main strategies of network reconstruction and edge enrichment have been reported on the effectiveness of boosting the prediction performance in numerous literature studies, there still lack comparative studies of the performance differences between network reconstruction and edge enrichment. Inspired by the question, this study first uses three protein similarity metrics (local, global and sequence) for network reconstruction and edge enrichment in PPI networks, and then evaluates the performance differences of network reconstruction, edge enrichment and the original networks on two real PPI datasets. The experimental results demonstrate that edge enrichment work better than both network reconstruction and original networks. Moreover, for the edge enrichment of PPI networks, the sequence similarity outperformes both local and global similarity. In summary, our study can help biologists select suitable pre-processing schemes and achieve better protein function prediction for PPI networks.
Collapse
Affiliation(s)
- Jiaogen Zhou
- Jiangsu Provincial Engineering Research Center for Intelligent Monitoring and Ecological Management of Pond and Reservoir Water Environment, Huaiyin Normal University, Huian, China
| | - Wei Xiong
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai, China
| | - Yang Wang
- Department of Computer Science and Technology, Tongji University, Shanghai, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai, China
| |
Collapse
|
10
|
Huang T, Suen D. Iron insufficiency in floral buds impairs pollen development by disrupting tapetum function. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 108:244-267. [PMID: 34310779 PMCID: PMC9292431 DOI: 10.1111/tpj.15438] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 06/25/2021] [Accepted: 07/20/2021] [Indexed: 06/13/2023]
Abstract
Reduction of crop yield due to iron (Fe) deficiency has always been a concern in agriculture. How Fe insufficiency in floral buds affects pollen development remains unexplored. Here, plants transferred to Fe-deficient medium at the reproductive stage had reduced floral Fe content and viable pollen and showed a defective pollen outer wall, all restored by supplying floral buds with Fe. A comparison of differentially expressed genes (DEGs) in Fe-deficient leaves, roots, and anthers suggested that changes in several cellular processes were unique to anthers, including increased lipid degradation. Co-expression analysis revealed that ABORTED MICROSPORES (AMS), DEFECTIVE IN TAPETAL DEVELOPMENT AND FUNCTION1, and BASIC HELIX-LOOP-HELIX 089/091/010 encode key upstream transcription factors of Fe deficiency-responsive DEGs involved in tapetum function and development, including tapetal ROS homeostasis, programmed cell death, and pollen outer wall formation-related lipid metabolism. Analysis of RESPIRATORY-BURST OXIDASE HOMOLOG E (RBOHE) gain- and loss-of-function under Fe deficiency indicated that RBOHE- and Fe-dependent regulation cooperatively control anther reactive oxygen species levels and pollen development. Since DEGs in Fe-deficient anthers were not significantly enriched in genes related to mitochondrial function, the changes in mitochondrial status under Fe deficiency, including respiration activity, density, and morphology, were probably because the Fe amount was insufficient to maintain proper mitochondrial protein function in anthers. To sum up, Fe deficiency in anthers may affect Fe-dependent protein function and impact upstream transcription factors and their downstream genes, resulting in extensively impaired tapetum function and pollen development.
Collapse
Affiliation(s)
- Tzu‐Hsiang Huang
- Agricultural Biotechnology Research CenterAcademia SinicaTaipei11529Taiwan
- Molecular and Biological Agricultural Sciences ProgramTaiwan International Graduate ProgramAcademia Sinica and National Chung‐Hsing UniversityTaipei11529Taiwan
- Graduate Institute of BiotechnologyNational Chung‐Hsing UniversityTaichung40227Taiwan
| | - Der‐Fen Suen
- Agricultural Biotechnology Research CenterAcademia SinicaTaipei11529Taiwan
- Molecular and Biological Agricultural Sciences ProgramTaiwan International Graduate ProgramAcademia Sinica and National Chung‐Hsing UniversityTaipei11529Taiwan
- Biotechnology CenterNational Chung‐Hsing UniversityTaichung40227Taiwan
| |
Collapse
|
11
|
A machine learning framework for predicting drug-drug interactions. Sci Rep 2021; 11:17619. [PMID: 34475500 PMCID: PMC8413337 DOI: 10.1038/s41598-021-97193-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 08/18/2021] [Indexed: 11/25/2022] Open
Abstract
Understanding drug–drug interactions is an essential step to reduce the risk of adverse drug events before clinical drug co-prescription. Existing methods, commonly integrating heterogeneous data to increase model performance, often suffer from a high model complexity, As such, how to elucidate the molecular mechanisms underlying drug–drug interactions while preserving rational biological interpretability is a challenging task in computational modeling for drug discovery. In this study, we attempt to investigate drug–drug interactions via the associations between genes that two drugs target. For this purpose, we propose a simple f drug target profile representation to depict drugs and drug pairs, from which an l2-regularized logistic regression model is built to predict drug–drug interactions. Furthermore, we define several statistical metrics in the context of human protein–protein interaction networks and signaling pathways to measure the interaction intensity, interaction efficacy and action range between two drugs. Large-scale empirical studies including both cross validation and independent test show that the proposed drug target profiles-based machine learning framework outperforms existing data integration-based methods. The proposed statistical metrics show that two drugs easily interact in the cases that they target common genes; or their target genes connect via short paths in protein–protein interaction networks; or their target genes are located at signaling pathways that have cross-talks. The unravelled mechanisms could provide biological insights into potential adverse drug reactions of co-prescribed drugs.
Collapse
|
12
|
Hobbs ET, Goralski SM, Mitchell A, Simpson A, Leka D, Kotey E, Sekira M, Munro JB, Nadendla S, Jackson R, Gonzalez-Aguirre A, Krallinger M, Giglio M, Erill I. ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts. Front Res Metr Anal 2021; 6:674205. [PMID: 34327299 PMCID: PMC8313968 DOI: 10.3389/frma.2021.674205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 06/28/2021] [Indexed: 11/20/2022] Open
Abstract
Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology.
Collapse
Affiliation(s)
- Elizabeth T Hobbs
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Stephen M Goralski
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Ashley Mitchell
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Andrew Simpson
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Dorjan Leka
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Emmanuel Kotey
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Matt Sekira
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - James B Munro
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | - Suvarna Nadendla
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | - Rebecca Jackson
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | | | - Martin Krallinger
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Centro Nacional de Investigaciones Oncológicas (CNIO), Madrid, Spain
| | - Michelle Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| |
Collapse
|
13
|
Databases for Protein-Protein Interactions. Methods Mol Biol 2021; 2361:229-248. [PMID: 34236665 DOI: 10.1007/978-1-0716-1641-3_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Protein-protein interaction networks have a crucial role in biological processes. Proteins perform multiple functions in forming physical and functional interactions in cellular systems. Information concerning an enormous number of protein interactions in a wide range of species has accumulated and has been integrated into various resources for molecular biology and systems biology. This chapter provides a review of the representative databases and the major computational methods used for protein-protein interactions.
Collapse
|
14
|
Nguyen QH, Le DH. Similarity Calculation, Enrichment Analysis, and Ontology Visualization of Biomedical Ontologies using UFO. Curr Protoc 2021; 1:e115. [PMID: 33900688 DOI: 10.1002/cpz1.115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The rapid growth of biomedical ontologies observed in recent years has been reported to be useful in various applications. In this article, we propose two main-function protocols-term-related and entity-related-with the three most common ontology analyses, including similarity calculation, enrichment analysis, and ontology visualization, which can be done by separate methods. Many previously developed tools implementing those methods run on different platforms and implement a limited number of the methods for similarity calculation and enrichment analysis tools for a specific type of biomedical ontology, although any type can be acceptable. Moreover, depending on each application, methods have distinct advantages; thus, the greater the number of methods a tool has, the better decisions that users make. The protocol here implements all the analyses above using an advanced popular tool called UFO. UFO is a Cytoscape app that unifies most of the semantic similarity measures for between-term and between-entity similarity calculation for biomedical ontologies in OBO format, which can calculate the similarity between two sets of entities and weigh imported entity networks, as well as generate functional similarity networks. The complete protocol can be performed in 30 min and is designed for use by biologists with no prior bioinformatics training. © 2021 Wiley Periodicals LLC. Basic Protocol: Running UFO using a list of input Gene Ontology, Disease Ontology, or Human Phenotype Ontology data.
Collapse
Affiliation(s)
- Quang-Huy Nguyen
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
| | - Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam.,School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
| |
Collapse
|
15
|
Cui LL, Zhou CX, Han B, Wang SS, Li SY, Xie SC, Zhou DH. Urine proteomics for profiling of mouse toxoplasmosis using liquid chromatography tandem mass spectrometry analysis. Parasit Vectors 2021; 14:211. [PMID: 33879238 PMCID: PMC8056516 DOI: 10.1186/s13071-021-04713-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 03/31/2021] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Toxoplasma gondii is an obligate intracellular parasite that causes toxoplasmosis. Urine is an easily obtained clinical sample that has been widely applied for diagnostic purposes. However, changes in the urinary proteome during T. gondii infection have never been investigated. METHODS Twenty four-hour urine samples were obtained from BALB/c mice with acute infection [11 days post infection (DPI)], mice with chronic infection (35 DPI) and healthy controls, and were analyzed using a label-free liquid chromatography tandem mass spectrometry analysis. RESULTS We identified a total of 13,414 peptides on 1802 proteins, of which 169 and 47 proteins were significantly differentially expressed at acute and chronic infection phases, respectively. Clustering analysis revealed obvious differences in proteome profiles among all groups. Gene ontology analysis showed that a large number of differentially expressed proteins (DEPs) detected in acute infection were associated with biological binding activity and single-organism processes. KEGG pathway enrichment analysis showed that the majority of these DEPs were involved in disease-related and metabolic pathways. CONCLUSIONS Our findings revealed global reprogramming of the urine proteome following T. gondii infection, and data obtained in this study will enhance our understanding of the host responses to T. gondii infection and lead to the identification of new diagnostic biomarkers.
Collapse
Affiliation(s)
- Lin-Lin Cui
- Key Laboratory of Fujian-Taiwan Animal Pathogen Biology, College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
| | - Chun-Xue Zhou
- Department of Pathogen Biology, School of Basic Medical Sciences, Cheeloo College of Medicine, Shandong University, Jinan, Shandong Province, 250012, People's Republic of China.
| | - Bing Han
- Department of Pathogen Biology, School of Basic Medical Sciences, Cheeloo College of Medicine, Shandong University, Jinan, Shandong Province, 250012, People's Republic of China
| | - Sha-Sha Wang
- College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi, 712100, People's Republic of China
| | - Si-Ying Li
- Department of Pathogen Biology, School of Basic Medical Sciences, Cheeloo College of Medicine, Shandong University, Jinan, Shandong Province, 250012, People's Republic of China
| | - Shi-Chen Xie
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu Province, 730046, People's Republic of China
| | - Dong-Hui Zhou
- Key Laboratory of Fujian-Taiwan Animal Pathogen Biology, College of Animal Sciences (College of Bee Science), Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China.
| |
Collapse
|
16
|
Wattanapornprom W, Thammarongtham C, Hongsthong A, Lertampaiporn S. Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization. Life (Basel) 2021; 11:life11040293. [PMID: 33808227 PMCID: PMC8066735 DOI: 10.3390/life11040293] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/16/2021] [Accepted: 03/25/2021] [Indexed: 12/17/2022] Open
Abstract
The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10–14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.
Collapse
Affiliation(s)
- Warin Wattanapornprom
- Applied Computer Science Program, Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi, Bangkok 10140, Thailand;
| | - Chinae Thammarongtham
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Apiradee Hongsthong
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Supatcha Lertampaiporn
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
- Correspondence:
| |
Collapse
|
17
|
Zajac N, Zoller S, Seppälä K, Moi D, Dessimoz C, Jokela J, Hartikainen H, Glover N. Gene Duplication and Gain in the Trematode Atriophallophorus winterbourni Contributes to Adaptation to Parasitism. Genome Biol Evol 2021; 13:evab010. [PMID: 33484570 PMCID: PMC7936022 DOI: 10.1093/gbe/evab010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2021] [Indexed: 01/10/2023] Open
Abstract
Gene duplications and novel genes have been shown to play a major role in helminth adaptation to a parasitic lifestyle because they provide the novelty necessary for adaptation to a changing environment, such as living in multiple hosts. Here we present the de novo sequenced and annotated genome of the parasitic trematode Atriophallophorus winterbourni and its comparative genomic analysis to other major parasitic trematodes. First, we reconstructed the species phylogeny, and dated the split of A. winterbourni from the Opisthorchiata suborder to approximately 237.4 Ma (±120.4 Myr). We then addressed the question of which expanded gene families and gained genes are potentially involved in adaptation to parasitism. To do this, we used hierarchical orthologous groups to reconstruct three ancestral genomes on the phylogeny leading to A. winterbourni and performed a GO (Gene Ontology) enrichment analysis of the gene composition of each ancestral genome, allowing us to characterize the subsequent genomic changes. Out of the 11,499 genes in the A. winterbourni genome, as much as 24% have arisen through duplication events since the speciation of A. winterbourni from the Opisthorchiata, and as much as 31.9% appear to be novel, that is, newly acquired. We found 13 gene families in A. winterbourni to have had more than ten genes arising through these recent duplications; all of which have functions potentially relating to host behavioral manipulation, host tissue penetration, and hiding from host immunity through antigen presentation. We identified several families with genes evolving under positive selection. Our results provide a valuable resource for future studies on the genomic basis of adaptation to parasitism and point to specific candidate genes putatively involved in antagonistic host-parasite adaptation.
Collapse
Affiliation(s)
- Natalia Zajac
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
- ETH Zurich, Department of Environmental Systems Science, Institute of Integrative Biology, Zurich, Switzerland
| | - Stefan Zoller
- ETH Zurich, Department of Environmental Systems Science, Institute of Integrative Biology, Zurich, Switzerland
| | - Katri Seppälä
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
- Research Department for Limnology, University of Innsbruck, Mondsee, Austria
| | - David Moi
- Department of Computational Biology, University of Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Center for Integrative Genomics, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Center for Integrative Genomics, Lausanne, Switzerland
- Centre for Life’s Origins and Evolution, Department of Genetics Evolution and Environment, University College London, United Kingdom
- Department of Computer Science, University College London, United Kingdom
| | - Jukka Jokela
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
- ETH Zurich, Department of Environmental Systems Science, Institute of Integrative Biology, Zurich, Switzerland
| | - Hanna Hartikainen
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
- ETH Zurich, Department of Environmental Systems Science, Institute of Integrative Biology, Zurich, Switzerland
- School of Life Sciences, University of Nottingham, University Park, United Kingdom
| | - Natasha Glover
- Department of Computational Biology, University of Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Center for Integrative Genomics, Lausanne, Switzerland
| |
Collapse
|
18
|
Jia B, Xiang D, Fu X, Shao Q, Hong Q, Quan G, Wu G. Proteomic Changes of Porcine Oocytes After Vitrification and Subsequent in vitro Maturation: A Tandem Mass Tag-Based Quantitative Analysis. Front Cell Dev Biol 2020; 8:614577. [PMID: 33425922 PMCID: PMC7785821 DOI: 10.3389/fcell.2020.614577] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 11/24/2020] [Indexed: 01/01/2023] Open
Abstract
Cryopreservation of immature germinal vesicle (GV) oocytes is a promising strategy in pigs but still results in reduced oocyte quality due to inevitable cryodamages. Recently, there has been more focus on the molecular changes of oocytes after vitrification, but the alteration in the proteome level remains elusive. The aim of this study therefore was to decipher the proteomic characteristics of porcine GV oocytes following vitrification and in vitro maturation (IVM) by using tandem mass tag (TMT)-based quantitative approach and bioinformatics analysis. A total of 4,499 proteins were identified, out of which 153 presented significant difference. There were 94 up-regulated and 59 down-regulated proteins expressed differentially in the vitrified oocytes. Functional classification and enrichment analyses revealed that many of these proteins were involved in metabolism, signal transduction, response to stimulus, immune response, complement, coagulation cascades, and so on. Moreover, a parallel reaction monitoring technique validated the reliability of TMT data through quantitative analysis for 10 candidate proteins. In conclusion, our results provided a novel perspective of proteomics to comprehend the quality change in the vitrified porcine GV oocytes after IVM.
Collapse
Affiliation(s)
- Baoyu Jia
- College of Veterinary Medicine, Yunnan Agricultural University, Kunming, China
| | - Decai Xiang
- Yunnan Provincial Engineering Laboratory of Animal Genetic Resource Conservation and Germplasm Enhancement, Yunnan Animal Science and Veterinary Institute, Kunming, China
| | - Xiangwei Fu
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Qingyong Shao
- Yunnan Provincial Engineering Laboratory of Animal Genetic Resource Conservation and Germplasm Enhancement, Yunnan Animal Science and Veterinary Institute, Kunming, China
| | - Qionghua Hong
- Yunnan Provincial Engineering Laboratory of Animal Genetic Resource Conservation and Germplasm Enhancement, Yunnan Animal Science and Veterinary Institute, Kunming, China
| | - Guobo Quan
- Yunnan Provincial Engineering Laboratory of Animal Genetic Resource Conservation and Germplasm Enhancement, Yunnan Animal Science and Veterinary Institute, Kunming, China
| | - Guoquan Wu
- Yunnan Provincial Engineering Laboratory of Animal Genetic Resource Conservation and Germplasm Enhancement, Yunnan Animal Science and Veterinary Institute, Kunming, China
| |
Collapse
|
19
|
Zhong X, Rajapakse JC. Graph embeddings on gene ontology annotations for protein-protein interaction prediction. BMC Bioinformatics 2020; 21:560. [PMID: 33323115 PMCID: PMC7739483 DOI: 10.1186/s12859-020-03816-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 10/13/2020] [Indexed: 01/15/2023] Open
Abstract
Background Protein–protein interaction (PPI) prediction is an important task towards the understanding of many bioinformatics functions and applications, such as predicting protein functions, gene-disease associations and disease-drug associations. However, many previous PPI prediction researches do not consider missing and spurious interactions inherent in PPI networks. To address these two issues, we define two corresponding tasks, namely missing PPI prediction and spurious PPI prediction, and propose a method that employs graph embeddings that learn vector representations from constructed Gene Ontology Annotation (GOA) graphs and then use embedded vectors to achieve the two tasks. Our method leverages on information from both term–term relations among GO terms and term-protein annotations between GO terms and proteins, and preserves properties of both local and global structural information of the GO annotation graph. Results We compare our method with those methods that are based on information content (IC) and one method that is based on word embeddings, with experiments on three PPI datasets from STRING database. Experimental results demonstrate that our method is more effective than those compared methods. Conclusion Our experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GOA graphs for our defined missing and spurious PPI tasks.
Collapse
Affiliation(s)
- Xiaoshi Zhong
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.
| | - Jagath C Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, Singapore
| |
Collapse
|
20
|
Li T, Lei L, Bhattacharyya S, Van den Berge K, Sarkar P, Bickel PJ, Levina E. Hierarchical Community Detection by Recursive Partitioning. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1833888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Tianxi Li
- Department of Statistics, University of Virginia, Charllottesville, VA
| | - Lihua Lei
- Department of Statistics, Stanford University, Stanford, CA
| | | | - Koen Van den Berge
- Department of Statistics, University of California, Berkeley, Berkeley, CA
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
| | - Purnamrita Sarkar
- Department of Statistics and Data Sciences, University of Texas at Austin, Austin, TX
| | - Peter J. Bickel
- Department of Statistics, University of California, Berkeley, Berkeley, CA
| | | |
Collapse
|
21
|
Celebi R, Rebelo Moreira J, Hassan AA, Ayyar S, Ridder L, Kuhn T, Dumontier M. Towards FAIR protocols and workflows: the OpenPREDICT use case. PeerJ Comput Sci 2020; 6:e281. [PMID: 33816932 PMCID: PMC7924452 DOI: 10.7717/peerj-cs.281] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 06/18/2020] [Indexed: 06/12/2023]
Abstract
It is essential for the advancement of science that researchers share, reuse and reproduce each other's workflows and protocols. The FAIR principles are a set of guidelines that aim to maximize the value and usefulness of research data, and emphasize the importance of making digital objects findable and reusable by others. The question of how to apply these principles not just to data but also to the workflows and protocols that consume and produce them is still under debate and poses a number of challenges. In this paper we describe a two-fold approach of simultaneously applying the FAIR principles to scientific workflows as well as the involved data. We apply and evaluate our approach on the case of the PREDICT workflow, a highly cited drug repurposing workflow. This includes FAIRification of the involved datasets, as well as applying semantic technologies to represent and store data about the detailed versions of the general protocol, of the concrete workflow instructions, and of their execution traces. We propose a semantic model to address these specific requirements and was evaluated by answering competency questions. This semantic model consists of classes and relations from a number of existing ontologies, including Workflow4ever, PROV, EDAM, and BPMN. This allowed us then to formulate and answer new kinds of competency questions. Our evaluation shows the high degree to which our FAIRified OpenPREDICT workflow now adheres to the FAIR principles and the practicality and usefulness of being able to answer our new competency questions.
Collapse
Affiliation(s)
- Remzi Celebi
- Institute of Data Science, Maastricht University, Maastricht, Netherlands
| | | | - Ahmed A. Hassan
- Pharmacology & Personalised Medicine, Maastricht University, Maastricht, Netherlands
| | - Sandeep Ayyar
- Medical Informatics, Stanford University, Palo Alto, CA, United States of America
| | - Lars Ridder
- Netherlands eScience Center, Amsterdam, Netherlands
| | - Tobias Kuhn
- Computer Science, VU University Amsterdam, Amsterdam, Netherlands
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
22
|
Zhan H, Song L, Kamran A, Han F, Li B, Zhou Z, Liu T, Shen L, Li Y, Wang F, Yang J. Comprehensive Proteomic Analysis of Lysine Ubiquitination in Seedling Leaves of Nicotiana tabacum. ACS OMEGA 2020; 5:20122-20133. [PMID: 32832766 PMCID: PMC7439365 DOI: 10.1021/acsomega.0c01741] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 07/23/2020] [Indexed: 05/04/2023]
Abstract
Lysine ubiquitination, a widely studied posttranslational modification, plays vital roles in various biological processes in eukaryotic cells. Although several studies have examined the plant ubiquitylome, no such research has been performed in tobacco, a model plant for molecular biology. Here, we comprehensively analyzed lysine ubiquitination in tobacco (Nicotiana tabacum) using LC-MS/MS along with highly sensitive immune-affinity purification. In total, 964 lysine-ubiquitinated (Kub) sites were identified in 572 proteins. Extensive bioinformatics studies revealed the distribution of these proteins in various cellular locations, including the cytoplasm, chloroplast, nucleus, and plasma membrane. Notably, 25% of the Kub proteins were located in the chloroplast of which 21 were enzymatically involved in important pathways, that is, photosynthesis and carbon fixation. Western blot analysis indicated that TMV infection can cause changes in ubiquitination levels. This is the first comprehensive proteomic analysis of lysine ubiquitination in tobacco, illustrating the vital role of ubiquitination in various physiological and biochemical processes and representing a valuable addition to the existing landscape of lysine ubiquitination.
Collapse
Affiliation(s)
- Huaixu Zhan
- Key
Laboratory of Tobacco Pest Monitoring, Controlling & Integrated
Management, Tobacco Research Institute of
Chinese Academy of Agricultural Sciences, Qingdao 266101, China
- Graduate
School of Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Liyun Song
- Key
Laboratory of Tobacco Pest Monitoring, Controlling & Integrated
Management, Tobacco Research Institute of
Chinese Academy of Agricultural Sciences, Qingdao 266101, China
| | - Ali Kamran
- Key
Laboratory of Tobacco Pest Monitoring, Controlling & Integrated
Management, Tobacco Research Institute of
Chinese Academy of Agricultural Sciences, Qingdao 266101, China
| | - Fei Han
- State
Tobacco Monopoly Administration, Beijing 100045, China
| | - Bin Li
- Sichuan
Tobacco Company, Chengdu 610017, China
| | - Zhicheng Zhou
- Hunan
Tobacco Science Institute, Changsha 410004, China
| | - Tianbo Liu
- Hunan
Tobacco Science Institute, Changsha 410004, China
| | - Lili Shen
- Key
Laboratory of Tobacco Pest Monitoring, Controlling & Integrated
Management, Tobacco Research Institute of
Chinese Academy of Agricultural Sciences, Qingdao 266101, China
| | - Ying Li
- Key
Laboratory of Tobacco Pest Monitoring, Controlling & Integrated
Management, Tobacco Research Institute of
Chinese Academy of Agricultural Sciences, Qingdao 266101, China
| | - Fenglong Wang
- Key
Laboratory of Tobacco Pest Monitoring, Controlling & Integrated
Management, Tobacco Research Institute of
Chinese Academy of Agricultural Sciences, Qingdao 266101, China
- wangfenglong@ caas.cn
| | - Jinguang Yang
- Key
Laboratory of Tobacco Pest Monitoring, Controlling & Integrated
Management, Tobacco Research Institute of
Chinese Academy of Agricultural Sciences, Qingdao 266101, China
- . Tel.: +86-532-88703236
| |
Collapse
|
23
|
Mishra SK, Muthye V, Kandoi G. Computational Methods for Predicting Functions at the mRNA Isoform Level. Int J Mol Sci 2020; 21:ijms21165686. [PMID: 32784445 PMCID: PMC7460821 DOI: 10.3390/ijms21165686] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 08/05/2020] [Accepted: 08/06/2020] [Indexed: 11/16/2022] Open
Abstract
Multiple mRNA isoforms of the same gene are produced via alternative splicing, a biological mechanism that regulates protein diversity while maintaining genome size. Alternatively spliced mRNA isoforms of the same gene may sometimes have very similar sequence, but they can have significantly diverse effects on cellular function and regulation. The products of alternative splicing have important and diverse functional roles, such as response to environmental stress, regulation of gene expression, human heritable, and plant diseases. The mRNA isoforms of the same gene can have dramatically different functions. Despite the functional importance of mRNA isoforms, very little has been done to annotate their functions. The recent years have however seen the development of several computational methods aimed at predicting mRNA isoform level biological functions. These methods use a wide array of proteo-genomic data to develop machine learning-based mRNA isoform function prediction tools. In this review, we discuss the computational methods developed for predicting the biological function at the individual mRNA isoform level.
Collapse
|
24
|
Wu J, Elsheikha HM, Tu Y, Getachew A, Zhou H, Zhou C, Xu S. Significant transcriptional changes in mature daughter Varroa destructor mites during infestation of different developmental stages of honeybees. PEST MANAGEMENT SCIENCE 2020; 76:2736-2745. [PMID: 32187435 DOI: 10.1002/ps.5821] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 02/03/2020] [Accepted: 03/18/2020] [Indexed: 06/10/2023]
Abstract
BACKGROUND Varroa destructor is considered a major cause of honeybee (Apis mellifera) colony losses worldwide. Although V. destructor mites exhibit preference behavior for certain honeybee lifecycle stages, the mechanism underlying host finding and preference remains largely unknown. RESULTS By using a de novo transcriptome assembly strategy, we sequenced the mature daughter V. destructor mite transcriptome during infestation of different stages of honeybees (brood cells, newly emerged bees and adult bees). A total of 132 779 unigenes were obtained with an average length of 2745 bp and N50 of 5706 bp. About 63.1% of the transcriptome could be annotated based on sequence homology to the predatory mite Metaseiulus occidentalis proteins. Expression analysis revealed that mature daughter mites had distinct transcriptome profiles after infestation of different honeybee stages, and that the majority of the differentially expressed genes (DEGs) of mite infesting adult honeybees were down-regulated compared to that infesting the sealed brood cells. Gene ontology and KEGG pathway enrichment analyses showed that a large number of DEGs were involved in cellular process and metabolic process, suggesting that Varroa mites undergo metabolic adjustment to accommodate the cellular, molecular and/or immune response of the honeybees. Interestingly, in adult honeybees, some mite DEGs involved in neurotransmitter biosynthesis and transport were identified and their levels of expression were validated by quantitative polymerase chain reaction (qPCR). CONCLUSION These results provide evidence for transcriptional reprogramming in mature daughter Varroa mites during infestation of honeybees, which may be relevant to understanding the mechanism underpinning adaptation and preference behavior of these mites for honeybees. © 2020 Society of Chemical Industry.
Collapse
Affiliation(s)
- Jiangli Wu
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture and Rural Affairs, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, P. R. China
| | - Hany M Elsheikha
- Faculty of Medicine and Health Sciences, School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Loughborough, UK
| | - Yangyang Tu
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture and Rural Affairs, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, P. R. China
| | - Awraris Getachew
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture and Rural Affairs, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, P. R. China
| | - Huaiyu Zhou
- Department of Pathogenic Biology, Shandong University School of Basic Medicine, Jinan, P. R. China
| | - Chunxue Zhou
- Department of Pathogenic Biology, Shandong University School of Basic Medicine, Jinan, P. R. China
| | - Shufa Xu
- Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture and Rural Affairs, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, P. R. China
| |
Collapse
|
25
|
Le DH. UFO: A tool for unifying biomedical ontology-based semantic similarity calculation, enrichment analysis and visualization. PLoS One 2020; 15:e0235670. [PMID: 32645039 PMCID: PMC7347127 DOI: 10.1371/journal.pone.0235670] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 06/22/2020] [Indexed: 02/06/2023] Open
Abstract
Background Biomedical ontologies have been growing quickly and proven to be useful in many biomedical applications. Important applications of those data include estimating the functional similarity between ontology terms and between annotated biomedical entities, analyzing enrichment for a set of biomedical entities. Many semantic similarity calculation and enrichment analysis methods have been proposed for such applications. Also, a number of tools implementing the methods have been developed on different platforms. However, these tools have implemented a small number of the semantic similarity calculation and enrichment analysis methods for a certain type of biomedical ontology. Note that the methods can be applied to all types of biomedical ontologies. More importantly, each method can be dominant in different applications; thus, users have more choice with more number of methods implemented in tools. Also, more functions would facilitate their task with ontology. Results In this study, we developed a Cytoscape app, named UFO, which unifies most of the semantic similarity measures for between-term and between-entity similarity calculation for all types of biomedical ontologies in OBO format. Based on the similarity calculation, UFO can calculate the similarity between two sets of entities and weigh imported entity networks as well as generate functional similarity networks. Besides, it can perform enrichment analysis of a set of entities by different methods. Moreover, UFO can visualize structural relationships between ontology terms, annotating relationships between entities and terms, and functional similarity between entities. Finally, we demonstrated the ability of UFO through some case studies on finding the best semantic similarity measures for assessing the similarity between human disease phenotypes, constructing biomedical entity functional similarity networks for predicting disease-associated biomarkers, and performing enrichment analysis on a set of similar phenotypes. Conclusions Taken together, UFO is expected to be a tool where biomedical ontologies can be exploited for various biomedical applications. Availability UFO is distributed as a Cytoscape app, and can be downloaded freely at Cytoscape App (http://apps.cytoscape.org/apps/ufo) for non-commercial use
Collapse
Affiliation(s)
- Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
- School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
- * E-mail:
| |
Collapse
|
26
|
Abstract
MOTIVATION With the ever-increasing number and diversity of sequenced species, the challenge to characterize genes with functional information is even more important. In most species, this characterization almost entirely relies on automated electronic methods. As such, it is critical to benchmark the various methods. The Critical Assessment of protein Function Annotation algorithms (CAFA) series of community experiments provide the most comprehensive benchmark, with a time-delayed analysis leveraging newly curated experimentally supported annotations. However, the definition of a false positive in CAFA has not fully accounted for the open world assumption (OWA), leading to a systematic underestimation of precision. The main reason for this limitation is the relative paucity of negative experimental annotations. RESULTS This article introduces a new, OWA-compliant, benchmark based on a balanced test set of positive and negative annotations. The negative annotations are derived from expert-curated annotations of protein families on phylogenetic trees. This approach results in a large increase in the average information content of negative annotations. The benchmark has been tested using the naïve and BLAST baseline methods, as well as two orthology-based methods. This new benchmark could complement existing ones in future CAFA experiments. AVAILABILITY AND IMPLEMENTATION All data, as well as code used for analysis, is available from https://lab.dessimoz.org/20_not. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex Warwick Vesztrocy
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Department of Computer Science, University College London, London, WC1E 6BT, UK
- Centre for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
27
|
Bouziane H, Chouarfia A. Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment. J Integr Bioinform 2020; 18:51-79. [PMID: 32598314 PMCID: PMC8035964 DOI: 10.1515/jib-2019-0091] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 04/08/2020] [Indexed: 12/31/2022] Open
Abstract
To date, many proteins generated by large-scale genome sequencing projects are still uncharacterized and subject to intensive investigations by both experimental and computational means. Knowledge of protein subcellular localization (SCL) is of key importance for protein function elucidation. However, it remains a challenging task, especially for multiple sites proteins known to shuttle between cell compartments to perform their proper biological functions and proteins which do not have significant homology to proteins of known subcellular locations. Due to their low-cost and reasonable accuracy, machine learning-based methods have gained much attention in this context with the availability of a plethora of biological databases and annotated proteins for analysis and benchmarking. Various predictive models have been proposed to tackle the SCL problem, using different protein sequence features pertaining to the subcellular localization, however, the overwhelming majority of them focuses on single localization and cover very limited cellular locations. The prediction was basically established on sorting signals, amino acids compositions, and homology. To improve the prediction quality, focus is actually on knowledge information extracted from annotation databases, such as protein-protein interactions and Gene Ontology (GO) functional domains annotation which has been recently a widely adopted and essential information for learning systems. To deal with such problem, in the present study, we considered SCL prediction task as a multi-label learning problem and tried to label both single site and multiple sites unannotated bacterial protein sequences by mining proteins homology relationships using both GO terms of protein homologs and PSI-BLAST profiles. The experiments using 5-fold cross-validation tests on the benchmark datasets showed a significant improvement on the results obtained by the proposed consensus multi-label prediction model which discriminates six compartments for Gram-negative and five compartments for Gram-positive bacterial proteins.
Collapse
Affiliation(s)
- Hafida Bouziane
- Département d’Informatique, Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB BP 1505, El M’Naouer, 31000, Oran, Algeria
| | - Abdallah Chouarfia
- Département d’Informatique, Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB BP 1505, El M’Naouer, 31000, Oran, Algeria
| |
Collapse
|
28
|
Shaw D, Chen H, Jiang T. DeepIsoFun: a deep domain adaptation approach to predict isoform functions. Bioinformatics 2020; 35:2535-2544. [PMID: 30535380 DOI: 10.1093/bioinformatics/bty1017] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/07/2018] [Accepted: 12/08/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Isoforms are mRNAs produced from the same gene locus by alternative splicing and may have different functions. Although gene functions have been studied extensively, little is known about the specific functions of isoforms. Recently, some computational approaches based on multiple instance learning have been proposed to predict isoform functions from annotated gene functions and expression data, but their performance is far from being desirable primarily due to the lack of labeled training data. To improve the performance on this problem, we propose a novel deep learning method, DeepIsoFun, that combines multiple instance learning with domain adaptation. The latter technique helps to transfer the knowledge of gene functions to the prediction of isoform functions and provides additional labeled training data. Our model is trained on a deep neural network architecture so that it can adapt to different expression distributions associated with different gene ontology terms. RESULTS We evaluated the performance of DeepIsoFun on three expression datasets of human and mouse collected from SRA studies at different times. On each dataset, DeepIsoFun performed significantly better than the existing methods. In terms of area under the receiver operating characteristics curve, our method acquired at least 26% improvement and in terms of area under the precision-recall curve, it acquired at least 10% improvement over the state-of-the-art methods. In addition, we also study the divergence of the functions predicted by our method for isoforms from the same gene and the overall correlation between expression similarity and the similarity of predicted functions. AVAILABILITY AND IMPLEMENTATION https://github.com/dls03/DeepIsoFun/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dipan Shaw
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA.,Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
29
|
Lock A, Rutherford K, Harris MA, Hayles J, Oliver SG, Bähler J, Wood V. PomBase 2018: user-driven reimplementation of the fission yeast database provides rapid and intuitive access to diverse, interconnected information. Nucleic Acids Res 2020; 47:D821-D827. [PMID: 30321395 PMCID: PMC6324063 DOI: 10.1093/nar/gky961] [Citation(s) in RCA: 115] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 10/11/2018] [Indexed: 12/16/2022] Open
Abstract
PomBase (www.pombase.org), the model organism database for the fission yeast Schizosaccharomyces pombe, has undergone a complete redevelopment, resulting in a more fully integrated, better-performing service. The new infrastructure supports daily data updates as well as fast, efficient querying and smoother navigation within and between pages. New pages for publications and genotypes provide routes to all data curated from a single source and to all phenotypes associated with a specific genotype, respectively. For ontology-based annotations, improved displays balance comprehensive data coverage with ease of use. The default view now uses ontology structure to provide a concise, non-redundant summary that can be expanded to reveal underlying details and metadata. The phenotype annotation display also offers filtering options to allow users to focus on specific areas of interest. An instance of the JBrowse genome browser has been integrated, facilitating loading of and intuitive access to, genome-scale datasets. Taken together, the new data and pages, along with improvements in annotation display and querying, allow users to probe connections among different types of data to form a comprehensive view of fission yeast biology. The new PomBase implementation also provides a rich set of modular, reusable tools that can be deployed to create new, or enhance existing, organism-specific databases.
Collapse
Affiliation(s)
- Antonia Lock
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Kim Rutherford
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Midori A Harris
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Jacqueline Hayles
- Cell Cycle Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - Stephen G Oliver
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Jürg Bähler
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Valerie Wood
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|
30
|
Cai Y, Wang J, Deng L. SDN2GO: An Integrated Deep Learning Model for Protein Function Prediction. Front Bioeng Biotechnol 2020; 8:391. [PMID: 32411695 PMCID: PMC7201018 DOI: 10.3389/fbioe.2020.00391] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 04/07/2020] [Indexed: 02/01/2023] Open
Abstract
The assignment of function to proteins at a large scale is essential for understanding the molecular mechanism of life. However, only a very small percentage of the more than 179 million proteins in UniProtKB have Gene Ontology (GO) annotations supported by experimental evidence. In this paper, we proposed an integrated deep-learning-based classification model, named SDN2GO, to predict protein functions. SDN2GO applies convolutional neural networks to learn and extract features from sequences, protein domains, and known PPI networks, and then utilizes a weight classifier to integrate these features and achieve accurate predictions of GO terms. We constructed the training set and the independent test set according to the time-delayed principle of the Critical Assessment of Function Annotation (CAFA) and compared it with two highly competitive methods and the classic BLAST method on the independent test set. The results show that our method outperforms others on each sub-ontology of GO. We also investigated the performance of using protein domain information. We learned from the Natural Language Processing (NLP) to process domain information and pre-trained a deep learning sub-model to extract the comprehensive features of domains. The experimental results demonstrate that the domain features we obtained are much improved the performance of our model. Our deep learning models together with the data pre-processing scripts are publicly available as an open source software at https://github.com/Charrick/SDN2GO.
Collapse
Affiliation(s)
- Yideng Cai
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Jiacheng Wang
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, China
- School of Software, Xinjiang University, Urumqi, China
| |
Collapse
|
31
|
Nam UH, Kim JO, Kim JH. De novo transcriptome sequencing and analysis of Anisakis pegreffii (Nematoda: Anisakidae) third-stage and fourth stage larvae. J Nematol 2020; 52:1-16. [PMID: 32298057 PMCID: PMC7266050 DOI: 10.21307/jofnem-2020-041] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Indexed: 01/07/2023] Open
Abstract
Anisakis pegreffii is known as one of the causes of a fish-borne zoonosis, anisakidosis. Despite its significant public health and food hygiene impacts, little is known of the pathogenesis, genetic background of this parasite, at least partly due to the lack of genome and transcriptome information. In this study, RNA-seq and de novo assembly were conducted to obtain transcriptome profiles of the A. pegreffii third and fourth larvae. The third stage larvae (APL3) were collected from chub mackerel and the fourth stage larvae (APL4) were obtained by in vitro culture. In total, 47,243 and 43,660 unigenes were expressed in APL3 and APL4 transcriptomes. Of them, 18,753 were known and 28,490 were novel for APL3, while 18,996 were known and 24,664 were novel for APL4. The most abundantly expressed genes in APL3 were mitochondrial enzymes (COI, COII, COIII) and polyubiquitins (UBB, UBIQP_XENLA). Collagen-related genes (col-145, col-34, col-138, Bm1_54705, col-40) were the most abundantly expressed in APL4. Mitochondrial enzyme genes (COIII, COI) were also highly expressed in APL4. Among the transcripts, 614 were up-regulated in APL3, while 1,309 were up-regulated in APL4. Several protease and protein biosynthesis-related genes were highly expressed in APL3, all of which are thought to be crucial for invading host tissues. Collagen synthesis-related genes were highly expressed in APL4, reflecting active biosynthesis of collagens occurs during moulting process of APL4. Of these differentially expressed genes, several genes (SI, nas-13, EF-TSMT, SFXN2, dhs-27) were validated to highly transcribed in APL3, while other genes (col-40, F09E10.7, pept-1, col-34, VIT) in APL4. The biological roles of these genes in vivo will be deciphered when the reference genome sequences are available, together with in vitro experiments.
Collapse
Affiliation(s)
- U-Hwa Nam
- Department of Marine Bioscience, College of Life Science, Gangneung-Wonju National University , Gangneung, 25457, Korea
| | - Jong-Oh Kim
- Institute of Marine Biotechnology, Pukyong National University , Busan, 48513, Korea
| | - Jeong-Ho Kim
- Department of Marine Bioscience, College of Life Science, Gangneung-Wonju National University , Gangneung, 25457, Korea
| |
Collapse
|
32
|
Mei S, Zhang K. In silico unravelling pathogen-host signaling cross-talks via pathogen mimicry and human protein-protein interaction networks. Comput Struct Biotechnol J 2019; 18:100-113. [PMID: 31956393 PMCID: PMC6956678 DOI: 10.1016/j.csbj.2019.12.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/07/2019] [Accepted: 12/14/2019] [Indexed: 01/08/2023] Open
Abstract
Pathogen-host protein interactions are fundamental for pathogens to manipulate host signaling pathways and subvert host immune defense. For most pathogens, very few or no experimental studies have been conducted to investigate their signaling cross-talks with host. In this study, we propose a computational framework to validate the biological assumption that human protein-protein interaction (PPI) networks alone are sufficient to infer pathogen-host PPIs via pathogen functional mimicry. Pathogen functional mimicry assumes that a pathogen functionally mimics and substitutes host counterpart proteins in order for the pathogen to get involved in or hijack the host cellular processes. Through pathogen functional mimicry defined via gene ontology (GO) semantic similarity, we first use the known human PPIs as templates to infer pathogen-host PPIs, and the PPIs are further used as training data to build an l2-regularized logistic regression model for novel pathogen-host PPI prediction. Independent tests on the experimental data from human immunodeficiency virus and Francisella tularensis validate the effectiveness of the proposed pathogen functional mimicry technique. Performance comparisons also show that the proposed technique y excels the existing pathogen sequence mimicry approaches and transfer learning methods. The proposed framework provides a new avenue to study the experimentally less-studied pathogens in the worst scenarios that very few or no experimental pathogen-host PPIs are available. As two case studies, we apply the proposed framework to Salmonella typhimurium and Human respiratory syncytial virus to reconstruct the pathogen-host PPI networks and further investigate the interference of these two pathogens with human immune signaling and transcription regulatory system.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang 110034, China
| | - Kun Zhang
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
| |
Collapse
|
33
|
Lan J, Zhang R, Yu H, Wang J, Xue W, Chen J, Lin S, Wang Y, Xie Z, Jiang S. Quantitative Proteomic Analysis Uncovers the Mediation of Endoplasmic Reticulum Stress-Induced Autophagy in DHAV-1-Infected DEF Cells. Int J Mol Sci 2019; 20:ijms20246160. [PMID: 31817666 PMCID: PMC6940786 DOI: 10.3390/ijms20246160] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Revised: 12/03/2019] [Accepted: 12/04/2019] [Indexed: 12/11/2022] Open
Abstract
Autophagy is a tightly regulated catabolic process and is activated in cells in response to stress signals. Despite extensive study, the interplay between duck hepatitis A virus type 1 (DHAV-1) and the autophagy of host cells is not clear. In this study, we applied proteomics analysis to investigate the interaction mechanism between DHAV-1 and duck embryo fibroblast (DEF) cells. In total, 507 differentially expressed proteins (DEPs) were identified, with 171 upregulated proteins and 336 downregulated proteins. The protein expression level of heat shock proteins (Hsps) and their response to stimulus proteins and zinc finger proteins (ZFPs) were significantly increased while the same aspects of ribosome proteins declined. Bioinformatics analysis indicated that DEPs were mainly involved in the “response to stimulus”, the “defense response to virus”, and the “phagosome pathway”. Furthermore, Western blot results showed that the conversion of microtubule-associated protein 1 light chain 3-I (LC3-I) to the lipidation form of LC3-II increased, and the conversion rate decreased when DEF cells were processed with 4-phenylbutyrate (4-PBA). These findings indicated that DHAV-1 infection could cause endoplasmic reticulum (ER) stress-induced autophagy in DEF cells, and that ER stress was an important regulatory factor in the activation of autophagy. Our data provide a new clue regarding the host cell response to DHAV-1 and identify proteins involved in the DHAV-1 infection process or the ER stress-induced autophagy process.
Collapse
Affiliation(s)
- Jingjing Lan
- College of Veterinary Medicine, Shandong Agricultural University, Taian 271000, China; (J.L.); (R.Z.); (H.Y.); (J.W.); (W.X.); (J.C.); (Z.X.)
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, Taian 271000, China
| | - Ruihua Zhang
- College of Veterinary Medicine, Shandong Agricultural University, Taian 271000, China; (J.L.); (R.Z.); (H.Y.); (J.W.); (W.X.); (J.C.); (Z.X.)
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, Taian 271000, China
| | - Honglei Yu
- College of Veterinary Medicine, Shandong Agricultural University, Taian 271000, China; (J.L.); (R.Z.); (H.Y.); (J.W.); (W.X.); (J.C.); (Z.X.)
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, Taian 271000, China
| | - Jingyu Wang
- College of Veterinary Medicine, Shandong Agricultural University, Taian 271000, China; (J.L.); (R.Z.); (H.Y.); (J.W.); (W.X.); (J.C.); (Z.X.)
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, Taian 271000, China
| | - Wenxiang Xue
- College of Veterinary Medicine, Shandong Agricultural University, Taian 271000, China; (J.L.); (R.Z.); (H.Y.); (J.W.); (W.X.); (J.C.); (Z.X.)
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, Taian 271000, China
| | - Junhao Chen
- College of Veterinary Medicine, Shandong Agricultural University, Taian 271000, China; (J.L.); (R.Z.); (H.Y.); (J.W.); (W.X.); (J.C.); (Z.X.)
- College of Public Health and Management, Weifang Medical University, Weifang 261042, China
| | - Shaoli Lin
- Molecular Virology Laboratory, VA-MD College of Veterinary Medicine and Maryland Pathogen Research Institute, University of Maryland, College Park, MD 20742, USA;
| | - Yu Wang
- Department of Basic Medical Sciences, Taishan Medical College, Taian 271000, China;
| | - Zhijing Xie
- College of Veterinary Medicine, Shandong Agricultural University, Taian 271000, China; (J.L.); (R.Z.); (H.Y.); (J.W.); (W.X.); (J.C.); (Z.X.)
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, Taian 271000, China
| | - Shijin Jiang
- College of Veterinary Medicine, Shandong Agricultural University, Taian 271000, China; (J.L.); (R.Z.); (H.Y.); (J.W.); (W.X.); (J.C.); (Z.X.)
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, Taian 271000, China
- Correspondence: ; Tel.: +86-538-8245799
| |
Collapse
|
34
|
Zheng N, Wang K, Zhan W, Deng L. Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches. Curr Drug Metab 2019; 20:177-184. [PMID: 30156155 DOI: 10.2174/1389200219666180829121038] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/21/2018] [Accepted: 08/02/2018] [Indexed: 01/15/2023]
Abstract
BACKGROUND Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions. METHODS In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods. RESULTS We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions. CONCLUSION The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.
Collapse
Affiliation(s)
- Nantao Zheng
- School of Software, Central South University, Changsha, 410075, China
| | - Kairou Wang
- School of Software, Central South University, Changsha, 410075, China
| | - Weihua Zhan
- School of Electronics and Computer Science, Zhejiang Wanli University, Ningbo 315100, China
| | - Lei Deng
- School of Software, Central South University, Changsha, 410075, China.,Shanghai Key Lab of Intelligent Information Processing, Shanghai 200433, China
| |
Collapse
|
35
|
Hu L, Yuan X, Liu X, Xiong S, Luo X. Efficiently Detecting Protein Complexes from Protein Interaction Networks via Alternating Direction Method of Multipliers. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1922-1935. [PMID: 29994334 DOI: 10.1109/tcbb.2018.2844256] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Protein complexes are crucial in improving our understanding of the mechanisms employed by proteins. Various computational algorithms have thus been proposed to detect protein complexes from protein interaction networks. However, given massive protein interactome data obtained by high-throughput technologies, existing algorithms, especially those with additionally consideration of biological information of proteins, either have low efficiency in performing their tasks or suffer from limited effectiveness. For addressing this issue, this work proposes to detect protein complexes from a protein interaction network with high efficiency and effectiveness. To do so, the original detection task is first formulated into an optimization problem according to the intuitive properties of protein complexes. After that, the framework of alternating direction method of multipliers is applied to decompose this optimization problem into several subtasks, which can be subsequently solved in a separate and parallel manner. An algorithm for implementing this solution is then developed. Experimental results on five large protein interaction networks demonstrated that compared to state-of-the-art protein complex detection algorithms, our algorithm outperformed them in terms of both effectiveness and efficiency. Moreover, as number of parallel processes increases, one can expect an even higher computational efficiency for the proposed algorithm with no compromise on effectiveness.
Collapse
|
36
|
A Computational Framework for Predicting Direct Contacts and Substructures within Protein Complexes. Biomolecules 2019; 9:biom9110656. [PMID: 31717703 PMCID: PMC6921016 DOI: 10.3390/biom9110656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/20/2019] [Accepted: 10/23/2019] [Indexed: 11/17/2022] Open
Abstract
Understanding the physical arrangement of subunits within protein complexes potentially provides valuable clues about how the subunits work together and how the complexes function. The majority of recent research focuses on identifying protein complexes as a whole and seldom studies the inner structures within complexes. In this study, we propose a computational framework to predict direct contacts and substructures within protein complexes. In this framework, we first train a supervised learning model of l2-regularized logistic regression to learn the patterns of direct and indirect interactions within complexes, from where physical subunit interaction networks are predicted. Then, to infer substructures within complexes, we apply a graph clustering method (i.e., maximum modularity clustering (MMC)) and a gene ontology (GO) semantic similarity based functional clustering on partially- and fully-connected networks, respectively. Computational results show that the proposed framework achieves fairly good performance of cross validation and independent test in terms of detecting direct contacts between subunits. Functional analyses further demonstrate the rationality of partitioning the subunits into substructures via the MMC algorithm and functional clustering.
Collapse
|
37
|
Mei S, Zhang K. Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein-Protein Interaction Networks. Int J Mol Sci 2019; 20:ijms20205075. [PMID: 31614890 PMCID: PMC6829266 DOI: 10.3390/ijms20205075] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 10/11/2019] [Indexed: 12/11/2022] Open
Abstract
Rapid reconstruction of genome-scale protein-protein interaction (PPI) networks is instrumental in understanding the cellular processes and disease pathogenesis and drug reactions. However, lack of experimentally verified negative data (i.e., pairs of proteins that do not interact) is still a major issue that needs to be properly addressed in computational modeling. In this study, we take advantage of the very limited experimentally verified negative data from Negatome to infer more negative data for computational modeling. We assume that the paralogs or orthologs of two non-interacting proteins also do not interact with high probability. We coin an assumption as "Neglog" this assumption is to some extent supported by paralogous/orthologous structure conservation. To reduce the risk of bias toward the negative data from Negatome, we combine Neglog with less biased random sampling according to a certain ratio to construct training data. L2-regularized logistic regression is used as the base classifier to counteract noise and train on a large dataset. Computational results show that the proposed Neglog method outperforms pure random sampling method with sound biological interpretability. In addition, we find that independent test on negative data is indispensable for bias control, which is usually neglected by existing studies. Lastly, we use the Neglog method to validate the PPIs in STRING, which are supported by gene ontology (GO) enrichment analyses.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang 110034, China.
| | - Kun Zhang
- Bioinformatics facility of Xavier NIH RCMI Cancer Research Center, Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA.
| |
Collapse
|
38
|
Liberti J, Görner J, Welch M, Dosselli R, Schiøtt M, Ogawa Y, Castleden I, Hemmi JM, Baer-Imhoof B, Boomsma JJ, Baer B. Seminal fluid compromises visual perception in honeybee queens reducing their survival during additional mating flights. eLife 2019; 8:45009. [PMID: 31500699 PMCID: PMC6739865 DOI: 10.7554/elife.45009] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 08/05/2019] [Indexed: 12/18/2022] Open
Abstract
Queens of social insects make all mate-choice decisions on a single day, except in honeybees whose queens can conduct mating flights for several days even when already inseminated by a number of drones. Honeybees therefore appear to have a unique, evolutionarily derived form of sexual conflict: a queen’s decision to pursue risky additional mating flights is driven by later-life fitness gains from genetically more diverse worker-offspring but reduces paternity shares of the drones she already mated with. We used artificial insemination, RNA-sequencing and electroretinography to show that seminal fluid induces a decline in queen vision by perturbing the phototransduction pathway within 24–48 hr. Follow up field trials revealed that queens receiving seminal fluid flew two days earlier than sister queens inseminated with saline, and failed more often to return. These findings are consistent with seminal fluid components manipulating queen eyesight to reduce queen promiscuity across mating flights. For social insects like honeybees it is beneficial if their queens mate with many males, because genetic diversity can protect the hive against parasites. Early in life, a honeybee queen has a short period of time in which she can fly out to mate with males before returning to the hive with all the sperm needed to last for a lifetime. Queens that have mated on their first flight may embark on additional mating flights over a few consecutive days to further increase genetic variability in their offspring. This is problematic for a male that has already mated because the more males that inseminate the queen the fewer offspring will carry on his specific genes. This results in sexual conflict between males and queens over the number of mating flights. In many animals, males manipulate females using molecules in seminal fluid to reduce the chances of the female mating again and honeybee males may use a similar strategy. Previous studies revealed that insemination alters the activity of genes related to vision in a honeybee queen’s brain. This could be one way for the males to prevent queens from embarking on additional mating flights. Now, Liberti et al. find support for this idea by showing that seminal fluid can indeed trigger changes in the activity of vision-related genes in the brains of honeybee queens, which in turn reduce a queen’s opportunity to complete additional mating flights. Queens inseminated with seminal fluid were less responsive to light compared to queens that were exposed to saline instead. Electronic tracking devices affixed to queens showed that the seminal fluid-exposed queens left for mating flights sooner but were more likely to get lost and to not return to their hives compared to the saline-exposed queens. The experiments support the idea of a sexual arms race in honeybees. Males use seminal fluid to cause rapid deteriorating vision in queens, thus reducing their likelihood of leaving the hive to mate again and to find males when they do fly again. The queens try to counteract these effects by leaving for mating flights sooner, thereby increasing offspring genetic diversity and the success of their colonies. Further studies will be needed to find out how the honeybee sexual arms race varies across seasons, bee races, and geographic ranges. Such information will be useful for honeybee breeding programs, which rely on queen mating success and hive genetic diversity to ensure hive health.
Collapse
Affiliation(s)
- Joanito Liberti
- Centre for Social Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Julia Görner
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Australia
| | - Mat Welch
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Australia
| | - Ryan Dosselli
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Australia.,Centre for Evolutionary Biology, School of Biological Sciences, The University of Western Australia, Crawley, Australia
| | - Morten Schiøtt
- Centre for Social Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Yuri Ogawa
- School of Animal Biology and UWA Oceans Institute, The University of Western Australia, Crawley, Australia
| | - Ian Castleden
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Australia
| | - Jan M Hemmi
- School of Animal Biology and UWA Oceans Institute, The University of Western Australia, Crawley, Australia
| | - Barbara Baer-Imhoof
- Centre for Integrative Bee Research (CIBER), Department of Entomology, University of California, Riverside, Riverside, United States
| | - Jacobus J Boomsma
- Centre for Social Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Boris Baer
- Centre for Integrative Bee Research (CIBER), Department of Entomology, University of California, Riverside, Riverside, United States
| |
Collapse
|
39
|
A Multi-Label Learning Framework for Drug Repurposing. Pharmaceutics 2019; 11:pharmaceutics11090466. [PMID: 31505805 PMCID: PMC6781509 DOI: 10.3390/pharmaceutics11090466] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Revised: 08/22/2019] [Accepted: 09/05/2019] [Indexed: 01/10/2023] Open
Abstract
Drug repurposing plays an important role in screening old drugs for new therapeutic efficacy. The existing methods commonly treat prediction of drug-target interaction as a problem of binary classification, in which a large number of randomly sampled drug-target pairs accounting for over 50% of the entire training dataset are necessarily required. Such a large number of negative examples that do not come from experimental observations inevitably decrease the credibility of predictions. In this study, we propose a multi-label learning framework to find new uses for old drugs and discover new drugs for known target genes. In the framework, each drug is treated as a class label and its target genes are treated as the class-specific training data to train a supervised learning model of l2-regularized logistic regression. As such, the inter-drug associations are explicitly modelled into the framework and all the class-specific training data come from experimental observations. In addition, the data constraint is less demanding, for instance, the chemical substructures of a drug are no longer needed and the novel target genes are inferred only from the underlying patterns of the known genes targeted by the drug. Stratified multi-label cross-validation shows that 84.9% of known target genes have at least one drug correctly recognized, and the proposed framework correctly recognizes 86.73% of the independent test drug-target interactions (DTIs) from DrugBank. These results show that the proposed framework could generalize well in the large drug/class space without the information of drug chemical structures and target protein structures. Furthermore, we use the trained model to predict new drugs for the known target genes, identify new genes for the old drugs, and infer new associations between old drugs and new disease phenotypes via the OMIM database. Gene ontology (GO) enrichment analyses and the disease associations reported in recent literature provide supporting evidences to the computational results, which potentially shed light on new clinical therapies for new and/or old disease phenotypes.
Collapse
|
40
|
Hinderer EW, Flight RM, Dubey R, MacLeod JN, Moseley HNB. Advances in gene ontology utilization improve statistical power of annotation enrichment. PLoS One 2019; 14:e0220728. [PMID: 31415589 PMCID: PMC6695228 DOI: 10.1371/journal.pone.0220728] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 07/22/2019] [Indexed: 01/08/2023] Open
Abstract
Gene-annotation enrichment is a common method for utilizing ontology-based annotations in gene and gene-product centric knowledgebases. Effective utilization of these annotations requires inferring semantic linkages by tracing paths through edges in the ontological graph, referred to as relations. However, some relations are semantically problematic with respect to scope, necessitating their omission or modification lest erroneous term mappings occur. To address these issues, we created the Gene Ontology Categorization Suite, or GOcats-a novel tool that organizes the Gene Ontology into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. Here, we demonstrate the improvements in annotation enrichment by re-interpreting edges that would otherwise be omitted by traditional ancestor path-tracing methods. Specifically, we show that GOcats' unique handling of relations improves enrichment over conventional methods in the analysis of two different gene-expression datasets: a breast cancer microarray dataset and several horse cartilage development RNAseq datasets. With the breast cancer microarray dataset, we observed significant improvement (one-sided binomial test p-value = 1.86E-25) in 182 of 217 significantly enriched GO terms identified from the conventional path traversal method when GOcats' path traversal was used. We also found new significantly enriched terms using GOcats, whose biological relevancy has been experimentally demonstrated elsewhere. Likewise, on the horse RNAseq datasets, we observed a significant improvement in GO term enrichment when using GOcat's path traversal: one-sided binomial test p-values range from 1.32E-03 to 2.58E-44.
Collapse
Affiliation(s)
- Eugene W. Hinderer
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, KY, United States of America
| | - Robert M. Flight
- Markey Cancer Center, University of Kentucky, Lexington, KY, United States of America
| | - Rashmi Dubey
- Maxwell H. Gluck Equine Research Center, University of Kentucky, Lexington, KY, United States of America
- Department of Veterinary Science, University of Kentucky, Lexington, KY, United States of America
| | - James N. MacLeod
- Maxwell H. Gluck Equine Research Center, University of Kentucky, Lexington, KY, United States of America
- Department of Veterinary Science, University of Kentucky, Lexington, KY, United States of America
| | - Hunter N. B. Moseley
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, KY, United States of America
- Markey Cancer Center, University of Kentucky, Lexington, KY, United States of America
- Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, United States of America
| |
Collapse
|
41
|
Cho DH, Park CI. RNA-seq data for olive flounder (Paralichthys olivaceus) according to water temperature. Data Brief 2019; 25:104384. [PMID: 31489357 PMCID: PMC6717212 DOI: 10.1016/j.dib.2019.104384] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 08/02/2019] [Accepted: 08/02/2019] [Indexed: 11/24/2022] Open
Abstract
We provide raw data from a transcriptomic analysis of olive flounder in response to changes in water temperature. At the time of this analysis, the olive flounder genome was not yet available in China, and there were no related references. Therefore, assembly was carried out using the de novo method to reveal the entire nucleotide sequence based on the nucleotide sequence information of the sequenced reads. The functions of expressed genes based on Gene Ontology analysis are also categorized and presented.
Collapse
Affiliation(s)
- Dong-Hee Cho
- Institute of Marine Industry, College of Marine Science, Gyeongsang National University, 455, Tongyeong 650-160, Republic of Korea
| | - Chan-Il Park
- Institute of Marine Industry, College of Marine Science, Gyeongsang National University, 455, Tongyeong 650-160, Republic of Korea
| |
Collapse
|
42
|
Morris BJ, Willcox BJ, Donlon TA. Genetic and epigenetic regulation of human aging and longevity. Biochim Biophys Acta Mol Basis Dis 2019; 1865:1718-1744. [PMID: 31109447 PMCID: PMC7295568 DOI: 10.1016/j.bbadis.2018.08.039] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Revised: 08/02/2018] [Accepted: 08/28/2018] [Indexed: 02/06/2023]
Abstract
Here we summarize the latest data on genetic and epigenetic contributions to human aging and longevity. Whereas environmental and lifestyle factors are important at younger ages, the contribution of genetics appears more important in reaching extreme old age. Genome-wide studies have implicated ~57 gene loci in lifespan. Epigenomic changes during aging profoundly affect cellular function and stress resistance. Dysregulation of transcriptional and chromatin networks is likely a crucial component of aging. Large-scale bioinformatic analyses have revealed involvement of numerous interaction networks. As the young well-differentiated cell replicates into eventual senescence there is drift in the highly regulated chromatin marks towards an entropic middle-ground between repressed and active, such that genes that were previously inactive "leak". There is a breakdown in chromatin connectivity such that topologically associated domains and their insulators weaken, and well-defined blocks of constitutive heterochromatin give way to generalized, senescence-associated heterochromatin, foci. Together, these phenomena contribute to aging.
Collapse
Affiliation(s)
- Brian J Morris
- Basic & Clinical Genomics Laboratory, School of Medical Sciences and Bosch Institute, University of Sydney, New South Wales 2006, Australia; Honolulu Heart Program (HHP)/Honolulu-Asia Aging Study (HAAS), Department of Research, Kuakini Medical Center, Honolulu, HI 96817, United States; Department of Geriatric Medicine, John A. Burns School of Medicine, University of Hawaii, Kuakini Medical Center Campus, Honolulu, HI 96813, United States.
| | - Bradley J Willcox
- Honolulu Heart Program (HHP)/Honolulu-Asia Aging Study (HAAS), Department of Research, Kuakini Medical Center, Honolulu, HI 96817, United States; Department of Geriatric Medicine, John A. Burns School of Medicine, University of Hawaii, Kuakini Medical Center Campus, Honolulu, HI 96813, United States.
| | - Timothy A Donlon
- Honolulu Heart Program (HHP)/Honolulu-Asia Aging Study (HAAS), Department of Research, Kuakini Medical Center, Honolulu, HI 96817, United States; Departments of Cell & Molecular Biology and Pathology, John A. Burns School of Medicine, University of Hawaii, Honolulu, HI 96813, United States.
| |
Collapse
|
43
|
Zhang F, Zhang Y, Lv X, Xu B, Zhang H, Yan J, Li H, Wu L. Evolution of an X-Linked miRNA Family Predominantly Expressed in Mammalian Male Germ Cells. Mol Biol Evol 2019; 36:663-678. [PMID: 30649414 PMCID: PMC6445303 DOI: 10.1093/molbev/msz001] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
MicroRNAs (miRNAs) are important posttranscriptional regulators of gene expression. However, comprehensive expression profiles of miRNAs during mammalian spermatogenesis are lacking. Herein, we sequenced small RNAs in highly purified mouse spermatogenic cells at different stages. We found that a family of X-linked miRNAs named spermatogenesis-related miRNAs (spermiRs) is predominantly expressed in the early meiotic phases and has a conserved testis-specific high expression pattern in different mammals. We identified one spermiR homolog in opossum; this homolog might originate from THER1, a retrotransposon that is active in marsupials but extinct in current placental mammals. SpermiRs have expanded rapidly with mammalian evolution and are diverged into two clades, spermiR-L and spermiR-R, which are likely to have been generated at least in part by tandem duplication mediated by flanking retrotransposable elements. Notably, despite having undergone highly frequent lineage-specific duplication events, the sequences encoding all spermiR family members are strictly located between two protein-coding genes, Slitrk2 and Fmr1. Moreover, spermiR-Ls and spermiR-Rs have evolved different expression patterns during spermatogenesis in different mammals. Intriguingly, the seed sequences of spermiRs, which are critical for the recognition of target genes, are highly divergent within and among mammals, whereas spermiR target genes largely overlap. When miR-741, the most highly expressed spermiR, is knocked out in cultured mouse spermatogonial stem cells (SSCs), another spermiR, miR-465a-5p, is dramatically upregulated and becomes the most abundant miRNA. Notably, miR-741−/− SSCs grow normally, and the genome-wide expression levels of mRNAs remain unchanged. All these observations indicate functional compensation between spermiR family members and strong coevolution between spermiRs and their targets.
Collapse
Affiliation(s)
- Fengjuan Zhang
- State Key Laboratory of Molecular Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China.,Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Ying Zhang
- State Key Laboratory of Molecular Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China.,Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Xiaolong Lv
- State Key Laboratory of Molecular Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China.,Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Beiying Xu
- State Key Laboratory of Molecular Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China.,Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Hongdao Zhang
- State Key Laboratory of Molecular Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China.,Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Jun Yan
- Institute of Neuroscience, State Key Laboratory of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Haipeng Li
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ligang Wu
- State Key Laboratory of Molecular Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China.,Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
44
|
Global Ubiquitome Profiling Revealed the Roles of Ubiquitinated Proteins in Metabolic Pathways of Tea Leaves in Responding to Drought Stress. Sci Rep 2019; 9:4286. [PMID: 30862833 PMCID: PMC6414630 DOI: 10.1038/s41598-019-41041-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 02/26/2019] [Indexed: 01/07/2023] Open
Abstract
Drought stress often affects the expression of genes and proteins in tea plants. However, the global profiling of ubiquitinated (Kub) proteins in tea plants remains unearthed. Here, we performed the ubiquitome in tea leaves under drought stress using antibody-based affinity enrichment coupled with LC-MS/MS analysis. In total, 1,409 lysine Kub sites in 781 proteins were identified, of which 14 sites in 12 proteins were up-regulated and 123 sites in 91 proteins down-regulated under drought stress. The identified Kub proteins were mainly located in the cytosol (31%), chloroplast (27%) and nuclear (19%). Moreover, 5 conserved motifs in EKub, EXXXKub, KubD, KubE and KubA were extracted. Several Kub sites in ubiquitin-mediated proteolysis-related proteins, including RGLG2, UBC36, UEV1D, RPN10 and PSMC2, might affect protein degradation and DNA repair. Plenty of Kub proteins related to catechins biosynthesis, including PAL, CHS, CHI and F3H, were positively correlated with each other due to their co-expression and co-localization. Furthermore, some Kub proteins involved in carbohydrate and amino acid metabolism, including FBPase, FBA and GAD1, might promote sucrose, fructose and GABA accumulation in tea leaves under drought stress. Our study preliminarily revealed the global profiling of Kub proteins in metabolic pathways and provided an important resource for further study on the functions of Kub proteins in tea plants.
Collapse
|
45
|
Chatterjee B, Thakur SS. Single-Run Mass Spectrometry Analysis Provides Deep Insight into E. coli Proteome. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2018; 29:2394-2401. [PMID: 30259409 DOI: 10.1007/s13361-018-2066-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Revised: 09/02/2018] [Accepted: 09/06/2018] [Indexed: 06/08/2023]
Abstract
Single-run mass spectrometry has enabled the detection and quantifications of E. coli proteins. A total of 2068 proteins quantified by intensity-based absolute quantification (iBAQ) Schwanhäusser et al.: (Nature. 473, 337-342, 2011) procedure were obtained with single enzyme-trypsin, without pre-fractionation, by quadruplicate long liquid chromatography runs coupled with high-resolution linear trap quadrupole (LTQ)-Orbitrap Velos mass spectrometry. The single-run of 12 h has ability to cover almost 98% of the quadruplicate LC-MS/MS runs of E. coli proteome and is therefore almost equivalent to quadruplicate LC-MS/MS runs. These quantified proteins are about 52% of the total proteins present in E. coli genome according to Uniprot database. The quantified proteins covered almost all of the proteins in folate biosynthesis. Remarkably greater part of Gene Ontology (GO) Barrell et al.: (Nucleic Acids Res. 37, D396-D403, 2009), Ashburner et al.: (Nat. Genet. 25, 25-29, 2000) annotations, signaling pathways along with protein-protein interactions were covered. Some of the important biological processes-cell cycle, DNA repair, ion transport, ubiquinone biosynthetic process, pseudouridine synthesis, peptidoglycan biosynthetic process, RNA processing, and translation-revealed protein-protein interaction network generated by Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) Jensen, et al.:(Nucleic Acids Res 37, D412-D126, 2009) database. Therefore, to achieve the saturation point of detection of maximum number of proteins in single LC-MS/MS run, 12-h liquid chromatography gradient is appropriate. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Bhaswati Chatterjee
- National Institute of Pharmaceutical Education and Research (NIPER), NIPER-Hyderabad, (Dept. of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India), Balanagar, Hyderabad, Telangana, 500 037, India.
| | - Suman S Thakur
- Centre for Cellular and Molecular Biology, Proteomics and Cell Signaling, Lab E409, Uppal Road, Hyderabad, 500007, India.
| |
Collapse
|
46
|
Kim S, Wyckoff J, Morris AT, Succop A, Avery A, Duncan GE, Jazwinski SM. DNA methylation associated with healthy aging of elderly twins. GeroScience 2018; 40:469-484. [PMID: 30136078 PMCID: PMC6294724 DOI: 10.1007/s11357-018-0040-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 08/09/2018] [Indexed: 12/21/2022] Open
Abstract
Variation in healthy aging and lifespan is ascribed more to various non-genetic factors than to inherited genetic determinants, and a major goal in aging research is to reveal the epigenetic basis of aging. One approach to this goal is to find genomic sites or regions where DNA methylation correlates with biological age. Using health data from 134 elderly twins, we calculated a frailty index as a quantitative indicator of biological age, and by applying the Infinium HumanMethylation450K BeadChip technology to their leukocyte DNA samples, we obtained quantitative DNA methylation data on genome-wide CpG sites. We analyzed the health and epigenome data by taking two independent associative approaches: the parametric regression-based approach and a non-parametric machine learning approach followed by GO ontology analysis. Our results indicate that DNA methylation at CpG sites in the promoter region of PCDHGA3 is associated with biological age. PCDHGA3 belongs to clustered protocadherin genes, which are all located in a single locus on chromosome 5 in human. Previous studies of the clustered protocadherin genes showed that (1) DNA methylation is associated with age or age-related phenotypes; (2) DNA methylation can modulate gene expression; (3) dysregulated gene expression is associated with various pathologies; and (4) DNA methylation patterns at this locus are associated with adverse lifetime experiences. All these observations suggest that DNA methylation at the clustered protocadherin genes, including PCDHGA3, is a key mediator of healthy aging.
Collapse
Affiliation(s)
- Sangkyu Kim
- Tulane Center for Aging and Department of Medicine, Tulane University Health Sciences Center, New Orleans, LA, 70112, USA.
| | - Jennifer Wyckoff
- Tulane Center for Aging and Department of Medicine, Tulane University Health Sciences Center, New Orleans, LA, 70112, USA
| | - Anne-T Morris
- Virginia Commonwealth University, Mid-Atlantic Twin Registry, Richmond, VA, USA
| | | | - Ally Avery
- University of Washington Twin Registry, Seattle, WA, USA
- Washington State Twin Registry, Washington State University - Health Sciences Spokane, Spokane, WA, USA
| | - Glen E Duncan
- University of Washington Twin Registry, Seattle, WA, USA
- Washington State Twin Registry, Washington State University - Health Sciences Spokane, Spokane, WA, USA
| | - S Michal Jazwinski
- Tulane Center for Aging and Department of Medicine, Tulane University Health Sciences Center, New Orleans, LA, 70112, USA
| |
Collapse
|
47
|
Kim JH, Kim JO, Jeon CH, Nam UH, Subramaniyam S, Yoo SI, Park JH. Comparative transcriptome analyses of the third and fourth stage larvae of Anisakis simplex (Nematoda: Anisakidae). Mol Biochem Parasitol 2018; 226:24-33. [DOI: 10.1016/j.molbiopara.2018.10.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2017] [Revised: 09/16/2018] [Accepted: 10/22/2018] [Indexed: 01/02/2023]
|
48
|
Venkatesan A, Tagny Ngompe G, Hassouni NE, Chentli I, Guignon V, Jonquet C, Ruiz M, Larmande P. Agronomic Linked Data (AgroLD): A knowledge-based system to enable integrative biology in agronomy. PLoS One 2018; 13:e0198270. [PMID: 30500839 PMCID: PMC6269127 DOI: 10.1371/journal.pone.0198270] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 09/03/2018] [Indexed: 12/22/2022] Open
Abstract
Recent advances in high-throughput technologies have resulted in a tremendous increase in the amount of omics data produced in plant science. This increase, in conjunction with the heterogeneity and variability of the data, presents a major challenge to adopt an integrative research approach. We are facing an urgent need to effectively integrate and assimilate complementary datasets to understand the biological system as a whole. The Semantic Web offers technologies for the integration of heterogeneous data and their transformation into explicit knowledge thanks to ontologies. We have developed the Agronomic Linked Data (AgroLD- www.agrold.org), a knowledge-based system relying on Semantic Web technologies and exploiting standard domain ontologies, to integrate data about plant species of high interest for the plant science community e.g., rice, wheat, arabidopsis. We present some integration results of the project, which initially focused on genomics, proteomics and phenomics. AgroLD is now an RDF (Resource Description Format) knowledge base of 100M triples created by annotating and integrating more than 50 datasets coming from 10 data sources-such as Gramene.org and TropGeneDB-with 10 ontologies-such as the Gene Ontology and Plant Trait Ontology. Our evaluation results show users appreciate the multiple query modes which support different use cases. AgroLD's objective is to offer a domain specific knowledge platform to solve complex biological and agronomical questions related to the implication of genes/proteins in, for instances, plant disease resistance or high yield traits. We expect the resolution of these questions to facilitate the formulation of new scientific hypotheses to be validated with a knowledge-oriented approach.
Collapse
Affiliation(s)
- Aravind Venkatesan
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Gildas Tagny Ngompe
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Nordine El Hassouni
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- UMR AGAP, CIRAD, Montpellier, France
- South Green Bioinformatics Platform, Montpellier, France
| | - Imene Chentli
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Valentin Guignon
- South Green Bioinformatics Platform, Montpellier, France
- Bioversity International, Montpellier, France
| | - Clement Jonquet
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Manuel Ruiz
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- UMR AGAP, CIRAD, Montpellier, France
- South Green Bioinformatics Platform, Montpellier, France
- AGAP, Univ. of Montpellier, CIRAD, INRA, INRIA, SupAgro, Montpellier, France
| | - Pierre Larmande
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
- South Green Bioinformatics Platform, Montpellier, France
- DIADE, IRD, Univ. of Montpellier, Montpellier, France
| |
Collapse
|
49
|
Warwick Vesztrocy A, Dessimoz C, Redestig H. Prioritising candidate genes causing QTL using hierarchical orthologous groups. Bioinformatics 2018; 34:i612-i619. [PMID: 30423067 PMCID: PMC6129274 DOI: 10.1093/bioinformatics/bty615] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motivation A key goal in plant biotechnology applications is the identification of genes associated to particular phenotypic traits (for example: yield, fruit size, root length). Quantitative Trait Loci (QTL) studies identify genomic regions associated with a trait of interest. However, to infer potential causal genes in these regions, each of which can contain hundreds of genes, these data are usually intersected with prior functional knowledge of the genes. This process is however laborious, particularly if the experiment is performed in a non-model species, and the statistical significance of the inferred candidates is typically unknown. Results This paper introduces QTLSearch, a method and software tool to search for candidate causal genes in QTL studies by combining Gene Ontology annotations across many species, leveraging hierarchical orthologous groups. The usefulness of this approach is demonstrated by re-analysing two metabolic QTL studies: one in Arabidopsis thaliana, the other in Oryza sativa subsp. indica. Even after controlling for statistical significance, QTLSearch inferred potential causal genes for more QTL than BLAST-based functional propagation against UniProtKB/Swiss-Prot, and for more QTL than in the original studies. Availability and implementation QTLSearch is distributed under the LGPLv3 license. It is available to install from the Python Package Index (as qtlsearch), with the source available from https://bitbucket.org/alex-warwickvesztrocy/qtlsearch. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex Warwick Vesztrocy
- Department of Genetics, Evolution and Environment, University College London, London, UK
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Genetics, Evolution and Environment, University College London, London, UK
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Department of Computer Science, University College London, London, UK
- Centre for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | | |
Collapse
|
50
|
Thapa K, Wu KC, Sarma A, Grund EM, Szeto A, Mendez AJ, Gesta S, Vishnudas VK, Narain NR, Sarangarajan R. Dysregulation of the calcium handling protein, CCDC47, is associated with diabetic cardiomyopathy. Cell Biosci 2018; 8:45. [PMID: 30140426 PMCID: PMC6098598 DOI: 10.1186/s13578-018-0244-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Accepted: 08/11/2018] [Indexed: 11/10/2022] Open
Abstract
Background Diabetes mellitus is associated with an increased risk in diabetic cardiomyopathy (DCM) that is distinctly not attributed to co-morbidities with other vasculature diseases. To date, while dysregulation of calcium handling is a key hallmark in cardiomyopathy, studies have been inconsistent in the types of alterations involved. In this study human cardiomyocytes were exposed to an environmental nutritional perturbation of high glucose, fatty acids, and l-carnitine to model DCM and iTRAQ-coupled LC–MS/MS proteomic analysis was used to capture proteins affected by the perturbation. The proteins captured were then compared to proteins currently annotated in the cardiovascular disease (CVD) gene ontology (GO) database to identify proteins not previously described as being related to CVD. Subsequently, GO analysis for calcium regulating proteins and endoplasmic/sarcoplasmic reticulum (ER/SR) associated proteins was carried out. Results Here, we identified CCDC47 (calumin) as a unique calcium regulating protein altered in our in vitro nutritional perturbation model. The cellular and functional role of CCDC47 was then assessed in rat cardiomyocytes. In rat H9C2 myocytes, overexpression of CCDC47 resulted in increase in ionomycin-induced calcium release and reuptake. Of interest, in a diet-induced obese (DIO) rat model of DCM, CCDC47 mRNA expression was increased in the atrium and ventricle of the heart, but CCDC47 protein expression was significantly increased only in the atrium of DIO rats compared to lean control rats. Notably, no changes in ANP, BNP, or β-MHC were observed between DIO rats and lean control rats. Conclusions Together, our in vitro and in vivo studies demonstrate that CCDC47 is a unique calcium regulating protein that is associated with early onset hypertrophic cardiomyopathy. Electronic supplementary material The online version of this article (10.1186/s13578-018-0244-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Khampaseuth Thapa
- Berg, LLC, 500 Old Connecticut Path, Bldg B (3rd Floor), Framingham, MA 01701 USA
| | - Kai Connie Wu
- Berg, LLC, 500 Old Connecticut Path, Bldg B (3rd Floor), Framingham, MA 01701 USA
| | - Aishwarya Sarma
- Berg, LLC, 500 Old Connecticut Path, Bldg B (3rd Floor), Framingham, MA 01701 USA
| | - Eric M Grund
- Berg, LLC, 500 Old Connecticut Path, Bldg B (3rd Floor), Framingham, MA 01701 USA
| | - Angela Szeto
- 2Diabetes Research Institute, University of Miami Miller School of Medicine, Miami, FL 33136 USA
| | - Armando J Mendez
- 2Diabetes Research Institute, University of Miami Miller School of Medicine, Miami, FL 33136 USA
| | - Stephane Gesta
- Berg, LLC, 500 Old Connecticut Path, Bldg B (3rd Floor), Framingham, MA 01701 USA
| | - Vivek K Vishnudas
- Berg, LLC, 500 Old Connecticut Path, Bldg B (3rd Floor), Framingham, MA 01701 USA
| | - Niven R Narain
- Berg, LLC, 500 Old Connecticut Path, Bldg B (3rd Floor), Framingham, MA 01701 USA
| | | |
Collapse
|