1
|
Zhang ZM, Zhao JP, Wei PJ, Zheng CH. iPromoter-CLA: Identifying promoters and their strength by deep capsule networks with bidirectional long short-term memory. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107087. [PMID: 36099675 DOI: 10.1016/j.cmpb.2022.107087] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 05/14/2022] [Accepted: 08/23/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE The promoter is a fragment of DNA and a specific sequence with transcriptional regulation function in DNA. Promoters are located upstream at the transcription start site, which is used to initiate downstream gene expression. So far, promoter identification is mainly achieved by biological methods, which often require more effort. It has become a more effective classification and prediction method to identify promoter types through computational methods. METHODS In this study, we proposed a new capsule network and recurrent neural network hybrid model to identify promoters and predict their strength. Firstly, we used one-hot to encode DNA sequence. Secondly, we used three one-dimensional convolutional layers, a one-dimensional convolutional capsule layer and digit capsule layer to learn local features. Thirdly, a bidirectional long short-time memory was utilized to extract global features. Finally, we adopted the self-attention mechanism to improve the contribution of relatively important features, which further enhances the performance of the model. RESULTS Our model attains a cross-validation accuracy of 86% and 73.46% in prokaryotic promoter recognition and their strength prediction, which showcases a better performance compared with the existing approaches in both the first layer promoter identification and the second layer promoter's strength prediction. CONCLUSIONS our model not only combines convolutional neural network and capsule layer but also uses a self-attention mechanism to better capture hidden information features from the perspective of sequence. Thus, we hope that our model can be widely applied to other components.
Collapse
Affiliation(s)
- Zhi-Min Zhang
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Jian-Ping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.
| | - Pi-Jing Wei
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Chun-Hou Zheng
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China; School of Artificial Intelligence, Anhui University, Hefei, China
| |
Collapse
|
2
|
Yu L, Ju B, Ren S. HLGNN-MDA: Heuristic Learning Based on Graph Neural Networks for miRNA-Disease Association Prediction. Int J Mol Sci 2022; 23:13155. [PMID: 36361945 PMCID: PMC9657597 DOI: 10.3390/ijms232113155] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 10/23/2022] [Accepted: 10/26/2022] [Indexed: 01/12/2024] Open
Abstract
Identifying disease-related miRNAs can improve the understanding of complex diseases. However, experimentally finding the association between miRNAs and diseases is expensive in terms of time and resources. The computational screening of reliable miRNA-disease associations has thus become a necessary tool to guide biological experiments. "Similar miRNAs will be associated with the same disease" is the assumption on which most current miRNA-disease association prediction methods rely; however, biased prior knowledge, and incomplete and inaccurate miRNA similarity data and disease similarity data limit the performance of the model. Here, we propose heuristic learning based on graph neural networks to predict microRNA-disease associations (HLGNN-MDA). We learn the local graph topology features of the predicted miRNA-disease node pairs using graph neural networks. In particular, our improvements to the graph convolution layer of the graph neural network enable it to learn information among homogeneous nodes and among heterogeneous nodes. We illustrate the performance of HLGNN-MDA by performing tenfold cross-validation against excellent baseline models. The results show that we have promising performance in multiple metrics. We also focus on the role of the improvements to the graph convolution layer in the model. The case studies are supported by evidence on breast cancer, hepatocellular carcinoma and renal cell carcinoma. Given the above, the experiments demonstrate that HLGNN-MDA can serve as a reliable method to identify novel miRNA-disease associations.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China
| | | | | |
Collapse
|
3
|
Dou L, Zhou W, Zhang L, Xu L, Han K. Accurate identification of RNA D modification using multiple features. RNA Biol 2021; 18:2236-2246. [PMID: 33729104 PMCID: PMC8632091 DOI: 10.1080/15476286.2021.1898160] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 02/13/2021] [Accepted: 02/23/2021] [Indexed: 10/21/2022] Open
Abstract
As one of the common post-transcriptional modifications in tRNAs, dihydrouridine (D) has prominent effects on regulating the flexibility of tRNA as well as cancerous diseases. Facing with the expensive and time-consuming sequencing techniques to detect D modification, precise computational tools can largely promote the progress of molecular mechanisms and medical developments. We proposed a novel predictor, called iRNAD_XGBoost, to identify potential D sites using multiple RNA sequence representations. In this method, by considering the imbalance problem using hybrid sampling method SMOTEEEN, the XGBoost-selected top 30 features are applied to construct model. The optimized model showed high Sn and Sp values of 97.13% and 97.38% over jackknife test, respectively. For the independent experiment, these two metrics separately achieved 91.67% and 94.74%. Compared with iRNAD method, this model illustrated high generalizability and consistent prediction efficiencies for positive and negative samples, which yielded satisfactory MCC scores of 0.94 and 0.86, respectively. It is inferred that the chemical property and nucleotide density features (CPND), electron-ion interaction pseudopotential (EIIP and PseEIIP) as well as dinucleotide composition (DNC) are crucial to the recognition of D modification. The proposed predictor is a promising tool to help experimental biologists investigate molecular functions.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, GuangdongChina
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, SichuanChina
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, HeilongjiangChina
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, Guangdong, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, GuangdongChina
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, HeilongjiangChina
| |
Collapse
|
4
|
Malenica N, Dunić JA, Vukadinović L, Cesar V, Šimić D. Genetic Approaches to Enhance Multiple Stress Tolerance in Maize. Genes (Basel) 2021; 12:genes12111760. [PMID: 34828366 PMCID: PMC8617808 DOI: 10.3390/genes12111760] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 10/27/2021] [Accepted: 11/03/2021] [Indexed: 12/29/2022] Open
Abstract
The multiple-stress effects on plant physiology and gene expression are being intensively studied lately, primarily in model plants such as Arabidopsis, where the effects of six stressors have simultaneously been documented. In maize, double and triple stress responses are obtaining more attention, such as simultaneous drought and heat or heavy metal exposure, or drought in combination with insect and fungal infestation. To keep up with these challenges, maize natural variation and genetic engineering are exploited. On one hand, quantitative trait loci (QTL) associated with multiple-stress tolerance are being identified by molecular breeding and genome-wide association studies (GWAS), which then could be utilized for future breeding programs of more resilient maize varieties. On the other hand, transgenic approaches in maize have already resulted in the creation of many commercial double or triple stress resistant varieties, predominantly weed-tolerant/insect-resistant and, additionally, also drought-resistant varieties. It is expected that first generation gene-editing techniques, as well as recently developed base and prime editing applications, in combination with the routine haploid induction in maize, will pave the way to pyramiding more stress tolerant alleles in elite lines/varieties on time.
Collapse
Affiliation(s)
- Nenad Malenica
- Division of Molecular Biology, Faculty of Science, University of Zagreb, Horvatovac 102a, 10000 Zagreb, Croatia;
| | - Jasenka Antunović Dunić
- Department of Biology, Josip Juraj Strossmayer University, Cara Hadrijana 8/A, 31000 Osijek, Croatia; (J.A.D.); (V.C.)
| | - Lovro Vukadinović
- Agricultural Institute Osijek, Južno Predgrađe 17, 31000 Osijek, Croatia;
| | - Vera Cesar
- Department of Biology, Josip Juraj Strossmayer University, Cara Hadrijana 8/A, 31000 Osijek, Croatia; (J.A.D.); (V.C.)
- Faculty of Dental Medicine and Health, Josip Juraj Strossmayer University of Osijek, Crkvena 21, 31000 Osijek, Croatia
| | - Domagoj Šimić
- Agricultural Institute Osijek, Južno Predgrađe 17, 31000 Osijek, Croatia;
- Centre of Excellence for Biodiversity and Molecular Plant Breeding (CroP-BioDiv), Svetošimunska 25, 10000 Zagreb, Croatia
- Correspondence: ; Tel.: +385-31-515-521
| |
Collapse
|
5
|
Liu T, Chen J, Zhang Q, Hippe K, Hunt C, Le T, Cao R, Tang H. The Development of Machine Learning Methods in discriminating Secretory Proteins of Malaria Parasite. Curr Med Chem 2021; 29:807-821. [PMID: 34636289 DOI: 10.2174/0929867328666211005140625] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/28/2021] [Accepted: 08/15/2021] [Indexed: 11/22/2022]
Abstract
Malaria caused by Plasmodium falciparum is one of the major infectious diseases in the world. It is essential to exploit an effective method to predict secretory proteins of malaria parasites to develop effective cures and treatment. Biochemical assays can provide details for accurate identification of the secretory proteins, but these methods are expensive and time-consuming. In this paper, we summarized the machine learning-based identification algorithms and compared the construction strategies between different computational methods. Also, we discussed the use of machine learning to improve the ability of algorithms to identify proteins secreted by malaria parasites.
Collapse
Affiliation(s)
- Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Jiamao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University. United States
| | - Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University. United States
| | - Thu Le
- Department of Computer Science, Pacific Lutheran University. United States
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University. United States
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| |
Collapse
|
6
|
Min X, Lu F, Li C. Sequence-Based Deep Learning Frameworks on Enhancer-Promoter Interactions Prediction. Curr Pharm Des 2021; 27:1847-1855. [PMID: 33234095 DOI: 10.2174/1381612826666201124112710] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 07/29/2020] [Accepted: 08/06/2020] [Indexed: 11/22/2022]
Abstract
Enhancer-promoter interactions (EPIs) in the human genome are of great significance to transcriptional regulation, which tightly controls gene expression. Identification of EPIs can help us better decipher gene regulation and understand disease mechanisms. However, experimental methods to identify EPIs are constrained by funds, time, and manpower, while computational methods using DNA sequences and genomic features are viable alternatives. Deep learning methods have shown promising prospects in classification and efforts that have been utilized to identify EPIs. In this survey, we specifically focus on sequence-based deep learning methods and conduct a comprehensive review of the literature. First, we briefly introduce existing sequence- based frameworks on EPIs prediction and their technique details. After that, we elaborate on the dataset, pre-processing means, and evaluation strategies. Finally, we concluded with the challenges these methods are confronted with and suggest several future opportunities. We hope this review will provide a useful reference for further studies on enhancer-promoter interactions.
Collapse
Affiliation(s)
- Xiaoping Min
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Fengqing Lu
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Chunyan Li
- Graduate School, Yunnan Minzu University, Kunming 650504, China
| |
Collapse
|
7
|
Zenda T, Liu S, Dong A, Duan H. Advances in Cereal Crop Genomics for Resilience under Climate Change. Life (Basel) 2021; 11:502. [PMID: 34072447 PMCID: PMC8228855 DOI: 10.3390/life11060502] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 05/21/2021] [Accepted: 05/25/2021] [Indexed: 12/12/2022] Open
Abstract
Adapting to climate change, providing sufficient human food and nutritional needs, and securing sufficient energy supplies will call for a radical transformation from the current conventional adaptation approaches to more broad-based and transformative alternatives. This entails diversifying the agricultural system and boosting productivity of major cereal crops through development of climate-resilient cultivars that can sustainably maintain higher yields under climate change conditions, expanding our focus to crop wild relatives, and better exploitation of underutilized crop species. This is facilitated by the recent developments in plant genomics, such as advances in genome sequencing, assembly, and annotation, as well as gene editing technologies, which have increased the availability of high-quality reference genomes for various model and non-model plant species. This has necessitated genomics-assisted breeding of crops, including underutilized species, consequently broadening genetic variation of the available germplasm; improving the discovery of novel alleles controlling important agronomic traits; and enhancing creation of new crop cultivars with improved tolerance to biotic and abiotic stresses and superior nutritive quality. Here, therefore, we summarize these recent developments in plant genomics and their application, with particular reference to cereal crops (including underutilized species). Particularly, we discuss genome sequencing approaches, quantitative trait loci (QTL) mapping and genome-wide association (GWAS) studies, directed mutagenesis, plant non-coding RNAs, precise gene editing technologies such as CRISPR-Cas9, and complementation of crop genotyping by crop phenotyping. We then conclude by providing an outlook that, as we step into the future, high-throughput phenotyping, pan-genomics, transposable elements analysis, and machine learning hold much promise for crop improvements related to climate resilience and nutritional superiority.
Collapse
Affiliation(s)
- Tinashe Zenda
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Science, Faculty of Agriculture and Environmental Science, Bindura University of Science Education, Bindura P. Bag 1020, Zimbabwe
| | - Songtao Liu
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
| | - Anyi Dong
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
| | - Huijun Duan
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
| |
Collapse
|
8
|
Mores A, Borrelli GM, Laidò G, Petruzzino G, Pecchioni N, Amoroso LGM, Desiderio F, Mazzucotelli E, Mastrangelo AM, Marone D. Genomic Approaches to Identify Molecular Bases of Crop Resistance to Diseases and to Develop Future Breeding Strategies. Int J Mol Sci 2021; 22:5423. [PMID: 34063853 PMCID: PMC8196592 DOI: 10.3390/ijms22115423] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 04/30/2021] [Accepted: 05/15/2021] [Indexed: 12/16/2022] Open
Abstract
Plant diseases are responsible for substantial crop losses each year and affect food security and agricultural sustainability. The improvement of crop resistance to pathogens through breeding represents an environmentally sound method for managing disease and minimizing these losses. The challenge is to breed varieties with a stable and broad-spectrum resistance. Different approaches, from markers to recent genomic and 'post-genomic era' technologies, will be reviewed in order to contribute to a better understanding of the complexity of host-pathogen interactions and genes, including those with small phenotypic effects and mechanisms that underlie resistance. An efficient combination of these approaches is herein proposed as the basis to develop a successful breeding strategy to obtain resistant crop varieties that yield higher in increasing disease scenarios.
Collapse
Affiliation(s)
- Antonia Mores
- Council for Agricultural Research and Economics, Research Centre for Cereal and Industrial Crops, S.S. 673, Km 25,200, 71122 Foggia, Italy; (A.M.); (G.M.B.); (G.L.); (G.P.); (N.P.); (A.M.M.)
| | - Grazia Maria Borrelli
- Council for Agricultural Research and Economics, Research Centre for Cereal and Industrial Crops, S.S. 673, Km 25,200, 71122 Foggia, Italy; (A.M.); (G.M.B.); (G.L.); (G.P.); (N.P.); (A.M.M.)
| | - Giovanni Laidò
- Council for Agricultural Research and Economics, Research Centre for Cereal and Industrial Crops, S.S. 673, Km 25,200, 71122 Foggia, Italy; (A.M.); (G.M.B.); (G.L.); (G.P.); (N.P.); (A.M.M.)
| | - Giuseppe Petruzzino
- Council for Agricultural Research and Economics, Research Centre for Cereal and Industrial Crops, S.S. 673, Km 25,200, 71122 Foggia, Italy; (A.M.); (G.M.B.); (G.L.); (G.P.); (N.P.); (A.M.M.)
| | - Nicola Pecchioni
- Council for Agricultural Research and Economics, Research Centre for Cereal and Industrial Crops, S.S. 673, Km 25,200, 71122 Foggia, Italy; (A.M.); (G.M.B.); (G.L.); (G.P.); (N.P.); (A.M.M.)
| | | | - Francesca Desiderio
- Council for Agricultural Research and Economics, Genomics and Bioinformatics Research Center, Via San Protaso 302, 29017 Fiorenzuola d’Arda, Italy; (F.D.); (E.M.)
| | - Elisabetta Mazzucotelli
- Council for Agricultural Research and Economics, Genomics and Bioinformatics Research Center, Via San Protaso 302, 29017 Fiorenzuola d’Arda, Italy; (F.D.); (E.M.)
| | - Anna Maria Mastrangelo
- Council for Agricultural Research and Economics, Research Centre for Cereal and Industrial Crops, S.S. 673, Km 25,200, 71122 Foggia, Italy; (A.M.); (G.M.B.); (G.L.); (G.P.); (N.P.); (A.M.M.)
| | - Daniela Marone
- Council for Agricultural Research and Economics, Research Centre for Cereal and Industrial Crops, S.S. 673, Km 25,200, 71122 Foggia, Italy; (A.M.); (G.M.B.); (G.L.); (G.P.); (N.P.); (A.M.M.)
| |
Collapse
|
9
|
Shang Y, Gao L, Zou Q, Yu L. Prediction of drug-target interactions based on multi-layer network representation learning. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.068] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
10
|
iPTT(2 L)-CNN: A Two-Layer Predictor for Identifying Promoters and Their Types in Plant Genomes by Convolutional Neural Network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6636350. [PMID: 33488763 PMCID: PMC7803414 DOI: 10.1155/2021/6636350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/13/2020] [Accepted: 12/16/2020] [Indexed: 11/18/2022]
Abstract
A promoter is a short DNA sequence near to the start codon, responsible for initiating transcription of a specific gene in genome. The accurate recognition of promoters has great significance for a better understanding of the transcriptional regulation. Because of their importance in the process of biological transcriptional regulation, there is an urgent need to develop in silico tools to identify promoters and their types timely and accurately. A number of prediction methods had been developed in this regard; however, almost all of them were merely used for identifying promoters and their strength or sigma types. Owing to that TATA box region in TATA promoter that influences posttranscriptional processes, in the current study, we developed a two-layer predictor called iPTT(2L)-CNN by using the convolutional neural network (CNN) for identifying TATA and TATA-less promoters. The first layer can be used to identify a given DNA sequence as a promoter or nonpromoter. The second layer is used to identify whether the recognized promoter is TATA promoter or not. The 5-fold crossvalidation and independent testing results demonstrate that the constructed predictor is promising for identifying promoter and classifying TATA and TATA-less promoter. Furthermore, to make it easier for most experimental scientists get the results they need, a user-friendly web server has been established at http://www.jci-bioinfo.cn/iPPT(2L)-CNN.
Collapse
|
11
|
Wang J, Wu B, Kohnen MV, Lin D, Yang C, Wang X, Qiang A, Liu W, Kang J, Li H, Shen J, Yao T, Su J, Li B, Gu L. Classification of Rice Yield Using UAV-Based Hyperspectral Imagery and Lodging Feature. PLANT PHENOMICS (WASHINGTON, D.C.) 2021; 2021:9765952. [PMID: 33851136 PMCID: PMC8028843 DOI: 10.34133/2021/9765952] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 03/10/2021] [Indexed: 05/09/2023]
Abstract
High-yield rice cultivation is an effective way to address the increasing food demand worldwide. Correct classification of high-yield rice is a key step of breeding. However, manual measurements within breeding programs are time consuming and have high cost and low throughput, which limit the application in large-scale field phenotyping. In this study, we developed an accurate large-scale approach and presented the potential usage of hyperspectral data for rice yield measurement using the XGBoost algorithm to speed up the rice breeding process for many breeders. In total, 13 japonica rice lines in regional trials in northern China were divided into different categories according to the manual measurement of yield. Using an Unmanned Aerial Vehicle (UAV) platform equipped with a hyperspectral camera to capture images over multiple time series, a rice yield classification model based on the XGBoost algorithm was proposed. Four comparison experiments were carried out through the intraline test and the interline test considering lodging characteristics at the midmature stage or not. The result revealed that the degree of lodging in the midmature stage was an important feature affecting the classification accuracy of rice. Thus, we developed a low-cost, high-throughput phenotyping and nondestructive method by combining UAV-based hyperspectral measurements and machine learning for estimation of rice yield to improve rice breeding efficiency.
Collapse
Affiliation(s)
- Jian Wang
- Institute of Crop Sciences, Ningxia Academy of Agriculture and Forestry Science, Yinchuan, Ningxia 750105, China
| | - Bizhi Wu
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- State Key Laboratory of Marine Environmental Science, Xiamen University, China
| | - Markus V. Kohnen
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Daqi Lin
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Changcai Yang
- Digital Fujian Institute of Big Data for Agriculture and Forestry, Key Laboratory of Smart Agriculture and Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Xiaowei Wang
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Ailing Qiang
- Institute of Crop Sciences, Ningxia Academy of Agriculture and Forestry Science, Yinchuan, Ningxia 750105, China
| | - Wei Liu
- Institute of Crop Sciences, Ningxia Academy of Agriculture and Forestry Science, Yinchuan, Ningxia 750105, China
| | - Jianbin Kang
- Seed Workstations of the Ningxia Hui Autonomous Region, Yinchuan, Ningxia 750004, China
| | - Hua Li
- Digital Fujian Institute of Big Data for Agriculture and Forestry, Key Laboratory of Smart Agriculture and Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Jing Shen
- Seed Workstations of the Ningxia Hui Autonomous Region, Yinchuan, Ningxia 750004, China
| | - Tianhao Yao
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Jun Su
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Bangyu Li
- Aerospace Information Research Center, Institute of Automation, Chinese Academic Science, Beijing 100190, China
| | - Lianfeng Gu
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| |
Collapse
|
12
|
Zhu Y, Li F, Xiang D, Akutsu T, Song J, Jia C. Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks. Brief Bioinform 2020; 22:5998831. [PMID: 33227813 DOI: 10.1093/bib/bbaa299] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 10/01/2020] [Accepted: 10/07/2020] [Indexed: 12/26/2022] Open
Abstract
A promoter is a region in the DNA sequence that defines where the transcription of a gene by RNA polymerase initiates, which is typically located proximal to the transcription start site (TSS). How to correctly identify the gene TSS and the core promoter is essential for our understanding of the transcriptional regulation of genes. As a complement to conventional experimental methods, computational techniques with easy-to-use platforms as essential bioinformatics tools can be effectively applied to annotate the functions and physiological roles of promoters. In this work, we propose a deep learning-based method termed Depicter (Deep learning for predicting promoter), for identifying three specific types of promoters, i.e. promoter sequences with the TATA-box (TATA model), promoter sequences without the TATA-box (non-TATA model), and indistinguishable promoters (TATA and non-TATA model). Depicter is developed based on an up-to-date, species-specific dataset which includes Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana promoters. A convolutional neural network coupled with capsule layers is proposed to train and optimize the prediction model of Depicter. Extensive benchmarking and independent tests demonstrate that Depicter achieves an improved predictive performance compared with several state-of-the-art methods. The webserver of Depicter is implemented and freely accessible at https://depicter.erc.monash.edu/.
Collapse
Affiliation(s)
- Yan Zhu
- School of Science, Dalian Maritime University, China
| | - Fuyi Li
- Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Australia
| | | | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University
| |
Collapse
|
13
|
Sequence based prediction of pattern recognition receptors by using feature selection technique. Int J Biol Macromol 2020; 162:931-934. [DOI: 10.1016/j.ijbiomac.2020.06.234] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 06/23/2020] [Accepted: 06/24/2020] [Indexed: 01/04/2023]
|
14
|
Min X, Ye C, Liu X, Zeng X. Predicting enhancer-promoter interactions by deep learning and matching heuristic. Brief Bioinform 2020; 22:5937174. [PMID: 33096548 DOI: 10.1093/bib/bbaa254] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 09/08/2020] [Accepted: 09/08/2020] [Indexed: 12/11/2022] Open
Abstract
Enhancer-promoter interactions (EPIs) play an important role in transcriptional regulation. Recently, machine learning-based methods have been widely used in the genome-scale identification of EPIs due to their promising predictive performance. In this paper, we propose a novel method, termed EPI-DLMH, for predicting EPIs with the use of DNA sequences only. EPI-DLMH consists of three major steps. First, a two-layer convolutional neural network is used to learn local features, and an bidirectional gated recurrent unit network is used to capture long-range dependencies on the sequences of promoters and enhancers. Second, an attention mechanism is used for focusing on relatively important features. Finally, a matching heuristic mechanism is introduced for the exploration of the interaction between enhancers and promoters. We use benchmark datasets in evaluating and comparing the proposed method with existing methods. Comparative results show that our model is superior to currently existing models in multiple cell lines. Specifically, we found that the matching heuristic mechanism introduced into the proposed model mainly contributes to the improvement of performance in terms of overall accuracy. Additionally, compared with existing models, our model is more efficient with regard to computational speed.
Collapse
Affiliation(s)
- Xiaoping Min
- Department of Computer Science, Xiamen University, Xiamen, China
| | - Congmin Ye
- Department of Computer Science, Xiamen University, Xiamen, China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen, China
| | - Xiangxiang Zeng
- School of Information Science and Engineering, Hunan University, Changsha, China
| |
Collapse
|
15
|
Chen T, Wang X, Chu Y, Wang Y, Jiang M, Wei DQ, Xiong Y. T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm. Front Microbiol 2020; 11:580382. [PMID: 33072049 PMCID: PMC7541839 DOI: 10.3389/fmicb.2020.580382] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 08/21/2020] [Indexed: 12/19/2022] Open
Abstract
Type IV secreted effectors (T4SEs) can be translocated into the cytosol of host cells via type IV secretion system (T4SS) and cause diseases. However, experimental approaches to identify T4SEs are time- and resource-consuming, and the existing computational tools based on machine learning techniques have some obvious limitations such as the lack of interpretability in the prediction models. In this study, we proposed a new model, T4SE-XGB, which uses the eXtreme gradient boosting (XGBoost) algorithm for accurate identification of type IV effectors based on optimal features based on protein sequences. After trying 20 different types of features, the best performance was achieved when all features were fed into XGBoost by the 5-fold cross validation in comparison with other machine learning methods. Then, the ReliefF algorithm was adopted to get the optimal feature set on our dataset, which further improved the model performance. T4SE-XGB exhibited highest predictive performance on the independent test set and outperformed other published prediction tools. Furthermore, the SHAP method was used to interpret the contribution of features to model predictions. The identification of key features can contribute to improved understanding of multifactorial contributors to host-pathogen interactions and bacterial pathogenesis. In addition to type IV effector prediction, we believe that the proposed framework can provide instructive guidance for similar studies to construct prediction methods on related biological problems. The data and source code of this study can be freely accessed at https://github.com/CT001002/T4SE-XGB.
Collapse
Affiliation(s)
- Tianhang Chen
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Mingming Jiang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
16
|
Liu Z, Zhang Y, Han X, Li C, Yang X, Gao J, Xie G, Du N. Identifying Cancer-Related lncRNAs Based on a Convolutional Neural Network. Front Cell Dev Biol 2020; 8:637. [PMID: 32850792 PMCID: PMC7432192 DOI: 10.3389/fcell.2020.00637] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 06/24/2020] [Indexed: 12/15/2022] Open
Abstract
Millions of people are suffering from cancers, but accurate early diagnosis and effective treatment are still tough for all doctors. In recent years, long non-coding RNAs (lncRNAs) have been proven to play an important role in diseases, especially cancers. These lncRNAs execute their functions by regulating gene expression. Therefore, identifying lncRNAs which are related to cancers could help researchers gain a deeper understanding of cancer mechanisms and help them find treatment options. A large number of relationships between lncRNAs and cancers have been verified by biological experiments, which give us a chance to use computational methods to identify cancer-related lncRNAs. In this paper, we applied the convolutional neural network (CNN) to identify cancer-related lncRNAs by lncRNA's target genes and their tissue expression specificity. Since lncRNA regulates target gene expression and it has been reported to have tissue expression specificity, their target genes and expression in different tissues were used as features of lncRNAs. Then, the deep belief network (DBN) was used to unsupervised encode features of lncRNAs. Finally, CNN was used to predict cancer-related lncRNAs based on known relationships between lncRNAs and cancers. For each type of cancer, we built a CNN model to predict its related lncRNAs. We identified more related lncRNAs for 41 kinds of cancers. Ten-cross validation has been used to prove the performance of our method. The results showed that our method is better than several previous methods with area under the curve (AUC) 0.81 and area under the precision–recall curve (AUPR) 0.79. To verify the accuracy of our results, case studies have been done.
Collapse
Affiliation(s)
- Zihao Liu
- Department of Oncology, Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing, China.,Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Xudong Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chenxi Li
- Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Xuhui Yang
- Department of Oncology, Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing, China
| | - Jie Gao
- Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Ganfeng Xie
- Department of Oncology, Southwest Hospital, Army Medical University, Chongqing, China
| | - Nan Du
- Department of Oncology, Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing, China.,Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| |
Collapse
|
17
|
Wang H, Cimen E, Singh N, Buckler E. Deep learning for plant genomics and crop improvement. CURRENT OPINION IN PLANT BIOLOGY 2020; 54:34-41. [PMID: 31986354 DOI: 10.1016/j.pbi.2019.12.010] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 11/28/2019] [Accepted: 12/18/2019] [Indexed: 05/26/2023]
Abstract
Our era has witnessed tremendous advances in plant genomics, characterized by an explosion of high-throughput techniques to identify multi-dimensional genome-wide molecular phenotypes at low costs. More importantly, genomics is not merely acquiring molecular phenotypes, but also leveraging powerful data mining tools to predict and explain them. In recent years, deep learning has been found extremely effective in these tasks. This review highlights two prominent questions at the intersection of genomics and deep learning: 1) how can the flow of information from genomic DNA sequences to molecular phenotypes be modeled; 2) how can we identify functional variants in natural populations using deep learning models? Additionally, we discuss the possibility of unleashing the power of deep learning in synthetic biology to create novel genomic elements with desirable functions. Taken together, we propose a central role of deep learning in future plant genomics research and crop genetic improvement.
Collapse
Affiliation(s)
- Hai Wang
- National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, Joint Laboratory for International Cooperation in Crop Molecular Breeding, China Agricultural University, Beijing 100193, China; Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Emre Cimen
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; Computational Intelligence and Optimization Laboratory, Industrial Engineering Department, Eskisehir Technical University, Eskisehir 26000, Turkey
| | - Nisha Singh
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; ICAR-National Institute for Plant Biotechnology, New Delhi 110012, India
| | - Edward Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; United States Department of Agriculture, Agricultural Research Service, Ithaca, NY 14853, USA
| |
Collapse
|
18
|
Affiliation(s)
- Youhuang Bai
- Department of Bioinformatics, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Ziding Zhang
- National Demonstration Center for Experimental Biological Sciences Education, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Ming Chen
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|