1
|
Ali M, Shah D, Qazi S, Khan IA, Abrar M, Zahir S. An effective deep learning-based approach for splice site identification in gene expression. Sci Prog 2024; 107:368504241266588. [PMID: 39051530 PMCID: PMC11273556 DOI: 10.1177/00368504241266588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
A crucial stage in eukaryote gene expression involves mRNA splicing by a protein assembly known as the spliceosome. This step significantly contributes to generating and properly operating the ultimate gene product. Since non-coding introns disrupt eukaryotic genes, splicing entails the elimination of introns and joining exons to create a functional mRNA molecule. Nevertheless, accurately finding splice sequence sites using various molecular biology techniques and other biological approaches is complex and time-consuming. This paper presents a precise and reliable computer-aided diagnosis (CAD) technique for the rapid and correct identification of splice site sequences. The proposed deep learning-based framework uses long short-term memory (LSTM) to extract distinct patterns from RNA sequences, enabling rapid and accurate point mutation sequence mapping. The proposed network employs one-hot encodings to find sequential patterns that effectively identify splicing sites. A thorough ablation study of traditional machine learning, one-dimensional convolutional neural networks (1D-CNNs), and recurrent neural networks (RNNs) models was conducted. The proposed LSTM network outperformed existing state-of-the-art approaches, improving accuracy by 3% and 2% for the acceptor and donor sites datasets.
Collapse
Affiliation(s)
- Mohsin Ali
- Department of Computer Science, Bacha Khan University, Charsadda, KP, Pakistan
| | - Dilawar Shah
- Department of Computer Science, Bacha Khan University, Charsadda, KP, Pakistan
| | - Shahid Qazi
- Department of Computer Science, Bacha Khan University, Charsadda, KP, Pakistan
| | - Izaz Ahmad Khan
- Department of Computer Science, Bacha Khan University, Charsadda, KP, Pakistan
| | - Mohammad Abrar
- Faculty of Computer Science, Arab Open University, Muscat, Oman, Sultanate of Oman
| | - Sana Zahir
- Institute of Computer Sciences and Information Technology, The University of Agriculture Peshawar, Peshawar, KP, Pakistan
| |
Collapse
|
2
|
Cho E, Cho S, Kim M, Ediriweera TK, Seo D, Lee SS, Cha J, Jin D, Kim YK, Lee JH. Single nucleotide polymorphism marker combinations for classifying Yeonsan Ogye chicken using a machine learning approach. JOURNAL OF ANIMAL SCIENCE AND TECHNOLOGY 2022; 64:830-841. [PMID: 36287747 PMCID: PMC9574617 DOI: 10.5187/jast.2022.e64] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 07/15/2022] [Accepted: 08/01/2022] [Indexed: 11/27/2022]
Abstract
Genetic analysis has great potential as a tool to differentiate between different species and breeds of livestock. In this study, the optimal combinations of single nucleotide polymorphism (SNP) markers for discriminating the Yeonsan Ogye chicken (Gallus gallus domesticus) breed were identified using high-density 600K SNP array data. In 3,904 individuals from 198 chicken breeds, SNP markers specific to the target population were discovered through a case-control genome-wide association study (GWAS) and filtered out based on the linkage disequilibrium blocks. Significant SNP markers were selected by feature selection applying two machine learning algorithms: Random Forest (RF) and AdaBoost (AB). Using a machine learning approach, the 38 (RF) and 43 (AB) optimal SNP marker combinations for the Yeonsan Ogye chicken population demonstrated 100% accuracy. Hence, the GWAS and machine learning models used in this study can be efficiently utilized to identify the optimal combination of markers for discriminating target populations using multiple SNP markers.
Collapse
Affiliation(s)
- Eunjin Cho
- Department of Bio-AI Convergence, Chungnam
National University, Daejeon 34134, Korea
| | - Sunghyun Cho
- Research and Development Center,
Insilicogen Inc., Yongin 19654, Korea
| | - Minjun Kim
- Division of Animal and Dairy Science,
Chungnam National University, Daejeon 34134, Korea
| | | | - Dongwon Seo
- Department of Bio-AI Convergence, Chungnam
National University, Daejeon 34134, Korea,Research Institute TNT Research
Company, Jeonju 54810, Korea
| | | | - Jihye Cha
- Animal Genome & Bioinformatics,
National Institute of Animal Science, Rural Development
Administration, Wanju 55365, Korea
| | - Daehyeok Jin
- Animal Genetic Resources Research Center,
National Institute of Animal Science, Rural Development
Administration, Hamyang 50000, Korea
| | - Young-Kuk Kim
- Department of Bio-AI Convergence, Chungnam
National University, Daejeon 34134, Korea
| | - Jun Heon Lee
- Department of Bio-AI Convergence, Chungnam
National University, Daejeon 34134, Korea,Division of Animal and Dairy Science,
Chungnam National University, Daejeon 34134, Korea,Corresponding author: Jun Heon Lee,
Department of Bio-AI Convergence, Chungnam National University, Daejeon 34134,
Korea. Tel: +82-42-821-5779, E-mail:
| |
Collapse
|
3
|
Li X, Cai C, Zheng H, Zhu H. Recognizing strawberry appearance quality using different combinations of deep feature and classifiers. J FOOD PROCESS ENG 2022. [DOI: 10.1111/jfpe.13982] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Xuchen Li
- School of Optoelectronic Engineering Xi'an Technological University Xi'an China
| | - Changlong Cai
- School of Optoelectronic Engineering Xi'an Technological University Xi'an China
| | - Hao Zheng
- School of Optoelectronic Engineering Xi'an Technological University Xi'an China
| | - Hongfei Zhu
- School of Computer Science and Technology Tiangong University Tianjin China
| |
Collapse
|
4
|
Kim HS, Kim KB, Lee JH, Jung JJ, Kim YJ, Kim SP, Choi MH, Yi JH, Chung SC. Mid-Air Tactile Sensations Evoked by Laser-Induced Plasma: A Neurophysiological Study. Front Neurosci 2021; 15:733423. [PMID: 34658771 PMCID: PMC8517193 DOI: 10.3389/fnins.2021.733423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 09/06/2021] [Indexed: 11/22/2022] Open
Abstract
This study demonstrates the feasibility of a mid-air means of haptic stimulation at a long distance using the plasma effect induced by laser. We hypothesize that the stress wave generated by laser-induced plasma in the air can propagate through the air to reach the nearby human skin and evoke tactile sensation. To validate this hypothesis, we investigated somatosensory responses in the human brain to laser plasma stimuli by analyzing electroencephalography (EEG) in 14 participants. Three types of stimuli were provided to the index finger: a plasma stimulus induced from the laser, a mechanical stimulus transferred through Styrofoam stick, and a sham stimulus providing only the sound of the plasma and mechanical stimuli at the same time. The event-related desynchronization/synchronization (ERD/S) of sensorimotor rhythms (SMRs) in EEG was analyzed. Every participant verbally reported that they could feel a soft tap on the finger in response to the laser stimulus, but not to the sham stimulus. The spectrogram of EEG evoked by laser stimulation was similar to that evoked by mechanical stimulation; alpha ERD and beta ERS were present over the sensorimotor area in response to laser as well as mechanical stimuli. A decoding analysis revealed that classification error increased when discriminating ERD/S patterns between laser and mechanical stimuli, compared to the case of discriminating between laser and sham, or mechanical and sham stimuli. Our neurophysiological results confirm that tactile sensation can be evoked by the plasma effect induced by laser in the air, which may provide a mid-air haptic stimulation method.
Collapse
Affiliation(s)
- Hyung-Sik Kim
- Department of Biomedical Engineering, BK21 Plus Research Institute of Biomedical Engineering, School of ICT Convergence Engineering, College of Science and Technology, Konkuk University, Chungju-si, South Korea
| | - Kyu Beom Kim
- Department of Biomedical Engineering, BK21 Plus Research Institute of Biomedical Engineering, School of ICT Convergence Engineering, College of Science and Technology, Konkuk University, Chungju-si, South Korea
| | - Je-Hyeop Lee
- Department of Biomedical Engineering, BK21 Plus Research Institute of Biomedical Engineering, School of ICT Convergence Engineering, College of Science and Technology, Konkuk University, Chungju-si, South Korea
| | - Jin-Ju Jung
- Department of Biomedical Engineering, BK21 Plus Research Institute of Biomedical Engineering, School of ICT Convergence Engineering, College of Science and Technology, Konkuk University, Chungju-si, South Korea
| | - Ye-Jin Kim
- Department of Biomedical Engineering, BK21 Plus Research Institute of Biomedical Engineering, School of ICT Convergence Engineering, College of Science and Technology, Konkuk University, Chungju-si, South Korea
| | - Sung-Phil Kim
- Department of Biomedical Engineering, Ulsan National Institute of Science and Technology, Ulsan, South Korea
| | - Mi-Hyun Choi
- Department of Biomedical Engineering, BK21 Plus Research Institute of Biomedical Engineering, School of ICT Convergence Engineering, College of Science and Technology, Konkuk University, Chungju-si, South Korea
| | - Jeong-Han Yi
- Department of Biomedical Engineering, BK21 Plus Research Institute of Biomedical Engineering, School of ICT Convergence Engineering, College of Science and Technology, Konkuk University, Chungju-si, South Korea
| | - Soon-Cheol Chung
- Department of Biomedical Engineering, BK21 Plus Research Institute of Biomedical Engineering, School of ICT Convergence Engineering, College of Science and Technology, Konkuk University, Chungju-si, South Korea
| |
Collapse
|
5
|
Wani MA, Garg P, Roy KK. Machine learning-enabled predictive modeling to precisely identify the antimicrobial peptides. Med Biol Eng Comput 2021; 59:2397-2408. [PMID: 34632545 DOI: 10.1007/s11517-021-02443-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 09/14/2021] [Indexed: 10/20/2022]
Abstract
The ubiquitous antimicrobial peptides (AMPs), with a broad range of antimicrobial activities, represent a great promise for combating the multi-drug resistant infections. In this study, using a large and diverse set of AMPs (2638) and non-AMPs (3700), we have explored a variety of machine learning classifiers to build in silico models for AMP prediction, including Random Forest (RF), k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), and ensemble learning. Among the various models generated, the RF classifier-based model top-performed in both the internal [Accuracy: 91.40%, Precision: 89.37%, Sensitivity: 90.05%, and Specificity: 92.36%] and external validations [Accuracy: 89.43%, Precision: 88.92%, Sensitivity: 85.21%, and Specificity: 92.43%]. In addition, the RF classifier-based model correctly predicted the known AMPs and non-AMPs; those kept aside as an additional external validation set. The performance assessment revealed three features viz. ChargeD2001, PAAC12 (pseudo amino acid composition), and polarity T13 that are likely to play vital roles in the antimicrobial activity of AMPs. The developed RF-based classification model may further be useful in the design and prediction of the novel potential AMPs.
Collapse
Affiliation(s)
- Mushtaq Ahmad Wani
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Kolkata, 700054, West Bengal, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Mohali, 160062, Punjab, India
| | - Kuldeep K Roy
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Kolkata, 700054, West Bengal, India. .,Department of Pharmaceutical Sciences, School of Health Sciences, University of Petroleum and Energy Studies (UPES), P.O. Bidholi, Dehradun, 248007, Uttarakhand, India.
| |
Collapse
|
6
|
Comparison of Dengue Predictive Models Developed Using Artificial Neural Network and Discriminant Analysis with Small Dataset. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11030943] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In Indonesia, dengue has become one of the hyperendemic diseases. Dengue consists of three clinical phases—febrile phase, critical phase, and recovery phase. Many patients have died in the critical phase due to the lack of proper and timely treatment. Therefore, we developed models that can predict the severity level of dengue based on the laboratory test results of the corresponding patients using Artificial Neural Network (ANN) and Discriminant Analysis (DA). In developing the models, we used a very small dataset. It is shown that ANN models developed using logistic and hyperbolic tangent activation function with 70% training data yielded the highest accuracy (90.91%), sensitivity (91.11%), and specificity (95.51%). This is the proposed model in this research. The proposed model will be able to help physicians in predicting the severity level of dengue patients before entering the critical phase. Furthermore, it will ease physicians in treating dengue patients early, so fatal cases or deaths can be avoided.
Collapse
|
7
|
Zhang MQ. A personal journey on cracking the genomic codes. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-021-0245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
8
|
Zhang Y, Liu X, MacLeod J, Liu J. Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach. BMC Genomics 2018; 19:971. [PMID: 30591034 PMCID: PMC6307148 DOI: 10.1186/s12864-018-5350-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 12/03/2018] [Indexed: 11/10/2022] Open
Abstract
Background Exon splicing is a regulated cellular process in the transcription of protein-coding genes. Technological advancements and cost reductions in RNA sequencing have made quantitative and qualitative assessments of the transcriptome both possible and widely available. RNA-seq provides unprecedented resolution to identify gene structures and resolve the diversity of splicing variants. However, currently available ab initio aligners are vulnerable to spurious alignments due to random sequence matches and sample-reference genome discordance. As a consequence, a significant set of false positive exon junction predictions would be introduced, which will further confuse downstream analyses of splice variant discovery and abundance estimation. Results In this work, we present a deep learning based splice junction sequence classifier, named DeepSplice, which employs convolutional neural networks to classify candidate splice junctions. We show (I) DeepSplice outperforms state-of-the-art methods for splice site classification when applied to the popular benchmark dataset HS3D, (II) DeepSplice shows high accuracy for splice junction classification with GENCODE annotation, and (III) the application of DeepSplice to classify putative splice junctions generated by Rail-RNA alignment of 21,504 human RNA-seq data significantly reduces 43 million candidates into around 3 million highly confident novel splice junctions. Conclusions A model inferred from the sequences of annotated exon junctions that can then classify splice junctions derived from primary RNA-seq data has been implemented. The performance of the model was evaluated and compared through comprehensive benchmarking and testing, indicating a reliable performance and gross usability for classifying novel splice junctions derived from RNA-seq alignment. Electronic supplementary material The online version of this article (10.1186/s12864-018-5350-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yi Zhang
- Department of Computer Science, University of Kentucky, Lexington, KY, 40506, USA.
| | - Xinan Liu
- Department of Computer Science, University of Kentucky, Lexington, KY, 40506, USA
| | - James MacLeod
- Department of Veterinary Science, University of Kentucky, Lexington, KY, 40506, USA
| | - Jinze Liu
- Department of Computer Science, University of Kentucky, Lexington, KY, 40506, USA
| |
Collapse
|
9
|
Liu G, Liu GJ, Tan JX, Lin H. DNA physical properties outperform sequence compositional information in classifying nucleosome-enriched and -depleted regions. Genomics 2018; 111:1167-1175. [PMID: 30055231 DOI: 10.1016/j.ygeno.2018.07.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 07/07/2018] [Accepted: 07/15/2018] [Indexed: 12/15/2022]
Abstract
The nucleosome is the fundamental structural unit of eukaryotic chromatin and plays an essential role in the epigenetic regulation of cellular processes, such as DNA replication, recombination, and transcription. Hence, it is important to identify nucleosome positions in the genome. Our previous model based on DNA deformation energy, in which a set of DNA physical descriptors was used, performed well in predicting nucleosome dyad positions and occupancy. In this study, we established a machine-learning model for predicting nucleosome occupancy in order to further verify the physical descriptors. Results showed that (1) our model outperformed several other sequence compositional information-based models, indicating a stronger dependence of nucleosome positioning on DNA physical properties; (2) nucleosome-enriched and -depleted regions have distinct features in terms of DNA physical descriptors like sequence-dependent flexibility and equilibrium structure parameters; (3) gene transcription start sites and termination sites can be well characterized with the distribution patterns of the physical descriptors, indicating the regulatory role of DNA physical properties in gene transcription. In addition, we developed a web server for the model, which is freely accessible at http://lin-group.cn/server/iNuc-force/.
Collapse
Affiliation(s)
- Guoqing Liu
- The School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China.
| | - Guo-Jun Liu
- School of Natural Sciences and Mathematics, Ural Federal University, Ekaterinburg 620000, Russia
| | - Jiu-Xin Tan
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
10
|
Chowdhury B, Garai A, Garai G. An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm. BMC Bioinformatics 2017; 18:460. [PMID: 29065853 PMCID: PMC5655831 DOI: 10.1186/s12859-017-1874-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 10/17/2017] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Detection of important functional and/or structural elements and identification of their positions in a large eukaryotic genomic sequence are an active research area. Gene is an important functional and structural unit of DNA. The computation of gene prediction is, therefore, very essential for detailed genome annotation. RESULTS In this paper, we propose a new gene prediction technique based on Genetic Algorithm (GA) to determine the optimal positions of exons of a gene in a chromosome or genome. The correct identification of the coding and non-coding regions is difficult and computationally demanding. The proposed genetic-based method, named Gene Prediction with Genetic Algorithm (GPGA), reduces this problem by searching only one exon at a time instead of all exons along with its introns. This representation carries a significant advantage in that it breaks the entire gene-finding problem into a number of smaller sub-problems, thereby reducing the computational complexity. We tested the performance of the GPGA with existing benchmark datasets and compared the results with well-known and relevant techniques. The comparison shows the better or comparable performance of the proposed method. We also used GPGA for annotating the human chromosome 21 (HS21) using cross-species comparisons with the mouse orthologs. CONCLUSION It was noted that the GPGA predicted true genes with better accuracy than other well-known approaches.
Collapse
Affiliation(s)
- Biswanath Chowdhury
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, 700009 WB India
| | - Arnav Garai
- Unit of Energy, Utilities, Communications and Services, Infosys Technologies Ltd., Bhubaneswar, 751024 Odisha India
| | - Gautam Garai
- Computational Sciences Division, Saha Institute of Nuclear Physics, Kolkata, 700064 WB India
| |
Collapse
|
11
|
Quality Monitoring for Laser Welding Based on High-Speed Photography and Support Vector Machine. APPLIED SCIENCES-BASEL 2017. [DOI: 10.3390/app7030299] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
12
|
He H, Lin D, Zhang J, Wang Y, Deng HW. Biostatistics, Data Mining and Computational Modeling. TRANSLATIONAL BIOINFORMATICS 2016. [DOI: 10.1007/978-94-017-7543-4_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
13
|
Zhang X, Shen Z, Zhang G, Shen Y, Chen M, Zhao J, Wu R. Short Exon Detection via Wavelet Transform Modulus Maxima. PLoS One 2016; 11:e0163088. [PMID: 27635656 PMCID: PMC5026382 DOI: 10.1371/journal.pone.0163088] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 09/04/2016] [Indexed: 02/05/2023] Open
Abstract
The detection of short exons is a challenging open problem in the field of bioinformatics. Due to the fact that the weakness of existing model-independent methods lies in their inability to reliably detect small exons, a model-independent method based on the singularity detection with wavelet transform modulus maxima has been developed for detecting short coding sequences (exons) in eukaryotic DNA sequences. In the analysis of our method, the local maxima can capture and characterize singularities of short exons, which helps to yield significant patterns that are rarely observed with the traditional methods. In order to get some information about singularities on the differences between the exon signal and the background noise, the noise level is estimated by filtering the genomic sequence through a notch filter. Meanwhile, a fast method based on a piecewise cubic Hermite interpolating polynomial is applied to reconstruct the wavelet coefficients for improving the computational efficiency. In addition, the output measure of a paired-numerical representation calculated in both forward and reverse directions is used to incorporate a useful DNA structural property. The performances of our approach and other techniques are evaluated on two benchmark data sets. Experimental results demonstrate that the proposed method outperforms all assessed model-independent methods for detecting short exons in terms of evaluation metrics.
Collapse
Affiliation(s)
- Xiaolei Zhang
- Shantou University Medical College, Shantou, P.R. China
| | - Zhiwei Shen
- Department of Radiology, Second Affiliated Hospital of Shantou University Medical College, Shantou, P.R. China
| | - Guishan Zhang
- College of Engineering, Shantou University, Shantou, P.R. China
| | - Yuanyu Shen
- Department of Radiology, Second Affiliated Hospital of Shantou University Medical College, Shantou, P.R. China
| | - Miaomiao Chen
- Department of Radiology, Second Affiliated Hospital of Shantou University Medical College, Shantou, P.R. China
| | - Jiaxiang Zhao
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin, P.R. China
- * E-mail: (JXZ); (RHW)
| | - Renhua Wu
- Department of Radiology, Second Affiliated Hospital of Shantou University Medical College, Shantou, P.R. China
- * E-mail: (JXZ); (RHW)
| |
Collapse
|
14
|
Bashir S, Qamar U, Khan FH. IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework. J Biomed Inform 2015; 59:185-200. [PMID: 26703093 DOI: 10.1016/j.jbi.2015.12.001] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Revised: 11/01/2015] [Accepted: 12/06/2015] [Indexed: 11/30/2022]
Abstract
Accuracy plays a vital role in the medical field as it concerns with the life of an individual. Extensive research has been conducted on disease classification and prediction using machine learning techniques. However, there is no agreement on which classifier produces the best results. A specific classifier may be better than others for a specific dataset, but another classifier could perform better for some other dataset. Ensemble of classifiers has been proved to be an effective way to improve classification accuracy. In this research we present an ensemble framework with multi-layer classification using enhanced bagging and optimized weighting. The proposed model called "HM-BagMoov" overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on five different heart disease datasets, four breast cancer datasets, two diabetes datasets, two liver disease datasets and one hepatitis dataset obtained from public repositories. The analysis of the results show that ensemble framework achieved the highest accuracy, sensitivity and F-Measure when compared with individual classifiers for all the diseases. In addition to this, the ensemble framework also achieved the highest accuracy when compared with the state of the art techniques. An application named "IntelliHealth" is also developed based on proposed model that may be used by hospitals/doctors for diagnostic advice.
Collapse
Affiliation(s)
- Saba Bashir
- Computer Engineering Department, College of Electrical and Mechanical Engineering, National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan.
| | - Usman Qamar
- Computer Engineering Department, College of Electrical and Mechanical Engineering, National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan.
| | - Farhan Hassan Khan
- Computer Engineering Department, College of Electrical and Mechanical Engineering, National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan.
| |
Collapse
|
15
|
Using weighted features to predict recombination hotspots in Saccharomyces cerevisiae. J Theor Biol 2015; 382:15-22. [DOI: 10.1016/j.jtbi.2015.06.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Revised: 06/04/2015] [Accepted: 06/20/2015] [Indexed: 01/06/2023]
|
16
|
Bioinformatics Analyses to Separate Species Specific mRNAs from Unknown Sequences in de novo Assembled Transcriptomes. ACTA ACUST UNITED AC 2015. [DOI: 10.1007/978-3-319-16480-9_32] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
|
17
|
Goel N, Singh S, Aseri TC. An Improved Method for Splice Site Prediction in DNA Sequences Using Support Vector Machines. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.procs.2015.07.350] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
18
|
Feng Y, Luo L. Using long-range contact number information for protein secondary structure prediction. INT J BIOMATH 2014. [DOI: 10.1142/s1793524514500521] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper, we first combine tetra-peptide structural words with contact number for protein secondary structure prediction. We used the method of increment of diversity combined with quadratic discriminant analysis to predict the structure of central residue for a sequence fragment. The method is used tetra-peptide structural words and long-range contact number as information resources. The accuracy of Q3 is over 83% in 194 proteins. The accuracies of predicted secondary structures for 20 amino acid residues are ranged from 81% to 88%. Moreover, we have introduced the residue long-range contact, which directly indicates the separation of contacting residue in terms of the position in the sequence, and examined the negative influence of long-range residue interactions on predicting secondary structure in a protein. The method is also compared with existing prediction methods. The results show that our method is more effective in protein secondary structures prediction.
Collapse
Affiliation(s)
- Yonge Feng
- College of Science, Inner Mongolia Agriculture University, Hohhot 010018, P. R. China
| | - Liaofu Luo
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, P. R. China
| |
Collapse
|
19
|
IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction. Adv Bioinformatics 2014; 2014:261362. [PMID: 25132849 PMCID: PMC4123571 DOI: 10.1155/2014/261362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Revised: 06/07/2014] [Accepted: 06/14/2014] [Indexed: 11/17/2022] Open
Abstract
Protein coding and promoter region predictions are very important challenges of bioinformatics (Attwood and Teresa, 2000). The identification of these regions plays a crucial role in understanding the genes. Many novel computational and mathematical methods are introduced as well as existing methods that are getting refined for predicting both of the regions separately; still there is a scope for improvement. We propose a classifier that is built with MACA (multiple attractor cellular automata) and MCC (modified clonal classifier) to predict both regions with a single classifier. The proposed classifier is trained and tested with Fickett and Tung (1992) datasets for protein coding region prediction for DNA sequences of lengths 54, 108, and 162. This classifier is trained and tested with MMCRI datasets for protein coding region prediction for DNA sequences of lengths 252 and 354. The proposed classifier is trained and tested with promoter sequences from DBTSS (Yamashita et al., 2006) dataset and nonpromoters from EID (Saxonov et al., 2000) and UTRdb (Pesole et al., 2002) datasets. The proposed model can predict both regions with an average accuracy of 90.5% for promoter and 89.6% for protein coding region predictions. The specificity and sensitivity values of promoter and protein coding region predictions are 0.89 and 0.92, respectively.
Collapse
|
20
|
Feng Y, Lin H, Luo L. Prediction of protein secondary structure using feature selection and analysis approach. Acta Biotheor 2014; 62:1-14. [PMID: 24052343 DOI: 10.1007/s10441-013-9203-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2012] [Accepted: 08/24/2013] [Indexed: 01/09/2023]
Abstract
The prediction of the secondary structure of a protein from its amino acid sequence is an important step towards the prediction of its three-dimensional structure. However, the accuracy of ab initio secondary structure prediction from sequence is about 80% currently, which is still far from satisfactory. In this study, we proposed a novel method that uses binomial distribution to optimize tetrapeptide structural words and increment of diversity with quadratic discriminant to perform prediction for protein three-state secondary structure. A benchmark dataset including 2,640 proteins with sequence identity of less than 25% was used to train and test the proposed method. The results indicate that overall accuracy of 87.8% was achieved in secondary structure prediction by using ten-fold cross-validation. Moreover, the accuracy of predicted secondary structures ranges from 84 to 89% at the level of residue. These results suggest that the feature selection technique can detect the optimized tetrapeptide structural words which affect the accuracy of predicted secondary structures.
Collapse
|
21
|
Chen S, Zhang CY, Song K. Recognizing short coding sequences of prokaryotic genome using a novel iteratively adaptive sparse partial least squares algorithm. Biol Direct 2013; 8:23. [PMID: 24067167 PMCID: PMC3852556 DOI: 10.1186/1745-6150-8-23] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 09/23/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Significant efforts have been made to address the problem of identifying short genes in prokaryotic genomes. However, most known methods are not effective in detecting short genes. Because of the limited information contained in short DNA sequences, it is very difficult to accurately distinguish between protein coding and non-coding sequences in prokaryotic genomes. We have developed a new Iteratively Adaptive Sparse Partial Least Squares (IASPLS) algorithm as the classifier to improve the accuracy of the identification process. RESULTS For testing, we chose the short coding and non-coding sequences from seven prokaryotic organisms. We used seven feature sets (including GC content, Z-curve, etc.) of short genes.In comparison with GeneMarkS, Metagene, Orphelia, and Heuristic Approachs methods, our model achieved the best prediction performance in identification of short prokaryotic genes. Even when we focused on the very short length group ([60-100 nt)), our model provided sensitivity as high as 83.44% and specificity as high as 92.8%. These values are two or three times higher than three of the other methods while Metagene fails to recognize genes in this length range.The experiments also proved that the IASPLS can improve the identification accuracy in comparison with other widely used classifiers, i.e. Logistic, Random Forest (RF) and K nearest neighbors (KNN). The accuracy in using IASPLS was improved 5.90% or more in comparison with the other methods. In addition to the improvements in accuracy, IASPLS required ten times less computer time than using KNN or RF. CONCLUSIONS It is conclusive that our method is preferable for application as an automated method of short gene classification. Its linearity and easily optimized parameters make it practicable for predicting short genes of newly-sequenced or under-studied species.
Collapse
Affiliation(s)
- Sun Chen
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China.
| | | | | |
Collapse
|
22
|
Calculation of nucleosomal DNA deformation energy: its implication for nucleosome positioning. Chromosome Res 2012; 20:889-902. [DOI: 10.1007/s10577-012-9328-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2012] [Revised: 11/09/2012] [Accepted: 11/15/2012] [Indexed: 10/27/2022]
|
23
|
iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties. PLoS One 2012; 7:e47843. [PMID: 23144709 PMCID: PMC3483203 DOI: 10.1371/journal.pone.0047843] [Citation(s) in RCA: 165] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 09/21/2012] [Indexed: 01/14/2023] Open
Abstract
Nucleosome positioning has important roles in key cellular processes. Although intensive efforts have been made in this area, the rules defining nucleosome positioning is still elusive and debated. In this study, we carried out a systematic comparison among the profiles of twelve DNA physicochemical features between the nucleosomal and linker sequences in the Saccharomyces cerevisiae genome. We found that nucleosomal sequences have some position-specific physicochemical features, which can be used for in-depth studying nucleosomes. Meanwhile, a new predictor, called iNuc-PhysChem, was developed for identification of nucleosomal sequences by incorporating these physicochemical properties into a 1788-D (dimensional) feature vector, which was further reduced to a 884-D vector via the IFS (incremental feature selection) procedure to optimize the feature set. It was observed by a cross-validation test on a benchmark dataset that the overall success rate achieved by iNuc-PhysChem was over 96% in identifying nucleosomal or linker sequences. As a web-server, iNuc-PhysChem is freely accessible to the public at http://lin.uestc.edu.cn/server/iNuc-PhysChem. For the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented just for the integrity in developing the predictor. Meanwhile, for those who prefer to run predictions in their own computers, the predictor's code can be easily downloaded from the web-server. It is anticipated that iNuc-PhysChem may become a useful high throughput tool for both basic research and drug design.
Collapse
|
24
|
Izumiyama T, Minoshima S, Yoshida T, Shimizu N. A novel big protein TPRBK possessing 25 units of TPR motif is essential for the progress of mitosis and cytokinesis. Gene 2012; 511:202-17. [PMID: 23036704 DOI: 10.1016/j.gene.2012.09.061] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Revised: 09/07/2012] [Accepted: 09/20/2012] [Indexed: 10/27/2022]
Abstract
Through the comprehensive analysis of the genomic DNA sequence of human chromosome 22, we identified a novel gene of 702 kb encoding a big protein of 2481 amino acid residues, and named it as TPRBK (TPR containing big gene cloned at Keio). A novel protein TPRBK possesses 25 units of the TPR motif, which has been known to associate with a diverse range of biological functions. Orthologous genes of human TPRBK were found widely in animal species, from insecta to mammal, but not found in plants, fungi and nematoda. Northern blotting and RT-PCR analyses revealed that TPRBK gene is expressed ubiquitously in the human and mouse fetal tissues and various cell lines of human, monkey and mouse. Immunofluorescent staining of the synchronized monkey COS-7 cells with several relevant antibodies indicated that TPRBK changes its subcellular localization during the cell cycle: at interphase TPRBK locates on the centrosomes, during mitosis it translocates from spindle poles to mitotic spindles then to spindle midzone, and through a period of cytokinesis it stays on the midbody. Co-immunoprecipitation assay and immunofluorescent staining with adequate antibodies revealed that TPRBK binds to Aurora B, and those proteins together translocate throughout mitosis and cytokinesis. Treatments of cells with two drugs (Blebbistatin and Y-27632), that are known to inhibit the contractility of actin-myosin, disturbed the proper intracellular localization of TPRBK. Moreover, the knockdown of TPRBK expression by small interfering RNA (siRNA) suppressed the bundling of spindle midzone microtubules and disrupted the midbody formation, arresting the cells at G(2)+M phase. These observations indicated that a novel big protein TPRBK is essential for the formation and integrity of the midbody, hence we postulated that TPRBK plays a critical role in the progress of mitosis and cytokinesis during mammalian cell cycle.
Collapse
Affiliation(s)
- Tomohiro Izumiyama
- Advanced Research Center for Genome Super Power, Keio University, Tsukuba, Japan
| | | | | | | |
Collapse
|
25
|
Song K, Zhang Z, Tong TP, Wu F. Classifier assessment and feature selection for recognizing short coding sequences of human genes. J Comput Biol 2012; 19:251-60. [PMID: 22401589 DOI: 10.1089/cmb.2011.0078] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
With the ever-increasing pace of genome sequencing, there is a great need for fast and accurate computational tools to automatically identify genes in these genomes. Although great progress has been made in the development of gene-finding algorithms during the past decades, there is still room for further improvement. In particular, the issue of recognizing short exons in eukaryotes is still not solved satisfactorily. This article is devoted to assessing various linear and kernel-based classification algorithms and selecting the best combination of Z-curve features for further improvement of the issue. Eight state-of-the-art linear and kernel-based supervised pattern recognition techniques were used to identify the short (21-192 bp) coding sequences of human genes. By measuring the prediction accuracy, the tradeoff between sensitivity and specificity and the time consumption, partial least squares (PLS) and kernel partial least squares (KPLS) algorithms were verified to be the most optimal linear and kernel-based classifiers, respectively. A surprising result was that, by making good use of the interpretability of the PLS and the Z-curve methods, 93 Z-curve features were proved to be the best selective combination. Using them, the average recognition accuracy was improved as high as 7.7% by means of KPLS when compared with what was obtained by the Fisher discriminant analysis using 189 Z-curve variables (Gao and Zhang, 2004 ). The used codes are freely available from the following approaches (implemented in MATLAB and supported on Linux and MS Windows): (1) SVM: http://www.support-vector-machines.org/SVM_soft.html. (2) GP: http://www.gaussianprocess.org. (3) KPLS and KFDA: Taylor, J.S., and Cristianini, N. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK. (4) PLS: Wise, B.M., and Gallagher, N.B. 2011. PLS-Toolbox for use with MATLAB: ver 1.5.2. Eigenvector Technologies, Manson, WA. Supplementary Material for this article is available at www.liebertonline.com/cmb.
Collapse
Affiliation(s)
- Kai Song
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China.
| | | | | | | |
Collapse
|
26
|
Sequence-dependent prediction of recombination hotspots in Saccharomyces cerevisiae. J Theor Biol 2012; 293:49-54. [DOI: 10.1016/j.jtbi.2011.10.004] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Revised: 10/04/2011] [Accepted: 10/04/2011] [Indexed: 11/18/2022]
|
27
|
Yin PY, Shyu SJ, Yang SR, Chang YC. Reinforcement Learning for Improving Gene Identification Accuracy by Combination of Gene-Finding Programs. INTERNATIONAL JOURNAL OF APPLIED METAHEURISTIC COMPUTING 2012. [DOI: 10.4018/jamc.2012010104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Due to the explosive and growing size of the genome database, the discovery of gene has become one of the most computationally intensive tasks in bioinformatics. Many such systems have been developed to find genes; however, there is still some room to improve the prediction accuracy. This paper proposes a reinforcement learning model for a combination of gene predictions from existing gene-finding programs. The model learns the optimal policy for accepting the best predictions. The fitness of a policy is reinforced if the selected prediction at a nucleotide site correctly corresponds to the true annotation. The model searches for the optimal policy which maximizes the expected prediction accuracy over all nucleotide sites in the sequences. The experimental results demonstrate that the proposed model yields higher prediction accuracy than that obtained by the single best program.
Collapse
|
28
|
Zhu P, Bowden P, Zhang D, Marshall JG. Mass spectrometry of peptides and proteins from human blood. MASS SPECTROMETRY REVIEWS 2011; 30:685-732. [PMID: 24737629 DOI: 10.1002/mas.20291] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2008] [Revised: 12/09/2009] [Accepted: 01/19/2010] [Indexed: 06/03/2023]
Abstract
It is difficult to convey the accelerating rate and growing importance of mass spectrometry applications to human blood proteins and peptides. Mass spectrometry can rapidly detect and identify the ionizable peptides from the proteins in a simple mixture and reveal many of their post-translational modifications. However, blood is a complex mixture that may contain many proteins first expressed in cells and tissues. The complete analysis of blood proteins is a daunting task that will rely on a wide range of disciplines from physics, chemistry, biochemistry, genetics, electromagnetic instrumentation, mathematics and computation. Therefore the comprehensive discovery and analysis of blood proteins will rank among the great technical challenges and require the cumulative sum of many of mankind's scientific achievements together. A variety of methods have been used to fractionate, analyze and identify proteins from blood, each yielding a small piece of the whole and throwing the great size of the task into sharp relief. The approaches attempted to date clearly indicate that enumerating the proteins and peptides of blood can be accomplished. There is no doubt that the mass spectrometry of blood will be crucial to the discovery and analysis of proteins, enzyme activities, and post-translational processes that underlay the mechanisms of disease. At present both discovery and quantification of proteins from blood are commonly reaching sensitivities of ∼1 ng/mL.
Collapse
Affiliation(s)
- Peihong Zhu
- Department of Chemistry and Biology, Ryerson University, 350 Victoria Street, Toronto, Ontario, Canada M5B 2K3
| | | | | | | |
Collapse
|
29
|
Jin J, An J. Robust discriminant analysis and its application to identify protein coding regions of rice genes. Math Biosci 2011; 232:96-100. [PMID: 21575644 DOI: 10.1016/j.mbs.2011.04.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2010] [Revised: 04/18/2011] [Accepted: 04/25/2011] [Indexed: 10/18/2022]
Abstract
Identification of protein coding regions is fundamentally a statistical pattern recognition problem. Discriminant analysis is a statistical technique for classifying a set of observations into predefined classes and it is useful to solve such problems. It is well known that outliers are present in virtually every data set in any application domain, and classical discriminant analysis methods (including linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA)) do not work well if the data set has outliers. In order to overcome the difficulty, the robust statistical method is used in this paper. We choose four different coding characters as discriminant variables and an approving result is presented by the method of robust discriminant analysis.
Collapse
Affiliation(s)
- Jiao Jin
- Department of Statistics and Financial Mathematics, School of Mathematical Sciences, Beijing Normal University, Ministry of Education, Beijing, China
| | | |
Collapse
|
30
|
Xu S, Rao N, Chen X, Zhou B. Inferring an organism-specific optimal threshold for predicting protein coding regions in eukaryotes based on a bootstrapping algorithm. Biotechnol Lett 2011; 33:889-96. [DOI: 10.1007/s10529-011-0525-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2010] [Accepted: 01/06/2011] [Indexed: 11/25/2022]
|
31
|
Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci 2010; 130:91-100. [DOI: 10.1007/s12064-010-0114-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Accepted: 10/23/2010] [Indexed: 12/27/2022]
|
32
|
Zhao X, Pei Z, Liu J, Qin S, Cai L. Prediction of nucleosome DNA formation potential and nucleosome positioning using increment of diversity combined with quadratic discriminant analysis. Chromosome Res 2010; 18:777-85. [PMID: 20953693 DOI: 10.1007/s10577-010-9160-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Revised: 09/17/2010] [Accepted: 09/30/2010] [Indexed: 10/18/2022]
Abstract
In this work, a novel method was developed to distinguish nucleosome DNA and linker DNA based on increment of diversity combined with quadratic discriminant analysis (IDQD), using k-mer frequency of nucleotides in genome. When used to predict DNA potential for forming nucleosomes, the model achieved a high accuracy of 94.94%, 77.60%, and 86.81%, respectively, for Saccharomyces cerevisiae, Homo sapiens, and Drosophila melanogaster. The area under the receiver operator characteristics curve of our classifier was 0.982 for S. cerevisiae. Our results indicate that DNA sequence preference is critical for nucleosome formation potential and is likely conserved across eukaryotes. The model successfully identified nucleosome-enriched or nucleosome-depleted regions in S. cerevisiae genome, suggesting nucleosome positioning depends on DNA sequence preference. Thus, IDQD classifier is useful for predicting nucleosome positioning.
Collapse
Affiliation(s)
- Xiujuan Zhao
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | | | | | | | | |
Collapse
|
33
|
Wu J. Testing the coding potential of conserved short genomic sequences. Adv Bioinformatics 2010; 2010:287070. [PMID: 20224812 PMCID: PMC2834954 DOI: 10.1155/2010/287070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Accepted: 01/02/2010] [Indexed: 11/25/2022] Open
Abstract
Proposed is a procedure to test whether a genomic sequence contains coding DNA, called a coding potential region. The procedure tests the coding potential of conserved short genomic sequence, in which the assumptions on the probability models of gene structures are relaxed. Thus, it is expected to provide additional candidate regions that contain coding DNAs to the current genomic database. The procedure was applied to the set of highly conserved human-mouse sequences in the genome database at the University of California at Santa Cruz. For sequences containing RefSeq coding exons, the procedure detected 91.3% regions having coding potential in this set, which covers 83% of the human RefSeq coding exons, at a 2.6% false positive rate. The procedure detected 12,688 novel short regions with coding potential at the false discovery rate <0.05; 65.7% of the novel regions are between annotated genes.
Collapse
Affiliation(s)
- Jing Wu
- Department of Statistics, Carnegie Mellon University, PA 15213, USA
| |
Collapse
|
34
|
Bowden P, Pendrak V, Zhu P, Marshall JG. Meta sequence analysis of human blood peptides and their parent proteins. J Proteomics 2010; 73:1163-75. [PMID: 20170764 DOI: 10.1016/j.jprot.2010.02.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2009] [Revised: 01/23/2010] [Accepted: 02/09/2010] [Indexed: 11/19/2022]
Abstract
Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins.
Collapse
Affiliation(s)
- Peter Bowden
- Department of Chemistry and Biology, Ryerson University, Toronto, Canada
| | | | | | | |
Collapse
|
35
|
Abstract
The occupancy of nucleosomes along chromosome is a key factor for gene regulation. However, except promoter regions, genome-wide properties and functions of nucleosome organization remain unclear in mammalian genomes. Using the computational model of Increment of Diversity with Quadratic Discriminant (IDQD) trained from the microarray data, the nucleosome occupancy score (NOScore) was defined and applied to splice junction regions of constitutive, cassette exon, alternative 3′ and 5′ splicing events in the human genome. We found an interesting relation between NOScore and RNA splicing: exon regions have higher NOScores compared with their flanking intron sequences in both constitutive and alternative splicing events, indicating the stronger nucleosome occupation potential of exon regions. In addition, NOScore valleys present at ∼25 bp upstream of the acceptor site in all splicing events. By defining folding diversity-to-energy ratio to describe RNA structural flexibility, we demonstrated that primary RNA transcripts from nucleosome occupancy regions are relatively rigid and those from nucleosome depleted regions are relatively flexible. The negative correlation between nucleosome occupation/depletion of DNA sequence and structural flexibility/rigidity of its primary transcript around splice junctions may provide clues to the deeper understanding of the unexpected role for nucleosome organization in the regulation of RNA splicing.
Collapse
Affiliation(s)
- Wei Chen
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | | | | |
Collapse
|
36
|
Bowden P, Beavis R, Marshall J. Tandem mass spectrometry of human tryptic blood peptides calculated by a statistical algorithm and captured by a relational database with exploration by a general statistical analysis system. J Proteomics 2009; 73:103-11. [PMID: 19703602 DOI: 10.1016/j.jprot.2009.08.004] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Revised: 08/04/2009] [Accepted: 08/17/2009] [Indexed: 01/23/2023]
Abstract
A goodness of fit test may be used to assign tandem mass spectra of peptides to amino acid sequences and to directly calculate the expected probability of mis-identification. The product of the peptide expectation values directly yields the probability that the parent protein has been mis-identified. A relational database could capture the mass spectral data, the best fit results, and permit subsequent calculations by a general statistical analysis system. The many files of the Hupo blood protein data correlated by X!TANDEM against the proteins of ENSEMBL were collected into a relational database. A redundant set of 247,077 proteins and peptides were correlated by X!TANDEM, and that was collapsed to a set of 34,956 peptides from 13,379 distinct proteins. About 6875 distinct proteins were only represented by a single distinct peptide, 2866 proteins showed 2 distinct peptides, and 3454 proteins showed at least three distinct peptides by X!TANDEM. More than 99% of the peptides were associated with proteins that had cumulative expectation values, i.e. probability of false positive identification, of one in one hundred or less. The distribution of peptides per protein from X!TANDEM was significantly different than those expected from random assignment of peptides.
Collapse
Affiliation(s)
- Peter Bowden
- Department of Chemistry and Biology, Ryerson University, 350 Victoria Street, Toronto, ON, Canada M5B 2K3
| | | | | |
Collapse
|
37
|
Chen W, Luo L. Classification of antimicrobial peptide using diversity measure with quadratic discriminant analysis. J Microbiol Methods 2009; 78:94-6. [PMID: 19348863 DOI: 10.1016/j.mimet.2009.03.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2009] [Revised: 03/20/2009] [Accepted: 03/30/2009] [Indexed: 11/27/2022]
Abstract
Accurate classification of antimicrobial peptides according to their biological activities will facilitate the design of novel antimicrobial agents and the discovery of new therapeutic targets. In this work, an excellent algorithm of Increment of Diversity with Quadratic Discriminant analysis (IDQD) was proposed to classify antimicrobial peptides with diverse biological activities.
Collapse
Affiliation(s)
- Wei Chen
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | | |
Collapse
|
38
|
Jost D, Everaers R. Genome wide application of DNA melting analysis. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2009; 21:034108. [PMID: 21817253 DOI: 10.1088/0953-8984/21/3/034108] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Correspondences between functional and thermodynamic melting properties in a genome are being increasingly employed for ab initio gene finding and for the interpretation of the evolution of genomes. Here we present the first systematic genome wide comparison between biologically coding domains and thermodynamically stable regions. In particular, we develop statistical methods to estimate the reliability of the resulting predictions. Not surprisingly, we find that the success of the approach depends on the difference in GC content between the coding and the non-coding parts of the genome and on the percentage of coding base-pairs in the sequence. These prerequisites vary strongly between species, where we observe no systematic differences between eukaryotes and prokaryotes. We find a number of organisms in which the strong correlation of coding domains and thermodynamically stable regions allows us to identify putative exons or genes to complement existing approaches. In contrast to previous investigations along these lines we have not employed the Poland-Scheraga (PS) model of DNA melting but use the earlier Zimm-Bragg (ZB) model. The Ising-like form of the ZB model can be viewed as an approximation to the PS model, with averaged loop entropies included into the cooperative factor [Formula: see text]. This results in a speed-up by a factor of 20-100 compared to the Fixman-Freire algorithm for the solution of the PS model. We show that for genomic sequences the resulting systematic errors are negligible compared to the parameterization uncertainty of the models. We argue that for limited computing resources, available CPU power is better invested in broadening the statistical base for genomic investigations than in marginal improvements of the description of the physical melting behavior.
Collapse
Affiliation(s)
- Daniel Jost
- Laboratoire de Physique de l'École Normale Supérieure de Lyon, Université de Lyon, CNRS UMR 5672, 46 Allée d'Italie 69364 Lyon Cedex 07, France
| | | |
Collapse
|
39
|
Zhang MQ. Using MZEF to find internal coding exons. CURRENT PROTOCOLS IN BIOINFORMATICS 2008; Chapter 4:Unit 4.2. [PMID: 18792940 DOI: 10.1002/0471250953.bi0402s00] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
MZEF (Michael Zhang's Exon Finder) was designed to help identify one of the most important classes of exons, i.e. the internal coding exons, in human genomic DNA sequences. It is neither for predicting intronless genes, nor for assembling predicted exons into complete gene models. There is also a mouse version (mMZEF) and an Arabidopsis version (aMZEF). This unit presents the Unix and Web versions of MZEF and reviews how to interpret the MZEF results.
Collapse
Affiliation(s)
- Micheal Q Zhang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| |
Collapse
|
40
|
Lu J, Luo L, Zhang Y. Distance conservation of transcription regulatory motifs in human promoters. Comput Biol Chem 2008; 32:433-7. [PMID: 18722813 DOI: 10.1016/j.compbiolchem.2008.07.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2007] [Revised: 03/20/2008] [Accepted: 07/02/2008] [Indexed: 10/21/2022]
Abstract
To understanding the interaction network among transcription-regulation elements in human is an immediate challenge for modern molecular biology. Here a central problem is how to extract evolutionary information and search the evolutionary conservation from the comparison of promoters of closely related species. Through the comparative studies of k-mer distribution in human and mouse transcription factor binding site (TFBS) sequences we have discovered that the average distance between a pair of transcription regulatory 7-mer motifs is conservative in human-mouse promoters. The distance conservation is a new kind of evolutionary conservation, not based on the strict location of bases in genome sequence. By utilizing the conservation of k-mer distance it will be helpful to propose a non-alignment-based approach for fast genome-wide discovery of transcription regulatory motifs. We demonstrated the distance conservation by genome-wide searching of conservative regulatory 7-mer motifs with successful rate 90%. Then, after defining human-mouse pair-distance divergence parameter we studied the tissue-specific motif pairs and found that the parameter for motif pairs is 11-16 times smaller than for their controls for 28 tissues and these pairs can be clearly differentiated on two-dimensional parameter plane. Finally, the mechanism of distance conservation was discussed briefly which is supposed to be related to the module structure of TFBSs.
Collapse
Affiliation(s)
- Jun Lu
- Laboratory of Theoretical Biophysics, Faculty of Science and Technology, Inner Mongolia University, Hohhot, China
| | | | | |
Collapse
|
41
|
Abstract
The CorePromoter program is very useful for identification of transcriptional start sites (TSS) and core promoter regions when 5'-upstream genomic DNA sequences of human genes are available. It is very simple to use and can be accessed either through the Web or after downloading to a local computer. The protocols in this unit introduce its basic methodology and discuss how to apply it to a sample problem in conjunction with other gene-finding programs.
Collapse
Affiliation(s)
- Michael Q Zhang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| |
Collapse
|
42
|
Davuluri RV. Application of FirstEF to find promoters and first exons in the human genome. ACTA ACUST UNITED AC 2008; Chapter 4:Unit4.7. [PMID: 18428702 DOI: 10.1002/0471250953.bi0407s01] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Predicting first exons and promoters is an important part of gene finding in DNA sequence analysis. This unit presents FirstEF as a method for predicting the first exons and promoters. A combines FirstEF predictions with other information such as cDNA/EST matches.
Collapse
|
43
|
Feng Y, Luo L. Use of tetrapeptide signals for protein secondary-structure prediction. Amino Acids 2008; 35:607-14. [PMID: 18431531 DOI: 10.1007/s00726-008-0089-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2007] [Accepted: 03/04/2008] [Indexed: 10/22/2022]
Abstract
This paper develops a novel sequence-based method, tetra-peptide-based increment of diversity with quadratic discriminant analysis (TPIDQD for short), for protein secondary-structure prediction. The proposed TPIDQD method is based on tetra-peptide signals and is used to predict the structure of the central residue of a sequence fragment. The three-state overall per-residue accuracy (Q (3)) is about 80% in the threefold cross-validated test for 21-residue fragments in the CB513 dataset. The accuracy can be further improved by taking long-range sequence information (fragments of more than 21 residues) into account in prediction. The results show the tetra-peptide signals can indeed reflect some relationship between an amino acid's sequence and its secondary structure, indicating the importance of tetra-peptide signals as the protein folding code in the protein structure prediction.
Collapse
Affiliation(s)
- Yonge Feng
- Laboratory of Theoretical Biophysics, Faculty of Science and Technology, Inner Mongolia University, Hohhot, 010021, China.
| | | |
Collapse
|
44
|
Abstract
As the number of sequenced genomes increases, the ability to deduce genome function becomes increasingly salient. For many genome sequences, the only annotation that will be available for the foreseeable future will be based on computational predictions and comparisons with functional elements in related species. Here we discuss computational approaches for automated genome-wide annotation of functional elements in mammalian genomes. These include methods for ab initio and comparative gene-structure predictions. Gene features such as intron splice sites, 3' untranslated regions, promoters, and cis-regulatory elements are discussed, as is a novel method for predicting DNaseI hypersensitive sites. Recent methodologies for predicting noncoding RNA genes, including microRNA genes and their targets, are also reviewed.
Collapse
Affiliation(s)
- Steven J M Jones
- Genome Sciences Centre, British Columbia Cancer Research Center, Vancouver, British Columbia, V5Z 1L3, Canada.
| |
Collapse
|
45
|
An artificial neural network method for combining gene prediction based on equitable weights. Neurocomputing 2008. [DOI: 10.1016/j.neucom.2007.07.019] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
46
|
Yin C, Yau SST. Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J Theor Biol 2007; 247:687-94. [PMID: 17509616 DOI: 10.1016/j.jtbi.2007.03.038] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2006] [Revised: 03/24/2007] [Accepted: 03/26/2007] [Indexed: 11/30/2022]
Abstract
With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it is indispensable to develop different prediction methods since combining different methods may greatly improve the prediction accuracy. A new method to predict protein coding regions is developed in this paper based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. The method computes the 3-base periodicity and the background noise of the stepwise DNA segments of the target DNA sequences using nucleotide distributions in the three codon positions of the DNA sequences. Exon and intron sequences can be identified from trends of the ratio of the 3-base periodicity to the background noise in the DNA sequences. Case studies on genes from different organisms show that this method is an effective approach for exon prediction.
Collapse
Affiliation(s)
- Changchuan Yin
- Department of Mathematics, Statistics and Computer Science, The University of Illinois at Chicago, M/C 249, Chicago, IL 60607-7045, USA
| | | |
Collapse
|
47
|
Yang H, Sasaki T, Minoshima S, Shimizu N. Identification of three novel proteins (SGSM1, 2, 3) which modulate small G protein (RAP and RAB)-mediated signaling pathway. Genomics 2007; 90:249-60. [PMID: 17509819 DOI: 10.1016/j.ygeno.2007.03.013] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2006] [Revised: 03/20/2007] [Accepted: 03/26/2007] [Indexed: 01/12/2023]
Abstract
We report a novel protein family consisting of three members, each of which contains RUN and TBC motifs and appears to be associated with small G protein-mediated signal transduction pathway. We named these proteins as small G protein signaling modulators (SGSM1/2/3). Northern blot analysis revealed that human SGSM2/3 are expressed ubiquitously in various tissues, whereas SGSM1 is expressed mainly in brain, heart, and testis. Mouse possessed the same protein family genes, and the in situ hybridization and immunohistochemical staining of tissue sections revealed that mouse Sgsm1/2/3 are expressed in the neurons of central nervous system, indicating the strong association of Sgsm family with neuronal function. Furthermore, endogenous Sgsm1 protein was localized in the trans-Golgi network of mouse Neuro2a cells by immunofluorescence microscopy. Expression of various cDNA constructs followed by immunoprecipitation assay revealed that human SGSM1/2/3 proteins are coprecipitated with RAP and RAB subfamily members of the small G protein superfamily. Based on these results, we postulated that the SGSM family members function as modulators of the small G protein RAP and RAB-mediated neuronal signal transduction and vesicular transportation pathways.
Collapse
Affiliation(s)
- Hao Yang
- Department of Molecular Biology, Keio University School of Medicine, Tokyo 160-8582, Japan
| | | | | | | |
Collapse
|
48
|
Bernal A, Crammer K, Hatzigeorgiou A, Pereira F. Global discriminative learning for higher-accuracy computational gene prediction. PLoS Comput Biol 2007; 3:e54. [PMID: 17367206 PMCID: PMC1828702 DOI: 10.1371/journal.pcbi.0030054] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2006] [Accepted: 02/01/2007] [Indexed: 11/18/2022] Open
Abstract
Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns. We describe a new approach to statistical learning for sequence data that is broadly applicable to computational biology problems and that has experimentally demonstrated advantages over current hidden Markov model (HMM)-based methods for sequence analysis. The methods we describe in this paper, implemented in the CRAIG program, allow researchers to modularly specify and train sequence analysis models that combine a wide range of weakly informative features into globally optimal predictions. Our results for the gene prediction problem show significant improvements over existing ab initio gene predictors on a variety of tests, including the specially challenging ENCODE regions. Such improved predictions, particularly on initial and single exons, could benefit researchers who are seeking more accurate means of recognizing such important features as signal peptides and regulatory regions. More generally, we believe that our method, by combining the structure-describing capabilities of HMMs with the accuracy of margin-based classification methods, provides a general tool for statistical learning in biological sequences that will replace HMMs in any sequence modeling task for which there is annotated training data.
Collapse
Affiliation(s)
- Axel Bernal
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
| | | | | | | |
Collapse
|
49
|
Abstract
The BioPerl toolkit provides a library of hundreds of routines for processing sequence, annotation, alignment, and sequence analysis reports. It often serves as a bridge between different computational biology applications assisting the user to construct analysis pipelines. This chapter illustrates how BioPerl facilitates tasks such as writing scripts summarizing information from BLAST reports or extracting key annotation details from a GenBank sequence record.
Collapse
|
50
|
Knapp K, Chen YPP. An evaluation of contemporary hidden Markov model genefinders with a predicted exon taxonomy. Nucleic Acids Res 2006; 35:317-24. [PMID: 17170005 PMCID: PMC1802560 DOI: 10.1093/nar/gkl1026] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2006] [Revised: 11/13/2006] [Accepted: 11/13/2006] [Indexed: 11/15/2022] Open
Abstract
We present an independent evaluation of six recent hidden Markov model (HMM) genefinders. Each was tested on the new dataset (FSH298), the results of which showed no dramatic improvement over the genefinders tested five years ago. In addition, we introduce a comprehensive taxonomy of predicted exons and classify each resulting exon accordingly. These results are useful in measuring (with finer granularity) the effects of changes in a genefinder. We present an analysis of these results and identify four patterns of inaccuracy common in all HMM-based results.
Collapse
Affiliation(s)
- Keith Knapp
- Faculty of Science and Technology, Deakin UniversityAustralia
| | - Yi-Ping Phoebe Chen
- Faculty of Science and Technology, Deakin UniversityAustralia
- Australia Research Council Centre in BioinformaticsAustralia
| |
Collapse
|