1
|
Zhang H, Wafula EK, Eilers J, Harkess A, Ralph PE, Timilsena PR, dePamphilis CW, Waite JM, Honaas LA. Building a foundation for gene family analysis in Rosaceae genomes with a novel workflow: A case study in Pyrus architecture genes. FRONTIERS IN PLANT SCIENCE 2022; 13:975942. [PMID: 36452099 PMCID: PMC9702816 DOI: 10.3389/fpls.2022.975942] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 09/21/2022] [Indexed: 05/26/2023]
Abstract
The rapid development of sequencing technologies has led to a deeper understanding of plant genomes. However, direct experimental evidence connecting genes to important agronomic traits is still lacking in most non-model plants. For instance, the genetic mechanisms underlying plant architecture are poorly understood in pome fruit trees, creating a major hurdle in developing new cultivars with desirable architecture, such as dwarfing rootstocks in European pear (Pyrus communis). An efficient way to identify genetic factors for important traits in non-model organisms can be to transfer knowledge across genomes. However, major obstacles exist, including complex evolutionary histories and variable quality and content of publicly available plant genomes. As researchers aim to link genes to traits of interest, these challenges can impede the transfer of experimental evidence across plant species, namely in the curation of high-quality, high-confidence gene models in an evolutionary context. Here we present a workflow using a collection of bioinformatic tools for the curation of deeply conserved gene families of interest across plant genomes. To study gene families involved in tree architecture in European pear and other rosaceous species, we used our workflow, plus a draft genome assembly and high-quality annotation of a second P. communis cultivar, 'd'Anjou.' Our comparative gene family approach revealed significant issues with the most recent 'Bartlett' genome - primarily thousands of missing genes due to methodological bias. After correcting assembly errors on a global scale in the 'Bartlett' genome, we used our workflow for targeted improvement of our genes of interest in both P. communis genomes, thus laying the groundwork for future functional studies in pear tree architecture. Further, our global gene family classification of 15 genomes across 6 genera provides a valuable and previously unavailable resource for the Rosaceae research community. With it, orthologs and other gene family members can be easily identified across any of the classified genomes. Importantly, our workflow can be easily adopted for any other plant genomes and gene families of interest.
Collapse
Affiliation(s)
- Huiting Zhang
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
- Department of Horticulture, Washington State University, Pullman, WA, United States
| | - Eric K. Wafula
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Jon Eilers
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
| | - Alex E. Harkess
- College of Agriculture, Auburn University, Auburn, AL, United States
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, United States
| | - Paula E. Ralph
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Prakash Raj Timilsena
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Claude W. dePamphilis
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Jessica M. Waite
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
| | - Loren A. Honaas
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
| |
Collapse
|
2
|
Touati R, Tajouri A, Mesaoudi I, Oueslati AE, Lachiri Z, Kharrat M. New methodology for repetitive sequences identification in human X and Y chromosomes. Biomed Signal Process Control 2021; 64:102207. [PMID: 33101452 PMCID: PMC7572123 DOI: 10.1016/j.bspc.2020.102207] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 07/23/2020] [Accepted: 09/01/2020] [Indexed: 11/24/2022]
Abstract
Repetitive DNA sequences occupy the major proportion of DNA in the human genome and even in the other species' genomes. The importance of each repetitive DNA type depends on many factors: structural and functional roles, positions, lengths and numbers of these repetitions are clear examples. Conserving such DNA sequences or not in different locations in the chromosome remains a challenge for researchers in biology. Detecting their location despite their great variability and finding novel repetitive sequences remains a challenging task. To side-step this problem, we developed a new method based on signal and image processing tools. In fact, using this method we could find repetitive patterns in DNA images regardless of the repetition length. This new technique seems to be more efficient in detecting new repetitive sequences than bioinformatics tools. In fact, the classical tools present limited performances especially in case of mutations (insertion or deletion). However, modifying one or a few numbers of pixels in the image doesn't affect the global form of the repetitive pattern. As a consequence, we generated a new repetitive patterns database which contains tandem and dispersed repeated sequences. The highly repetitive sequences, we have identified in X and Y chromosomes, are shown to be located in other human chromosomes or in other genomes. The data we have generated is then taken as input to a Convolutional neural network classifier in order to classify them. The system we have constructed is efficient and gives an average of 94.4% as recognition score.
Collapse
Affiliation(s)
- Rabeb Touati
- University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Asma Tajouri
- University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia
| | - Imen Mesaoudi
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Afef Elloumi Oueslati
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Zied Lachiri
- University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia
| | - Maher Kharrat
- University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia
| |
Collapse
|
3
|
Zheng Q, Chen T, Zhou W, Xie L, Su H. Gene prediction by the noise-assisted MEMD and wavelet transform for identifying the protein coding regions. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2020.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
4
|
Das L, Das JK, Nanda S. Detection of exon location in eukaryotic DNA using a fuzzy adaptive Gabor wavelet transform. Genomics 2020; 112:4406-4416. [PMID: 32717319 DOI: 10.1016/j.ygeno.2020.07.020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 06/25/2020] [Accepted: 07/08/2020] [Indexed: 11/17/2022]
Abstract
The existing model-independent methods for the detection of exons in DNA could not prove to be ideal as commonly employed fixed window length strategy produces spectral leakage causing signal noise The Modified-Gabor-wavelet-transform exploits a multiscale strategy to deal with the issue to some extent. Yet, no rule regarding the occurrence of small and large exons has been specified. To overcome this randomness, scaling-factor of GWT has been adapted based on a fuzzy rule. Due to the nucleotides' genetic code and fuzzy behaviors in DNA configuration, this work could adopt the fuzzy approach. Two fuzzy membership functions (large and small) take care of the variation in the coding regions. The fuzzy-based learning parameter adaptively tunes the scale factor for fast and precise prediction of exons. The proposed approach has an immense plus point of being capable of isolating detailed sub-regions in each exon efficiently proving its efficacy comparing with existing techniques.
Collapse
Affiliation(s)
- Lopamudra Das
- School of Electronics Engineering, KIIT University, Bhubaneswar, India.
| | - J K Das
- School of Electronics Engineering, KIIT University, Bhubaneswar, India.
| | - Sarita Nanda
- School of Electronics Engineering, KIIT University, Bhubaneswar, India.
| |
Collapse
|
5
|
Identification of CpG Islands in DNA Sequences Using Short-Time Fourier Transform. Interdiscip Sci 2020; 12:355-367. [PMID: 32394270 DOI: 10.1007/s12539-020-00370-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 04/07/2020] [Accepted: 04/17/2020] [Indexed: 10/24/2022]
Abstract
In the era of big data analysis, genomics data analysis is highly needed to extract the hidden information present in the DNA sequences. One of the important hidden features present in the DNA sequences is CpG islands. CpG Islands are important as these are used as gene markers and also these are associated with cancer etc. Therefore, various methods have been reported for the identification of CpG islands in DNA sequences. The key contributions of this work are (i) extraction of the periodicity feature associated with CpG islands using Short-time Fourier transform (ii) a short-time Fourier transform-based algorithm has been proposed for the identification of CpG Islands in DNA sequences. The results of the proposed algorithm amply demonstrate its better performance as compared to other reported methods on CpG islands detection.
Collapse
|