1
|
Sun L, Lai M, Ghouri F, Nawaz MA, Ali F, Baloch FS, Nadeem MA, Aasim M, Shahid MQ. Modern Plant Breeding Techniques in Crop Improvement and Genetic Diversity: From Molecular Markers and Gene Editing to Artificial Intelligence-A Critical Review. PLANTS (BASEL, SWITZERLAND) 2024; 13:2676. [PMID: 39409546 PMCID: PMC11478383 DOI: 10.3390/plants13192676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 09/08/2024] [Accepted: 09/22/2024] [Indexed: 10/20/2024]
Abstract
With the development of new technologies in recent years, researchers have made significant progress in crop breeding. Modern breeding differs from traditional breeding because of great changes in technical means and breeding concepts. Whereas traditional breeding initially focused on high yields, modern breeding focuses on breeding orientations based on different crops' audiences or by-products. The process of modern breeding starts from the creation of material populations, which can be constructed by natural mutagenesis, chemical mutagenesis, physical mutagenesis transfer DNA (T-DNA), Tos17 (endogenous retrotransposon), etc. Then, gene function can be mined through QTL mapping, Bulked-segregant analysis (BSA), Genome-wide association studies (GWASs), RNA interference (RNAi), and gene editing. Then, at the transcriptional, post-transcriptional, and translational levels, the functions of genes are described in terms of post-translational aspects. This article mainly discusses the application of the above modern scientific and technological methods of breeding and the advantages and limitations of crop breeding and diversity. In particular, the development of gene editing technology has contributed to modern breeding research.
Collapse
Affiliation(s)
- Lixia Sun
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, South China Agricultural University, Guangzhou 510642, China; (L.S.); (M.L.); (F.G.)
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, South China Agricultural University, Guangzhou 510642, China
| | - Mingyu Lai
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, South China Agricultural University, Guangzhou 510642, China; (L.S.); (M.L.); (F.G.)
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, South China Agricultural University, Guangzhou 510642, China
| | - Fozia Ghouri
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, South China Agricultural University, Guangzhou 510642, China; (L.S.); (M.L.); (F.G.)
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, South China Agricultural University, Guangzhou 510642, China
| | - Muhammad Amjad Nawaz
- Education Scientific Center of Nanotechnology, Far Eastern Federal University, 690091 Vladivostok, Russia;
| | - Fawad Ali
- School of Tropical Agriculture and Forestry, Hainan University, Sanya 572025, China;
| | - Faheem Shehzad Baloch
- Dapartment of Biotechnology, Faculty of Science, Mersin University, Mersin 33343, Türkiye;
| | - Muhammad Azhar Nadeem
- Faculty of Agricultural Sciences and Technologies, Sivas University of Science and Technology, Sivas 58140, Türkiye; (M.A.N.); (M.A.)
| | - Muhammad Aasim
- Faculty of Agricultural Sciences and Technologies, Sivas University of Science and Technology, Sivas 58140, Türkiye; (M.A.N.); (M.A.)
| | - Muhammad Qasim Shahid
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangdong Laboratory for Lingnan Modern Agriculture, South China Agricultural University, Guangzhou 510642, China; (L.S.); (M.L.); (F.G.)
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, South China Agricultural University, Guangzhou 510642, China
| |
Collapse
|
2
|
Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications. Mol Biol Evol 2024; 41:msae177. [PMID: 39172750 PMCID: PMC11385596 DOI: 10.1093/molbev/msae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/02/2024] [Accepted: 07/09/2024] [Indexed: 08/24/2024] Open
Abstract
Insertions and deletions constitute the second most important source of natural genomic variation. Insertions and deletions make up to 25% of genomic variants in humans and are involved in complex evolutionary processes including genomic rearrangements, adaptation, and speciation. Recent advances in long-read sequencing technologies allow detailed inference of insertions and deletion variation in species and populations. Yet, despite their importance, evolutionary studies have traditionally ignored or mishandled insertions and deletions due to a lack of comprehensive methodologies and statistical models of insertions and deletion dynamics. Here, we discuss methods for describing insertions and deletion variation and modeling insertions and deletions over evolutionary time. We provide practical advice for tackling insertions and deletions in genomic sequences and illustrate our discussion with examples of insertions and deletion-induced effects in human and other natural populations and their contribution to evolutionary processes. We outline promising directions for future developments in statistical methodologies that would allow researchers to analyze insertions and deletion variation and their effects in large genomic data sets and to incorporate insertions and deletions in evolutionary inference.
Collapse
Affiliation(s)
| | - Ian Holmes
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA
- Calico Life Sciences LLC, South San Francisco, CA 94080, USA
| | - Gerton Lunter
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9713 GZ, The Netherlands
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
3
|
García Mesa JJ, Zhu Z, Cartwright RA. COATi: Statistical Pairwise Alignment of Protein-Coding Sequences. Mol Biol Evol 2024; 41:msae117. [PMID: 38869090 PMCID: PMC11255384 DOI: 10.1093/molbev/msae117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 04/26/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024] Open
Abstract
Sequence alignment is an essential method in bioinformatics and the basis of many analyses, including phylogenetic inference, ancestral sequence reconstruction, and gene annotation. Sequencing artifacts and errors made during genome assembly, such as abiological frameshifts and incorrect early stop codons, can impact downstream analyses leading to erroneous conclusions in comparative and functional genomic studies. More significantly, while indels can occur both within and between codons in natural sequences, most amino-acid- and codon-based aligners assume that indels only occur between codons. This mismatch between biology and alignment algorithms produces suboptimal alignments and errors in downstream analyses. To address these issues, we present COATi, a statistical, codon-aware pairwise aligner that supports complex insertion-deletion models and can handle artifacts present in genomic data. COATi allows users to reduce the amount of discarded data while generating more accurate sequence alignments. COATi can infer indels both within and between codons, leading to improved sequence alignments. We applied COATi to a dataset containing orthologous protein-coding sequences from humans and gorillas and conclude that 41% of indels occurred between codons, agreeing with previous work in other species. We also applied COATi to semiempirical benchmark alignments and find that it outperforms several popular alignment programs on several measures of alignment quality and accuracy.
Collapse
Affiliation(s)
- Juan José García Mesa
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- Ira A. Fulton Schools of Engineering, Arizona State University, Tempe, AZ, USA
| | - Ziqi Zhu
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
4
|
Susko E. Complex statistical modelling for phylogenetic inference. CAN J STAT 2022. [DOI: 10.1002/cjs.11741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Edward Susko
- Department of Mathematics and Statistics Dalhousie University Halifax Nova Scotia Canada B3H 3J5
| |
Collapse
|
5
|
Seo TK, Redelings BD, Thorne JL. Correlations between alignment gaps and nucleotide substitution or amino acid replacement. Proc Natl Acad Sci U S A 2022; 119:e2204435119. [PMID: 35972964 PMCID: PMC9407537 DOI: 10.1073/pnas.2204435119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 07/11/2022] [Indexed: 11/18/2022] Open
Abstract
To assess the conventional treatment in evolutionary inference of alignment gaps as missing data, we propose a simple nonparametric test of the null hypothesis that the locations of alignment gaps are independent of the nucleotide substitution or amino acid replacement process. When we apply the test to 1,390 protein alignments that are informed by protein tertiary structure and use a 5% significance level, the null hypothesis of independence between amino acid replacement and gap location is rejected for ∼65% of datasets. Via simulations that include substitution and insertion-deletion, we show that the test performs well with true alignments. When we simulate according to the null hypothesis and then apply the test to optimal alignments that are inferred by each of four widely used software packages, the null hypothesis is rejected too frequently. Via further simulations and analyses, we show that the overly frequent rejections of the null hypothesis are not solely due to weaknesses of widely used software for finding optimal alignments. Instead, our evidence suggests that optimal alignments are unrepresentative of true alignments and that biased evolutionary inferences may result from relying upon individual optimal alignments.
Collapse
Affiliation(s)
- Tae-Kun Seo
- Division of Life Sciences, Korea Polar Research Institute, Yeonsu-gu, Incheon 21990, Republic of Korea
| | - Benjamin D. Redelings
- Biology Department, Duke University, Durham, NC 27708
- Ronin Institute, Durham, NC 27705
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045
| | - Jeffrey L. Thorne
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695
- Department of Statistics, North Carolina State University, Raleigh, NC 27695
| |
Collapse
|
6
|
de Oliveira Martins L, Bloomfield S, Stoakes E, Grant AJ, Page AJ, Mather AE. Tatajuba: exploring the distribution of homopolymer tracts. NAR Genom Bioinform 2022; 4:lqac003. [PMID: 35118377 PMCID: PMC8808543 DOI: 10.1093/nargab/lqac003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 11/18/2021] [Accepted: 01/05/2022] [Indexed: 11/14/2022] Open
Abstract
Length variation of homopolymeric tracts, which induces phase variation, is known to regulate gene expression leading to phenotypic variation in a wide range of bacterial species. There is no specialized bioinformatics software which can, at scale, exhaustively explore and describe these features from sequencing data. Identifying these is non-trivial as sequencing and bioinformatics methods are prone to introducing artefacts when presented with homopolymeric tracts due to the decreased base diversity. We present tatajuba, which can automatically identify potential homopolymeric tracts and help predict their putative phenotypic impact, allowing for rapid investigation. We use it to detect all tracts in two separate datasets, one of Campylobacter jejuni and one of three Bordetella species, and to highlight those tracts that are polymorphic across samples. With this we confirm homopolymer tract variation with phenotypic impact found in previous studies and additionally find many more with potential variability. The software is written in C and is available under the open source licence GNU GPLv3.
Collapse
Affiliation(s)
| | - Samuel Bloomfield
- Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK
| | - Emily Stoakes
- Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge CB3 0ES, UK
| | - Andrew J Grant
- Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge CB3 0ES, UK
| | - Andrew J Page
- Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK
| | - Alison E Mather
- Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK
| |
Collapse
|
7
|
Seo TK, Gascuel O, Thorne JL. Measuring Phylogenetic Information of Incomplete Sequence Data. Syst Biol 2021; 71:630-648. [PMID: 34469581 DOI: 10.1093/sysbio/syab073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Revised: 08/26/2021] [Accepted: 08/27/2021] [Indexed: 11/13/2022] Open
Abstract
Widely used approaches for extracting phylogenetic information from aligned sets of molecular sequences rely upon probabilistic models of nucleotide substitution or amino-acid replacement. The phylogenetic information that can be extracted depends on the number of columns in the sequence alignment and will be decreased when the alignment contains gaps due to insertion or deletion events. Motivated by the measurement of information loss, we suggest assessment of the Effective Sequence Length (ESL) of an aligned data set. The ESL can differ from the actual number of columns in a sequence alignment because of the presence of alignment gaps. Furthermore, the estimation of phylogenetic information is affected by model misspecification. Inevitably, the actual process of molecular evolution differs from the probabilistic models employed to describe this process. This disparity means the amount of phylogenetic information in an actual sequence alignment will differ from the amount in a simulated data set of equal size, which motivated us to develop a new test for model adequacy. Via theory and empirical data analysis, we show how to disentangle the effects of gaps and model misspecification. By comparing the Fisher information of actual and simulated sequences, we identify which alignment sites and tree branches are most affected by gaps and model misspecification.
Collapse
Affiliation(s)
- Tae-Kun Seo
- Department of Biological Sciences, Korea Polar Research Institute, 26 Songdomirae-ro, Yeonsu-gu, Incheon 21990, Republic of Korea.,Unit Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur and CNRS, Paris, France [Sabbatical affiliation of T-K S]
| | - Olivier Gascuel
- Unit Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur and CNRS, Paris, France [Sabbatical affiliation of T-K S].,Institut de Systmatique, Evolution Biodiversit, (ISYEB - UMR 7205, CNRS, Musum National d'Histoire Naturel, SU, EPHE, UA), Paris, France [Current affiliation of O.G.]
| | - Jeffrey L Thorne
- Departments of Biological Sciences and Statistics, North Carolina State University, Raleigh NC 27695-7566 U.S.A
| |
Collapse
|
8
|
Holmes I. A Model of Indel Evolution by Finite-State, Continuous-Time Machines. Genetics 2020; 216:1187-1204. [PMID: 33020189 PMCID: PMC7768254 DOI: 10.1534/genetics.120.303630] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 09/22/2020] [Indexed: 01/09/2023] Open
Abstract
We introduce a systematic method of approximating finite-time transition probabilities for continuous-time insertion-deletion models on sequences. The method uses automata theory to describe the action of an infinitesimal evolutionary generator on a probability distribution over alignments, where both the generator and the alignment distribution can be represented by pair hidden Markov models (HMMs). In general, combining HMMs in this way induces a multiplication of their state spaces; to control this, we introduce a coarse-graining operation to keep the state space at a constant size. This leads naturally to ordinary differential equations for the evolution of the transition probabilities of the approximating pair HMM. The TKF91 model emerges as an exact solution to these equations for the special case of single-residue indels. For the more general case of multiple-residue indels, the equations can be solved by numerical integration. Using simulated data, we show that the resulting distribution over alignments, when compared to previous approximations, is a better fit over a broader range of parameters. We also propose a related approach to develop differential equations for sufficient statistics to estimate the underlying instantaneous indel rates by expectation maximization. Our code and data are available at https://github.com/ihh/trajectory-likelihood.
Collapse
Affiliation(s)
- Ian Holmes
- Department of Bioengineering, University of California, Berkeley, California 94720
| |
Collapse
|