1
|
Revisiting the recombinant history of HIV-1 group M with dynamic network community detection. Proc Natl Acad Sci U S A 2022; 119:e2108815119. [PMID: 35500121 PMCID: PMC9171507 DOI: 10.1073/pnas.2108815119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Recombination is a major mechanism through which HIV type 1 (HIV-1) maintains genetic diversity and interferes with viral eradication efforts. There is growing evidence demonstrating a recombinant origin of primate lentiviruses including HIV-1 group M (HIV-1/M). Inferring the extent of recombination across the entire HIV-1/M genome is of great importance as it provides deeper insights into the origin, dynamics, and evolution of the global pandemic. Here we propose an alternative method that can reconstruct the extent of genome-wide recombination in HIV-1, uncover reticulate patterns, and serve as a framework for HIV-1 classification. Our method provides an alternative approach for understanding the roles of virus recombination in the early evolutionary history of zoonosis for other emerging viruses. The prevailing abundance of full-length HIV type 1 (HIV-1) genome sequences provides an opportunity to revisit the standard model of HIV-1 group M (HIV-1/M) diversity that clusters genomes into largely nonrecombinant subtypes, which is not consistent with recent evidence of deep recombinant histories for simian immunodeficiency virus (SIV) and other HIV-1 groups. Here we develop an unsupervised nonparametric clustering approach, which does not rely on predefined nonrecombinant genomes, by adapting a community detection method developed for dynamic social network analysis. We show that this method (dynamic stochastic block model [DSBM]) attains a significantly lower mean error rate in detecting recombinant breakpoints in simulated data (quasibinomial generalized linear model (GLM), P<8×10−8), compared to other reference-free recombination detection programs (genetic algorithm for recombination detection [GARD], recombination detection program 4 [RDP4], and RDP5). When this method was applied to a representative sample of n = 525 actual HIV-1 genomes, we determined k = 29 as the optimal number of DSBM clusters and used change-point detection to estimate that at least 95% of these genomes are recombinant. Further, we identified both known and undocumented recombination hotspots in the HIV-1 genome and evidence of intersubtype recombination in HIV-1 subtype reference genomes. We propose that clusters generated by DSBM can provide an informative framework for HIV-1 classification.
Collapse
|
2
|
Tongo M, Martin DP, Dorfman JR. Elucidation of Early Evolution of HIV-1 Group M in the Congo Basin Using Computational Methods. Genes (Basel) 2021; 12:genes12040517. [PMID: 33918115 PMCID: PMC8065694 DOI: 10.3390/genes12040517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/26/2021] [Accepted: 03/26/2021] [Indexed: 11/16/2022] Open
Abstract
The Congo Basin region is believed to be the site of the cross-species transmission event that yielded HIV-1 group M (HIV-1M). It is thus likely that the virus has been present and evolving in the region since that cross-species transmission. As HIV-1M was only discovered in the early 1980s, our directly observed record of the epidemic is largely limited to the past four decades. Nevertheless, by exploiting the genetic relatedness of contemporary HIV-1M sequences, phylogenetic methods provide a powerful framework for investigating simultaneously the evolutionary and epidemiologic history of the virus. Such an approach has been taken to find that the currently classified HIV-1 M subtypes and Circulating Recombinant Forms (CRFs) do not give a complete view of HIV-1 diversity. In addition, the currently identified major HIV-1M subtypes were likely genetically predisposed to becoming a major component of the present epidemic, even before the events that resulted in the global epidemic. Further efforts have identified statistically significant hot- and cold-spots of HIV-1M subtypes sequence inheritance in genomic regions of recombinant forms. In this review we provide ours and others recent findings on the emergence and spread of HIV-1M variants in the region, which have provided insights into the early evolution of this virus.
Collapse
Affiliation(s)
- Marcel Tongo
- Center for Research on Emerging and Re-Emerging Diseases (CREMER), Institute of Medical Research and Study of Medicinal Plants (IMPM), Yaoundé, Cameroon
- Correspondence:
| | - Darren P. Martin
- Division of Computational Biology, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa;
| | - Jeffrey R. Dorfman
- Division of Medical Virology, School of Pathology, Faculty of Health Sciences, Stellenbosch University, Cape Town 7505, South Africa;
| |
Collapse
|
3
|
Molecular and geographic characterization of hiv-1 bf recombinant viruses. Virus Res 2019; 270:197650. [PMID: 31279829 DOI: 10.1016/j.virusres.2019.197650] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 07/01/2019] [Accepted: 07/03/2019] [Indexed: 01/21/2023]
Abstract
The Human Immunodeficiency Virus Type 1 (HIV-1) presents a wide genetic variability, which is represented by four groups, nine subtypes of group M and several recombinant forms. Among these, the BF recombinants have been distinguished by a high global dispersion and an increase in number and diversity. To date, 15 BF Circulating Recombinant Forms (CRFs) and diverse BF Unique Recombinant Forms (URFs) have been described. In Brazil, nine CRF_BF have been identified. The aim of this work was to perform molecular and geographic characterization of HIV-1 BF recombinant strains. Near full-length genomes of 265 BF recombinant viruses were collected from public databases and molecular analyses were performed. These sequences were originally retrieved between 1993-2006 and isolated from 16 countries (51.3% from Brazil). Diagnostic's year analysis showed that BF recombinants circulate in Brazil since at least 1985. Most sequences displayed recombination in the pol (84.9%), gag (69.3%) and env (51.4%) regions. The subtype B predominated in all accessory and regulatory genes, except in vif, in which the F subtype was predominant (40.4%). Twelve regions with a recombination rate higher than 10% were identified, especially one region inside p24 gene (1359-1397) whose recombination was present in more than 30% of the sequences. Coreceptor usage prediction during viral entry showed that BF recombinants preferentially use CCR5 (67.2%) and the most frequent tetrapeptides found in the V3 loop were GPGR (47.9%) and GPGQ (21.1%). The frequency of X4/dual viruses was lower amongst F subtype (25.8%) V3 sequences, compared with B subtype (43%). In addition, mutations associated with intermediate or high resistance levels to PI (10.6%), NRTI (15.0%), NNRTI (14.0%) and INSTI (2.6%) were identified. The great diversity of the recombination patterns evidences that the recombination between the subtypes B and F is frequent, reflecting a probable high rate of dual infection and the acquisition of advantageous characteristics for viral fitness.
Collapse
|
4
|
Berchuck SI, Mwanza JC, Warren JL. A spatially varying change points model for monitoring glaucoma progression using visual field data. SPATIAL STATISTICS 2019; 30:1-26. [PMID: 30931247 PMCID: PMC6438211 DOI: 10.1016/j.spasta.2019.02.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Glaucoma disease progression, as measured by visual field (VF) data, is often defined by periods of relative stability followed by an abrupt decrease in visual ability at some point in time. Determining the transition point of the disease trajectory to a more severe state is important clinically for disease management and for avoiding irreversible vision loss. Based on this, we present a unified statistical modeling framework that permits prediction of the timing and spatial location of future vision loss and informs clinical decisions regarding disease progression. The developed method incorporates anatomical information to create a biologically plausible data-generating model. We accomplish this by introducing a spatially varying coefficients model that includes spatially varying change points to detect structural shifts in both the mean and variance process of VF data across both space and time. The VF location-specific change point represents the underlying, and potentially censored, timing of true change in disease trajectory while a multivariate spatial boundary detection structure is introduced that accounts for the complex spatial connectivity of the VF and optic disc. We show that our method improves estimation and prediction of multiple aspects of disease management in comparison to existing methods through simulation and real data application. The R package spCP implements the new methodology.
Collapse
Affiliation(s)
| | - Jean-Claude Mwanza
- Department of Ophthalmology, University of North Carolina-Chapel Hill, NC, USA
| | - Joshua L. Warren
- Department of Biostatistics, Yale University, New Haven, CT, USA
| |
Collapse
|
5
|
Tongo M, de Oliveira T, Martin DP. Patterns of genomic site inheritance in HIV-1M inter-subtype recombinants delineate the most likely genomic sites of subtype-specific adaptation. Virus Evol 2018; 4:vey015. [PMID: 29942655 PMCID: PMC6007327 DOI: 10.1093/ve/vey015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Recombination between different HIV-1 group M (HIV-1M) subtypes is a major contributor to the ongoing genetic diversification of HIV-1M. However, it remains unclear whether the different genome regions of recombinants are randomly inherited from the different subtypes. To elucidate this, we analysed the distribution within 82 circulating and 201 unique recombinant forms (CRFs/URFs), of genome fragments derived from HIV-1M Subtypes A, B, C, D, F, and G and CRF01_AE. We found that viruses belonging to the analysed HIV-1M subtypes and CRF01_AE contributed certain genome fragments more frequently during recombination than other fragments. Furthermore, we identified statistically significant hot-spots of Subtype A sequence inheritance in genomic regions encoding portions of Gag and Nef, Subtype B in Pol, Tat and Env, Subtype C in Vif, Subtype D in Pol and Env, Subtype F in Gag, Subtype G in Vpu-Env and Nef, and CRF01_AE inheritance in Vpu and Env. The apparent non-randomness in the frequencies with which different subtypes have contributed specific genome regions to known HIV-1M recombinants is consistent with selection strongly impacting the survival of inter-subtype recombinants. We propose that hotspots of genomic region inheritance are likely to demarcate the locations of subtype-specific adaptive genetic variations.
Collapse
Affiliation(s)
- Marcel Tongo
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine and Medical Sciences, College of Health Sciences, Nelson R Mandela School of Medicine, University of KwaZulu-Natal (UKZN), 719 Umbilo Road, Durban 4001, South Africa
- Center of Research for Emerging and Re-Emerging Diseases (CREMER), Institute of Medical Research and Study of Medicinal Plants (IMPM), Yaoundé, Cameroon
| | - Tulio de Oliveira
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine and Medical Sciences, College of Health Sciences, Nelson R Mandela School of Medicine, University of KwaZulu-Natal (UKZN), 719 Umbilo Road, Durban 4001, South Africa
| | - Darren P Martin
- Division of Computational Biology, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| |
Collapse
|
6
|
Arenas M, Araujo NM, Branco C, Castelhano N, Castro-Nallar E, Pérez-Losada M. Mutation and recombination in pathogen evolution: Relevance, methods and controversies. INFECTION GENETICS AND EVOLUTION 2017; 63:295-306. [PMID: 28951202 DOI: 10.1016/j.meegid.2017.09.029] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 09/20/2017] [Accepted: 09/21/2017] [Indexed: 02/06/2023]
Abstract
Mutation and recombination drive the evolution of most pathogens by generating the genetic variants upon which selection operates. Those variants can, for example, confer resistance to host immune systems and drug therapies or lead to epidemic outbreaks. Given their importance, diverse evolutionary studies have investigated the abundance and consequences of mutation and recombination in pathogen populations. However, some controversies persist regarding the contribution of each evolutionary force to the development of particular phenotypic observations (e.g., drug resistance). In this study, we revise the importance of mutation and recombination in the evolution of pathogens at both intra-host and inter-host levels. We also describe state-of-the-art analytical methodologies to detect and quantify these two evolutionary forces, including biases that are often ignored in evolutionary studies. Finally, we present some of our former studies involving pathogenic taxa where mutation and recombination played crucial roles in the recovery of pathogenic fitness, the generation of interspecific genetic diversity, or the design of centralized vaccines. This review also illustrates several common controversies and pitfalls in the analysis and in the evaluation and interpretation of mutation and recombination outcomes.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain; Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Natalia M Araujo
- Laboratory of Molecular Virology, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Brazil.
| | - Catarina Branco
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Nadine Castelhano
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal; Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.
| | - Eduardo Castro-Nallar
- Universidad Andrés Bello, Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biológicas, Santiago, Chile.
| | - Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Ashburn, VA 20147, Washington, DC, United States; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal.
| |
Collapse
|
7
|
Recombination hotspots: Models and tools for detection. DNA Repair (Amst) 2016; 40:47-56. [PMID: 26991854 DOI: 10.1016/j.dnarep.2016.02.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Accepted: 02/09/2016] [Indexed: 11/22/2022]
Abstract
Recombination hotspots are the regions within the genome where the rate, and the frequency of recombination are optimum with a size varying from 1 to 2kb. The recombination event is mediated by the double-stranded break formation, guided by the combined enzymatic action of DNA topoisomerase and Spo 11 endonuclease. These regions are distributed non-uniformly throughout the human genome and cause distortions in the genetic map. Numerous lines of evidence suggest that the number of hotspots known in humans has increased manifold in recent years. A few facts about the hotspot evolutions were also put forward, indicating the differences in the hotspot position between chimpanzees and humans. In mice, recombination hot spots were found to be clustered within the major histocompatibility complex (MHC) region. Several models, that help explain meiotic recombination has been proposed. Moreover, scientists also developed some computational tools to locate the hotspot position and estimate their recombination rate in humans is of great interest to population and medical geneticists. Here we reviewed the molecular mechanisms, models and in silico prediction techniques of hot spot residues.
Collapse
|
8
|
Pérez-Losada M, Arenas M, Galán JC, Palero F, González-Candelas F. Recombination in viruses: mechanisms, methods of study, and evolutionary consequences. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2015; 30:296-307. [PMID: 25541518 PMCID: PMC7106159 DOI: 10.1016/j.meegid.2014.12.022] [Citation(s) in RCA: 230] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Revised: 12/15/2014] [Accepted: 12/17/2014] [Indexed: 02/08/2023]
Abstract
Recombination is a pervasive process generating diversity in most viruses. It joins variants that arise independently within the same molecule, creating new opportunities for viruses to overcome selective pressures and to adapt to new environments and hosts. Consequently, the analysis of viral recombination attracts the interest of clinicians, epidemiologists, molecular biologists and evolutionary biologists. In this review we present an overview of three major areas related to viral recombination: (i) the molecular mechanisms that underlie recombination in model viruses, including DNA-viruses (Herpesvirus) and RNA-viruses (Human Influenza Virus and Human Immunodeficiency Virus), (ii) the analytical procedures to detect recombination in viral sequences and to determine the recombination breakpoints, along with the conceptual and methodological tools currently used and a brief overview of the impact of new sequencing technologies on the detection of recombination, and (iii) the major areas in the evolutionary analysis of viral populations on which recombination has an impact. These include the evaluation of selective pressures acting on viral populations, the application of evolutionary reconstructions in the characterization of centralized genes for vaccine design, and the evaluation of linkage disequilibrium and population structure.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Portugal; Computational Biology Institute, George Washington University, Ashburn, VA 20147, USA
| | - Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Juan Carlos Galán
- Servicio de Microbiología, Hospital Ramón y Cajal and Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain; CIBER en Epidemiología y Salud Pública, Spain
| | - Ferran Palero
- CIBER en Epidemiología y Salud Pública, Spain; Unidad Mixta Infección y Salud Pública, FISABIO-Universitat de València, Valencia, Spain
| | - Fernando González-Candelas
- CIBER en Epidemiología y Salud Pública, Spain; Unidad Mixta Infección y Salud Pública, FISABIO-Universitat de València, Valencia, Spain.
| |
Collapse
|
9
|
Roossinck MJ, García-Arenal F. Ecosystem simplification, biodiversity loss and plant virus emergence. Curr Opin Virol 2015; 10:56-62. [PMID: 25638504 PMCID: PMC7102708 DOI: 10.1016/j.coviro.2015.01.005] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Revised: 01/08/2015] [Accepted: 01/14/2015] [Indexed: 01/02/2023]
Abstract
Plant viruses can emerge into crops from wild plant hosts, or conversely from domestic (crop) plants into wild hosts. Changes in ecosystems, including loss of biodiversity and increases in managed croplands, can impact the emergence of plant virus disease. Although data are limited, in general the loss of biodiversity is thought to contribute to disease emergence. More in-depth studies have been done for human viruses, but studies with plant viruses suggest similar patterns, and indicate that simplification of ecosystems through increased human management may increase the emergence of viral diseases in crops.
Collapse
Affiliation(s)
- Marilyn J Roossinck
- Department of Plant Pathology and Environmental Microbiology, Center for Infectious Disease Dynamics, Pennsylvania State University, USA; Murdoch University, Perth, Australia.
| | - Fernando García-Arenal
- Centro de Biotecnología y Genómica de Plantas UPM-INIA, and E.T.S.I. Agrónomos, Campus de Montegancedo, Universidad Politécnica de Madrid, Spain
| |
Collapse
|
10
|
Golden M, Muhire BM, Semegni Y, Martin DP. Patterns of recombination in HIV-1M are influenced by selection disfavouring the survival of recombinants with disrupted genomic RNA and protein structures. PLoS One 2014; 9:e100400. [PMID: 24936864 PMCID: PMC4061080 DOI: 10.1371/journal.pone.0100400] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Accepted: 05/27/2014] [Indexed: 11/18/2022] Open
Abstract
Genetic recombination is a major contributor to the ongoing diversification of HIV. It is clearly apparent that across the HIV-genome there are defined recombination hot and cold spots which tend to co-localise both with genomic secondary structures and with either inter-gene boundaries or intra-gene domain boundaries. There is also good evidence that most recombination breakpoints that are detectable within the genes of natural HIV recombinants are likely to be minimally disruptive of intra-protein amino acid contacts and that these breakpoints should therefore have little impact on protein folding. Here we further investigate the impact on patterns of genetic recombination in HIV of selection favouring the maintenance of functional RNA and protein structures. We confirm that chimaeric Gag p24, reverse transcriptase, integrase, gp120 and Nef proteins that are expressed by natural HIV-1 recombinants have significantly lower degrees of predicted folding disruption than randomly generated recombinants. Similarly, we use a novel single-stranded RNA folding disruption test to show that there is significant, albeit weak, evidence that natural HIV recombinants tend to have genomic secondary structures that more closely resemble parental structures than do randomly generated recombinants. These results are consistent with the hypothesis that natural selection has acted both in the short term to purge recombinants with disrupted RNA and protein folds, and in the longer term to modify the genome architecture of HIV to ensure that recombination prone sites correspond with those where recombination will be minimally deleterious.
Collapse
Affiliation(s)
- Michael Golden
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| | - Brejnev M. Muhire
- Institute of Infectious Diseases and Molecular Medicine, Computational Biology Group, University of Cape Town, Cape Town, South Africa
| | - Yves Semegni
- Department of Mathematics and Physics, Cape Peninsula University of Technology, Cape Town, South Africa
| | - Darren P. Martin
- Institute of Infectious Diseases and Molecular Medicine, Computational Biology Group, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
11
|
Guerrero JA, Macías-Díaz JE. A computational method for the detection of activation/deactivation patterns in biological signals with three levels of electric intensity. Math Biosci 2014; 248:117-27. [PMID: 24418009 DOI: 10.1016/j.mbs.2013.12.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Revised: 12/20/2013] [Accepted: 12/31/2013] [Indexed: 11/29/2022]
Abstract
In the present work, we develop a computational technique to approximate the changes of phase in temporal series associated to electric signals of muscles which perform activities at three different levels of intensity. We suppose that the temporal series are samples of independent, normally distributed random variables with mean equal to zero, and variance equal to one of three possible values, each of them associated to a certain degree of electric intensity. For example, these intensity levels may represent a leg muscle at rest, or active during a light activity (walking), or active during a highly demanding performance (jogging). The model is presented as a maximum likelihood problem involving discrete variables. In turn, this problem is transformed into a continuous one via the introduction of continuous variables with penalization parameters, and it is solved recursively through an iterative numerical method. An a posteriori treatment of the results is used in order to avoid the detection of relatively short periods of silence or activity. We perform simulations with synthetic data in order to assess the validity of our technique. Our computational results show that the method approximates well the occurrence of the change points in synthetic temporal series, even in the presence of autocorrelated sequences. In the way, we show that a generalization of a computational technique for the change-point detection of electric signals with two phases of activity (Esquivel-Frausto et al., 2010 [40]), may be inapplicable in cases of temporal series with three levels of intensity. In this sense, the method proposed in the present manuscript improves previous efforts of the authors.
Collapse
Affiliation(s)
- J A Guerrero
- Departamento de Estadística, Universidad Autónoma de Aguascalientes, Aguascalientes 20131, Mexico.
| | - J E Macías-Díaz
- Departamento de Matemáticas y Física, Universidad Autónoma de Aguascalientes, Aguascalientes 20131, Mexico.
| |
Collapse
|
12
|
Esquivel-Frausto ME, Guerrero JA, Macías-Díaz JE. Computational approximation of the likelihood ratio for testing the existence of change-points in a heteroscedastic series. J STAT COMPUT SIM 2013. [DOI: 10.1080/00949655.2012.663373] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
13
|
Presson AP, Kim N, Xiaofei Y, Chen IS, Kim S. Methodology and software to detect viral integration site hot-spots. BMC Bioinformatics 2011; 12:367. [PMID: 21914224 PMCID: PMC3203353 DOI: 10.1186/1471-2105-12-367] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Accepted: 09/14/2011] [Indexed: 11/17/2022] Open
Abstract
Background Modern gene therapy methods have limited control over where a therapeutic viral vector inserts into the host genome. Vector integration can activate local gene expression, which can cause cancer if the vector inserts near an oncogene. Viral integration hot-spots or 'common insertion sites' (CIS) are scrutinized to evaluate and predict patient safety. CIS are typically defined by a minimum density of insertions (such as 2-4 within a 30-100 kb region), which unfortunately depends on the total number of observed VIS. This is problematic for comparing hot-spot distributions across data sets and patients, where the VIS numbers may vary. Results We develop two new methods for defining hot-spots that are relatively independent of data set size. Both methods operate on distributions of VIS across consecutive 1 Mb 'bins' of the genome. The first method 'z-threshold' tallies the number of VIS per bin, converts these counts to z-scores, and applies a threshold to define high density bins. The second method 'BCP' applies a Bayesian change-point model to the z-scores to define hot-spots. The novel hot-spot methods are compared with a conventional CIS method using simulated data sets and data sets from five published human studies, including the X-linked ALD (adrenoleukodystrophy), CGD (chronic granulomatous disease) and SCID-X1 (X-linked severe combined immunodeficiency) trials. The BCP analysis of the human X-linked ALD data for two patients separately (774 and 1627 VIS) and combined (2401 VIS) resulted in 5-6 hot-spots covering 0.17-0.251% of the genome and containing 5.56-7.74% of the total VIS. In comparison, the CIS analysis resulted in 12-110 hot-spots covering 0.018-0.246% of the genome and containing 5.81-22.7% of the VIS, corresponding to a greater number of hot-spots as the data set size increased. Our hot-spot methods enable one to evaluate the extent of VIS clustering, and formally compare data sets in terms of hot-spot overlap. Finally, we show that the BCP hot-spots from the repopulating samples coincide with greater gene and CpG island density than the median genome density. Conclusions The z-threshold and BCP methods are useful for comparing hot-spot patterns across data sets of disparate sizes. The methodology and software provided here should enable one to study hot-spot conservation across a variety of VIS data sets and evaluate vector safety for gene therapy trials.
Collapse
Affiliation(s)
- Angela P Presson
- Department of Biostatistics, University of California Los Angeles, School of Public Health, USA.
| | | | | | | | | |
Collapse
|
14
|
Song G, Hsu CH, Riemer C, Zhang Y, Kim HL, Hoffmann F, Zhang L, Hardison RC, Green ED, Miller W. Conversion events in gene clusters. BMC Evol Biol 2011; 11:226. [PMID: 21798034 PMCID: PMC3161012 DOI: 10.1186/1471-2148-11-226] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2011] [Accepted: 07/28/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene clusters containing multiple similar genomic regions in close proximity are of great interest for biomedical studies because of their associations with inherited diseases. However, such regions are difficult to analyze due to their structural complexity and their complicated evolutionary histories, reflecting a variety of large-scale mutational events. In particular, conversion events can mislead inferences about the relationships among these regions, as traced by traditional methods such as construction of phylogenetic trees or multi-species alignments. RESULTS To correct the distorted information generated by such methods, we have developed an automated pipeline called CHAP (Cluster History Analysis Package) for detecting conversion events. We used this pipeline to analyze the conversion events that affected two well-studied gene clusters (α-globin and β-globin) and three gene clusters for which comparative sequence data were generated from seven primate species: CCL (chemokine ligand), IFN (interferon), and CYP2abf (part of cytochrome P450 family 2). CHAP is freely available at http://www.bx.psu.edu/miller_lab. CONCLUSIONS These studies reveal the value of characterizing conversion events in the context of studying gene clusters in complex genomes.
Collapse
Affiliation(s)
- Giltae Song
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, PA 16802 USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Monteiro-Cunha JP, Araujo AF, Santos E, Galvao-Castro B, Alcantara LCJ. Lack of high-level resistance mutations in HIV type 1 BF recombinant strains circulating in northeast Brazil. AIDS Res Hum Retroviruses 2011; 27:623-31. [PMID: 21087197 DOI: 10.1089/aid.2010.0126] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Abstract The genetic variability and the prevalence of drug resistance-associated mutations (DRAM) of HIV-1 isolates from 50 women and 8 children from Feira de Santana, Bahia, Brazil were investigated. DNA samples were obtained and pol sequences were generated by PCR and direct sequencing. Phylogenetic analysis showed that 39 (67.2%) samples were subtype B, four (6.9%) F, one (1.7%) C, and 14 (24.1%) BF recombinants. Four different BF recombination patterns were detected. Twelve (20.7%) samples shared the same breakpoint within the reverse transcriptase (RT) sequence. Fifty-five (94.8%) isolates showed several resistance-associated mutations in the RT and the protease (PR) genes. Ten (17.2%) isolates presented mutations associated with a high level of resistance: nine (15.5%) to nucleoside RT inhibitors (NRTI), four (6.9%) to nonnucleoside RT inhibitors (NNRTI), and three (5.2%) to PR inhibitors (PIs). Subtype B-infected patients had, on average, 0.5 high-level DRAM per sequence while no mutations were observed in BF recombinants, although the two groups were under ARV for a similar period of time. Our data indicate the predominance of the subtype B, followed by BF recombinants in this population, and the dissemination of a recombinant strain in Bahia, which could be related to adaptive advantages of these variants over the predominant subtype B.
Collapse
Affiliation(s)
- Joana Paixao Monteiro-Cunha
- Laboratório Avançado de Saúde Pública (LASP), Centro de Pesquisa Gonçalo Moniz (CPqGM), Fundação Oswaldo Cruz (FIOCRUZ), Salvador, Bahia, Brazil
| | - Adriano Fernando Araujo
- Laboratório Avançado de Saúde Pública (LASP), Centro de Pesquisa Gonçalo Moniz (CPqGM), Fundação Oswaldo Cruz (FIOCRUZ), Salvador, Bahia, Brazil
| | - Edson Santos
- Fundação Bahiana para o Desenvolvimento das Ciências (FBDC), Escola Bahiana de Medicina e Saúde Pública (EBMSP), Salvador, Bahia, Brazil
| | - Bernardo Galvao-Castro
- Laboratório Avançado de Saúde Pública (LASP), Centro de Pesquisa Gonçalo Moniz (CPqGM), Fundação Oswaldo Cruz (FIOCRUZ), Salvador, Bahia, Brazil
- Fundação Bahiana para o Desenvolvimento das Ciências (FBDC), Escola Bahiana de Medicina e Saúde Pública (EBMSP), Salvador, Bahia, Brazil
| | - Luiz Carlos Junior Alcantara
- Laboratório Avançado de Saúde Pública (LASP), Centro de Pesquisa Gonçalo Moniz (CPqGM), Fundação Oswaldo Cruz (FIOCRUZ), Salvador, Bahia, Brazil
- Fundação Bahiana para o Desenvolvimento das Ciências (FBDC), Escola Bahiana de Medicina e Saúde Pública (EBMSP), Salvador, Bahia, Brazil
- Vaccine Branch, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
16
|
Rask TS, Hansen DA, Theander TG, Gorm Pedersen A, Lavstsen T. Plasmodium falciparum erythrocyte membrane protein 1 diversity in seven genomes--divide and conquer. PLoS Comput Biol 2010; 6. [PMID: 20862303 PMCID: PMC2940729 DOI: 10.1371/journal.pcbi.1000933] [Citation(s) in RCA: 261] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2010] [Accepted: 08/16/2010] [Indexed: 12/21/2022] Open
Abstract
The var gene encoded hyper-variable Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) family mediates cytoadhesion of infected erythrocytes to human endothelium. Antibodies blocking cytoadhesion are important mediators of malaria immunity acquired by endemic populations. The development of a PfEMP1 based vaccine mimicking natural acquired immunity depends on a thorough understanding of the evolved PfEMP1 diversity, balancing antigenic variation against conserved receptor binding affinities. This study redefines and reclassifies the domains of PfEMP1 from seven genomes. Analysis of domains in 399 different PfEMP1 sequences allowed identification of several novel domain classes, and a high degree of PfEMP1 domain compositional order, including conserved domain cassettes not always associated with the established group A–E division of PfEMP1. A novel iterative homology block (HB) detection method was applied, allowing identification of 628 conserved minimal PfEMP1 building blocks, describing on average 83% of a PfEMP1 sequence. Using the HBs, similarities between domain classes were determined, and Duffy binding-like (DBL) domain subclasses were found in many cases to be hybrids of major domain classes. Related to this, a recombination hotspot was uncovered between DBL subdomains S2 and S3. The VarDom server is introduced, from which information on domain classes and homology blocks can be retrieved, and new sequences can be classified. Several conserved sequence elements were found, including: (1) residues conserved in all DBL domains predicted to interact and hold together the three DBL subdomains, (2) potential integrin binding sites in DBLα domains, (3) an acylation motif conserved in group A var genes suggesting N-terminal N-myristoylation, (4) PfEMP1 inter-domain regions proposed to be elastic disordered structures, and (5) several conserved predicted phosphorylation sites. Ideally, this comprehensive categorization of PfEMP1 will provide a platform for future studies on var/PfEMP1 expression and function. About one million African children die from malaria every year. The severity of malaria infections in part depends on which type of the parasitic protein PfEMP1 is expressed on the surface of the infected red blood cells. Natural immunity to malaria is mediated through antibodies to PfEMP1. Therefore hopes for a malaria vaccine based on PfEMP1 proteins have been raised. However, the large sequence variation among PfEMP1 molecules has caused great difficulties in executing and interpreting studies on PfEMP1. Here, we present an extensive sequence analysis of all currently available PfEMP1 sequences and show that PfEMP1 variation is ordered and can be categorized at different levels. In this way, PfEMP1 belong to group A–E and are composed of up to four components, each component containing specific DBL or CIDR domain subclasses, which in some cases form entire conserved domain combinations. Finally, each PfEMP1 can be described in high detail as a combination of 628 homology blocks. This dissection of PfEMP1 diversity also enables predictions of several functional sequence motifs relevant to the fold of PfEMP1 proteins and their ability to bind human receptors. We therefore believe that this description of PfEMP1 diversity is necessary and helpful for the design and interpretation of future PfEMP1 studies.
Collapse
Affiliation(s)
- Thomas S. Rask
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
- Centre for Medical Parasitology, Department of Medical Microbiology and Immunology, University of Copenhagen, Copehagen, Denmark
- * E-mail: (TSR); (TL)
| | - Daniel A. Hansen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Thor G. Theander
- Centre for Medical Parasitology, Department of Medical Microbiology and Immunology, University of Copenhagen, Copehagen, Denmark
| | - Anders Gorm Pedersen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Thomas Lavstsen
- Centre for Medical Parasitology, Department of Medical Microbiology and Immunology, University of Copenhagen, Copehagen, Denmark
- * E-mail: (TSR); (TL)
| |
Collapse
|
17
|
Esquivel-Frausto ME, Guerrero JA, Macías-Díaz JE. Activity pattern detection in electroneurographic and electromyogram signals through a heteroscedastic change-point method. Math Biosci 2010; 224:109-17. [PMID: 20093131 DOI: 10.1016/j.mbs.2010.01.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2009] [Revised: 11/05/2009] [Accepted: 01/11/2010] [Indexed: 12/01/2022]
Abstract
In this work, we propose a heteroscedastic method in the detection of activity patterns of electroneurographic and electromyogram signals involved in rhythmic activities of nerves and muscles, respectively. The electric behavior observed in such signals is characterized by phases of activity and silence. The beginning and the length of electrically active and electrically silent phases in a signal allow us to quantitatively analyze the changes and the effects on a rhythmic activity produced by experimental changes. In order to distinguish between these two phases, signals are assumed to be a sample of a time-dependent, normally distributed random variable with non-constant variance, and that the determination of the variance at each point allows us to determine in which phase is the signal. The parameters of the model are determined by means of an iterative process which maximizes the log-likelihood under the proposed model. Moreover, we apply our method to the determination of the activity phases and silence phases in sequences of experimental and synthetic electroneurographic and electromyogram signals. The results obtained with synthetic data show that the method performs well in the determination of these activity patterns. Finally, the study of particular signals simulated under a generalized autoregressive conditional heteroscedasticity model suggests the robustness of the method with respect to the assumption of independence.
Collapse
Affiliation(s)
- M E Esquivel-Frausto
- Departamento de Estadística, Universidad Autónoma de Aguascalientes, Aguascalientes, Ags. 20100, Mexico.
| | | | | |
Collapse
|
18
|
Boussau B, Guéguen L, Gouy M. A mixture model and a hidden markov model to simultaneously detect recombination breakpoints and reconstruct phylogenies. Evol Bioinform Online 2009; 5:67-79. [PMID: 19812727 PMCID: PMC2747125 DOI: 10.4137/ebo.s2242] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Homologous recombination is a pervasive biological process that affects sequences in all living organisms and viruses. In the presence of recombination, the evolutionary history of an alignment of homologous sequences cannot be properly depicted by a single bifurcating tree: some sites have evolved along a specific phylogenetic tree, others have followed another path. Methods available to analyse recombination in sequences usually involve an analysis of the alignment through sliding-windows, or are particularly demanding in computational resources, and are often limited to nucleotide sequences. In this article, we propose and implement a Mixture Model on trees and a phylogenetic Hidden Markov Model to reveal recombination breakpoints while searching for the various evolutionary histories that are present in an alignment known to have undergone homologous recombination. These models are sufficiently efficient to be applied to dozens of sequences on a single desktop computer, and can handle equivalently nucleotide or protein sequences. We estimate their accuracy on simulated sequences and test them on real data.
Collapse
Affiliation(s)
- Bastien Boussau
- Université de Lyon, université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, Villeurbanne F-69622, France.
| | | | | |
Collapse
|
19
|
Molecular mechanisms of recombination restriction in the envelope gene of the human immunodeficiency virus. PLoS Pathog 2009; 5:e1000418. [PMID: 19424420 PMCID: PMC2671596 DOI: 10.1371/journal.ppat.1000418] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2008] [Accepted: 04/07/2009] [Indexed: 11/23/2022] Open
Abstract
The ability of pathogens to escape the host's immune response is crucial for the establishment of persistent infections and can influence virulence. Recombination has been observed to contribute to this process by generating novel genetic variants. Although distinctive recombination patterns have been described in many viral pathogens, little is known about the influence of biases in the recombination process itself relative to selective forces acting on newly formed recombinants. Understanding these influences is important for determining how recombination contributes to pathogen genome and proteome evolution. Most previous research on recombination-driven protein evolution has focused on relatively simple proteins, usually in the context of directed evolution experiments. Here, we study recombination in the envelope gene of HIV-1 between primary isolates belonging to subtypes that recombine naturally in the HIV/AIDS pandemic. By characterizing the early steps in the generation of recombinants, we provide novel insights into the evolutionary forces that shape recombination patterns within viral populations. Specifically, we show that the combined effects of mechanistic processes that determine the locations of recombination breakpoints across the HIV-1 envelope gene, and purifying selection acting against dysfunctional recombinants, can explain almost the entire distribution of breakpoints found within this gene in nature. These constraints account for the surprising paucity of recombination breakpoints found in infected individuals within this highly variable gene. Thus, the apparent randomness of HIV evolution via recombination may in fact be relatively more predictable than anticipated. In addition, the dominance of purifying selection in localized areas of the HIV genome defines regions where functional constraints on recombinants appear particularly strong, pointing to vulnerable aspects of HIV biology. Recombination allows mixing portions of genomes of different origins, generating chimeric genes and genomes. With respect to the random generation of new mutations, it can lead to the simultaneous insertion of several substitutions, introducing more drastic changes in the genome. Furthermore, recombination is expected to yield a higher proportion of functional products since it combines variants that already exist in the population and that are therefore compatible with the survival of the organism. However, when recombination involves genetically distant strains, it can be constrained by the necessity to retain the functionality of the resulting products. In pathogens, which are subjected to strong selective pressures, recombination is particularly important, and several viruses, such as the human immunodeficiency virus (HIV), readily recombine. Here, we demonstrate the existence of preferential regions for recombination in the HIV-1 envelope gene when crossing sequences representative of strains observed to recombine in vivo. Furthermore, some recombinants give a decreased proportion of functional products. When considering these factors, one can retrace the history of most natural HIV recombinants. Recombination in HIV appears not so unpredictable, therefore, and the existence of recombinants that frequently generate nonfunctional products highlights previously unappreciated limits of the genetic flexibility of HIV.
Collapse
|
20
|
Westesson O, Holmes I. Accurate detection of recombinant breakpoints in whole-genome alignments. PLoS Comput Biol 2009; 5:e1000318. [PMID: 19300487 PMCID: PMC2651022 DOI: 10.1371/journal.pcbi.1000318] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2008] [Accepted: 02/04/2009] [Indexed: 11/25/2022] Open
Abstract
We propose a novel method for detecting sites of molecular recombination in multiple alignments. Our approach is a compromise between previous extremes of computationally prohibitive but mathematically rigorous methods and imprecise heuristic methods. Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment. We evaluate our method on benchmark datasets from previous studies on two recombinant pathogens, Neisseria and HIV-1, as well as simulated data. We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy. We show that our method does well inferring recombination breakpoints while at the same time maintaining practicality for larger datasets. In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions. In viral and bacterial pathogens, recombination has the ability to combine fitness-enhancing mutations. Accurate characterization of recombinant breakpoints in newly sequenced strains can provide information about the role of this process in evolution, for example, in immune evasion. Of particular interest are situations of an admixture of pathogen subspecies, recombination between whose genomes may change the apparent phylogenetic tree topology in different regions of a multiple-genome alignment. We describe an algorithm that can pinpoint recombination breakpoints to greater accuracy than previous methods, allowing detection of both short recombinant regions and long-range multiple crossovers. The algorithm is appropriate for the analysis of fast-evolving pathogen sequences where repeated substitutions may be observed at a single site in a multiple alignment (violating the “infinite sites” assumption inherent to some other breakpoint-detection algorithms). Simulations demonstrate the practicality of our implementation for alignments of longer sequences and more taxa than previous methods.
Collapse
Affiliation(s)
- Oscar Westesson
- Department of Bioengineering, University of California Berkeley, Berkeley, California, United States of America
| | - Ian Holmes
- Department of Bioengineering, University of California Berkeley, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|
21
|
Marttinen P, Baldwin A, Hanage WP, Dowson C, Mahenthiralingam E, Corander J. Bayesian modeling of recombination events in bacterial populations. BMC Bioinformatics 2008; 9:421. [PMID: 18840286 PMCID: PMC2579306 DOI: 10.1186/1471-2105-9-421] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2008] [Accepted: 10/07/2008] [Indexed: 11/10/2022] Open
Abstract
Background We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of strains in a data set increases. Results We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites. Conclusion A multitude of challenging simulation scenarios and an analysis of real data from seven housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities offered by our approach. The software is freely available for download at URL .
Collapse
Affiliation(s)
- Pekka Marttinen
- Department of Mathematics and statistics, University of Helsinki, FIN-00014, Finland.
| | | | | | | | | | | |
Collapse
|
22
|
Affiliation(s)
- Bruce Rannala
- Genome Center and Department of Evolution and Ecology, University of California, Davis, California 95616;
| | - Ziheng Yang
- Department of Biology, University College London, London WC1E 6BT United Kingdom; Laboratory of Biometrics, Graduate School of Agriculture and Life Sciences, University of Tokyo, Tokyo, Japan;
| |
Collapse
|
23
|
Wan X, Weng J, Zhai H, Wang J, Lei C, Liu X, Guo T, Jiang L, Su N, Wan J. Quantitative trait loci (QTL) analysis for rice grain width and fine mapping of an identified QTL allele gw-5 in a recombination hotspot region on chromosome 5. Genetics 2008; 179:2239-52. [PMID: 18689882 PMCID: PMC2516094 DOI: 10.1534/genetics.108.089862] [Citation(s) in RCA: 115] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2008] [Accepted: 05/20/2008] [Indexed: 11/18/2022] Open
Abstract
Rice grain width and shape play a crucial role in determining grain quality and yield. The genetic basis of rice grain width was dissected into six additive quantitative trait loci (QTL) and 11 pairs of epistatic QTL using an F(7) recombinant inbred line (RIL) population derived from a single cross between Asominori (japonica) and IR24 (indica). QTL by environment interactions were evaluated in four environments. Chromosome segment substitution lines (CSSLs) harboring the six additive effect QTL were used to evaluate gene action across eight environments. A major, stable QTL, qGW-5, consistently decreased rice grain width in both the Asominori/IR24 RIL and CSSL populations with the genetic background Asominori. By investigating the distorted segregation of phenotypic values of rice grain width and genotypes of molecular markers in BC(4)F(2) and BC(4)F(3) populations, qGW-5 was dissected into a single recessive gene, gw-5, which controlled both grain width and length-width ratio. gw-5 was narrowed down to a 49.7-kb genomic region with high recombination frequencies on chromosome 5 using 6781 BC(4)F(2) individuals and 10 newly developed simple sequence repeat markers. Our results provide a basis for map-based cloning of the gw-5 gene and for marker-aided gene/QTL pyramiding in rice quality breeding.
Collapse
Affiliation(s)
- Xiangyuan Wan
- National Key Laboratory for Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Martins LDO, Leal E, Kishino H. Phylogenetic detection of recombination with a Bayesian prior on the distance between trees. PLoS One 2008; 3:e2651. [PMID: 18612422 PMCID: PMC2440540 DOI: 10.1371/journal.pone.0002651] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2008] [Accepted: 06/07/2008] [Indexed: 11/18/2022] Open
Abstract
Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected. In this context, a recombination hotspot can be inferred in phylogenetic analyses if we observe several consecutive breakpoints. We developed a distance measure between unrooted topologies that closely resembles the number of recombinations. By introducing a prior distribution on these recombination distances, a Bayesian hierarchical model was devised to detect phylogenetic inconsistencies occurring due to recombinations. This model relaxes the assumption of known parental sequences, still common in HIV analysis, allowing the entire dataset to be analyzed at once. On simulated datasets with up to 16 taxa, our method correctly detected recombination breakpoints and the number of recombination events for each breakpoint. The procedure is robust to rate and transition∶transversion heterogeneities for simulations with and without recombination. This recombination distance is related to recombination hotspots. Applying this procedure to a genomic HIV-1 dataset, we found evidence for hotspots and de novo recombination.
Collapse
|