1
|
Abstract
Microarrays are part of a new class of biotechnologies, which allow the monitoring of expression levels of thousands of genes simultaneously. In microarray data analysis, the comparison of gene expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large data sets. To identify genes with altered expression under two experimental conditions, we propose a nonparametric statistical approach. Specifically, we propose estimating the distributions of a t-type statistic and its null statistic, using kernel methods. A comparison of these two distributions by means of a likelihood ratio test can identify genes with significantly changed expressions. A new method to provide more stable estimates of tail probabilities is proposed, as well as a method for the calculation of the cut-off point and the acceptance region. The methodology is applied to a leukaemia data set containing expression levels of 7129 genes, and is compared with normal mixture model and the traditional t-test.
Collapse
Affiliation(s)
- Ali Gannoun
- UMR CNRS 5149, Université Montpellier II, France, Department
of Microbiology, Howard University, Washington, DC, USA, Statistical
Genetics and Bioinformatics Unit, Howard University, National Human Genome
Center, Washington, DC, USA,
| | | | - Wolfgang Urfer
- Department of Statistics, University of Dortmund, Germany
| | - George E. Bonney
- Department of Microbiology, Howard University, Washington, DC, USA,
Statistical Genetics and Bioinformatics Unit, Howard University, National
Human Genome Center, Washington, DC, USA
| |
Collapse
|
2
|
Abstract
In recent times genetic network analysis has been found to be useful in the study of gene-gene interactions, and the study of gene-gene correlations is a special analysis of the network. There are many methods for this goal. Most of the existing methods model the relationship between each gene and the set of genes under study. These methods work well in applications, but there are often issues such as non-uniqueness of solution and/or computational difficulties, and interpretation of results. Here we study this problem from a different point of view: given a measure of pair wise gene-gene relationship, we use the technique of pattern image restoration to infer the optimal network pair wise relationships. In this method, the solution always exists and is unique, and the results are easy to interpret in the global sense and are computationally simple. The regulatory relationships among the genes are inferred according to the principle that neighboring genes tend to share some common features. The network is updated iteratively until convergence, each iteration monotonously reduces entropy and variance of the network, so the limit network represents the clearest picture of the regulatory relationships among the genes provided by the data and recoverable by the model. The method is illustrated with a simulated data and applied to real data sets.
Collapse
Affiliation(s)
- Ao Yuan
- National Human Genome Center, Howard University, Washington DC, USA
| | | | | | | |
Collapse
|
3
|
Abbas MM, Bobo LD, Berka N, Bonney GE, Apprey V, Dunston GM. 81-P: Association of polymorphisms in innate and adaptive immune response genes in an East African cohort with trachoma. Hum Immunol 2008. [DOI: 10.1016/j.humimm.2008.08.100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
4
|
Abbas M, Bobo LD, Hsieh YH, Berka N, Dunston G, Bonney GE, Apprey V, Quinn TC, West SK. Human leukocyte antigen (HLA)-B, DRB1, and DQB1 allotypes associated with disease and protection of trachoma endemic villagers. Invest Ophthalmol Vis Sci 2008; 50:1734-8. [PMID: 18824733 DOI: 10.1167/iovs.08-2053] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
PURPOSE Trachoma remains the leading preventable infectious cause of blindness in developing countries. Human leukocyte antigen (HLA) associations with ocular disease severity and persistent Chlamydia trachomatis infection of Tanzanians living in trachoma-endemic villages were examined to determine possible protective candidate allotypes for vaccine development. METHODS Buccal swab scrapes were taken from subjects in the Trichiasis Study Group (TSG), which studied females only, and the Family Trachoma Study (FTS), which compared persistently infected probands who had severe disease with disease-free siblings and parents. DNA was purified for polymerase chain reaction sequence-specific oligonucleotide identification of HLA-DRB1, DQB1, and B allotypes. Infection was detected from conjunctival scrapes using a C. trachomatis-specific PCR-enzyme immunoassay for the MOMP-1 gene. RESULTS In the TSG, DR*B11 (odds ratio [OR], 0.48; 95% confidence interval [CI], 0.26-0.90; P=0.02) was significantly associated with lack of trichiasis, whereas HLA-B*07 (OR, 3.26; 95% CI, 1.42-7.49; P=0.004) and HLA-B*08 (OR, 5.12; 95% CI, 1.74-15.05; P=0.001) were associated with trichiasis. In addition, HLA-B*14 was significantly associated with inflammatory trachoma + follicular trachoma (OR, 3.76; 95% CI, 1.70-8.33; P=0.04). There were no significant allele frequencies for the FTS. CONCLUSIONS The data suggest that HLA-DRB*11 may offer protection from trichiasis in trachoma hyperendemic villages. Complete allotype identification and designation of its respective protective CD4(+) T-cell antigens could provide a testable candidate vaccine for blindness prevention. Additionally, buccal swab DNA was sufficiently stable when acquired under harsh field conditions and stored long term in the freezer for low-resolution HLA typing.
Collapse
Affiliation(s)
- Muneer Abbas
- National Human Genome Center, Howard University, Washington, DC, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Yuan A, Bonney GE. A multivariate regression model for continuous genetic traits. J R Stat Soc Ser C Appl Stat 2006. [DOI: 10.1111/j.1467-9876.2006.00552.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
6
|
Yuan A, Chen G, Rotimi C, Bonney GE. A statistical framework for haplotype block inference. J Bioinform Comput Biol 2006; 3:1021-38. [PMID: 16278945 DOI: 10.1142/s021972000500151x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2004] [Revised: 03/29/2005] [Accepted: 03/30/2005] [Indexed: 11/18/2022]
Abstract
The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets.
Collapse
Affiliation(s)
- Ao Yuan
- Statistical Genetics and Bioinformatics Unit, Howard University, Washington, DC 20059, USA.
| | | | | | | |
Collapse
|
7
|
Abstract
Recently, alcohol-related traits have been shown to have a genetic component. Here, we study the association of specific genetic measures in one of the three sets of electrophysiological measures in families with alcoholism distributed as part of the Genetic Analysis Workshop 14 data, the NTTH (non-target case of Visual Oddball experiment for 4 electrode placements) phenotypes: ntth1, ntth2, ntth3, and ntth4. We focused on the analysis of the 786 Affymetrix markers on chromosome 4. Our desire was to find at least a partial answer to the question of whether ntth1, ntth2, ntth3, and ntth4 are separately or jointly genetically controlled, so we studied the principal components that explain most of the covariation of the four quantitative traits. The first principal component, which explains 70% of the covariation, showed association but not genetic linkage to two markers: tsc0272102 and tsc0560854. On the other hand, ntth1 appeared to be the trait driving the variation in the second principal component, which showed association and genetic linkage at markers in four regions: tsc0045058, tsc1213381, tsc0055068, and tsc0051777 at map distances 53.26, 85.42, 89.31, and 172.86, respectively. These results show that the partial answer to our starting question for this brief analysis is that the NTTH phenotypes are not jointly genetically controlled. The component ntth1 displays marked genetic linkage.
Collapse
Affiliation(s)
- Ao Yuan
- National Human Genome Center, Howard University, Washington DC, USA
- Department of Community Health and Family Medicine, Howard University, Washington DC, USA
| | - Victor Apprey
- National Human Genome Center, Howard University, Washington DC, USA
- Department of Community Health and Family Medicine, Howard University, Washington DC, USA
| | - Jules P Harrell
- Department of Psychology, Howard University, Washington DC, USA
| | - Robert E Taylor
- Alcoholism Research Center, Howard University, Washington DC, USA
| | - George E Bonney
- National Human Genome Center, Howard University, Washington DC, USA
- Department of Community Health and Family Medicine, Howard University, Washington DC, USA
- Division of Medical Genetics, Department of Pediatrics, Howard University, Washington DC, USA
| |
Collapse
|
8
|
Abstract
A genetic analysis of age of onset of alcoholism was performed on the Collaborative Study on the Genetics of Alcoholism data released for Genetic Analysis Workshop 14. Our study illustrates an application of the log-normal age of onset model in our software Genetic Epidemiology Models (GEMs). The phenotype ALDX1 of alcoholism was studied. The analysis strategy was to first find the markers of the Affymetrix SNP dataset with significant association with age of onset, and then to perform linkage analysis on them. ALDX1 revealed strong evidence of linkage for marker tsc0041591 on chromosome 2 and suggestive linkage for marker tsc0894042 on chromosome 3. The largest separation in mean ages of onset of ALDX1 was 19.76 and 24.41 between male smokers who are carriers of the risk allele of tsc0041591 and the non-carriers, respectively. Hence, male smokers who are carriers of marker tsc0041591 on chromosome 2 have an average onset of ALDX1 almost 5 years earlier than non-carriers.
Collapse
Affiliation(s)
- Victor Apprey
- National Human Genome Center, Howard University, Washington DC 20059-0001 USA
- Department of Community Health and Family Medicine, Howard University, Washington DC 20059-0001 USA
| | - Joseph Afful
- Howard University Cancer Center, Howard University, Washington DC 20059-0001 USA
| | - Jules P Harrell
- Department of Psychology, Howard University, Washington DC 20059-0001 USA
| | - Robert E Taylor
- Alcohol Research Center, Department of Pharmacology, Howard University, Washington DC 20059-0001 USA
| | - George E Bonney
- National Human Genome Center, Howard University, Washington DC 20059-0001 USA
- Department of Community Health and Family Medicine, Howard University, Washington DC 20059-0001 USA
- Division of Medical Genetics, Department of Pediatrics, Howard University, Washington DC 20059-0001 USA
- Howard University Cancer Center, Howard University, Washington DC 20059-0001 USA
| |
Collapse
|
9
|
Yue Q, Apprey V, Bonney GE. Which strategy is better for linkage analysis: single-nucleotide polymorphisms or microsatellites? Evaluation by identity-by-state-identity-by-descent transformation affected sib-pair method on GAW14 data. BMC Genet 2005; 6 Suppl 1:S16. [PMID: 16451621 PMCID: PMC1866774 DOI: 10.1186/1471-2156-6-s1-s16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The central issue for Genetic Analysis Workshop 14 (GAW14) is the question, which is the better strategy for linkage analysis, the use of single-nucleotide polymorphisms (SNPs) or microsatellite markers? To answer this question we analyzed the simulated data using Duffy's SIB-PAIR program, which can incorporate parental genotypes, and our identity-by-state - identity-by-descent (IBS-IBD) transformation method of affected sib-pair linkage analysis which uses the matrix transformation between IBS and IBD. The advantages of our method are as follows: the assumption of Hardy-Weinberg equilibrium is not necessary; the parental genotype information maybe all unknown; both IBS and its related IBD transformation can be used in the linkage analysis; the determinant of the IBS-IBD transformation matrix provides a quantitative measure of the quality of the marker in linkage analysis. With the originally distributed simulated data, we found that 1) for microsatellite markers there are virtually no differences in types I and II error rates when parental genotypes were or were not used; 2) on average, a microsatellite marker has more power than a SNP marker does in linkage detection; 3) if parental genotype information is used, SNP markers show lower type I error rates than microsatellite markers; and 4) if parental genotypes are not available, SNP markers show considerable variation in type I error rates for different methods.
Collapse
Affiliation(s)
- Qingqi Yue
- National Human Genome Center, Howard University, Washington, DC 20059, USA
- Department of Community Health and Family Medicine, Howard University, Washington, DC 20059, USA
| | - Victor Apprey
- National Human Genome Center, Howard University, Washington, DC 20059, USA
- Department of Community Health and Family Medicine, Howard University, Washington, DC 20059, USA
| | - George E Bonney
- National Human Genome Center, Howard University, Washington, DC 20059, USA
- Department of Community Health and Family Medicine, Howard University, Washington, DC 20059, USA
- Division of Medical Genetics, Department of Pediatrics, Howard University, Washington, DC 20059, USA
| |
Collapse
|
10
|
|
11
|
Keita SOY, Kittles RA, Royal CDM, Bonney GE, Furbert-Harris P, Dunston GM, Rotimi CN. Conceptualizing human variation. Nat Genet 2004; 36:S17-20. [PMID: 15507998 DOI: 10.1038/ng1455] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2004] [Accepted: 09/23/2004] [Indexed: 11/09/2022]
Abstract
What is the relationship between the patterns of biological and sociocultural variation in extant humans? Is this relationship accurately described, or best explained, by the term 'race' and the schema of 'racial' classification? What is the relationship between 'race', genetics and the demographic groups of society? Can extant humans be categorized into units that can scientifically be called 'races'? These questions underlie the discussions that address the explanations for the observed differences in many domains between named demographic groups across societies. These domains include disease incidence and prevalence and other variables studied by biologists and social scientists. Here, we offer a perspective on understanding human variation by exploring the meaning and use of the term 'race' and its relationship to a range of data. The quest is for a more useful approach with which to understand human biological variation, one that may provide better research designs and inform public policy.
Collapse
Affiliation(s)
- S O Y Keita
- National Human Genome Center, College of Medicine, Howard University, Washington, DC 20060, USA
| | | | | | | | | | | | | |
Collapse
|
12
|
Yuan A, Chen G, Chen Y, Rotimi C, Bonney GE. Identifying the susceptibility gene(s) in a set of trait-linked genes using genotype data. Genetics 2004; 167:1445-59. [PMID: 15280254 PMCID: PMC1470967 DOI: 10.1534/genetics.103.021600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
There are generally three steps to isolate a disease linkage-susceptibility gene: genome-wide scan, fine mapping, and, last, positional cloning. The last step is time consuming and involves intensive laboratory work. In some cases, fine mapping cannot proceed further on a set of markers because they are tightly linked. For years, genetic statisticians have been trying different ways to narrow the fine-mapping results to provide some guidance for the next step of laboratory work. Although these methods are practical and efficient, most of them are based on IBD data, which usually can be inferred only from the genotype data with some uncertainty. The corresponding methods thus have no greater power than one using genotype data directly. Also, IBD-based methods apply only to relative pair data. Here, using genotype data, we have developed a statistical hypothesis-testing method to pinpoint a SNP, or SNPs, suspected of responsibility for a disease trait linkage among a set of SNPs tightly linked in a region. Our method uses genotype data of affected individuals or case-control studies, which are widely available in the laboratory. The testing statistic can be constructed using any genotype-based disease-marker disequilibrium measure and is asymptotically distributed as a chi-square mixture. This method can be used for singleton data, relative pair data, or general pedigree data. We have applied the method to simulated data as well as a real data set; it gives satisfactory results.
Collapse
Affiliation(s)
- Ao Yuan
- Statistical Genetics and Bioinformatics Unit, National Human Genome Center, Howard University, Washington, DC 20059, USA.
| | | | | | | | | |
Collapse
|
13
|
Abstract
The assumption of Hardy-Weinberg equilibrium (HWE) among alleles is of fundamental importance in genetic studies. There are numerous testing methods for it using genotype counts data. The exact test is used when the sample size is not large enough for asymptotic approximations. There are several numerical methods to carry out this test, such as complete enumeration, Monte Carlo and Markov chain Monte Carlo simulations. Complete enumeration is impractical in many applications, especially when the table counts are large. The Monte Carlo method is simple to use but still difficult when the table counts become large. The Markov chain Monte Carlo method, by sampling a sub-table each time, is suitable for this latter situation. Based on switches among a few (no more than four) cells, the existing Markov chain samplers are highly dependent and inefficient for large tables. Here we consider a new Markov chain sampling, in which a sub-table of user-specified size is updated at each iteration. The resulting chain is less dependent, and the sampling is flexible and efficient. The conventional test for HWE is based on a few test statistics, such as the likelihood and the chi-squared statistic. To expand the family of test statistics, we consider a class of divergence measures for the departure of HWE. Examples are given as illustrations.
Collapse
Affiliation(s)
- Ao Yuan
- Statistical Genetics and Bioinformatics Unit, National Human Genome Center, Howard University, Washington, DC 20059, USA
| | | |
Collapse
|
14
|
Yuan A, Bonney GE. Two new recursive likelihood calculation methods for genetic analysis. Hum Hered 2003; 54:82-98. [PMID: 12566740 DOI: 10.1159/000067664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2002] [Accepted: 09/06/2002] [Indexed: 11/19/2022] Open
Abstract
Recursive likelihood calculations for genetic analysis with ungenotyped pedigree data employ variations of the Elston-Stewart (ES) or the Lander-Green (LG) algorithms. With the ES algorithm, the number of loci may be limited but not the pedigree size. With the LG algorithm, the reverse is the case. We introduce two new algorithms for the computation of regressive likelihoods for pedigrees with multivariate traits. The first is an alternative formulation of our existing model, which leads to a simpler form in the binary trait, polygenic and mixed model cases. The second is an approximation model, which is computationally efficient. These methods apply to both continuous and binary traits, in the oligogenic and polygenic cases. Both methods coincide in the binary case. We considered these methods for cases in which all the traits are controlled by a single locus, with each trait controlled by one locus independent to the others. Simulation studies and analysis of a real data are presented for segregation analysis as illustrations. These methods can also be used in other model-based analyses. These methods are implemented in G.E.M.S., the genetic epidemiology models software.
Collapse
Affiliation(s)
- Ao Yuan
- National Human Genome Center, Howard University, Statistical Genetics and Bioinformatics Unit, Washington DC, USA.
| | | |
Collapse
|
15
|
Gannoun A, Saracco J, Yuan A, Bonney GE. On Adaptive Transformation–Retransformation Estimate of Conditional Spatial Median. COMMUN STAT-THEOR M 2003. [DOI: 10.1081/sta-120023262] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
16
|
Chaudru V, Laing A, Dunston GM, Adams-Campbell LL, Williams R, Lynch JJ, Leffall LD, DeWitty RL, Gause BL, Bonney GE, Demenais F. Interactions between genetic and reproductive factors in breast cancer risk in a population-based sample of African-American families. Genet Epidemiol 2002; 22:285-97. [PMID: 11984862 DOI: 10.1002/gepi.0171] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Incidence of breast cancer (BC) varies among ethnic groups, with higher rates in white than in African-American women. Until now, most epidemiological and genetic studies have been carried out in white women. To investigate whether interactions between genetic and reproductive risk factors may explain part of the ethnic disparity in BC incidence, a genetic epidemiology study was conducted, between 1989 and 1994, at the Howard University Cancer Center (Washington, DC), which led to the recruitment of 245 African-American families. Segregation analysis of BC was performed by use of the class D regressive logistic model that allows for censored data to account for a variable age of onset of disease, as implemented in the REGRESS program. Segregation analysis of BC was consistent with a putative dominant gene effect (P < 0.000001) and residual sister-dependence (P < 0.0001). This putative gene was found to interact significantly with age at menarche (P = 0.048), and an interaction with a history of spontaneous abortions was suggested (P = 0.08). A late age at menarche increased BC risk in gene carriers but had a protective effect in non-gene carriers. A history of spontaneous abortions had a protective effect in gene carriers and increased BC risk in non-gene carriers. Our findings agree partially with a similar analysis of French families showing a significant gene x parity interaction and a suggestive gene x age at menarche interaction. Investigating gene x risk factor interactions in different populations may have important implications for further biological investigations and for BC risk assessment.
Collapse
|
17
|
Abstract
Ascertainment concerns the manner by which families are selected for genetic analysis and how to correct for it in likelihood models. Because such families are often neither drawn at random nor selected according to well-defined rules, the problem of ascertainment correction in the genetic analysis of family data has proved durable. This paper undertakes a systematic study of ascertainment corrections in terms of smaller distinct units, which will usually be sibships, nuclear families, or small pedigrees. Three principal results are presented. The first is that ascertainment corrections in likelihood models for family data can be made in terms of smaller units, without breaking up the pedigree. The second is that the appropriate correction for single ascertainment in a unit is the reciprocal of the sum of the marginal probabilities of all the persons relevant to its ascertainment, as if affected. The third result is a generalization of the single ascertainment-correction formula to k-plex ascertainment, in which each unit has k or more affecteds. The correction is the reciprocal of the sum of the joint probabilities of all distinct sets of k persons in the unit, as if they were all affected. In extended families, two additional ascertainment schemes will be considered and explicit formulas will be presented. One of these schemes is "uniform-proband-status ascertainment," in which nonmembers of a given unit have the same chance as members to become probands if they are affected; the other scheme is the "inverse law of ascertainment," in which the chance that nonmembers of a unit will become probands for that unit decreases with degree of relationship. Several specific recommendations are made for further study.
Collapse
Affiliation(s)
- G E Bonney
- Department of Biostatistics, Fox Chase Cancer Center, Philadelphia, USA.
| |
Collapse
|
18
|
Amfoh KK, Shaw RF, Bonney GE. The Use of Logistic Models for the Analysis of Codon Frequencies of DNA Sequences in Terms of Explanatory Variables. Biometrics 1994. [DOI: 10.2307/2533443] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
19
|
Amfoh KK, Shaw RF, Bonney GE. The use of logistic models for the analysis of codon frequencies of DNA sequences in terms of explanatory variables. Biometrics 1994; 50:1054-63. [PMID: 7786987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The development of the regressive logistic model applicable to the analysis of codon frequencies of DNA sequences in terms of explanatory variables is presented. A codon is a triplet of nucleotides that code for an amino acid, and may be considered as a trivariate response (B1, B2, B3), where Bi (i = 1, 2, 3) is a categorical random variable with values A, C, G, T. The linear order of bases in the DNA and possible statistical dependence of the bases in a given codon make the regressive logistic model a suitable tool for the analysis of codon frequencies. A problem of structural zeros arises from the fact that the stopping codons (terminators) do not code for amino acids; this is solved by normalizing the likelihood function. Codon frequencies may also depend on the function of the gene and they are known to differ between genes of the same genome. Differences also occur between synonymous codons for the same amino acid. Thus, the use of covariates that differ between synonymous codons as well as covariates that are constant within codons of the same amino acid may be useful in explaining the frequencies. As an illustration, the method is applied to the human mitochondrial genome using the following as explanatory variables: (1) TSCORE, a measure of the number of single base mutations required for a given codon to become a terminator; (2) AARISK, an indicator of a codon's ability of changing by a single base substitution to triplets coding for amino acids with very different characteristics; (3) AVDIST, a measure of the typicality of the amino acid coded for by the triplets. The results indicate that models that incorporate dependency structure and covariates are to be preferred to either the models comprising covariates alone or dependency structure alone.
Collapse
Affiliation(s)
- K K Amfoh
- Division of Biostatistics, Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111
| | | | | |
Collapse
|
20
|
Li Z, Bonney GE, Lathrop GM, Rao DC. Genetic analysis combining path analysis with regressive models: the TAU model of multifactorial transmission. Hum Hered 1994; 44:305-11. [PMID: 7860082 DOI: 10.1159/000154236] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
We have extended regressive models by incorporating a simple path model (the TAU model). This was achieved for both class A and class D regressive models by expressing the residual correlations in the regressive models in terms of parameters of the path model. We have presented explicit solutions for path coefficients in terms of the residual correlations. These methods were applied to a French-Canadian family study on body mass index. It was found that the estimate of pseudopolygenic heritability was robust under class A (t2 = 0.28) and class D (t2 = 0.26) models.
Collapse
Affiliation(s)
- Z Li
- Division of Biostatistics, Washington University School of Medicine, St. Louis, Mo. 63110
| | | | | | | |
Collapse
|
21
|
Rubin LA, Amos CI, Wade JA, Martin JR, Bale SJ, Little AH, Gladman DD, Bonney GE, Rubenstein JD, Siminovitch KA. Investigating the genetic basis for ankylosing spondylitis. Linkage studies with the major histocompatibility complex region. Arthritis Rheum 1994; 37:1212-20. [PMID: 8053961 DOI: 10.1002/art.1780370816] [Citation(s) in RCA: 84] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
OBJECTIVE To assess the hypothesis that B27 or a gene(s) in close proximity (e.g., within or near the major histocompatibility complex [MHC]) represents a disease-causing ankylosing spondylitis (AS) gene, and therefore contributes directly to the pathogenesis of this disorder. METHODS MHC haplotypes were determined by both serologic and molecular analyses in 15 multiple-case AS families from Toronto and Newfoundland. Segregation of MHC haplotypes with AS within these families was examined by linkage and identity-by-descent analyses. Attributable risk estimates for various genetic markers and for sex were calculated. RESULTS Linkage analyses established significant linkage between AS and the MHC, the maximal logarithm of odds (LOD) score being 3.48 at a recombination frequency (O) of 0.05. In a second analysis in which the population association of the MHC gene HLA-B27 with AS was taken into account, the maximal LOD score was 7.5 at O = 0.05. Identity-by-descent analyses showed a significant departure from random segregation among affected avuncular (P < 0.05) and cousin (P < 0.01) pairs. The presence of HLA-B40 in HLA-B27 positive individuals increased the risk for disease more than 3-fold, confirming previous reports. Disease susceptibility modeling suggested an autosomal dominant pattern of inheritance, with penetrance of approximately 20%. CONCLUSION These data provide the first conclusive demonstration of linkage between the MHC region and AS, and confirm that genes within this region contribute directly to the genetic susceptibility for AS.
Collapse
Affiliation(s)
- L A Rubin
- University of Toronto, Ontario, Canada
| | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Li Z, Bonney GE, Rao DC. Genetic analysis combining path analysis with regressive models: the BETA path model of polygenic and familial environmental transmission. Genet Epidemiol 1994; 11:431-42. [PMID: 7835689 DOI: 10.1002/gepi.1370110505] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
We have extended the class D regressive model for the purpose of combined path and segregation analyses by incorporating the BETA path model. We have done this by expressing correlations among residuals from major genotype (RMGs) of family members under the class D regressive model as functions of path coefficients under the BETA path model. The likelihood function under the combined model was factorized into a product of conditional densities, which is dominated by bivariate normal densities. Statistical inferences under the combined model are analogous to those under the class D regressive model.
Collapse
Affiliation(s)
- Z Li
- Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri
| | | | | |
Collapse
|
23
|
Laing AE, Demenais FM, Williams R, Kissling G, Chen VW, Bonney GE. Breast cancer risk factors in African-American women: the Howard University Tumor Registry experience. J Natl Med Assoc 1993; 85:931-9. [PMID: 8126744 PMCID: PMC2568204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
This retrospective case-control study examines risk factors for breast cancer in African-American women, who recently have shown an increase in the incidence of this malignancy, especially in younger women. Our study involves 503 cases from the Howard University Hospital and 539 controls from the same hospital, seen from 1978 to 1987. Using information culled from medical charts, an analysis of various factors for their effect on breast cancer risk was made. The source of data necessarily meant that some known risk factors were missing. Increases in risk were found for known risk factors such as decreased age at menarche and a family history of breast cancer. No change in risk was observed with single marital status, nulliparity, premenopausal status, or lactation. An increased odds ratio was found for induced abortions, which was significant in women diagnosed after 50 years of age. Spontaneous abortions had a small but significant protective effect in the same subgroup of women. Birth control pill usage conferred a significantly increased risk. It is of note that abortions and oral contraceptive usage, not yet studied in African Americans, have been suggested as possibly contributing to the recent increase in breast cancer in young African-American women.
Collapse
Affiliation(s)
- A E Laing
- Division of Biostatistics, Howard University Cancer Ctr, Washington, DC 20060
| | | | | | | | | | | |
Collapse
|
24
|
Borecki IB, Bonney GE, Rice T, Bouchard C, Rao DC. Influence of genotype-dependent effects of covariates on the outcome of segregation analysis of the body mass index. Am J Hum Genet 1993; 53:676-87. [PMID: 8352276 PMCID: PMC1682429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Several recent studies of the body mass index (BMI) have provided support for a recessive major gene influencing heaviness in humans. Segregation analysis of the BMI was carried out recently in a series of randomly sampled French-Canadian families to determine whether we could replicate the major gene finding by using a residual phenotype adjusted for the effects of age and sex. The best model included a recessive major effect for high BMI values with residual familial resemblance; however, Mendelian transmission could not be confirmed, and the no-transmission hypothesis (where all the tau's are constrained to be equal) was not rejected. Considering that the BMI is a complex phenotype affected by many factors and that there are known variations in body composition during growth and aging, we undertook a reanalysis of the data, using a model that allowed the estimation of genotype-specific age and gender effects. New tests on the transmission parameters satisfy the criteria for interfering Mendelian segregation. The results suggest that individuals with the "high" recessive genotype show the greatest degree of heaviness at birth, with a subsequent trend toward lower values throughout life, while individuals with the dominant "normal" genotypes show no appreciable trends with age. In addition, the "high" genotype appears to confer a greater degree of heaviness in females as compared with males. These results, along with other observations from the data, suggest that, while a recessive single gene influence may be discernible, the phenotypic expression of the BMI is likely to be complicated by genotype x environment interactions and, possibly, by the action of other loci. Further, the data also are consistent with the hypothesis that modifying factors may include the adoption of a more prudent life-style by individuals genetically predisposed to heaviness and a secular increase in the incidence, prevalence, and potency of environmentally based triggers leading to a higher penetrance of the "heavy" genotype in the young.
Collapse
Affiliation(s)
- I B Borecki
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO 63110
| | | | | | | | | |
Collapse
|
25
|
Abstract
In this paper we consider the compound version of the class D regressive models for a p variate phenotypic outcome. The likelihood function is noted and the results are illustrated with the Donner Laboratory data, without the assumption of a major gene for the brevity of presentation.
Collapse
Affiliation(s)
- P Bagchi
- Fox Chase Cancer Center, Philadelphia, Pennsylvania
| | | | | |
Collapse
|
26
|
Hu N, Dawsey SM, Wu M, Bonney GE, He LJ, Han XY, Fu M, Taylor PR. Familial aggregation of oesophageal cancer in Yangcheng County, Shanxi Province, China. Int J Epidemiol 1992; 21:877-82. [PMID: 1468848 DOI: 10.1093/ije/21.5.877] [Citation(s) in RCA: 71] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Oesophageal cancer is the second most common cause of cancer death in China and is particularly prevalent in northern China. Genetic factors have been studied less than environmental factors in the aetiology of this disease. This study was conducted to evaluate familial aggregation of oesophageal cancer. All households in Yangcheng County were interviewed in 1979 to determine family history of oesophageal cancer. In 1989, vital status for all family members from three Yangcheng villages was determined and re-interviews were conducted among families who reported a positive family history of oesophageal cancer in 1979. Risk of oesophageal cancer was evaluated by comparing family and individual rates of oesophageal cancer during the 1979-1989 interval stratified by the number of family members with oesophageal cancer prior to 1979. More families with prior oesophageal cancer history reported new oesophageal cancer deaths during the follow-up period than families without prior history (19% versus 5%). Oesophageal cancer rates increased with increasing positivity of family history, and adjustment for other risk factors did not substantially alter this result. We conclude that these data provide evidence for familial aggregation of oesophageal cancer.
Collapse
Affiliation(s)
- N Hu
- Department of Cell Biology, Chinese Academy of Medical Sciences, Beijing
| | | | | | | | | | | | | | | |
Collapse
|
27
|
Abstract
BACKGROUND Until recently, environmental factors were considered of greatest importance in the etiology of esophageal cancer. Recent studies, however, have suggested that genetic factors also have a role. PURPOSE Since no formal genetic study of this cancer has been previously reported, we carried out a statistical analysis to determine how important genetic factors are in the etiology of esophageal cancer in high-incidence areas of North China. METHODS Using a logistic regressive model, we performed a segregation analysis on 221 high-risk nuclear families from the Yaocun Commune, Linxian, Henan Province of China, with at least one affected family member and with all offspring aged 40 years or older. Three models, the mendelian, the environmental, and the no-transmission models, were each compared with the general-transmission model that incorporated both genetic and environmental factors. RESULTS According to Akaike's Information Criterion, the mendelian model provided the best fit for the data. By the chi-square test, the mendelian inheritance model was not rejected, but the environmental and the no-transmission models were both rejected. CONCLUSION The segregation analysis indicated an autosomal recessive mendelian inheritance, with the alleged mendelian gene present at a frequency of 19%, causing 4% of this population to be predisposed to develop esophageal cancer. Large, unmeasured, residual familial factors, however, were also significant. IMPLICATIONS Both an autosomal recessive gene and unexplained environmental factors appear to be important in the etiology of esophageal cancer in the subpopulation studied.
Collapse
Affiliation(s)
- C L Carter
- Division of Cancer Prevention and Control, National Cancer Institute (NCI), Rockville, Md
| | | | | | | | | | | |
Collapse
|
28
|
Demenais FM, Laing AE, Bonney GE. Numerical comparisons of two formulations of the logistic regressive models with the mixed model in segregation analysis of discrete traits. Genet Epidemiol 1992; 9:419-35. [PMID: 1487139 DOI: 10.1002/gepi.1370090605] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Segregation analysis of discrete traits can be conducted by the classical mixed model and the recently introduced regressive models. The mixed model assumes an underlying liability to the disease, to which a major gene, a multifactorial component, and random environment contribute independently. Affected persons have a liability exceeding a threshold. The regressive logistic models assume that the logarithm of the odds of being affected is a linear function of major genotype effects, the phenotypes of older relatives, and other covariates. A formulation of the regressive models, based on an underlying liability model, has been recently proposed. The regression coefficients on antecedents are expressed in terms of the relevant familial correlations and a one-to-one correspondence with the parameters of the mixed model can thus be established. Computer simulations are conducted to evaluate the fit of the two formulations of the regressive models to the mixed model on nuclear families. The two forms of the class D regressive model provide a good fit to a generated mixed model, in terms of both hypothesis testing and parameter estimation. The simpler class A regressive model, which assumes that the outcomes of children depend solely on the outcomes of parents, is not robust against a sib-sib correlation exceeding that specified by the model, emphasizing testing class A against class D. The studies reported here show that if the true state of nature is that described by the mixed model, then a regressive model will do just as well. Moreover, the regressive models, allowing for more patterns of family dependence, provide a flexible framework to understand gene-environment interactions in complex diseases.
Collapse
Affiliation(s)
- F M Demenais
- Division of Biostatistics, Howard University Cancer Center, Washington, D.C
| | | | | |
Collapse
|
29
|
Abstract
The regressive models for the analysis of family data are extended to include cases in which the within-sibship covariation may exceed that implied by the class A regressive model, but for which birth order is not required. In addition to specified major genes, if any, and common parental phenotypes, the excess within-sibship covariation may come from a common cumulative risk from unspecified factors such as a shared environment, and other genes. The within-sibship cumulative risk has a probability distribution in the population. The sib-sib correlation (more generally within-sibship statistical dependence) is equal for all pairs within a given sibship. The compound regressive model is thus a version of the class D regressive model with the property of within-sibship interchangeability. The work is motivated here by comparing and contrasting the Elston-Stewart algorithm and the Morton-MacLean algorithm for the mixed model of inheritance. This points the way to derive practical algorithms for the compound regressive models proposed, with easy extensions to pedigrees of arbitrary structure, and to multilocus problems.
Collapse
Affiliation(s)
- G E Bonney
- Department of Biostatistics, Fox Chase Cancer Center, Philadelphia, Pa. 19111
| |
Collapse
|
30
|
Bonney GE, Amfoh KK, Sherman SL, Keats BJ. An application of empirical Bayes methods to updating linkage information on chromosome 21. Cytogenet Cell Genet 1992; 59:112-3. [PMID: 1737472 DOI: 10.1159/000133217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- G E Bonney
- Division of Biostatistics and Epidemiology, Howard University Cancer Center, Washington, D.C
| | | | | | | |
Collapse
|
31
|
Borecki IB, Lathrop GM, Bonney GE, Yaouanq J, Rao DC. Combined segregation and linkage analysis of genetic hemochromatosis using affection status, serum iron, and HLA. Am J Hum Genet 1990; 47:542-50. [PMID: 2393027 PMCID: PMC1683860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Characterizing the distribution of parameters of iron metabolism by hemochromatosis genotype remains an important goal vis-à-vis potential screening strategies to identify individuals at genetic risk, since a specific marker to detect the abnormal gene has not been identified as yet. In the present investigation, we analyze serum iron values in ascertained families using a method which incorporates both segregation of the clinical affection status and the HLA linkage information to identify the underlying genotypes. The analysis is performed using an extension of the model presented by Bonney et al., comprising regressive models for segregation analysis and the multipoint linkage strategy implemented in LINKAGE. The gene was found to be completely recessive with respect to both clinical manifestations and serum iron abnormalities, with significant differences in expression by sex. Clinical manifestations were present for all male homozygotes in this data set, suggesting that the recessive hemochromatosis genotype is fully penetrant at all ages in males. This was not the case for younger females. Significant genotype-specific age and sex effects were found for serum iron values. It is interesting that deletion of the HLA marker information did not affect our ability to resolve the genetic model when we analyzed a bivariate phenotype. This serves as a reminder that a search for relevant biological markers can be equally important in discerning the genetic etiology of a disease trait, as a search for linked genetic markers.
Collapse
Affiliation(s)
- I B Borecki
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO 63110
| | | | | | | | | |
Collapse
|
32
|
Amos CI, Elston RC, Bonney GE, Keats BJ, Berenson GS. A multivariate method for detecting genetic linkage, with application to a pedigree with an adverse lipoprotein phenotype. Am J Hum Genet 1990; 47:247-54. [PMID: 2378349 PMCID: PMC1683708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The robust or model-free method for detecting linkage developed by Haseman and Elston for data from sib pairs is extended to incorporate observations of multiple traits on each individual. A method is proposed that estimates the linear function that results in the strongest correlation between the squared pair differences in the trait measurements and identity by descent at a marker locus. The method is illustrated by the study of apolipoprotein and cholesterol levels in individuals from a large family that had many members diagnosed with coronary heart disease.
Collapse
Affiliation(s)
- C I Amos
- Family Studies Section, National Cancer Institute, Bethesda, MD 20892
| | | | | | | | | |
Collapse
|
33
|
|
34
|
Abstract
The regressive models describe familial patterns of dependence of quantitative measures by specifying regression relationships among a person's phenotype and genotype and the phenotypes and genotypes of antecedents. When the number of sibs in the pattern of dependence increases, as in the class D regressive model, computation of the likelihood becomes time consuming, since the Elston-Stewart algorithm cannot be used generally. On the other hand, the simpler class A regressive model, which imposes a restriction on the sib-sib correlation, may lead to inference of a spurious major gene, as already observed in some instances. A simulation study is performed to explore the robustness of class A model with respect to false inference of a major gene and to search for faster methods of computing the likelihood under class D model. The class A model is not robust against the presence of a sib-sib correlation exceeding that specified by the model, unless tests on transmission probabilities are performed carefully: false detection of a major gene is reduced from a number of 26-30 to between 0 and 4 data sets out of 30 replicates after testing both the Mendelian transmission and the absence of transmission of a major effect against the general transmission model. Among various approximations of the likelihood formulation of the class D model, approximations 6 and 8 are found to work appropriately in terms of both the estimation of all parameters and hypothesis testing, for each generating model. These approximations lessen the computer time by allowing use of the Elston-Stewart algorithm.
Collapse
Affiliation(s)
- F M Demenais
- Division of Biostatistics and Epidemiology, Howard University Cancer Center, Washington, D.C. 20060
| | | | | |
Collapse
|
35
|
Abstract
The paper presents an extension of the regressive logistic models proposed by Bonney [Biometrics 42:611-625, 1986], to address the problems of variable age-of-onset and time-dependent covariates in analysis of familial diseases. This goal is achieved by using failure time data analysis methods, and partitioning the time of follow up in K mutually exclusive intervals. The conditional probability of being affected within the kth interval (k = 1...K) given not affected before represents the hazard function in this discrete formulation. A logistic model is used to specify a regression relationship between this hazard function and a set of explanatory variables including genotype, phenotypes of ancestors, and other covariates which can be time dependent. The probability that a given person either becomes affected within the kth interval (i.e., interval k includes age of onset of the person) or remains unaffected by the end of the kth interval (i.e., interval k includes age at examination of the person) are derived from the general results of failure time data analysis and used for the likelihood formulation. This proposed approach can be used in any genetic segregation and linkage analysis in which a penetrance function needs to be defined. Application of the method to familial leprosy data leads to results consistent with our previous analysis performed using the unified mixed model [Abel and Demenais, Am J Hum Genet 42:256-266, 1988], i.e., the presence of a recessive major gene controlling susceptibility to leprosy. Furthermore, a simulation study shows the capability of the new model to detect major gene effects and to provide accurate parameter estimates in a situation of complete ascertainment.
Collapse
Affiliation(s)
- L Abel
- Division of Biostatistics, Howard University Cancer Center, Washington, D.C
| | | |
Collapse
|
36
|
Abstract
The mixed model of segregation analysis specifies major gene effects and partitions the residual variance into polygenic and environmental components. The model explains familial correlations essentially in terms of genetic causation. The regressive model, on the other hand, is constructed by successively conditioning on ancestral phenotypes and major genes. Familial patterns of dependence are described in terms of correlations without necessarily introducing a particular scheme of causal relationship. These two approaches are compared both theoretically and numerically through computer simulations for the case of continuous traits on nuclear families. The class D regressive model, which is characterized by equal sib-sib correlations, is mathematically and numerically equivalent to the mixed model. The simpler class A regressive model, which is also characterized by equal sib-sib correlations determined in this case by the common parentage, provides good estimates of the mixed model parameters: major gene parameters and residual polygenic heritability, derived from the parent-offspring correlation. However, in the absence of a major gene, the restriction imposed by the class A model on the sibling correlation can affect the conclusions of segregation analysis: False inference of a major gene was observed in two out of ten replicates. Our simulations also indicate that the mixed model allowing for different heritabilities in adults and children leads to correct estimates of the major gene parameters and residual familial correlations (parent-offspring and sib-sib) as specified by the class A model. For all the models studied, major gene effects, when present, are correctly detected and estimated.
Collapse
Affiliation(s)
- F M Demenais
- Division of Biostatistics and Epidemiology, Howard University Cancer Center, Washington, D.C. 20060
| | | |
Collapse
|
37
|
Abstract
Several methods have been proposed to take into account the variable age of onset of a disease in genetic analysis. A different approach is presented from an etiological point of view. To illustrate the method, we used leprosy, an infectious disease with a variable age of onset depending on both the time of contamination with the bacillus and the latency of the disease; the role of a major gene in the susceptibility to this disease has been recently detected. The age-of-onset function was modeled to account for the two temporal processes: contamination event and incubation period. For genetic analysis, this function was combined with the probability of being susceptible to the disease, which was expressed by the use of regressive models. To test this new approach, ten sets of 500 nuclear families were simulated considering different hypotheses of contamination risks, which were either constant or dependent on contacts with contagious leprosy patients, and varying the extent to which the disease is heritable. Analyses of these data using two versions of the model indicate that the model can detect familial correlations in variable age of onset and discriminate between the different simulated effects.
Collapse
Affiliation(s)
- L Abel
- Unité de Recherche de Génétique Epidémiologique, I.N.S.E.R.M. U.155, Paris, France
| | | | | | | |
Collapse
|
38
|
Abstract
Regressive models are extended to disease phenotypes with two or more affection classes through the use of polychotomous logistic regression. The classes of affection may be ordered (ranked as on a liability continuum), or unordered. Data on affective disorders are used for illustration.
Collapse
Affiliation(s)
- G E Bonney
- Howard University Cancer Center, Washington, D.C. 20060
| | | | | |
Collapse
|
39
|
Bonney GE, Lathrop GM, Lalouel JM. Combined linkage and segregation analysis using regressive models. Am J Hum Genet 1988; 43:29-37. [PMID: 3163888 PMCID: PMC1715288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Regressive models for segregation analysis have been extended to include multivariate data and linked marker loci. The new models have been applied to data from two pedigrees segregating a gene for cardiovascular disease.
Collapse
Affiliation(s)
- G E Bonney
- Division of Biostatistics, Howard University Cancer Center, Washington, DC
| | | | | |
Collapse
|
40
|
Bonney GE. Logistic regression for dependent binary observations. Biometrics 1987; 43:951-73. [PMID: 3427178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The likelihood of a set of binary dependent outcomes, with or without explanatory variables, is expressed as a product of conditional probabilities each of which is assumed to be logistic. The models are called regressive logistic models. They provide a simple but relatively unknown parametrization of the multivariate distribution. They have the theoretical and practical advantage that they can be analyzed and fitted as in logistic regression for independent outcomes, and with the same computer programs. The paper is largely expository and is intended to motivate the development and usage of the regressive logistic models. The discussion includes serially dependent outcomes, equally predictive outcomes, more specialized patterns of dependence, multidimensional tables, and three examples.
Collapse
Affiliation(s)
- G E Bonney
- Division of Biostatistics, Howard University Cancer Center, Washington, D.C. 20060
| |
Collapse
|
41
|
Bonney GE, Elston RC, Correa P, Tannenbaum SR, Haenszel W, Zavala DE, Fontham E, Zarama G, Gordillo G, Cuello C. Genetic etiology of gastric carcinoma: II. Segregation analysis of gastric pH, nitrate, and nitrite. Genet Epidemiol 1987; 4:103-14. [PMID: 3582957 DOI: 10.1002/gepi.1370040205] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
A study of gastric pH, nitrate, and nitrite in 110 families collected as part of a cohort from the Narino region of Colombia is presented. All three traits are familial and have a significant linearly increasing age trend. Gastric pH has a clear bimodal distribution but does not show Mendelian segregation. The nitrate distribution is slightly skewed, but generational heterogeneity explains the data best. Gastric nitrite is also biomodal with a clear break at concentration 1.08 micrograms/ml, and 74% of the observations at zero concentration; it shows a recessive Mendelian segregation with significant residual spouse correlation. This model also fits the data best when nitrite is dichotomized into detected (measurable) and undetected values. The estimated frequency of the recessive allele is .57, so that an estimated 32% of the population sampled are recessives. Recessives whose spouses have measurable nitrite have an estimated penetrance of 99.3% at age 30 years, whereas those whose spouses have zero or undetected nitrite have a penetrance of only 8.8% at age 30 years. It appears that gastric nitrite, and, from our previous study of these families, chronic atrophic gastritis are important biologic markers for the early identification of persons predisposed to gastric cancer.
Collapse
|
42
|
Bonney GE. Regressive logistic models for familial disease and other binary traits. Biometrics 1986; 42:611-25. [PMID: 3567294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
The simple Markovian structures of dependence, defined previously for continuous traits, are extended here to familial disease and other binary traits through the use of the logistic function. The regressive models so formulated can incorporate explanatory variables and major gene effects for segregation and linkage analyses. Thus, the goals of epidemiology and genetics in the analysis of familial disease can be accomplished in the same computational scheme.
Collapse
|
43
|
|
44
|
Bonney GE, Elston RC, Correa P, Haenszel W, Zavala DE, Zarama G, Collazos T, Cuello C. Genetic etiology of gastric carcinoma: I. Chronic atrophic gastritis. Genet Epidemiol 1986; 3:213-24. [PMID: 3744019 DOI: 10.1002/gepi.1370030402] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Scientific evidence has accumulated to show that chronic atrophic gastritis (CAG) is a precursor of gastric carcinoma, especially its intestinal histologic type; thus the etiology of CAG is of interest. Data on 110 families (557 individuals) collected as part of a large cohort from the Narino region of Colombia, South America, are analyzed to determine the familiality of CAG as a risk factor, and the possible involvement of a major gene in its etiology. We found that age and having an affected mother are important risk factors. In the sample, 45% are affected; 56% of individuals above 30 are affected, whereas only 28% of those 30 and under are affected; 48% of those with affected mothers are affected, but only 7% of those with unaffected mothers are affected. A positive spouse association was confounded with age. Sex and an affected father are not significant risk factors. The genetic (segregation) analysis showed Mendelian transmission of a recessive autosomal gene with penetrance dependent on age and mother's CAG status. Homozygous recessives account for an estimated 61% of the sampled population and have penetrance reaching 72% at age 30 if the mother is affected, and 41% if the mother is not affected. Carriers and non-carriers, who make up an estimated 39% of the sampled population, have an appreciable estimated risk after age 50. The environment, particularly diet, as the sole determinant of CAG needs reevaluation; some combined action of genes and environment seems more plausible.
Collapse
|
45
|
|
46
|
Abstract
We consider the analysis of a continuous trait measured on human families and pedigrees to elucidate the mechanism of underlying major genes. Thinking in terms of the naturally Markovian structure of the dependencies in pedigrees rather than variance components, we describe natural classes of regressive models that are computationally feasible. These models can accommodate wide generalizations of the residual variation, and so should provide a stronger basis than other models proposed to date for inferring the segregation and linkage relationships of major genes.
Collapse
|
47
|
|