1
|
Genomic characterisation of hormone receptor-positive breast cancer arising in very young women. Ann Oncol 2023; 34:397-409. [PMID: 36709040 PMCID: PMC10619213 DOI: 10.1016/j.annonc.2023.01.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 12/14/2022] [Accepted: 01/15/2023] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Very young premenopausal women diagnosed with hormone receptor-positive, human epidermal growth factor receptor 2-negative (HR+HER2-) early breast cancer (EBC) have higher rates of recurrence and death for reasons that remain largely unexplained. PATIENTS AND METHODS Genomic sequencing was applied to HR+HER2- tumours from patients enrolled in the Suppression of Ovarian Function Trial (SOFT) to determine genomic drivers that are enriched in young premenopausal women. Genomic alterations were characterised using next-generation sequencing from a subset of 1276 patients (deep targeted sequencing, n = 1258; whole-exome sequencing in a young-age, case-control subsample, n = 82). We defined copy number (CN) subgroups and assessed for features suggestive of homologous recombination deficiency (HRD). Genomic alteration frequencies were compared between young premenopausal women (<40 years) and older premenopausal women (≥40 years), and assessed for associations with distant recurrence-free interval (DRFI) and overall survival (OS). RESULTS Younger women (<40 years, n = 359) compared with older women (≥40 years, n = 917) had significantly higher frequencies of mutations in GATA3 (19% versus 16%) and CN amplifications (CNAs) (47% versus 26%), but significantly lower frequencies of mutations in PIK3CA (32% versus 47%), CDH1 (3% versus 9%), and MAP3K1 (7% versus 12%). Additionally, they had significantly higher frequencies of features suggestive of HRD (27% versus 21%) and a higher proportion of PIK3CA mutations with concurrent CNAs (23% versus 11%). Genomic features suggestive of HRD, PIK3CA mutations with CNAs, and CNAs were associated with significantly worse DRFI and OS compared with those without these features. These poor prognostic features were enriched in younger patients: present in 72% of patients aged <35 years, 54% aged 35-39 years, and 40% aged ≥40 years. Poor prognostic features [n = 584 (46%)] versus none [n = 692 (54%)] had an 8-year DRFI of 84% versus 94% and OS of 88% versus 96%. Younger women (<40 years) had the poorest outcomes: 8-year DRFI 74% versus 85% and OS 80% versus 93%, respectively. CONCLUSION These results provide insights into genomic alterations that are enriched in young women with HR+HER2- EBC, provide rationale for genomic subgrouping, and highlight priority molecular targets for future clinical trials.
Collapse
|
2
|
Abstract PD5-03: Characterization of high TIL breast cancers reveals a prognostic and functionally distinct tissue-resident memory subpopulation. Cancer Res 2019. [DOI: 10.1158/1538-7445.sabcs18-pd5-03] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Background: Tumor infiltrating lymphocytes (TILs) assessed via light microscopy are prognostic and predictive in the early stage and advanced triple negative and HER2-amplified breast cancer (BC). Higher TILs can also identify patients more likely to benefit from anti-PD-1 therapy. In this study we interrogated T cell subsets that comprise high TILs to determine if distinct subpopulations are key mediators of anti-tumor immunity.
Methods: We characterised TILs with a focus on CD3+ T cells in 129 primary and metastatic BC samples using flow cytometry, bulk RNASeq on flow sorted T cell populations, multiplex immunohistochemistry and microdroplet based single cell 3' mRNA sequencing on the 10X Genomics Chromium platform. Cell type specific gene expression signatures were determined from differential expression between putative T cell subpopulations. These signatures were investigated in clinical cohorts, including trial cohorts treated with pembrolizumab.
Results: High TIL Infiltrates consisted primarily of CD3+ T cells, with both CD8 and CD4 populations. Unsupervised clustering of single cell sequencing identified 9 CD8 and CD4 subpopulations with distinct gene expression profiles. In addition to Tregs and CD8 effector memory (TEM) T cells, we found a CD8+ tissue resident memory (TRM) population expressing greater levels of T-cell checkpoints and cytotoxic markers compared to effector memory cells. In 2 primary tumours and 1 liver metastasis, bulk RNASeq of flow sorted TEM and TRM corroborated the single cell mRNASeq results. T cell receptor profiling (TCR) in the 3 samples found non-overlapping repertoires in the 2 primary tumours, but overlap in one metastatic lesion, suggesting divergent developmental origins in the breast, but the potential for nascent TRM differentiation in a metastatic niche. Clustering of these TCRs suggested differing antigen specificities between TRM and non-TRM CD8 T cells. Using Metabric data, the CD8 TRM gene expression signature was prognostic for disease free survival (DFS) in primary TNBCs (n=329, log-rank p=0.003), and was able to further stratify cases with high and low CD8A expression for DFS (log-rank p = 0.03). The CD8 TRM signature was enriched in baseline tumour samples of responders (n = 9) compared with non-responders (n=36) in 45 patients with metastatic melanoma treated with T cell checkpoint blockade (p < 0.0001). Additional single cell sequencing data with TCR sequencing will be combined with these initial results, and an independent data set of single cell mRNASeq and TCR Seq on CD3+ BC TILs will be used to confirm our findings. Cell type specific signatures will be explored in additional clinical cohorts including KEYNOTE-086, and presented at the meeting.
Conclusion: Using single cell profiling of the immune microenvironment in BC we demonstrate that high TIL BCs contain multiple T cell subpopulations with different functional and prognostic significance. Our approach identified a CD8 TRM population with a distinct gene expression profile and strong expression of key immune checkpoints likely representing the presence of true tumor specific immunity. This population may be a key target of immune checkpoint blockade.
Citation Format: Savas P, Virassamy B, Ye C, Salim A, Mintoff CP, Caramia F, Salgado R, Teo ZL, Dushyanthen S, Byrne A, Luen SJ, Fox SB, Speed TP, Mackay LK, Neeson PJ, Loi S. Characterization of high TIL breast cancers reveals a prognostic and functionally distinct tissue-resident memory subpopulation [abstract]. In: Proceedings of the 2018 San Antonio Breast Cancer Symposium; 2018 Dec 4-8; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2019;79(4 Suppl):Abstract nr PD5-03.
Collapse
|
3
|
Distinct epigenetic signatures delineate transcriptional programs during virus-specific CD8(+) T cell differentiation. Immunity 2014; 41:853-65. [PMID: 25517617 DOI: 10.1016/j.immuni.2014.11.001] [Citation(s) in RCA: 171] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Accepted: 10/07/2014] [Indexed: 02/06/2023]
Abstract
The molecular mechanisms that regulate the rapid transcriptional changes that occur during cytotoxic T lymphocyte (CTL) proliferation and differentiation in response to infection are poorly understood. We have utilized ChIP-seq to assess histone H3 methylation dynamics within naive, effector, and memory virus-specific T cells isolated directly ex vivo after influenza A virus infection. Our results show that within naive T cells, codeposition of the permissive H3K4me3 and repressive H3K27me3 modifications is a signature of gene loci associated with gene transcription, replication, and cellular differentiation. Upon differentiation into effector and/or memory CTLs, the majority of these gene loci lose repressive H3K27me3 while retaining the permissive H3K4me3 modification. In contrast, immune-related effector gene promoters within naive T cells lacked the permissive H3K4me3 modification, with acquisition of this modification occurring upon differentiation into effector/memory CTLs. Thus, coordinate transcriptional regulation of CTL genes with related functions is achieved via distinct epigenetic mechanisms.
Collapse
|
4
|
Abstract
As bona fide p53 transcriptional targets, miR-34 microRNAs (miRNAs) exhibit frequent alterations in many human tumor types and elicit multiple p53 downstream effects upon overexpression. Unexpectedly, miR-34 deletion alone fails to impair multiple p53-mediated tumor suppressor effects in mice, possibly due to the considerable redundancy in the p53 pathway. Here, we demonstrate that miR-34a represses HDM4, a potent negative regulator of p53, creating a positive feedback loop acting on p53. In a Kras-induced mouse lung cancer model, miR-34a deficiency alone does not exhibit a strong oncogenic effect. However, miR-34a deficiency strongly promotes tumorigenesis when p53 is haploinsufficient, suggesting that the defective p53-miR-34 feedback loop can enhance oncogenesis in a specific context. The importance of the p53/miR-34/HDM4 feedback loop is further confirmed by an inverse correlation between miR-34 and full-length HDM4 in human lung adenocarcinomas. In addition, human lung adenocarcinomas generate an elevated level of a short HDM4 isoform through alternative polyadenylation. This short HDM4 isoform lacks miR-34-binding sites in the 3' untranslated region (UTR), thereby evading miR-34 regulation to disable the p53-miR-34 positive feedback. Taken together, our results elucidated the intricate cross-talk between p53 and miR-34 miRNAs and revealed an important tumor suppressor effect generated by this positive feedback loop.
Collapse
|
5
|
G protein-linked signaling pathways in bipolar and major depressive disorders. Front Genet 2013; 4:297. [PMID: 24391664 PMCID: PMC3870297 DOI: 10.3389/fgene.2013.00297] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Accepted: 12/05/2013] [Indexed: 01/03/2023] Open
Abstract
The G-protein linked signaling system (GPLS) comprises a large number of G-proteins, G protein-coupled receptors (GPCRs), GPCR ligands, and downstream effector molecules. G-proteins interact with both GPCRs and downstream effectors such as cyclic adenosine monophosphate (cAMP), phosphatidylinositols, and ion channels. The GPLS is implicated in the pathophysiology and pharmacology of both major depressive disorder (MDD) and bipolar disorder (BPD). This study evaluated whether GPLS is altered at the transcript level. The gene expression in the dorsolateral prefrontal (DLPFC) and anterior cingulate (ACC) were compared from MDD, BPD, and control subjects using Affymetrix Gene Chips and real time quantitative PCR. High quality brain tissue was used in the study to control for confounding effects of agonal events, tissue pH, RNA integrity, gender, and age. GPLS signaling transcripts were altered especially in the ACC of BPD and MDD subjects. Transcript levels of molecules which repress cAMP activity were increased in BPD and decreased in MDD. Two orphan GPCRs, GPRC5B and GPR37, showed significantly decreased expression levels in MDD, and significantly increased expression levels in BPD. Our results suggest opposite changes in BPD and MDD in the GPLS, “activated” cAMP signaling activity in BPD and “blunted” cAMP signaling activity in MDD. GPRC5B and GPR37 both appear to have behavioral effects, and are also candidate genes for neurodegenerative disorders. In the context of the opposite changes observed in BPD and MDD, these GPCRs warrant further study of their brain effects.
Collapse
|
6
|
Colon tumour secretopeptidome: insights into endogenous proteolytic cleavage events in the colon tumour microenvironment. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:2396-407. [PMID: 23684732 DOI: 10.1016/j.bbapap.2013.05.006] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Revised: 04/26/2013] [Accepted: 05/08/2013] [Indexed: 12/20/2022]
Abstract
The secretopeptidome comprises endogenous peptides derived from proteins secreted into the tumour microenvironment through classical and non-classical secretion. This study characterised the low-Mr (<3kDa) component of the human colon tumour (LIM1215, LIM1863) secretopeptidome, as a first step towards gaining insights into extracellular proteolytic cleavage events in the tumour microenvironment. Based on two biological replicates, this secretopeptidome isolation strategy utilised differential centrifugal ultrafiltration in combination with analytical RP-HPLC and nanoLC-MS/MS. Secreted peptides were identified using a combination of Mascot and post-processing analyses including MSPro re-scoring, extended feature sets and Percolator, resulting in 474 protein identifications from 1228 peptides (≤1% q-value, ≤5% PEP) - a 36% increase in peptide identifications when compared with conventional Mascot (homology ionscore thresholding). In both colon tumour models, 122 identified peptides were derived from 41 cell surface protein ectodomains, 23 peptides (12 proteins) from regulated intramembrane proteolysis (RIP), and 12 peptides (9 proteins) generated from intracellular domain proteolysis. Further analyses using the protease/substrate database MEROPS, (http://merops.sanger.ac.uk/), revealed 335 (71%) proteins classified as originating from classical/non-classical secretion, or the cell membrane. Of these, peptides were identified from 42 substrates in MEROPS with defined protease cleavage sites, while peptides generated from a further 205 substrates were fragmented by hitherto unknown proteases. A salient finding was the identification of peptides from 88 classical/non-classical secreted substrates in MEROPS, implicated in tumour progression and angiogenesis (FGFBP1, PLXDC2), cell-cell recognition and signalling (DDR1, GPA33), and tumour invasiveness and metastasis (MACC1, SMAGP); the nature of the proteases responsible for these proteolytic events is unknown. To confirm reproducibility of peptide fragment abundance in this study, we report the identification of a specific cleaved peptide fragment in the secretopeptidome from the colon-specific GPA33 antigen in 4/14 human CRC models. This improved secretopeptidome isolation and characterisation strategy has extended our understanding of endogenous peptides generated through proteolysis of classical/non-classical secreted proteins, extracellular proteolytic processing of cell surface membrane proteins, and peptides generated through RIP. The novel peptide cleavage site information in this study provides a useful first step in detailing proteolytic cleavage associated with tumourigenesis and the extracellular environment. This article is part of a Special Issue entitled: An Updated Secretome.
Collapse
|
7
|
A DNA resequencing array for genes involved in Parkinson's disease. Parkinsonism Relat Disord 2012; 18:386-90. [PMID: 22243833 DOI: 10.1016/j.parkreldis.2011.12.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/03/2011] [Revised: 11/25/2011] [Accepted: 12/20/2011] [Indexed: 02/03/2023]
Abstract
Parkinson's disease (PD) is aetiologically complex with both familial and sporadic forms. Familial PD results from rare, highly penetrant pathogenic mutations whereas multiple variants of low penetrance may contribute to the risk of sporadic PD. Common variants implicated in PD risk appear to explain only a minor proportion of the familial clustering observed in sporadic PD. It is therefore plausible that combinations of rare and/or common variants in genes already implicated in disease pathogenesis may help to explain the genetic basis of PD. We have developed a CustomSeq Affymetrix resequencing array to enable high-throughput sequencing of 13 genes (44 kb) implicated in the pathogenesis of PD. Using the array we sequenced 269 individuals, including 186 PD patients and 75 controls, achieving an overall call rate of 96.5% and 93.6%, for two respective versions of the array, and >99.9% accuracy for five samples sequenced by capillary sequencing in parallel. We identified modest associations with common variants in SNCA and LRRK2 and a trend suggestive of an overrepresentation of rare variants in cases compared to controls for several genes. We propose that this technology offers a robust and cost-effective alternative to targeted sequencing using traditional sequencing methods, and here we demonstrate the potential of this approach for either routine clinical investigation or for research studies aimed at understanding the genetic aetiology of PD.
Collapse
|
8
|
FIRMA: a method for detection of alternative splicing from exon array data. Bioinformatics 2008; 24:1707-14. [PMID: 18573797 PMCID: PMC2638867 DOI: 10.1093/bioinformatics/btn284] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2008] [Revised: 05/18/2008] [Accepted: 06/06/2008] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Analyses of EST data show that alternative splicing is much more widespread than once thought. The advent of exon and tiling microarrays means that researchers now have the capacity to experimentally measure alternative splicing on a genome wide level. New methods are needed to analyze the data from these arrays. RESULTS We present a method, finding isoforms using robust multichip analysis (FIRMA), for detecting differential alternative splicing in exon array data. FIRMA has been developed for Affymetrix exon arrays, but could in principle be extended to other exon arrays, tiling arrays or splice junction arrays. We have evaluated the method using simulated data, and have also applied it to two datasets: a panel of 11 human tissues and a set of 10 pairs of matched normal and tumor colon tissue. FIRMA is able to detect exons in several genes confirmed by reverse transcriptase PCR. AVAILABILITY R code implementing our methods is contributed to the package aroma.affymetrix.
Collapse
|
9
|
Replication of KIAA0350, IL2RA, RPL5 and CD58 as multiple sclerosis susceptibility genes in Australians. Genes Immun 2008; 9:624-30. [PMID: 18650830 DOI: 10.1038/gene.2008.59] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A recent genome-wide association study (GWAS) conducted by the International Multiple Sclerosis Genetics Consortium (IMSGC) identified a number of putative MS susceptibility genes. Here we have performed a replication study in 1134 Australian MS cases and 1265 controls for 17 risk-associated single nucleotide polymorphisms (SNPs) reported by the IMSGC. Of 16 SNPs that passed quality control filters, four, each corresponding to a different non-human leukocyte antigen (HLA) gene, were associated with disease susceptibility: KIAA0350 (rs6498169) P=0.001, IL2RA (rs2104286) P=0.033, RPL5 (rs6604026) P=0.041 and CD58 (rs12044852) P=0.042. There was no association (P=0.58) between rs6897932 in the IL7R gene and the risk of MS. No interactions were detected between the replicated IMSGC SNPs and HLA-DRB1*15, gender, disease course, disease progression or age-at-onset. We used a novel Bayesian approach to estimate the extent to which our data increased or decreased evidence for association with the six most-associated IMSGC loci. These analyses indicated that even modest P-values, such as those reported here, can contribute markedly to the posterior probability of 'true' association in replication studies. In conclusion, these data provide support for the involvement of four non-HLA genes in the pathogenesis of MS, and combined with previous data, increase to genome-wide significance (P=3 x 10(-8)) evidence of an association between KIAA0350 and risk of disease.
Collapse
|
10
|
Abstract
MOTIVATION Although copy-number aberrations are known to contribute to the diversity of the human DNA and cause various diseases, many aberrations and their phenotypes are still to be explored. The recent development of single-nucleotide polymorphism (SNP) arrays provides researchers with tools for calling genotypes and identifying chromosomal aberrations at an order-of-magnitude greater resolution than possible a few years ago. The fundamental problem in array-based copy-number (CN) analysis is to obtain CN estimates at a single-locus resolution with high accuracy and precision such that downstream segmentation methods are more likely to succeed. RESULTS We propose a preprocessing method for estimating raw CNs from Affymetrix SNP arrays. Its core utilizes a multichip probe-level model analogous to that for high-density oligonucleotide expression arrays. We extend this model by adding an adjustment for sequence-specific allelic imbalances such as cross-hybridization between allele A and allele B probes. We focus on total CN estimates, which allows us to further constrain the probe-level model to increase the signal-to-noise ratio of CN estimates. Further improvement is obtained by controlling for PCR effects. Each part of the model is fitted robustly. The performance is assessed by quantifying how well raw CNs alone differentiate between one and two copies on Chromosome X (ChrX) at a single-locus resolution (27kb) up to a 200kb resolution. The evaluation is done with publicly available HapMap data. AVAILABILITY The proposed method is available as part of an open-source R package named aroma.affymetrix. Because it is a bounded-memory algorithm, any number of arrays can be analyzed.
Collapse
|
11
|
SNP mapping and candidate gene sequencing in the class I region of the HLA complex: searching for multiple sclerosis susceptibility genes in Tasmanians. ACTA ACUST UNITED AC 2007; 71:42-50. [PMID: 17971048 DOI: 10.1111/j.1399-0039.2007.00962.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
This study is an extension to previously published work that has linked variation in the human leukocyte antigen (HLA) class I region with susceptibility to multiple sclerosis (MS) in Australians from the Island State of Tasmania. Single nucleotide polymorphism (SNP) mapping was performed on an 865-kb candidate region (D6S1683-D6S265) in 166 Tasmanian MS families, and seven candidate genes [ubiquitin D (UBD), olfactory receptor 2H3 (OR2H3), gamma-aminobutyric acid B receptor 1 (GABBR1), myelin oligodendrocyte glycoprotein (MOG), HLA-F, HLA complex group 4 (HCG4) and HLA-G] were resequenced. SNPs tagging the extended MS susceptibility haplotype were genotyped in an independent sample of 356 Australian MS trios and SNPs in the MOG gene were significantly over-transmitted to MS cases. We identified significant effects on MS susceptibility of HLA-A*2 (OR: 0.51; P = 0.05) and A*3 (OR: 2.85; P = 0.005), and two coding polymorphisms in the MOG gene (V145I: P = 0.01, OR: 2.2; V142L: P = 0.04, OR: 0.45) after full conditioning on HLA-DRB1. We have therefore identified plausible candidates for the causal MS susceptibility allele, and although not conclusive at this stage, our data provide suggestive evidence for multiple class I MS susceptibility genes.
Collapse
|
12
|
A genetic screen for behavioral mutations that perturb dopaminergic homeostasis in mice. GENES BRAIN AND BEHAVIOR 2006; 5:19-28. [PMID: 16436185 DOI: 10.1111/j.1601-183x.2005.00127.x] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Disruption of dopaminergic (DA) systems is thought to play a central role in the addictive process and in the pathophysiology of schizophrenia. Although inheritance plays an important role in the predisposition to these disorders, the genetic basis of this is not well understood. To provide additional insight, we have performed a modifier screen in mice designed to identify mutations that perturb DA homeostasis. With a genetic background sensitized by a mutation in the dopamine transporter (DAT), we used random chemical mutagenesis and screened for mutant mice with locomotor abnormalities. Four mutant lines were identified with quantitatively elevated levels of locomotor activity. Mapping of mutations in these lines identified two loci that alter activity only when dopamine levels are elevated by a DAT mutation and thus would only have been uncovered by this type of approach. One of these quantitative trait loci behaves as an enhancer of DA neurotransmission, whereas the other may act as a suppressor. In addition, we also identified three loci which are not dependent on the sensitized background but which also contribute to the overall locomotor phenotype.
Collapse
|
13
|
Abstract
The relaxin-like peptide family consists of relaxin-1, relaxin-2, and relaxin-3 and the insulin-like peptides (INSL)-3, INSL4, INSL5, and INSL6 (human relaxin-2 is equivalent to relaxin-1 in other species). Evolution of this family has been contentious. We therefore sought to clarify the issue by performing phylogenetic analysis of all relaxin-like peptides from the genomic databases available. Surprisingly, the phylogeny, combined with previous biologic characterizations, suggest that although relaxin's original function was likely in the brain, its reproductive role was acquired just prior to the divergence of amphibians. This phylogeny also illuminates inconsistencies in relaxin evolution in invertebrates, chickens, and cows.
Collapse
|
14
|
Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 2005; 33:e175. [PMID: 16284200 PMCID: PMC1283542 DOI: 10.1093/nar/gni179] [Citation(s) in RCA: 1417] [Impact Index Per Article: 74.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease. The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip. However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge. The resultant informatics problems have a profound impact on analysis and interpretation the data. Here, we address these critical issues and offer a solution. We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design. We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information. Comparing analysis results between the original and the redefined probe sets reveals approximately 30-50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method. Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions.
Collapse
|
15
|
A comparison of match-only algorithms for the analysis of Plasmodium falciparum oligonucleotide arrays. Int J Parasitol 2005; 35:523-31. [PMID: 15826644 DOI: 10.1016/j.ijpara.2005.02.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2004] [Revised: 01/19/2005] [Accepted: 02/06/2005] [Indexed: 01/04/2023]
Abstract
This study is motivated by two data sets which employ a custom Plasmodium falciparum version of the Affymetrix GeneChip, containing only perfect match (PM) oligonucleotides. A PM-only chip cannot be analysed using the standard Affymetrix-supplied software. We compared the performance of three match-only algorithms on these data: the Match Only Integral Distribution (MOID) algorithm, Robust Multichip Analysis (RMA), and the Model Based Expression Index (MBEI). We validated the differential expression of several genes using quantitative reverse transcriptase-PCR. We also performed a comparison using two publicly available 'benchmarking' data sets: the Latin Square spike-in data set generated by Affymetrix, and the Gene Logic dilution series. Since we know what the true fold changes are in these special data sets, they are helpful for assessment of expression algorithms.
Collapse
|
16
|
Experimental Design and Low-Level Analysis of Microarray Data. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2004; 60:25-58. [PMID: 15474586 DOI: 10.1016/s0074-7742(04)60002-x] [Citation(s) in RCA: 96] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
17
|
Deriving statistical models for predicting peptide tandem MS product ion intensities. Biochem Soc Trans 2003; 31:1479-83. [PMID: 14641094 DOI: 10.1042/bst0311479] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Improved search algorithms and scoring functions are required before the identification of peptide tandem MS data can be considered to be fully reliable and automatable. The development of models that can accurately predict product ion spectra from a peptide sequence would certainly help achieve this goal, but this firstly requires a better understanding of the process of fragmentation of peptides in the gas-phase. We summarize recent developments in this area and show that the prediction of product ion spectra is feasible and should improve the identification of peptide tandem MS data, especially for peptides that currently give low or insignificant scores with current search algorithms.
Collapse
|
18
|
A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003; 19:185-93. [PMID: 12538238 DOI: 10.1093/bioinformatics/19.2.185] [Citation(s) in RCA: 6048] [Impact Index Per Article: 288.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. RESULTS We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. AVAILABILITY Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. SUPPLEMENTARY INFORMATION Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html
Collapse
|
19
|
|
20
|
Abstract
We describe and assess the performance of the gene finding program pretty handy annotation tool (Phat) on sequence from the malaria parasite Plasmodium falciparum. Phat is based on a generalized hidden Markov model (GHMM) similar to the models used in GENSCAN, Genie and HMMgene. In a test set of 44 confirmed gene structures Phat achieves nucleotide-level sensitivity and specificity of greater than 95%, performing as well as the other P. falciparum gene finding programs Hexamer and GlimmerM. Phat is particularly useful for P. falciparum and other eukaryotes for which there are few gene finding programs available as it is distributed with code for retraining it on new organisms. Moreover, the full source code is freely available under the GNU General Public License, allowing for users to further develop and customize it.
Collapse
|
21
|
Abstract
Microarrays are part of a new class of biotechnologies that allow the monitoring of expression levels for thousands of genes simultaneously. Image analysis is an important aspect of microarray experiments, one that can have a potentially large impact on subsequent analyses, such as clustering or the identification of differentially expressed genes. This paper reviews a number of existing image analysis methods used on cDNA microarray data. In particular, it describes and discusses the different segmentation and background adjustment methods. It was found that in some cases background adjustment can substantially reduce the precision--that is, increase the variability of low-intensity spot values. In contrast, the choice of segmentation procedure seems to have a smaller impact.
Collapse
|
22
|
Abstract
The use of transposons offers the possibility of a directed approach to DNA sequencing, where a target DNA up to about 6kb in length can be sequenced quickly and with minimal redundancy. Transposons are mobile DNA elements which can be inserted in a reasonably random fashion into the target DNA. An important part of this process is the location of the transposon insertions (known as mapping) and the selection of a sensible subset of transposons to use as priming sites for sequencing reactions. This paper presents a probabilistic method of scoring selected subsets of transposons and a graph-theoretic algorithm for selection of a subset of maximal score.
Collapse
|
23
|
Power and robustness of a score test for linkage analysis of quantitative traits using identity by descent data on sib pairs. Genet Epidemiol 2001; 20:415-31. [PMID: 11319783 DOI: 10.1002/gepi.1011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Identification of genes involved in complex traits by traditional (lod score) linkage analysis is difficult due to many complicating factors. An unfortunate drawback of non-parametric procedures in general, though, is their low power to detect genetic effects. Recently, Dudoit and Speed [2000] proposed using a (likelihood-based) score test for detecting linkage with IBD data on sib pairs. This method uses the likelihood for theta, the recombination fraction between a trait locus and a marker locus, conditional on the phenotypes of the two sibs to test the null hypothesis of no linkage (theta = (1/2)). Although a genetic model must be specified, the approach offers several advantages. This paper presents results of simulation studies characterizing the power and robustness properties of this score test for linkage, and compares the power of the test to the Haseman-Elston and modified Haseman-Elston tests. The score test is seen to have impressively high power across a broad range of true and assumed models, particularly under multiple ascertainment. Assuming an additive model with a moderate allele frequency, in the range of p = 0.2 to 0.5, along with heritability H = 0.3 and a moderate residual correlation rho = 0.2 resulted in a very good overall performance across a wide range of trait-generating models. Generally, our results indicate that this score test for linkage offers a high degree of protection against wrong assumptions due to its strong robustness when used with the recommended additive model.
Collapse
|
24
|
Abstract
The score test of Dudoit and Speed [(2000) Biostatistics 1:1-26] to detect linkage between a trait locus and a marker locus, using identity by descent data on sib pairs, is extended to other types of relative pairs (grandparent/grandchild, avuncular, and half-sib relationships). The test is based on the likelihood of the recombination fraction theta between trait and marker loci, conditional on phenotypes of the relatives. We present results of simulation studies characterizing power and robustness properties of this linkage score test, and compare the power of the score test to that of the classical and modified Haseman-Elston tests. The score test has considerable power, particularly under sampling schemes where selection is on double probands. Use of a generic additive model [Goldstein et al., submitted] with allele frequency p = 0.2, heritability H = 0.3, and a moderate residual correlation of rho = 0.2 resulted in a very good overall performance across a wide range of trait-generating models.
Collapse
|
25
|
Abstract
Based on the assumption that severe alterations in the expression of genes known to be involved in high-density lipoprotein (HDL) metabolism may affect the expression of other genes, we screened an array of >5000 mouse expressed sequence tags for altered gene expression in the livers of two lines of mice with dramatic decreases in HDL plasma concentrations. Labeled cDNA from livers of apolipoprotein AI (apoAI)-knockout mice, scavenger receptor BI (SR-BI) transgenic mice, and control mice were cohybridized to microarrays. Two-sample t statistics were used to identify genes with altered expression levels in the knockout or transgenic mice compared with control mice. In the SR-BI group we found nine array elements representing at least five genes that were significantly altered on the basis of an adjusted P value < 0.05. In the apoAI-knockout group, eight array elements representing four genes were altered compared with the control group (adjusted P < 0.05). Several of the genes identified in the SR-BI transgenic suggest altered sterol metabolism and oxidative processes. These studies illustrate the use of multiple-testing methods for the identification of genes with altered expression in replicated microarray experiments.
Collapse
|
26
|
The current state of multiple sclerosis genetic research. ANNALS OF THE ACADEMY OF MEDICINE, SINGAPORE 2000; 29:322-30. [PMID: 10976385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
INTRODUCTION Multiple sclerosis (MS) is the most common genetic disease of the nervous system with onset usually in young adulthood. Four genome-wide searches in different Caucasian populations for MS susceptibility loci have been performed, but none reported any linkage at a level that would be regarded as significant according to current criteria. Significant linkage of MS to allelic variants of the major histocompatibility (MHC) locus on chromosome 6p21 has been established although its overall contribution to MS susceptibility has proven difficult to quantify. The objective of this review is not only to provide the reader with an update of MS genetics research, but also to provide a basic knowledge of the techniques being employed to map MS susceptibility genes. The different methodologies are discussed, and specific studies are reviewed in context. METHODS This review is based on findings from original articles, however, the results of recent candidate gene studies are intended to update previous review articles. RESULTS There remains no concrete non-MHC locus for MS, although there are enough findings of sufficient interest to warrant further investigation and optimism. Stratification of genome scan data based on MHC class II suggests that it interacts differentially with non-MHC loci and that it contributes moderately to disease susceptibility. Candidate gene studies have continued to return negative and ambiguous results, and follow-up fine mapping of suggestive linkages from the UK genome scan has proven unsuccessful in identifying significant linkages. Genetic analysis of crosses between mouse strains that are differentially susceptible to experimental allergic encephalomyelitis (EAE) has yielded linkages corresponding to putative MS susceptibility loci. However, recent successes in transgenic mice may provide an alternative to EAE, regarded by some as a poor model of MS. CONCLUSION The first whole genome search for a common human disease was performed over five years ago, and it is now clear, from the lack success in this field, that the genetic complexity of these traits has been underestimated. The genome-wide searches for MS susceptibility genes have suffered from insufficient statistical power, which has probably been compounded by disease and genetic heterogeneity. Studies in isolated populations and better laboratory and clinical definitions of disease are both steps in the right direction to solving these problems. Not withstanding the negative effects of genetic heterogeneity, pooling of resources for meta-analyses may provide the increase in statistical power required for detection of loci that exert a moderate or small effect on disease predisposition.
Collapse
|
27
|
A score test for the linkage analysis of qualitative and quantitative traits based on identity by descent data from sib-pairs. Biostatistics 2000; 1:1-26. [PMID: 12933522 DOI: 10.1093/biostatistics/1.1.1] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We propose a general likelihood-based approach to the linkage analysis of qualitative and quantitative traits using identity by descent (IBD) data from sib-pairs. We consider the likelihood of IBD data conditional on phenotypes and test the null hypothesis of no linkage between a marker locus and a gene influencing the trait using a score test in the recombination fraction theta between the two loci. This method unifies the linkage analysis of qualitative and quantitative traits into a single inferential framework, yielding a simple and intuitive test statistic. Conditioning on phenotypes avoids unrealistic random sampling assumptions and allows sib-pairs from differing ascertainment mechanisms to be incorporated into a single likelihood analysis. In particular, it allows the selection of sib-pairs based on their trait values and the analysis of only those pairs having the most informative phenotypes. The score test is based on the full likelihood, i.e. the likelihood based on all phenotype data rather than just differences of sib-pair phenotypes. Considering only phenotype differences, as in Haseman and Elston (1972) and Kruglyak and Lander (1995), may result in important losses in power. The linkage score test is derived under general genetic models for the trait, which may include multiple unlinked genes. Population genetic assumptions, such as random mating or linkage equilibrium at the trait loci, are not required. This score test is thus particularly promising for the analysis of complex human traits. The score statistic readily extends to accommodate incomplete IBD data at the test locus, by using the hidden Markov model implemented in the programs MAPMAKER/SIBS and GENEHUNTER (Kruglyak and Lander, 1995; Kruglyak et al., 1996). Preliminary simulation studies indicate that the linkage score test generally matches or outperforms the Haseman-Elston test, the largest gains in power being for selected samples of sib-pairs with extreme phenotypes.
Collapse
|
28
|
Abstract
As in other infectious diseases, the outcome of a Leishmania major infection is closely tied to the T helper cell response type; progressive disease is associated with a predominant Th2 lymphocyte response, healing with a Th1 response. In mice, susceptibility is genetically con trolled, with BALB/c (C) mice being susceptible and C57BL/6 (B) mice being resistant. Using a genome-wide scan on two large populations of F2 mice created from these strains, we have shown previously that susceptibility to infection with L. major is controlled by two autosomal loci: lmr1 at the H2 locus, and lmr2 on chromosome 9. Employing a strategy to identify loci that interact, we show here that lmr1 and lmr2 interact synergistically, and we describe a new locus lmr3, lying on the X chromosome, whose effect depends on a specific lmr1 haplotype.
Collapse
|
29
|
Abstract
Color separation is an essential step of the data processing in the four-dye fluorescence detection strategy used in automated DNA sequencing. In this paper, we propose a model to describe the crosstalk phenomenon, and show how the assumptions of the model are supported by experimental data. The crosstalk matrix is estimated via a reparameterization based on a mapping between the distribution of fluorescence intensities and that of dye concentrations. An iterative algorithm is designed to implement the estimation. To evaluate the color-correction quality of a crosstalk matrix, we propose a quantitative measure based on the distribution of the color-corrected data. We illustrate this method by applying it to a sequencing trace of slab gel electrophoresis obtained at the Human Genome Center at Lawrence Berkeley National Laboratory, and that of capillary electrophoresis provided by the Department of Chemistry at UC, Berkeley. The accuracy of this method is also assessed by the bootstrap method.
Collapse
|
30
|
Abstract
In the paper by Goldstein et al. (Genomics, 1995), the authors carried out a simulation study to investigate the relative efficiencies of a no interference linkage analysis to an analysis with certain models that allow for interference. They showed that, for completely informative and independent recombination data, the analysis with the no interference model was inefficient, in the present of interference. In practice, the assumption of completely informative markers is unrealistic with data from human pedigrees. We report the results of a study investigating whether this conclusion still holds for gametes arising within pedigrees. We consider the same two mapping problems as Goldstein et al.: exclusion mapping and gene ordering. The results obtained were consistent with their findings, although the efficiency gains for analyses using the chi-square model were not as great in some cases. This is not unexpected with less than fully informative data. These results point to the need for research of developing new statistical and computational methods to incorporate interference into multipoint linkage mapping using pedigree data. This would make efficient use of available, but sometimes scarce data, especially in disease gene mapping.
Collapse
|
31
|
Abstract
Half-tetrads, where two meiotic products from a single meiosis are recovered together, arise in different forms in a variety of organisms. Closely related to ordered tetrads, half-tetrads yield information on chromatid interference, chiasma interference, and centromere positions. In this article, for different half-tetrad types and different marker configurations, we derive the relations between multilocus half-tetrad probabilities and multilocus ordered tetrad probabilities. These relations are used to obtain equality and inequality constraints among multilocus half-tetrad probabilities that are imposed by the assumption of no chromatid interference. We illustrate how to apply these results to study chiasma interference and to map centromeres using multilocus half-tetrad data.
Collapse
|
32
|
Abstract
Ordered tetrad data yield information on chromatid interference, chiasma interference, and centromere locations. In this article, we show that the assumption of no chromatid interference imposes certain constraints on multilocus ordered tetrad probabilities. Assuming no chromatid interference, these constraints can be used to order markers under general chiasma processes. We also derive multilocus tetrad probabilities under a class of chiasma interference models, the chi-square models. Finally, we compare centromere map functions under the chi-square models with map functions proposed in the literature. Results in this article can be applied to order genetic markers and map centromeres using multilocus ordered tetrad data.
Collapse
|
33
|
A three-wavelength labeling approach for DNA sequencing using energy transfer primers and capillary electrophoresis. Electrophoresis 1998; 19:1403-14. [PMID: 9694290 DOI: 10.1002/elps.1150190835] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Capillary electrophoresis DNA sequencing has been accomplished by using four different energy transfer primers and three fluorescence detection channels. Methods have also been developed to deconvolve the three-color data into the four base concentrations. The nonnegative least squares and model selection method resulted in the best accuracy. The three-color data were compared to sequencing data obtained using four detection channels and four energy transfer primers. The average accuracy rates obtained over three 500 base M13mp18 runs using three-color coding were 96% including 18 uncallable compressions and 99.6% if these compressions are excluded. The average accuracy rate obtained using four-color coding was 96.3% including 18 uncallable compressions and 99.9% if these compressions are excluded. Although it is unlikely that three-color schemes will replace four-color sequencing, these methods have exposed basic concepts that will be useful in the development of higher-order multiplex coding methods for DNA analysis.
Collapse
|
34
|
Abstract
This paper proposes an algorithm for haplotype analysis based on a Monte Carlo method. Haplotype configurations are generated according to the distribution of joint haplotypes of individuals in a pedigree given their phenotype data, via a Markov chain Monte Carlo algorithm. The haplotype configuration which maximizes this conditional probability distribution can thus be estimated. In addition, the set of haplotype configurations with relatively high probabilities can also be estimated as possible alternatives to the most probable one. This flexibility enables geneticists to choose the haplotype configurations which are most reasonable to them, allowing them to include their knowledge of the data under analysis.
Collapse
|
35
|
Abstract
Analysis of linkage data has typically been carried out assuming genotyping errors are absent. Recent studies have shown, however, that the impact of ignoring genotyping errors can be great, especially in dense marker maps [Buetow, Am J Hum Genet 1991; 49:985-994; Lincoln and Lander, Genomics 1992; 14:604-610]. Because most organisms exhibit positive chiasma interference, we use the chi 2 model [Foss et al., Genetics 1993; 144:681-691] to examine the role interference plays in the estimation of genetic distance in the presence of genotyping errors. For simplicity, we confine our analyses to samples of 1,000 fully informative gametes. Our results support previous findings that ignoring errors inflates distance estimates. The larger the error rate, the greater the inflation. For a given error rate, the relative error in estimated genetic distance is greatest when interference is known to be weak or absent. An approximation to relative error which quantifies the relation to distance, error rate, and interference is provided. Robustness of estimation to error misspecification is also investigated. When the assumed error rate is too low, distance is overestimated while interference is underestimated. The situation is reversed when too large an error rate is assumed (interference is overestimated, and distance underestimated). Unfortunately, the joint estimation of distance and interference is not very robust to error misspecification.
Collapse
|
36
|
Abstract
Various random fingerprinting methods are sometimes used to detect overlap between pairs of clones as a first step toward producing a minimal tiling path of clones for subsequent mapping and sequencing efforts. This paper evaluates and compares various statistical procedures for detecting pairwise overlap between clones when the fingerprints arise from any random process meeting simple, plausible assumptions about the relationship between overlap and the resulting fingerprint. Examples of such random processes include, but are not limited to, large-scale hybridization procedures designed to prepare tiling paths of clones for subsequent large-scale genomic sequencing. Our goals are to assess how well random fingerprinting can possibly detect overlap, to assess the effects of inevitable fingerprinting errors on statistical detection, to determine how one can make the best use of the data random fingerprinting provides, and to evaluate how well simple, heuristic techniques for overlap detection compare to more complex, likelihood-based approaches. The paper provides a quantitative assessment of the ability of any random fingerprinting procedure to detect various proportions of clonal overlap and shows the extent to which a small amount of experimental error will vitiate the performance of such techniques. The paper outlines a simple approximation method for constructing Bayesian overlap detectors, while concluding that detectors constructed from linear combinations of fingerprint data can be designed that will perform nearly as well as more complex, likelihood-based approaches.
Collapse
|
37
|
Abstract
Crossover interference is now known to exist in humans but to date has been ignored in routine genetic mapping because of the computational burden involved. In a recent paper by Weeks et al. [Hum Hered 1993;43:86-97], interference was accounted for by the use of a variety of multilocus feasible map functions and a crossover model of Goldgar and Fain [Am J Hum Genet 1988;43:38-45]. In this paper, we present an alternative approach to incorporating crossover interference into multilocus likelihood computation, by modelling the underlying chiasma process directly using the chi 2 model, supplemented by an assumption of no chromatid interference. This procedure was applied to the same CEPH consortium chromosome 10 data set that was analyzed by Weeks et al. A fit to the data was achieved which was significantly better than that offered by the no-interference model, and comparable to the best of the alternatives considered by Weeks et al. We briefly discuss the relative merits of the different models for interference.
Collapse
|
38
|
A note on the combination of estimates of a recombination fraction. Ann Hum Genet 1996; 60:251-7. [PMID: 8800441 DOI: 10.1111/j.1469-1809.1996.tb00428.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
A number of ways of combining two or more independent estimates of the same recombination fraction can be found in the literature. We revisit this topic in the context of human gene mapping, and explore the value of transforming the recombination fraction to a new parameter whose log-likelihood function is more nearly quadratic. It is shown that the arcsine of the cube-root is one such function. These observations lead naturally to a way of summarizing and combining the summarized set of log-likelihood functions of a common recombination fraction. This idea is illustrated using pedigree data concerning six loci on chromosome 10 from the CEPH consortium. A comparison is also made with the method of summarizing and combining using 'equivalent numbers' of recombinants and informative meioses.
Collapse
|
39
|
Abstract
Various genetic map functions have been proposed to infer the unobservable genetic distance between two loci from the observable recombination fraction between them. Some map functions were found to fit data better than others. When there are more than three markers, multilocus recombination probabilities cannot be uniquely determined by the defining property of map functions, and different methods have been proposed to permit the use of map functions to analyze multilocus data. If for a given map function, there is a probability model for recombination that can give rise to it, then joint recombination probabilities can be deduced from this model. This provides another way to use map functions in multilocus analysis. In this paper we show that stationary renewal processes give rise to most of the map functions in the literature. Furthermore, we show that the interevent distributions of these renewal processes can all be approximated quite well by gamma distributions.
Collapse
|
40
|
Abstract
The relative abundance and rarity of DNA words have been recognized in previous biological studies to have implications for the regulation, repair, and evolutionary mechanisms of a genome. In this paper, we review several different measures of abundance and rarity of DNA words, including z-scores, representation ratios, and cross-ratios, that have appeared in the recent literature, and examine the concordance among them using the human cytomegalovirus genome sequence. We then rank all words of length k = 2, ..., 5 of seven herpesvirus genomes according to their abundance, as measured by one of the z-scores based upon a stationary Markov model of order k-2. Using a simple metric on the ranks of 2-words of the seven herpesvirus sequences, we construct an evolutionary tree. Several 3-words are observed to be consistently over- or underrepresented in all seven herpesviruses. Furthermore, clusters of some of the most over- and underrepresented 4- and 5-words in the genomes are identified with functional sites such as the origins of replication and regulatory signals of individual viruses.
Collapse
|
41
|
Alveolar lining layer is thin and continuous: low-temperature scanning electron microscopy of rat lung. J Appl Physiol (1985) 1995; 79:1615-28. [PMID: 8594022 DOI: 10.1152/jappl.1995.79.5.1615] [Citation(s) in RCA: 231] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
The low-temperature electron microscope, which preserves aqueous structures as solid water at liquid nitrogen temperature, was used to image the alveolar lining layer, including surfactant and its aqueous subphase, of air-filled lungs frozen in anesthetized rats at 15-cmH2O transpulmonary pressure. Lining layer thickness was measured on cross fractures of walls of the outermost subpleural alveoli that could be solidified with metal mirror cryofixation at rates sufficient to limit ice crystal growth to 10 nm and prevent appreciable water movement. The thickness of the liquid layer averaged 0.14 micron over relatively flat portions of the alveolar walls, 0.89 micron at the alveolar wall junctions, and 0.09 micron over the protruding features (9 rats, 20 walls, 16 junctions, and 146 areas), for an area-weighted average thickness of 0.2 micron. The alveolar lining layer appears continuous, submerging epithelial cell microvilli and intercellular junctional ridges; varies from a few nanometers to several micrometers in thickness, and serves to smooth the alveolar air-liquid interface in lungs inflated to zone 1 or 2 conditions.
Collapse
|
42
|
Tests of random mating for a highly polymorphic locus: application to HLA data. Biometrics 1995; 51:1064-76. [PMID: 7548691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Testing for random mating at an HLA locus is a difficult problem because of the highly polymorphic nature of the HLA loci. We discuss some methodological issues and propose several tests. A simulation study is conducted to evaluate these tests. The single allele test and the shared allele test deal with small sample sizes by aggregating the data in different ways. The shared allele test is found to be a more powerful method of detecting non-random mating patterns involving a deficiency or an excess of similar genotypes than the single allele test. We show that random mating of couple at the genotype level implies the random mating of couple at the allele level. Several multi-allele approaches are proposed for large population-based data sets. Among them, the corrected allele-table test performs better than the generalized Wald test in terms of power and size. These methods are then applied to an HLA data set of Caucasian couples, and no solid evidence for non-random mating at the HLA A, B, and DR loci is found.
Collapse
|
43
|
Reproductive failure and the major histocompatibility complex. Am J Hum Genet 1995; 56:1456-67. [PMID: 7762569 PMCID: PMC1801079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The association between HLA sharing and recurrent spontaneous abortion (RSA) was tested in 123 couples and the association between HLA sharing, and the outcome of treatment for unexplained infertility by in vitro fertilization (IVF) was tested in 76 couples, by using a new shared-allele test in order to identify more precisely the region of the major histocompatibility complex (MHC) influencing these reproductive defects. The shared-allele test circumvents the problem of rare alleles at HLA loci and at the same time provides a substantial gain in power over the simple chi 2 test. Two statistical methods, a corrected homogeneity test and a bootstrap approach, were developed to compare the allele frequencies at each of the HLA-A, HLA-B, HLA-DR, and HLA-DQ loci; they were not statistically different among the three patient groups and the control group. There was a significant excess of HLA-DR sharing in couples with RSA and a significant excess of HLA-DQ sharing in couples with unexplained infertility who failed treatment by IVF. These findings indicate that genes located in different parts of the class II region of the MHC affect different aspects of reproduction and strongly suggest that the sharing of HLA antigens per se is not the mechanism involved in the reproductive defects. The segment of the MHC that has genes affecting reproduction also has genes associated with different autoimmune diseases, and this juxtaposition may explain the association between reproductive defects and autoimmune diseases.
Collapse
|
44
|
Abstract
In multilocus linkage analysis, it is common to assume chiasma interference is absent. While this assumption provides mathematical tractability, there is substantial biological evidence contradicting it, particularly when the loci are closely spaced. The chi 2 class of recombination models, recently described by Foss et al. (Genetics 133: 681-691, 1993), has a plausible biological basis and provides a dramatically improved fit over virtually all other models currently in use. Here, a simulation study is performed to assess the relative efficiency of a no interference model analysis to an analysis with a chi 2 model which allows for interference. The results presented show that analysis with the no interference model is inefficient in the presence of interference.
Collapse
|
45
|
Abstract
In analyzing genetic linkage data it is common to assume that the locations of crossovers along a chromosome follow a Poisson process, whereas it has long been known that this assumption does not fit the data. In many organisms it appears that the presence of a crossover inhibits the formation of another nearby, a phenomenon known as "interference." We discuss several point process models for recombination that incorporate position interference but assume no chromatid interference. Using stochastic simulation, we are able to fit the models to a multilocus Drosophila dataset by the method of maximum likelihood. We find that some biologically inspired point process models incorporating one or two additional parameters provide a dramatically better fit to the data than the usual "no-interference" Poisson model.
Collapse
|
46
|
Abstract
The nonrandom occurrence of crossovers along a single strand during meiosis can be caused by either chromatid interference, crossover interference or both. Although crossover interference has been consistently observed in almost all organisms since the time of the first linkage studies, chromatid interference has not been as thoroughly discussed in the literature, and the evidence provided for it is inconsistent. In this paper with virtually no restrictions on the nature of crossover interference, we describe the constraints that follow from the assumption of no chromatid interference for single spore data. These constraints are necessary consequences of the assumption of no chromatid interference, but their satisfaction is not sufficient to guarantee no chromatid interference. Models can be constructed in which chromatid interference clearly exists but is not detectable with single spore data. We then extend our analysis to cover tetrad data, which permits more powerful tests of no chromatid interference. We note that the traditional test of no chromatid interference based on tetrad data does not make full use of the information provided by the data, and we offer a statistical procedure for testing the no chromatid interference constraints that does make full use of the data. The procedure is then applied to data from several organisms. Although no strong evidence of chromatid interference is found, we do observe an excess of two-strand double recombinations, i.e., negative chromatid interference.
Collapse
|
47
|
Abstract
The chi-square model (also known as the gamma model with integer shape parameter) for the occurrence of crossovers along a chromosome was first proposed in the 1940's as a description of interference that was mathematically tractable but without biological basis. Recently, the chi-square model has been reintroduced into the literature from a biological perspective. It arises as a result of certain hypothesized constraints on the resolution of randomly distributed crossover intermediates. In this paper under the assumption of no chromatid interference, the probability for any single spore or tetrad joint recombination pattern is derived under the chi-square model. The method of maximum likelihood is then used to estimate the chi-square parameter m and genetic distances among marker loci. We discuss how to interpret the goodness-of-fit statistics appropriately when there are some recombination classes that have only a small number of observations. Finally, comparisons are made between the chi-square model and some other tractable models in the literature.
Collapse
|
48
|
Testing for segregation distortion in the HLA complex. Biometrics 1994; 50:1189-98. [PMID: 7787001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
One of the long-standing issues in HLA research is whether there is segregation distortion in the HLA complex in human populations. In this paper we study some simple statistical models aimed at detecting segregation distortion. We present a statistic to test the Mendelian null hypothesis of equal transmission probabilities. To assess the possible contribution of multiple alleles to segregation distortion, we employ a specific log-linear model for transmission probabilities equivalent to the Bradley-Terry model in the literature of paired comparisons. We also provide a simple method for detecting a single allele effect, if present.
Collapse
|
49
|
Abstract
Several recent mapping efforts have used so-called "directed" approaches to construct their maps. However, most, but not all, published methods for modeling the progress in physical mapping projects have been focused on random approaches, such as bottom-up fingerprinting and STS-content mapping. In addition, those few efforts that did model directed approaches used methods that required assuming that all insert lengths were the same. This assumption is unnecessary. Using properties of stationary processes, one can derive simple asymptotic formulas that apply equally to constant and variable clone lengths. Also, in the case of constant clone lengths, these results are equivalent to, and extend, those published results for directed mapping derived by other methods. Simulations show that these methods provide estimates well within the limits of uncertainty inherent in any mapping project.
Collapse
|
50
|
Abstract
Large genomic DNA sequences contain regions with distinctive patterns of sequence organization. We describe a method using logarithms of probabilities based on seventh-order Markov chains to rapidly identify genomic sequences that do not resemble models of genome organization built from compilations of octanucleotide usage. Data bases have been constructed from Escherichia coli and Saccharomyces cerevisiae DNA sequences of > 1000 nt and human sequences of > 10,000 nt. Atypical genes and clusters of genes have been located in bacteriophage, yeast, and primate DNA sequences. We consider criteria for statistical significance of the results, offer possible explanations for the observed variation in genome organization, and give additional applications of these methods in DNA sequence analysis.
Collapse
|