1
|
Whole-genome sequencing suggests a chemokine gene cluster that modifies age at onset in familial Alzheimer's disease. Mol Psychiatry 2015; 20:1294-300. [PMID: 26324103 PMCID: PMC4759097 DOI: 10.1038/mp.2015.131] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 07/10/2015] [Accepted: 07/23/2015] [Indexed: 12/22/2022]
Abstract
We have sequenced the complete genomes of 72 individuals affected with early-onset familial Alzheimer's disease caused by an autosomal dominant, highly penetrant mutation in the presenilin-1 (PSEN1) gene, and performed genome-wide association testing to identify variants that modify age at onset (AAO) of Alzheimer's disease. Our analysis identified a haplotype of single-nucleotide polymorphisms (SNPs) on chromosome 17 within a chemokine gene cluster associated with delayed onset of mild-cognitive impairment and dementia. Individuals carrying this haplotype had a mean AAO of mild-cognitive impairment at 51.0 ± 5.2 years compared with 41.1 ± 7.4 years for those without these SNPs. This haplotype thus appears to modify Alzheimer's AAO, conferring a large (~10 years) protective effect. The associated locus harbors several chemokines including eotaxin-1 encoded by CCL11, and the haplotype includes a missense polymorphism in this gene. Validating this association, we found plasma eotaxin-1 levels were correlated with disease AAO in an independent cohort from the University of California San Francisco Memory and Aging Center. In this second cohort, the associated haplotype disrupted the typical age-associated increase of eotaxin-1 levels, suggesting a complex regulatory role for this haplotype in the general population. Altogether, these results suggest eotaxin-1 as a novel modifier of Alzheimer's disease AAO and open potential avenues for therapy.
Collapse
|
2
|
Statistical analysis of MPSS measurements: application to the study of LPS-activated macrophage gene expression. Proc Natl Acad Sci U S A 2005; 102:1402-7. [PMID: 15668391 PMCID: PMC547838 DOI: 10.1073/pnas.0406555102] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Massively Parallel Signature Sequencing (MPSS), a recently developed high-throughput transcription profiling technology, has the ability to profile almost every transcript in a sample without requiring prior knowledge of the sequence of the transcribed genes. As is the case with DNA microarrays, effective data analysis depends crucially on understanding how noise affects measurements. We analyze the sources of noise in MPSS and present a quantitative model describing the variability between replicate MPSS assays. We use this model to construct statistical hypotheses that test whether an observed change in gene expression in a pair-wise comparison is significant. This analysis is then extended to the determination of the significance of changes in expression levels measured over the course of a time series of measurements. We apply these analytic techniques to the study of a time series of MPSS gene expression measurements on LPS-stimulated macrophages. To evaluate our statistical significance metrics, we compare our results with published data on macrophage activation measured by using Affymetrix GeneChips.
Collapse
|
3
|
Abstract
The availability of the complete genomic sequences of the human and mouse T cell receptor loci opens up new opportunities for understanding T cell receptors (TCRs) and their genes. The full complement of TCR gene segments is finally known and should prove a valuable resource for supporting functional studies. A rational nomenclature system has been implemented and is widely available through IMGT and other public databases. Systematic comparisons of the genomic sequences within each locus, between loci, and across species enable precise analyses of the various diversification mechanisms and some regulatory signals. The genomic landscape of the TCR loci provides fundamental insights into TCR evolution as highly localized and tightly regulated gene families.
Collapse
|
4
|
Abstract
In pairwise end sequencing, sequences are determined from both ends of random subclones derived from a DNA target. Sufficiently similar overlapping end sequences are identified and grouped into contigs. When a clone's paired end sequences fall in different contigs, the contigs are connected together to form scaffolds. Increasingly, the goals of pairwise strategies are large and highly repetitive genomic targets. Here, we consider large-scale pairwise strategies that employ mixtures of subclone sizes. We explore the properties of scaffold formation within a hybrid theory/simulation mathematical model of a genomic target that contains many repeat families. Using this model, we evaluate problems that may arise, such as falsely linked end sequences (due either to random matches or to homologous repeats) and scaffolds that terminate without extending the full length of the target. We illustrate our model with an exploration of a strategy for sequencing the human genome. Our results show that, for a strategy that generates 10-fold sequence coverage derived from the ends of clones ranging in length from 2 to 150 kb, using an appropriate rule for detecting overlaps, we expect few false links while obtaining a single scaffold extending the length of each chromosome.
Collapse
|
5
|
Abstract
The parking strategy is an iterative approach to DNA sequencing. Each iteration consists of sequencing a novel portion of target DNA that does not overlap any previously sequenced region. Subject to the constraint of no overlap, each new region is chosen randomly. A parking strategy is often ideal in the early stages of a project for rapidly generating unique data. As a project progresses, parking becomes progressively more expensive and eventually prohibitive. We present a mathematical model with a generalization to allow for overlaps. This model predicts multiple parameters, including progress, costs, and the distribution of gap sizes left by a parking strategy. The highly fragmented nature of the gaps left after an initial parking strategy may make it difficult to finish a project efficiently. Therefore, in addition to our parking model, we model gap closing by walking. Our gap-closing model is generalizable to many other strategies. Our discussion includes modified parking strategies and hybrids with other strategies. A hybrid parking strategy has been employed for portions of the Human Genome Project.
Collapse
|
6
|
|
7
|
Analysis of sequence-tagged-connector strategies for DNA sequencing. Genome Res 1999; 9:297-307. [PMID: 10077536 PMCID: PMC310733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
The BAC-end sequencing, or sequence-tagged-connector (STC), approach to genome sequencing involves sequencing the ends of BAC inserts to scatter sequence tags (STCs) randomly across the genome. Once any BAC or other large segment of DNA is sequenced to completion by conventional shotgun approaches, these STC tags can be used to identify a minimum tiling path of BAC clones overlapping the nucleation sequence for sequence extension. Here, we explore the properties of STC-sequencing strategies within a mathematical model of a random target with homologous repeats and imperfect sequencing technology to understand the consequences of varying various parameters on the incidence of problem clones and the cost of the sequencing project. Problem clones are defined as clones for which either (A) there is no identifiable overlapping STC to extend the sequence in a particular direction or (B) the identified STC with minimum overlap comes from a nonoverlapping clone, either owing to random false matches or repeat-family homology. Based on the minimum overlap, we estimate the number of clones to be entirely sequenced and, then, using cost estimates, identify the decision rule (the degree of sequence similarity required before a match is declared between an STC and a clone) to minimize overall sequencing cost. A method to optimize the overlap decision rule is highly desirable, because both the total cost and the number of problem clones are shown to be highly sensitive to this choice. For a target of 3 Gb containing approximately 800 Mb of repeats with 85%-90% identity, we expect <10 problem clones with 15 times coverage by 150-kb clones. We derive the optimal redundancy and insert sizes of clone libraries for sequencing genomes of various sizes, from microbial to human. We estimate that establishing the resource of STCs as a means of identifying minimally overlapping clones represents only 1%-3% of the total cost of sequencing the human genome, and, up to a point of diminishing returns, a larger STC resource is associated with a smaller total sequencing cost.
Collapse
|
8
|
Abstract
Consider a DNA mapping project in which overlap of clones is inferred from multiple complete restriction enzyme digests. Each enzyme cuts each clone randomly into fragments whose lengths are determined with some error. Clones that share fragments with matching lengths could contain a region of overlap. However, common fragment lengths may be due to random coincidence leading to a false overlap declaration. Although the probability of false fragment matching is small, a mapping project involves a large number of clone comparisons. Consequently, erroneous fragment matches can be a serious problem. We use a geometrical probability approach to develop exact integral formulas and first-order approximations for the expected number and variance of classes of fragment pairs that will be identified falsely as matching. We also find exact formulas for the expected value, and variance of the number of true fragment matches. These formulas are useful in comparing different mapping strategies.
Collapse
|
9
|
Abstract
Consider a mapping project in which overlap of clonal segments is inferred from complete multiple restriction digests. The fragment sizes of the clones are measured with some error, potentially leading to a map with erroneous links. The number of errors in the map depends on the number and types of enzymes used to characterize the clones. The most critical parameter is the decision rule k, or the criterion for declaring clone overlap. Small changes in k may cause an order of magnitude change in the amount of work it takes to build a map of given completion. We observe that the cost of an optimal mapping strategy is approximately proportional to the target size. While this finding is encouraging, considerable effort is nonetheless required: for large-scale sequencing projects with up-front mapping, mapping will be a non-negligible fraction of the total sequencing cost.
Collapse
|
10
|
Abstract
We expand the already large number of known trypsinogen nucleotide and amino acid sequences by presenting additional trypsinogen sequences from the tunicate (Boltenia villosa), the lamprey (Petromyzon marinus), the pufferfish (Fugu rubripes), and the frog (Xenopus laevis). The current array of known trypsinogen sequences now spans the entire vertebrate phylogeny. Phylogenetic analysis is made difficult by the presence of multiple isozymes within species and rates of evolution that vary highly between both species and isozymes. We nevertheless present a Fitch-Margoliash phylogeny constructed from pairwise distances. We employ this phylogeny as a vehicle for speculation on the evolution of the trypsinogen gene family as well as the general modes of evolution of multigene families. Unique attributes of the lamprey and tunicate trypsinogens are noted.
Collapse
|
11
|
Abstract
Five to ten percent of breast cancer in the western world may be attributed to the inheritance of highly penetrant mutations in the breast and ovarian cancer susceptibility gene, BRCA1. The biological function of BRCA1 and factors affecting expressivity, such as gene-environment and gene-gene interactions, may be more effectively studied in appropriate animal models. We report the cloning and sequencing of the canine and murine BRCA1 genes and contrast the sequences with human BRCA1. The amino terminal 120 residues of the gene are > 80% identical among the three species. The C-terminus is also highly conserved, containing an 80 amino acid stretch that is over 80% identical. Motifs of likely functional significance are maintained, including the amino terminal RING finger motif (amino acids 24-64) and the granin consensus sequence (1214-1223). The distribution of missense mutations and neutral polymorphisms identified in BRCA1-linked breast cancer suggests that disease associated missense mutations occur at highly conserved residues whereas polymorphisms are in regions of lower conservation. Among eighteen missense mutations with unknown consequences, seven occur in amino acids that are identical across species. Four of these seven (E1219D, A1708E, P1749R and M1775R) are also within conserved domains. Taken together, these data predict regions of the gene which may be critical for normal function.
Collapse
|
12
|
Abstract
Random subcloning strategies are commonly employed for analyzing pieces of DNA that are too large for direct analysis. Such strategies are applicable to gene finding, physical mapping, and DNA sequencing. Random subcloning refers to the generation of many small, directly analyzable fragments of DNA that represent random fragments of a larger whole, such as a genome. Following analysis of these fragments, a map or sequence of the original target may be reconstructed. Mathematical modeling is useful in planning such strategies and in providing a reference for their evaluation, both during execution and following completion. The statistical theory necessary for constructing these models has been developed independently over the last century. This paper brings this theory together into a statistical model for random subcloning strategies. This mathematical model retains its utility even at high subclone redundancies, which are necessary for project completion. The discussion here centers on shotgun sequencing, a random subcloning strategy envisioned as the method of choice for sequencing the human genome.
Collapse
|
13
|
Abstract
Strategies for large-scale genomic DNA sequencing currently require physical mapping, followed by detailed mapping, and finally sequencing. The level of mapping detail determines the amount of effort, or sequence redundancy, required to finish a project. Current strategies attempt to find a balance between mapping and sequencing efforts. One such approach is to employ strategies that use sequence data to build physical maps. Such maps alleviate the need for prior mapping and reduce the final required sequence redundancy. To this end, the utility of correlating pairs of sequence data derived from both ends of subcloned templates is well recognized. However, optimal strategies employing such pairwise data have not been established. In the present work, we simulate and analyze the parameters of pairwise sequencing projects including template length, sequence read length, and total sequence redundancy. One pairwise strategy based on sequencing both ends of plasmid subclones is recommended and illustrated with raw data simulations. We find that pairwise strategies are effective with both small (cosmid) and large (megaYAC) targets and produce ordered sequence data with a high level of mapping completeness. They are ideal for finescale mapping and gene finding and as initial steps for either a high- or a low-redundancy sequencing effort. Such strategies are highly automatable.
Collapse
|
14
|
Abstract
Adolescent guinea pigs (AGPs) demonstrate dry gas hyperpnea-induced bronchoconstriction (HIB) that shares key features with HIB in humans with asthma. The airways of immature animals exhibit enhanced reactivity to diverse types of stimulation. We tested whether dry gas HIB is also increased in newborn guinea pigs (NGPs). We quantified HIB as the fractional increase of respiratory system resistance (Rrs) over baseline (BL) in five 4- to 7-day-old NGPs after 10 min of hyperpnea, as well as changes in Rrs elicited by intravenous methacholine or capsaicin, and compared these responses with those of AGPs. During hyperpnea, analogous stimuli were delivered by mechanically imposing hyperpnea at 3.0, 4.5, and 6.0 times quiet eucapnic minute ventilation (VE). In AGPs, hyperpnea caused significant bronchoconstriction that increased with VE; peak fractional increase of Rrs was 7.6 +/- 2.0 times BL. In contrast, hyperpnea caused insignificant bronchoconstriction in NGPs (1.4 +/- 0.2 times BL after the largest VE; P < 0.05 vs. AGP). Responses elicited by methacholine (10(-10)-10(-7) mol/kg) or capsaicin (0.01-10.0 microgram/kg) were similar in NGPs and AGPs. In AGPs, hyperpnea suppressed HIB until posthyperpnea. To determine whether the reduced HIB of NGPs was caused by enhanced suppression, NGPs and AGPs were administered acetylcholine (10(-10)-10(-7) mol/kg i.v.) during BL eucapnic ventilation and during eucapnic hyperpnea with warm humidified gas. Responses to acetylcholine were suppressed in AGPs and NGPs to a similar degree. We conclude that HIB is markedly diminished shortly after birth in guinea pigs and that it increases substantially during maturation.(ABSTRACT TRUNCATED AT 250 WORDS)
Collapse
|