Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Journal Articles

Rank	Citation Analysis	Article Type	Number of Years	Citation(s) in RCA
1	Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009;25:2078-9. [PMID: 19505943 PMCID: PMC2723002 DOI: 10.1093/bioinformatics/btp352] [Citation(s) in RCA: 41020] [Impact Index Per Article: 2563.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2009] [Revised: 05/28/2009] [Accepted: 05/30/2009] [Indexed: 11/24/2022] Open Abstract SUMMARY The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. AVAILABILITY http://samtools.sourceforge.net. Collapse Key Words Collapse MESH Headings Algorithms Base Sequence Computational Biology/methods Genome Genomics Molecular Sequence Data Sequence Alignment/methods Sequence Analysis, DNA/methods Software Collapse Grants R01 HG004719 NHGRI NIH HHS U54 HG002750 NHGRI NIH HHS 077192/Z/05/Z Wellcome Trust U54HG002750 NHGRI NIH HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	16	41020
2	Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012;9:357-9. [PMID: 22388286 PMCID: PMC3322381 DOI: 10.1038/nmeth.1923] [Citation(s) in RCA: 36284] [Impact Index Per Article: 2791.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 02/06/2012] [Indexed: 02/02/2023] Abstract As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy. Collapse Key Words Collapse MESH Headings Algorithms Computational Biology/methods Databases, Genetic Genome, Human/genetics Humans Sequence Alignment/methods Sequence Analysis, DNA/methods Collapse Grants R01 HG006677 NHGRI NIH HHS R01-HG006102 NHGRI NIH HHS R01 HG006102-02 NHGRI NIH HHS R01 HG006677-12 NHGRI NIH HHS R01 HG006102 NHGRI NIH HHS R01 HG006677-13 NHGRI NIH HHS R01-GM083873 NIGMS NIH HHS R01 GM083873 NIGMS NIH HHS R01 HG006102-01 NHGRI NIH HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	13	36284
3	Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods 2016;13:581-3. [PMID: 27214047 PMCID: PMC4927377 DOI: 10.1038/nmeth.3869] [Citation(s) in RCA: 17192] [Impact Index Per Article: 1910.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 04/13/2016] [Indexed: 02/06/2023] Abstract We present the open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors (https://github.com/benjjneb/dada2). DADA2 infers sample sequences exactly and resolves differences of as little as 1 nucleotide. In several mock communities, DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants. Collapse Key Words Collapse MESH Headings Animals Cohort Studies Computational Biology/methods DNA, Bacterial/genetics False Positive Reactions Feces/microbiology Female High-Throughput Nucleotide Sequencing/methods Humans Lactobacillus/classification Lactobacillus/genetics Lactobacillus/isolation & purification Mice Microbiota/genetics Pregnancy RNA, Ribosomal, 16S/genetics Reproducibility of Results Sequence Analysis, DNA/methods Software Vagina/microbiology Collapse Grants R01 AI112401 NIAID NIH HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	9	17192
4	Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, et alLander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J. Initial sequencing and analysis of the human genome. Nature 2001;409:860-921. [PMID: 11237011 DOI: 10.1038/35057062] [Show More Authors] [Citation(s) in RCA: 15031] [Impact Index Per Article: 626.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Abstract The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. Collapse Key Words Collapse MESH Headings Animals Chromosome Mapping Conserved Sequence CpG Islands DNA Transposable Elements Databases, Factual Drug Industry Evolution, Molecular Forecasting GC Rich Sequence Gene Duplication Genes Genetic Diseases, Inborn Genetics, Medical Genome, Human Human Genome Project Humans Mutation Private Sector Proteins/genetics Proteome Public Sector RNA/genetics Repetitive Sequences, Nucleic Acid Sequence Analysis, DNA/methods Species Specificity Collapse Grants Collapse Collaborators Collapse		24	15031
5	Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2004;21:263-5. [PMID: 15297300 DOI: 10.1093/bioinformatics/bth457] [Citation(s) in RCA: 11829] [Impact Index Per Article: 563.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open Abstract UNLABELLED Research over the last few years has revealed significant haplotype structure in the human genome. The characterization of these patterns, particularly in the context of medical genetic association studies, is becoming a routine research activity. Haploview is a software package that provides computation of linkage disequilibrium statistics and population haplotype patterns from primary genotype data in a visually appealing and interactive interface. AVAILABILITY http://www.broad.mit.edu/mpg/haploview/ CONTACT jcbarret@broad.mit.edu Collapse Key Words Collapse MESH Headings Algorithms Chromosome Mapping/methods Haplotypes/genetics Internet Linkage Disequilibrium/genetics Programming Languages Sequence Alignment/methods Sequence Analysis, DNA/methods Software User-Computer Interface Collapse Grants Collapse Collaborators Collapse	Journal Article	21	11829
6	Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. ACTA ACUST UNITED AC 2006;22:2688-90. [PMID: 16928733 DOI: 10.1093/bioinformatics/btl446] [Citation(s) in RCA: 10825] [Impact Index Per Article: 569.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Abstract UNLABELLED RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+Gamma yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets > or =4000 taxa it also runs 2-3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25,057 (1463 bp) and 2182 (51,089 bp) taxa, respectively. AVAILABILITY icwww.epfl.ch/~stamatak Collapse Key Words Collapse MESH Headings Algorithms Conserved Sequence Evolution, Molecular Models, Genetic Models, Statistical Phylogeny Sequence Alignment/methods Sequence Analysis, DNA/methods Sequence Homology, Nucleic Acid Software Species Specificity Collapse Grants Collapse Collaborators Collapse	Research Support, Non-U.S. Gov't	19	10825
7	DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011;43:491-8. [PMID: 21478889 PMCID: PMC3083463 DOI: 10.1038/ng.806] [Citation(s) in RCA: 8239] [Impact Index Per Article: 588.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2010] [Accepted: 03/17/2011] [Indexed: 02/07/2023] Abstract Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets. Collapse Key Words Collapse MESH Headings Data Interpretation, Statistical Databases, Nucleic Acid Exons Genetic Variation Genetics, Population/methods Genetics, Population/statistics & numerical data Genome, Human Genotype Humans Polymorphism, Single Nucleotide Sequence Alignment/methods Sequence Alignment/statistics & numerical data Sequence Analysis, DNA/methods Sequence Analysis, DNA/statistics & numerical data Software Collapse Grants P30 DK043351 NIDDK NIH HHS U01 HG005208 NHGRI NIH HHS U54 HG003067 NHGRI NIH HHS 54 HG003067 NHGRI NIH HHS Collapse Collaborators Collapse	Comparative Study	14	8239
8	Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, et alVenter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigó R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Deslattes Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X. The sequence of the human genome. Science 2001;291:1304-51. [PMID: 11181995 DOI: 10.1126/science.1058040] [Show More Authors] [Citation(s) in RCA: 7847] [Impact Index Per Article: 327.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Abstract A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge. Collapse Key Words Collapse MESH Headings Algorithms Animals Chromosome Banding Chromosome Mapping Chromosomes, Artificial, Bacterial Computational Biology Consensus Sequence CpG Islands DNA, Intergenic Databases, Factual Evolution, Molecular Exons Female Gene Duplication Genes Genetic Variation Genome, Human Human Genome Project Humans Introns Male Phenotype Physical Chromosome Mapping Polymorphism, Single Nucleotide Proteins/genetics Proteins/physiology Pseudogenes Repetitive Sequences, Nucleic Acid Retroelements Sequence Analysis, DNA/methods Species Specificity Collapse Grants Collapse Collaborators Collapse		24	7847
9	Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006;22:1658-9. [PMID: 16731699 DOI: 10.1093/bioinformatics/btl158] [Citation(s) in RCA: 7297] [Impact Index Per Article: 384.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open Abstract MOTIVATION In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST. Collapse Key Words Collapse MESH Headings Algorithms Animals Cluster Analysis Computational Biology/methods DNA/chemistry Databases, Nucleic Acid Databases, Protein Expressed Sequence Tags Humans Programming Languages RNA/chemistry Sequence Analysis, DNA/methods Sequence Analysis, Protein/methods Software Collapse Grants Collapse Collaborators Collapse	Research Support, Non-U.S. Gov't	19	7297
10	Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008;18:821-9. [PMID: 18349386 PMCID: PMC2336801 DOI: 10.1101/gr.074492.107] [Citation(s) in RCA: 7167] [Impact Index Per Article: 421.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2007] [Accepted: 03/17/2008] [Indexed: 02/06/2023] Abstract We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies. Collapse Key Words Collapse MESH Headings Algorithms Animals Chromosomes, Artificial, Bacterial Computational Biology/methods Computer Simulation Genome, Bacterial Genome, Human Genomics Humans Mammals/genetics Sequence Analysis, DNA/methods Sequence Analysis, DNA/standards Streptococcus/genetics Collapse Grants G0300762 Medical Research Council Collapse Collaborators Collapse	other	17	7167
11	Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003;19:185-93. [PMID: 12538238 DOI: 10.1093/bioinformatics/19.2.185] [Citation(s) in RCA: 6170] [Impact Index Per Article: 280.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract MOTIVATION When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. RESULTS We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. AVAILABILITY Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. SUPPLEMENTARY INFORMATION Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html Collapse Key Words Collapse MESH Headings Algorithms Calibration Models, Genetic Molecular Probes Nonlinear Dynamics Oligonucleotide Array Sequence Analysis/instrumentation Oligonucleotide Array Sequence Analysis/methods Oligonucleotide Array Sequence Analysis/standards Quality Control Sequence Analysis, DNA/methods Sequence Analysis, DNA/standards Stochastic Processes Collapse Grants Collapse Collaborators Collapse	Comparative Study	22	6170
12	Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 2014;9:e112963. [PMID: 25409509 PMCID: PMC4237348 DOI: 10.1371/journal.pone.0112963] [Citation(s) in RCA: 6058] [Impact Index Per Article: 550.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Accepted: 10/16/2014] [Indexed: 02/06/2023] Open Abstract Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3–5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software. Collapse Key Words Collapse MESH Headings Algorithms Bacteria/classification Bacteria/genetics Genetic Variation Genome, Bacterial Molecular Sequence Data Sequence Analysis, DNA/methods Software Collapse Grants HHSN272200900018C NIAID NIH HHS U19 AI110818 NIAID NIH HHS U54 HG003067 NHGRI NIH HHS U54HG003067 NHGRI NIH HHS Collapse Collaborators Collapse	Research Support, Non-U.S. Gov't	11	6058
13	Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 1999;27:573-80. [PMID: 9862982 PMCID: PMC148217 DOI: 10.1093/nar/27.2.573] [Citation(s) in RCA: 6000] [Impact Index Per Article: 230.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm's speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human beta T cellreceptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface atc3.biomath.mssm.edu/trf.html has been established for automated use of the program. Collapse Key Words Collapse MESH Headings Algorithms Chromosomes, Fungal/genetics Cluster Analysis Friedreich Ataxia/genetics Genes, Fungal Humans Iron-Binding Proteins Mannose-Binding Lectins Membrane Proteins Models, Statistical Mutation Pattern Recognition, Automated Phosphotransferases (Alcohol Group Acceptor)/genetics Probability Pseudogenes Receptors, Antigen, T-Cell, alpha-beta/genetics Saccharomyces cerevisiae Proteins Sequence Analysis, DNA/methods Software Tandem Repeat Sequences Frataxin Collapse Grants Collapse Collaborators Collapse	research-article	26	6000
14	1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. A map of human genome variation from population-scale sequencing. Nature 2010;467:1061-73. [PMID: 20981092 PMCID: PMC3042601 DOI: 10.1038/nature09534] [Citation(s) in RCA: 5922] [Impact Index Per Article: 394.8] [Reference Citation Analysis] [Collaborators] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Accepted: 09/30/2010] [Indexed: 11/08/2022] Abstract The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research. Collapse Key Words Collapse MESH Headings Calibration Chromosomes, Human, Y/genetics Computational Biology DNA Mutational Analysis DNA, Mitochondrial/genetics Evolution, Molecular Female Genetic Association Studies Genetic Variation/genetics Genetics, Population/methods Genome, Human/genetics Genome-Wide Association Study Genomics/methods Genotype Haplotypes/genetics Humans Male Mutation/genetics Pilot Projects Polymorphism, Single Nucleotide/genetics Recombination, Genetic/genetics Sample Size Selection, Genetic/genetics Sequence Alignment Sequence Analysis, DNA/methods Collapse Grants U01HG5208 NHGRI NIH HHS Howard Hughes Medical Institute 075491 Wellcome Trust U54 HG002750 NHGRI NIH HHS R01 GM059290 NIGMS NIH HHS 077014 Wellcome Trust WT086084/Z/08/Z Wellcome Trust 077009 Wellcome Trust R01HG4960 NHGRI NIH HHS 077192 Wellcome Trust R01 HG004960 NHGRI NIH HHS R01 MH084698 NIMH NIH HHS U01 HG005208 NHGRI NIH HHS WT081407/Z/06/Z Wellcome Trust 01MH84698 NIMH NIH HHS R01GM72861 NIGMS NIH HHS U54 HG003067 NHGRI NIH HHS U41 HG002371 NHGRI NIH HHS R01 HG004333 NHGRI NIH HHS U24 HG002371 NHGRI NIH HHS P41HG2371 NHGRI NIH HHS R01 HG002651 NHGRI NIH HHS R01 HG004719 NHGRI NIH HHS P41 HG004222 NHGRI NIH HHS R01HG4333 NHGRI NIH HHS R01HG3698 NHGRI NIH HHS R01 GM072861 NIGMS NIH HHS RC2 HG005552 NHGRI NIH HHS U54HG2757 NHGRI NIH HHS WT089088/Z/09/Z Wellcome Trust U54 HG002757 NHGRI NIH HHS 089088 Wellcome Trust R01GM59290 NIGMS NIH HHS U01HG5210 NHGRI NIH HHS U41HG4568 NHGRI NIH HHS U01 HG005211 NHGRI NIH HHS U01 HG005214 NHGRI NIH HHS WT075491/Z/04 Wellcome Trust R01 HG003229 NHGRI NIH HHS P41 HG002371 NHGRI NIH HHS T32 GM007753 NIGMS NIH HHS U54 HG003273 NHGRI NIH HHS G0801823 Medical Research Council WT085532AIA Wellcome Trust U01HG5211 NHGRI NIH HHS U54HG3079 NHGRI NIH HHS U01HG5214 NHGRI NIH HHS U41 HG004568 NHGRI NIH HHS RG/09/012/28096 British Heart Foundation U54HG2750 NHGRI NIH HHS 089062 Wellcome Trust Intramural NIH HHS U01 HG005210 NHGRI NIH HHS R01 HG003698 NHGRI NIH HHS RC2HG5552 NHGRI NIH HHS P01HG4120 NHGRI NIH HHS U01HG5209 NHGRI NIH HHS 089061 Wellcome Trust 086084 Wellcome Trust 01HG3229 NHGRI NIH HHS U54HG3067 NHGRI NIH HHS P41HG4221 NHGRI NIH HHS R01 HG002510 NHGRI NIH HHS 081407 Wellcome Trust 085532 Wellcome Trust N01 HG062088 NHGRI NIH HHS U01 HG005209 NHGRI NIH HHS F31 HG005201 NHGRI NIH HHS P50HG2357 NHGRI NIH HHS R21 AA022707 NIAAA NIH HHS P41 HG004221 NHGRI NIH HHS P01 HG004120 NHGRI NIH HHS P41HG4222 NHGRI NIH HHS S10RR025056 NCRR NIH HHS R01HG2651 NHGRI NIH HHS R01HG4719 NHGRI NIH HHS P50 HG002357 NHGRI NIH HHS G0801823(89305) Medical Research Council U54HG3273 NHGRI NIH HHS Collapse Collaborators David Altshuler, Richard M Durbin, Gonçalo R Abecasis, David R Bentley, Aravinda Chakravarti, Andrew G Clark, Francis S Collins, Francisco M De La Vega, Peter Donnelly, Michael Egholm, Paul Flicek, Stacey B Gabriel, Richard A Gibbs, Bartha M Knoppers, Eric S Lander, Hans Lehrach, Elaine R Mardis, Gil A McVean, Deborah A Nickerson, Leena Peltonen, Alan J Schafer, Stephen T Sherry, Jun Wang, Richard Wilson, Richard A Gibbs, David Deiros, Mike Metzker, Donna Muzny, Jeff Reid, David Wheeler, Jun Wang, Jingxiang Li, Min Jian, Guoqing Li, Ruiqiang Li, Huiqing Liang, Geng Tian, Bo Wang, Jian Wang, Wei Wang, Huanming Yang, Xiuqing Zhang, Huisong Zheng, Eric S Lander, David L Altshuler, Lauren Ambrogio, Toby Bloom, Kristian Cibulskis, Tim J Fennell, Stacey B Gabriel, David B Jaffe, Erica Shefler, Carrie L Sougnez, David R Bentley, Niall Gormley, Sean Humphray, Zoya Kingsbury, Paula Kokko-Gonzales, Jennifer Stone, Kevin J McKernan, Gina L Costa, Jeffry K Ichikawa, Clarence C Lee, Ralf Sudbrak, Hans Lehrach, Tatiana A Borodina, Andreas Dahl, Alexey N Davydov, Peter Marquardt, Florian Mertes, Wilfiried Nietfeld, Philip Rosenstiel, Stefan Schreiber, Aleksey V Soldatov, Bernd Timmermann, Marius Tolzmann, Michael Egholm, Jason Affourtit, Dana Ashworth, Said Attiya, Melissa Bachorski, Eli Buglione, Adam Burke, Amanda Caprio, Christopher Celone, Shauna Clark, David Conners, Brian Desany, Lisa Gu, Lorri Guccione, Kalvin Kao, Jonathan Kebbler, Jennifer Knowlton, Matthew Labrecque, Louise McDade, Craig Mealmaker, Melissa Minderman, Anne Nawrocki, Faheem Niazi, Kristen Pareja, Ravi Ramenani, David Riches, Wanmin Song, Cynthia Turcotte, Shally Wang, Elaine R Mardis, Richard K Wilson, David Dooling, Lucinda Fulton, Robert Fulton, George Weinstock, Richard M Durbin, John Burton, David M Carter, Carol Churcher, Alison Coffey, Anthony Cox, Aarno Palotie, Michael Quail, Tom Skelly, James Stalker, Harold P Swerdlow, Daniel Turner, Anniek De Witte, Shane Giles, Richard A Gibbs, David Wheeler, Matthew Bainbridge, Danny Challis, Aniko Sabo, Fuli Yu, Jin Yu, Jun Wang, Xiaodong Fang, Xiaosen Guo, Ruiqiang Li, Yingrui Li, Ruibang Luo, Shuaishuai Tai, Honglong Wu, Hancheng Zheng, Xiaole Zheng, Yan Zhou, Guoqing Li, Jian Wang, Huanming Yang, Gabor T Marth, Erik P Garrison, Weichun Huang, Amit Indap, Deniz Kural, Wan-Ping Lee, Wen Fung Leong, Aaron R Quinlan, Chip Stewart, Michael P Stromberg, Alistair N Ward, Jiantao Wu, Charles Lee, Ryan E Mills, Xinghua Shi, Mark J Daly, Mark A DePristo, David L Altshuler, Aaron D Ball, Eric Banks, Toby Bloom, Brian L Browning, Kristian Cibulskis, Tim J Fennell, Kiran V Garimella, Sharon R Grossman, Robert E Handsaker, Matt Hanna, Chris Hartl, David B Jaffe, Andrew M Kernytsky, Joshua M Korn, Heng Li, Jared R Maguire, Steven A McCarroll, Aaron McKenna, James C Nemesh, Anthony A Philippakis, Ryan E Poplin, Alkes Price, Manuel A Rivas, Pardis C Sabeti, Stephen F Schaffner, Erica Shefler, Ilya A Shlyakhter, David N Cooper, Edward V Ball, Matthew Mort, Andrew D Phillips, Peter D Stenson, Jonathan Sebat, Vladimir Makarov, Kenny Ye, Seungtai C Yoon, Carlos D Bustamante, Andrew G Clark, Adam Boyko, Jeremiah Degenhardt, Simon Gravel, Ryan N Gutenkunst, Mark Kaganovich, Alon Keinan, Phil Lacroute, Xin Ma, Andy Reynolds, Laura Clarke, Paul Flicek, Fiona Cunningham, Javier Herrero, Stephen Keenen, Eugene Kulesha, Rasko Leinonen, William M McLaren, Rajesh Radhakrishnan, Richard E Smith, Vadim Zalunin, Xiangqun Zheng-Bradley, Jan O Korbel, Adrian M Stütz, Sean Humphray, Markus Bauer, R Keira Cheetham, Tony Cox, Michael Eberle, Terena James, Scott Kahn, Lisa Murray, Aravinda Chakravarti, Kai Ye, Francisco M De La Vega, Yutao Fu, Fiona C L Hyland, Jonathan M Manning, Stephen F McLaughlin, Heather E Peckham, Onur Sakarya, Yongming A Sun, Eric F Tsung, Mark A Batzer, Miriam K Konkel, Jerilyn A Walker, Ralf Sudbrak, Marcus W Albrecht, Vyacheslav S Amstislavskiy, Ralf Herwig, Dimitri V Parkhomchuk, Stephen T Sherry, Richa Agarwala, Hoda M Khouri, Aleksandr O Morgulis, Justin E Paschall, Lon D Phan, Kirill E Rotmistrovsky, Robert D Sanders, Martin F Shumway, Chunlin Xiao, Gil A McVean, Adam Auton, Zamin Iqbal, Gerton Lunter, Jonathan L Marchini, Loukas Moutsianas, Simon Myers, Afidalina Tumian, Brian Desany, James Knight, Roger Winer, David W Craig, Steve M Beckstrom-Sternberg, Alexis Christoforides, Ahmet A Kurdoglu, John V Pearson, Shripad A Sinari, Waibhav D Tembe, David Haussler, Angie S Hinrichs, Sol J Katzman, Andrew Kern, Robert M Kuhn, Molly Przeworski, Ryan D Hernandez, Bryan Howie, Joanna L Kelley, S Cord Melton, Gonçalo R Abecasis, Yun Li, Paul Anderson, Tom Blackwell, Wei Chen, William O Cookson, Jun Ding, Hyun Min Kang, Mark Lathrop, Liming Liang, Miriam F Moffatt, Paul Scheet, Carlo Sidore, Matthew Snyder, Xiaowei Zhan, Sebastian Zöllner, Philip Awadalla, Ferran Casals, Youssef Idaghdour, John Keebler, Eric A Stone, Martine Zilversmit, Lynn Jorde, Jinchuan Xing, Evan E Eichler, Gozde Aksay, Can Alkan, Iman Hajirasouliha, Fereydoun Hormozdiari, Jeffrey M Kidd, S Cenk Sahinalp, Peter H Sudmant, Elaine R Mardis, Ken Chen, Asif Chinwalla, Li Ding, Daniel C Koboldt, Mike D McLellan, David Dooling, George Weinstock, John W Wallis, Michael C Wendl, Qunyuan Zhang, Richard M Durbin, Cornelis A Albers, Qasim Ayub, Senduran Balasubramaniam, Jeffrey C Barrett, David M Carter, Yuan Chen, Donald F Conrad, Petr Danecek, Emmanouil T Dermitzakis, Min Hu, Ni Huang, Matt E Hurles, Hanjun Jin, Luke Jostins, Thomas M Keane, Si Quang Le, Sarah Lindsay, Quan Long, Daniel G MacArthur, Stephen B Montgomery, Leopold Parts, James Stalker, Chris Tyler-Smith, Klaudia Walter, Yujun Zhang, Mark B Gerstein, Michael Snyder, Alexej Abyzov, Suganthi Balasubramanian, Robert Bjornson, Jiang Du, Fabian Grubert, Lukas Habegger, Rajini Haraksingh, Justin Jee, Ekta Khurana, Hugo Y K Lam, Jing Leng, Xinmeng Jasmine Mu, Alexander E Urban, Zhengdong Zhang, Yingrui Li, Ruibang Luo, Gabor T Marth, Erik P Garrison, Deniz Kural, Aaron R Quinlan, Chip Stewart, Michael P Stromberg, Alistair N Ward, Jiantao Wu, Charles Lee, Ryan E Mills, Xinghua Shi, Steven A McCarroll, Eric Banks, Mark A DePristo, Robert E Handsaker, Chris Hartl, Joshua M Korn, Heng Li, James C Nemesh, Jonathan Sebat, Vladimir Makarov, Kenny Ye, Seungtai C Yoon, Jeremiah Degenhardt, Mark Kaganovich, Laura Clarke, Richard E Smith, Xiangqun Zheng-Bradley, Jan O Korbel, Sean Humphray, R Keira Cheetham, Michael Eberle, Scott Kahn, Lisa Murray, Kai Ye, Francisco M De La Vega, Yutao Fu, Heather E Peckham, Yongming A Sun, Mark A Batzer, Miriam K Konkel, Jerilyn A Walker, Chunlin Xiao, Zamin Iqbal, Brian Desany, Tom Blackwell, Matthew Snyder, Jinchuan Xing, Evan E Eichler, Gozde Aksay, Can Alkan, Iman Hajirasouliha, Fereydoun Hormozdiari, Jeffrey M Kidd, Ken Chen, Asif Chinwalla, Li Ding, Mike D McLellan, John W Wallis, Matt E Hurles, Donald F Conrad, Klaudia Walter, Yujun Zhang, Mark B Gerstein, Michael Snyder, Alexej Abyzov, Jiang Du, Fabian Grubert, Rajini Haraksingh, Justin Jee, Ekta Khurana, Hugo Y K Lam, Jing Leng, Xinmeng Jasmine Mu, Alexander E Urban, Zhengdong Zhang, Richard A Gibbs, Matthew Bainbridge, Danny Challis, Cristian Coafra, Huyen Dinh, Christie Kovar, Sandy Lee, Donna Muzny, Lynne Nazareth, Jeff Reid, Aniko Sabo, Fuli Yu, Jin Yu, Gabor T Marth, Erik P Garrison, Amit Indap, Wen Fung Leong, Aaron R Quinlan, Chip Stewart, Alistair N Ward, Jiantao Wu, Kristian Cibulskis, Tim J Fennell, Stacey B Gabriel, Kiran V Garimella, Chris Hartl, Erica Shefler, Carrie L Sougnez, Jane Wilkinson, Andrew G Clark, Simon Gravel, Fabian Grubert, Laura Clarke, Paul Flicek, Richard E Smith, Xiangqun Zheng-Bradley, Stephen T Sherry, Hoda M Khouri, Justin E Paschall, Martin F Shumway, Chunlin Xiao, Gil A McVean, Sol J Katzman, Gonçalo R Abecasis, Elaine R Mardis, David Dooling, Lucinda Fulton, Robert Fulton, Daniel C Koboldt, Richard M Durbin, Senduran Balasubramaniam, Allison Coffey, Thomas M Keane, Daniel G MacArthur, Aarno Palotie, Carol Scott, James Stalker, Chris Tyler-Smith, Mark B Gerstein, Suganthi Balasubramanian, Aravinda Chakravarti, Bartha M Knoppers, Gonçalo R Abecasis, Carlos D Bustamante, Neda Gharani, Richard A Gibbs, Lynn Jorde, Jane S Kaye, Alastair Kent, Taosha Li, Amy L McGuire, Gil A McVean, Pilar N Ossorio, Charles N Rotimi, Yeyang Su, Lorraine H Toji, Chris Tyler-Smith, Lisa D Brooks, Adam L Felsenfeld, Jean E McEwen, Assya Abdallah, Christopher R Juenger, Nicholas C Clemm, Francis S Collins, Audrey Duncanson, Eric D Green, Mark S Guyer, Jane L Peterson, Alan J Schafer, Yali Xue, Reed A Cartwright, Collapse	Research Support, N.I.H., Extramural	15	5922
15	Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017;13:e1005595. [PMID: 28594827 PMCID: PMC5481147 DOI: 10.1371/journal.pcbi.1005595] [Citation(s) in RCA: 5246] [Impact Index Per Article: 655.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Revised: 06/22/2017] [Accepted: 05/22/2017] [Indexed: 12/11/2022] Open Abstract The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate "hybrid" assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler. Collapse Key Words Collapse MESH Headings Algorithms Chromosome Mapping/methods Genome, Bacterial/genetics High-Throughput Nucleotide Sequencing/methods Sequence Alignment/methods Sequence Analysis, DNA/methods Software User-Computer Interface Collapse Grants National Health and Medical Research Council Collapse Collaborators Collapse	Journal Article	8	5246
16	Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 1998;8:175-85. [PMID: 9521921 DOI: 10.1101/gr.8.3.175] [Citation(s) in RCA: 4535] [Impact Index Per Article: 168.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Abstract The availability of massive amounts of DNA sequence information has begun to revolutionize the practice of biology. As a result, current large-scale sequencing output, while impressive, is not adequate to keep pace with growing demand and, in particular, is far short of what will be required to obtain the 3-billion-base human genome sequence by the target date of 2005. To reach this goal, improved automation will be essential, and it is particularly important that human involvement in sequence data processing be significantly reduced or eliminated. Progress in this respect will require both improved accuracy of the data processing software and reliable accuracy measures to reduce the need for human involvement in error correction and make human review more efficient. Here, we describe one step toward that goal: a base-calling program for automated sequencer traces, phred, with improved accuracy. phred appears to be the first base-calling program to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry. Collapse Key Words Collapse MESH Headings Algorithms Base Sequence Human Genome Project Humans Reproducibility of Results Sensitivity and Specificity Sequence Alignment Sequence Analysis, DNA/instrumentation Sequence Analysis, DNA/methods Sequence Analysis, DNA/standards Software/standards Collapse Grants Collapse Collaborators Collapse		27	4535
17	Jombart T. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 2008;24:1403-5. [PMID: 18397895 DOI: 10.1093/bioinformatics/btn129] [Citation(s) in RCA: 4374] [Impact Index Per Article: 257.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract Collapse Key Words Collapse MESH Headings Algorithms Chromosome Mapping/methods Genetic Markers/genetics Multivariate Analysis Programming Languages Sequence Alignment/methods Sequence Analysis, DNA/methods Software Collapse Grants Collapse Collaborators Collapse		17	4374
18	Rozas J, Sánchez-DelBarrio JC, Messeguer X, Rozas R. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 2004;19:2496-7. [PMID: 14668244 DOI: 10.1093/bioinformatics/btg359] [Citation(s) in RCA: 4110] [Impact Index Per Article: 195.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract SUMMARY DnaSP is a software package for the analysis of DNA polymorphism data. Present version introduces several new modules and features which, among other options allow: (1) handling big data sets (approximately 5 Mb per sequence); (2) conducting a large number of coalescent-based tests by Monte Carlo computer simulations; (3) extensive analyses of the genetic differentiation and gene flow among populations; (4) analysing the evolutionary pattern of preferred and unpreferred codons; (5) generating graphical outputs for an easy visualization of results. AVAILABILITY The software package, including complete documentation and examples, is freely available to academic users from: http://www.ub.es/dnasp Collapse Key Words Collapse MESH Headings Algorithms Gene Expression Profiling/methods Polymorphism, Genetic/genetics Sequence Alignment/methods Sequence Analysis, DNA/methods Software User-Computer Interface Collapse Grants Collapse Collaborators Collapse	Research Support, Non-U.S. Gov't	21	4110
19	Gautier L, Cope L, Bolstad BM, Irizarry RA. affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 2004;20:307-15. [PMID: 14960456 DOI: 10.1093/bioinformatics/btg405] [Citation(s) in RCA: 4029] [Impact Index Per Article: 191.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open Abstract MOTIVATION The processing of the Affymetrix GeneChip data has been a recent focus for data analysts. Alternatives to the original procedure have been proposed and some of these new methods are widely used. RESULTS The affy package is an R package of functions and classes for the analysis of oligonucleotide arrays manufactured by Affymetrix. The package is currently in its second release, affy provides the user with extreme flexibility when carrying out an analysis and make it possible to access and manipulate probe intensity data. In this paper, we present the main classes and functions in the package and demonstrate how they can be used to process probe-level data. We also demonstrate the importance of probe-level analysis when using the Affymetrix GeneChip platform. Collapse Key Words Collapse MESH Headings Algorithms DNA Probes/chemistry Databases, Bibliographic Databases, Genetic Gene Expression Profiling/instrumentation Gene Expression Profiling/methods Information Storage and Retrieval/methods Oligonucleotide Array Sequence Analysis/instrumentation Oligonucleotide Array Sequence Analysis/methods Sequence Alignment/methods Sequence Analysis, DNA/methods Software User-Computer Interface Collapse Grants P01 HL 66583 NHLBI NIH HHS Collapse Collaborators Collapse	Research Support, U.S. Gov't, P.H.S.	21	4029
20	Ley TJ, Miller C, Ding L, Raphael BJ, Mungall AJ, Robertson AG, Hoadley K, Triche TJ, Laird PW, Baty JD, Fulton LL, Fulton R, Heath SE, Kalicki-Veizer J, Kandoth C, Klco JM, Koboldt DC, Kanchi KL, Kulkarni S, Lamprecht TL, Larson DE, Lin L, Lu C, McLellan MD, McMichael JF, Payton J, Schmidt H, Spencer DH, Tomasson MH, Wallis JW, Wartman LD, Watson MA, Welch J, Wendl MC, Ally A, Balasundaram M, Birol I, Butterfield Y, Chiu R, Chu A, Chuah E, Chun HJ, Corbett R, Dhalla N, Guin R, He A, Hirst C, Hirst M, Holt RA, Jones S, Karsan A, Lee D, Li HI, Marra MA, Mayo M, Moore RA, Mungall K, Parker J, Pleasance E, Plettner P, Schein J, Stoll D, Swanson L, Tam A, Thiessen N, Varhol R, Wye N, Zhao Y, Gabriel S, Getz G, Sougnez C, Zou L, Leiserson MDM, Vandin F, Wu HT, Applebaum F, Baylin SB, Akbani R, Broom BM, Chen K, Motter TC, Nguyen K, Weinstein JN, Zhang N, Ferguson ML, Adams C, Black A, Bowen J, Gastier-Foster J, Grossman T, Lichtenberg T, Wise L, Davidsen T, Demchok JA, Shaw KRM, Sheth M, Sofia HJ, Yang L, Downing JR, Eley G. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med 2013;368:2059-74. [PMID: 23634996 PMCID: PMC3767041 DOI: 10.1056/nejmoa1301689] [Citation(s) in RCA: 3888] [Impact Index Per Article: 324.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Abstract BACKGROUND Many mutations that contribute to the pathogenesis of acute myeloid leukemia (AML) are undefined. The relationships between patterns of mutations and epigenetic phenotypes are not yet clear. METHODS We analyzed the genomes of 200 clinically annotated adult cases of de novo AML, using either whole-genome sequencing (50 cases) or whole-exome sequencing (150 cases), along with RNA and microRNA sequencing and DNA-methylation analysis. RESULTS AML genomes have fewer mutations than most other adult cancers, with an average of only 13 mutations found in genes. Of these, an average of 5 are in genes that are recurrently mutated in AML. A total of 23 genes were significantly mutated, and another 237 were mutated in two or more samples. Nearly all samples had at least 1 nonsynonymous mutation in one of nine categories of genes that are almost certainly relevant for pathogenesis, including transcription-factor fusions (18% of cases), the gene encoding nucleophosmin (NPM1) (27%), tumor-suppressor genes (16%), DNA-methylation-related genes (44%), signaling genes (59%), chromatin-modifying genes (30%), myeloid transcription-factor genes (22%), cohesin-complex genes (13%), and spliceosome-complex genes (14%). Patterns of cooperation and mutual exclusivity suggested strong biologic relationships among several of the genes and categories. CONCLUSIONS We identified at least one potential driver mutation in nearly all AML samples and found that a complex interplay of genetic events contributes to AML pathogenesis in individual patients. The databases from this study are widely available to serve as a foundation for further investigations of AML pathogenesis, classification, and risk stratification. (Funded by the National Institutes of Health.). Collapse Key Words Collapse MESH Headings Adult CpG Islands DNA Methylation Epigenomics Female Gene Expression Gene Fusion Genome, Human Humans Leukemia, Myeloid, Acute/classification Leukemia, Myeloid, Acute/genetics Male MicroRNAs/genetics Middle Aged Mutation Nucleophosmin Sequence Analysis, DNA/methods Collapse Grants U24CA143840 NCI NIH HHS P01CA101937 NCI NIH HHS P30 CA016672 NCI NIH HHS U24 CA143882 NCI NIH HHS U24 CA143866 NCI NIH HHS U24CA144025 NCI NIH HHS U54 HG003273 NHGRI NIH HHS R01 HG005690 NHGRI NIH HHS U24 CA143843 NCI NIH HHS U24CA143867 NCI NIH HHS U54 HG003079 NHGRI NIH HHS U24CA143799 NCI NIH HHS U24 CA143883 NCI NIH HHS R01 CA162086 NCI NIH HHS U54 HG003067 NHGRI NIH HHS U54HG003273 NHGRI NIH HHS U24 CA143835 NCI NIH HHS U24CA143858 NCI NIH HHS U24CA143882 NCI NIH HHS U54HG003067 NHGRI NIH HHS U24 CA143845 NCI NIH HHS U24 CA143799 NCI NIH HHS U24 CA144025 NCI NIH HHS U24CA143883 NCI NIH HHS U54HG003079 NHGRI NIH HHS U24 CA143840 NCI NIH HHS U24CA143835 NCI NIH HHS U24 CA143858 NCI NIH HHS P30 CA091842 NCI NIH HHS U24CA143843 NCI NIH HHS U24 CA143848 NCI NIH HHS P01 CA101937 NCI NIH HHS U24CA143848 NCI NIH HHS U24CA143866 NCI NIH HHS U24CA143845 NCI NIH HHS R01 CA083962 NCI NIH HHS U24 CA143867 NCI NIH HHS Collapse Collaborators Collapse	Research Support, N.I.H., Extramural	12	3888
21	Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J. TM4: a free, open-source system for microarray data management and analysis. Biotechniques 2003;34:374-8. [PMID: 12613259 DOI: 10.2144/03342mt01] [Citation(s) in RCA: 3728] [Impact Index Per Article: 169.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open Abstract Collapse Key Words Collapse MESH Headings Database Management Systems Hypermedia Information Storage and Retrieval/methods Oligonucleotide Array Sequence Analysis/methods Sequence Alignment Sequence Analysis, DNA/methods Software Software Design User-Computer Interface Collapse Grants Collapse Collaborators Collapse		22	3728
22	Lanfear R, Calcott B, Ho SYW, Guindon S. PartitionFinder: Combined Selection of Partitioning Schemes and Substitution Models for Phylogenetic Analyses. Mol Biol Evol 2012;29:1695-701. [PMID: 22319168 DOI: 10.1093/molbev/mss020] [Citation(s) in RCA: 3664] [Impact Index Per Article: 281.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract Collapse Key Words Collapse MESH Headings Algorithms Bayes Theorem Cluster Analysis Evolution, Molecular Likelihood Functions Models, Genetic Phylogeny Selection, Genetic Sequence Analysis, DNA/methods Software Collapse Grants Collapse Collaborators Collapse		13	3664
23	Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 1995;269:496-512. [PMID: 7542800 DOI: 10.1126/science.7542800] [Citation(s) in RCA: 3609] [Impact Index Per Article: 120.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Abstract An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism. Collapse Key Words Collapse MESH Headings Bacterial Proteins/genetics Base Composition Base Sequence Chromosome Mapping/methods Chromosomes, Bacterial Cloning, Molecular Costs and Cost Analysis DNA, Bacterial/genetics Databases, Factual Genes, Bacterial Genome, Bacterial Haemophilus influenzae/genetics Haemophilus influenzae/physiology Molecular Sequence Data Operon RNA, Bacterial/genetics RNA, Ribosomal/genetics Repetitive Sequences, Nucleic Acid Sequence Analysis, DNA/methods Software Collapse Grants Collapse Collaborators Collapse		30	3609
24	Huang X, Madan A. CAP3: A DNA sequence assembly program. Genome Res 1999;9:868-77. [PMID: 10508846 PMCID: PMC310812 DOI: 10.1101/gr.9.9.868] [Citation(s) in RCA: 3557] [Impact Index Per Article: 136.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Abstract We describe the third generation of the CAP sequence assembly program. The CAP3 program includes a number of improvements and new features. The program has a capability to clip 5' and 3' low-quality regions of reads. It uses base quality values in computation of overlaps between reads, construction of multiple sequence alignments of reads, and generation of consensus sequences. The program also uses forward-reverse constraints to correct assembly errors and link contigs. Results of CAP3 on four BAC data sets are presented. The performance of CAP3 was compared with that of PHRAP on a number of BAC data sets. PHRAP often produces longer contigs than CAP3 whereas CAP3 often produces fewer errors in consensus sequences than PHRAP. It is easier to construct scaffolds with CAP3 than with PHRAP on low-pass data with forward-reverse constraints. Collapse Key Words Collapse MESH Headings Algorithms Chromosomes, Bacterial/genetics Consensus Sequence Contig Mapping/methods Databases, Factual Models, Genetic Models, Theoretical Reproducibility of Results Sequence Alignment/methods Sequence Analysis, DNA/methods Software Collapse Grants R01 HG01502-03 NHGRI NIH HHS Collapse Collaborators Collapse	research-article	26	3557
25	Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DMA, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 2002;419:498-511. [PMID: 12368864 PMCID: PMC3836256 DOI: 10.1038/nature01097] [Citation(s) in RCA: 3148] [Impact Index Per Article: 136.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2002] [Accepted: 09/02/2002] [Indexed: 11/08/2022] Abstract The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria. Collapse Key Words Collapse MESH Headings Animals Chromosome Structures DNA Repair DNA Replication DNA, Protozoan/biosynthesis DNA, Protozoan/genetics Evolution, Molecular Genome, Protozoan Humans Malaria Vaccines Malaria, Falciparum/immunology Malaria, Falciparum/parasitology Malaria, Falciparum/prevention & control Membrane Transport Proteins/genetics Membrane Transport Proteins/metabolism Molecular Sequence Data Plasmodium falciparum/genetics Plasmodium falciparum/immunology Plasmodium falciparum/metabolism Plastids/genetics Proteome Protozoan Proteins/genetics Protozoan Proteins/metabolism Protozoan Proteins/physiology Recombination, Genetic Sequence Analysis, DNA/methods Collapse Grants Wellcome Trust 061524 Wellcome Trust R01 AI028398 NIAID NIH HHS Collapse Collaborators Collapse	research-article	23	3148

Please SIGN IN to browse more articles.