101
|
An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One 2013; 8:e85024. [PMID: 24376861 PMCID: PMC3871669 DOI: 10.1371/journal.pone.0085024] [Citation(s) in RCA: 248] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 11/21/2013] [Indexed: 12/13/2022] Open
Abstract
Next Generation Sequencing is having an extremely strong impact in biological and medical research and diagnostics, with applications ranging from gene expression quantification to genotyping and genome reconstruction. Sequencing data is often provided as raw reads which are processed prior to analysis 1 of the most used preprocessing procedures is read trimming, which aims at removing low quality portions while preserving the longest high quality part of a NGS read. In the current work, we evaluate nine different trimming algorithms in four datasets and three common NGS-based applications (RNA-Seq, SNP calling and genome assembly). Trimming is shown to increase the quality and reliability of the analysis, with concurrent gains in terms of execution time and computational resources needed.
Collapse
|
102
|
Measurement of top quark polarization in top-antitop events from proton-proton collisions at √s=7 TeV using the ATLAS detector. PHYSICAL REVIEW LETTERS 2013; 111:232002. [PMID: 24476258 DOI: 10.1103/physrevlett.111.232002] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Indexed: 06/03/2023]
Abstract
This Letter presents measurements of the polarization of the top quark in top-antitop quark pair events, using 4.7 fb(-1) of proton-proton collision data recorded with the ATLAS detector at the Large Hadron Collider at √s=7 TeV. Final states containing one or two isolated leptons (electrons or muons) and jets are considered. Two measurements of α(ℓ)P, the product of the leptonic spin-analyzing power and the top quark polarization, are performed assuming that the polarization is introduced by either a CP conserving or a maximally CP violating production process. The measurements obtained, α(ℓ)P(CPC)=-0.035±0.014(stat)±0.037(syst) and α(ℓ)P(CPV)=0.020±0.016(stat)(-0.017)(+0.013)(syst), are in good agreement with the standard model prediction of negligible top quark polarization.
Collapse
|
103
|
Transcriptome sequencing and microarray design for functional genomics in the extremophile Arabidopsis relative Thellungiella salsuginea (Eutrema salsugineum). BMC Genomics 2013; 14:793. [PMID: 24228715 PMCID: PMC3832907 DOI: 10.1186/1471-2164-14-793] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2012] [Accepted: 11/11/2013] [Indexed: 11/29/2022] Open
Abstract
Background Most molecular studies of plant stress tolerance have been performed with Arabidopsis thaliana, although it is not particularly stress tolerant and may lack protective mechanisms required to survive extreme environmental conditions. Thellungiella salsuginea has attracted interest as an alternative plant model species with high tolerance of various abiotic stresses. While the T. salsuginea genome has recently been sequenced, its annotation is still incomplete and transcriptomic information is scarce. In addition, functional genomics investigations in this species are severely hampered by a lack of affordable tools for genome-wide gene expression studies. Results Here, we report the results of Thellungiella de novo transcriptome assembly and annotation based on 454 pyrosequencing and development and validation of a T. salsuginea microarray. ESTs were generated from a non-normalized and a normalized library synthesized from RNA pooled from samples covering different tissues and abiotic stress conditions. Both libraries yielded partially unique sequences, indicating their necessity to obtain comprehensive transcriptome coverage. More than 1 million sequence reads were assembled into 42,810 unigenes, approximately 50% of which could be functionally annotated. These unigenes were compared to all available Thellungiella genome sequence information. In addition, the groups of Late Embryogenesis Abundant (LEA) proteins, Mitogen Activated Protein (MAP) kinases and protein phosphatases were annotated in detail. We also predicted the target genes for 384 putative miRNAs. From the sequence information, we constructed a 44 k Agilent oligonucleotide microarray. Comparison of same-species and cross-species hybridization results showed superior performance of the newly designed array for T. salsuginea samples. The developed microarrays were used to investigate transcriptional responses of T. salsuginea and Arabidopsis during cold acclimation using the MapMan software. Conclusions This study provides the first comprehensive transcriptome information for the extremophile Arabidopsis relative T. salsuginea. The data constitute a more than three-fold increase in the number of publicly available unigene sequences and will greatly facilitate genome annotation. In addition, we have designed and validated the first genome-wide microarray for T. salsuginea, which will be commercially available. Together with the publicly available MapMan software this will become an important tool for functional genomics of plant stress tolerance.
Collapse
|
104
|
Measurement of the azimuthal angle dependence of inclusive jet yields in Pb+Pb collisions at √(sNN)=2.76 TeV with the ATLAS detector. PHYSICAL REVIEW LETTERS 2013; 111:152301. [PMID: 24160592 DOI: 10.1103/physrevlett.111.152301] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2013] [Indexed: 06/02/2023]
Abstract
Measurements of the variation of inclusive jet suppression as a function of relative azimuthal angle, Δφ, with respect to the elliptic event plane provide insight into the path-length dependence of jet quenching. ATLAS has measured the Δφ dependence of jet yields in 0.14 nb(-1) of √(s(NN))=2.76 TeV Pb+Pb collisions at the LHC for jet transverse momenta p(T)>45 GeV in different collision centrality bins using an underlying event subtraction procedure that accounts for elliptic flow. The variation of the jet yield with Δφ was characterized by the parameter, v(2)(jet), and the ratio of out-of-plane (Δφ~π/2) to in-plane (Δφ~0) yields. Nonzero v(2)(jet) values were measured in all centrality bins for p(T)<160 GeV. The jet yields are observed to vary by as much as 20% between in-plane and out-of-plane directions.
Collapse
|
105
|
Observation of associated near-side and away-side long-range correlations in sqrt[s(NN)]=5.02 TeV proton-lead collisions with the ATLAS detector. PHYSICAL REVIEW LETTERS 2013; 110:182302. [PMID: 23683193 DOI: 10.1103/physrevlett.110.182302] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Indexed: 06/02/2023]
Abstract
Two-particle correlations in relative azimuthal angle (Δø) and pseudorapidity (Δη) are measured in sqrt[s(NN)] = 5.02 TeV p+Pb collisions using the ATLAS detector at the LHC. The measurements are performed using approximately 1 μb(-1) of data as a function of transverse momentum (p(T)) and the transverse energy (ΣE(T)(Pb)) summed over 3.1 < η < 4.9 in the direction of the Pb beam. The correlation function, constructed from charged particles, exhibits a long-range (2 < |Δ η | < 5) "near-side" (Δø ~ 0) correlation that grows rapidly with increasing ΣE(T)(Pb). A long-range "away-side" (Δø ~ π) correlation, obtained by subtracting the expected contributions from recoiling dijets and other sources estimated using events with small ΣE(T)(Pb), is found to match the near-side correlation in magnitude, shape (in Δη and Δø) and ΣE(T)(Pb) dependence. The resultant Δø correlation is approximately symmetric about π/2, and is consistent with a dominant cos2Δø modulation for all ΣE(T)(Pb) ranges and particle p(T).
Collapse
|
106
|
Comparative study of RNA-seq- and microarray-derived coexpression networks in Arabidopsis thaliana. ACTA ACUST UNITED AC 2013; 29:717-24. [PMID: 23376351 DOI: 10.1093/bioinformatics/btt053] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
MOTIVATION Coexpression networks are data-derived representations of genes behaving in a similar way across tissues and experimental conditions. They have been used for hypothesis generation and guilt-by-association approaches for inferring functions of previously unknown genes. So far, the main platform for expression data has been DNA microarrays; however, the recent development of RNA-seq allows for higher accuracy and coverage of transcript populations. It is therefore important to assess the potential for biological investigation of coexpression networks derived from this novel technique in a condition-independent dataset. RESULTS We collected 65 publicly available Illumina RNA-seq high quality Arabidopsis thaliana samples and generated Pearson correlation coexpression networks. These networks were then compared with those derived from analogous microarray data. We show how Variance-Stabilizing Transformed (VST) RNA-seq data samples are the most similar to microarray ones, with respect to inter-sample variation, correlation coefficient distribution and network topological architecture. Microarray networks show a slightly higher score in biology-derived quality assessments such as overlap with the known protein-protein interaction network and edge ontological agreement. Different coexpression network centralities are investigated; in particular, we show how betweenness centrality is generally a positive marker for essential genes in A.thaliana, regardless of the platform originating the data. In the end, we focus on a specific gene network case, showing that although microarray data seem more suited for gene network reverse engineering, RNA-seq offers the great advantage of extending coexpression analyses to the entire transcriptome.
Collapse
|
107
|
Measurement of Z boson production in Pb-Pb collisions at sqrt[s(NN)]=2.76 TeV with the ATLAS detector. PHYSICAL REVIEW LETTERS 2013; 110:022301. [PMID: 23383894 DOI: 10.1103/physrevlett.110.022301] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Indexed: 06/01/2023]
Abstract
The ATLAS experiment has observed 1995 Z boson candidates in data corresponding to 0.15 nb(-1) of integrated luminosity obtained in the 2011 LHC Pb+Pb run at sqrt[s(NN)]=2.76 TeV. The Z bosons are reconstructed via dielectron and dimuon decay channels, with a background contamination of less than 3%. Results from the two channels are consistent and are combined. Within the statistical and systematic uncertainties, the per-event Z boson yield is proportional to the number of binary collisions estimated by the Glauber model. The elliptic anisotropy of the azimuthal distribution of the Z boson with respect to the event plane is found to be consistent with zero.
Collapse
|
108
|
LASSO modeling of the Arabidopsis thaliana seed/seedling transcriptome: a model case for detection of novel mucilage and pectin metabolism genes. MOLECULAR BIOSYSTEMS 2013; 8:2566-74. [PMID: 22735692 DOI: 10.1039/c2mb25096a] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Whole genome transcript correlation-based approaches have been shown to be enormously useful for candidate gene detection. Consequently, simple Pearson correlation has been widely applied in several web based tools. That said, several more sophisticated methods based on e.g. mutual information or Bayesian network inference have been developed and have been shown to be theoretically superior but are not yet commonly applied. Here, we propose the application of a recently developed statistical regression technique, the LASSO, to detect novel candidates from high throughput transcriptomic datasets. We apply the LASSO to a tissue specific dataset in the model plant Arabidopsis thaliana to identify novel players in Arabidopsis thaliana seed coat mucilage synthesis. We built LASSO models based on a list of genes known to be involved in a sub-pathway of Arabidopsis mucilage synthesis. After identifying a putative transcription factor, we verified its involvement in mucilage synthesis by obtaining knock-out mutants for this gene. We show that a loss of function of this putative transcription factor leads to a significant decrease in mucilage pectin.
Collapse
|
109
|
Search for dark matter candidates and large extra dimensions in events with a photon and missing transverse momentum in pp collision data at sqrt[s]=7 TeV with the ATLAS detector. PHYSICAL REVIEW LETTERS 2013; 110:011802. [PMID: 23383779 DOI: 10.1103/physrevlett.110.011802] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2012] [Indexed: 06/01/2023]
Abstract
Results of a search for new phenomena in events with an energetic photon and large missing transverse momentum in proton-proton collisions at sqrt[s] = 7 TeV are reported. Data collected by the ATLAS experiment at the LHC corresponding to an integrated luminosity of 4.6 fb(-1) are used. Good agreement is observed between the data and the standard model predictions. The results are translated into exclusion limits on models with large extra spatial dimensions and on pair production of weakly interacting dark matter candidates.
Collapse
|
110
|
Search for magnetic monopoles in sqrt[s]=7 TeV pp collisions with the ATLAS detector. PHYSICAL REVIEW LETTERS 2012; 109:261803. [PMID: 23368550 DOI: 10.1103/physrevlett.109.261803] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2012] [Indexed: 06/01/2023]
Abstract
This Letter presents a search for magnetic monopoles with the ATLAS detector at the CERN Large Hadron Collider using an integrated luminosity of 2.0 fb(-1) of pp collisions recorded at a center-of-mass energy of sqrt[s]=7 TeV. No event is found in the signal region, leading to an upper limit on the production cross section at 95% confidence level of 1.6/ϵ fb for Dirac magnetic monopoles with the minimum unit magnetic charge and with mass between 200 GeV and 1500 GeV, where ϵ is the monopole reconstruction efficiency. The efficiency ϵ is high and uniform in the fiducial region given by pseudorapidity |η|<1.37 and transverse kinetic energy 600-700<E(kin)sinθ<1400 GeV. The minimum value of 700 GeV is for monopoles of mass 200 GeV, whereas the minimum value of 600 GeV is applicable for higher mass monopoles. Therefore, the upper limit on the production cross section at 95% confidence level is 2 fb in this fiducial region. Assuming the kinematic distributions from Drell-Yan pair production of spin-1/2 Dirac magnetic monopoles, the efficiency is in the range 1%-10%, leading to an upper limit on the cross section at 95% confidence level that varies from 145 fb to 16 fb for monopoles with mass between 200 GeV and 1200 GeV. This limit is weaker than the fiducial limit because most of these monopoles lie outside the fiducial region.
Collapse
|
111
|
Search for direct top squark pair production in final states with one isolated lepton, jets, and missing transverse momentum in sqrt[s] = 7 TeV pp collisions using 4.7 fb(-10 of ATLAS data. PHYSICAL REVIEW LETTERS 2012; 109:211803. [PMID: 23215588 DOI: 10.1103/physrevlett.109.211803] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2012] [Indexed: 06/01/2023]
Abstract
A search is presented for direct top squark pair production in final states with one isolated electron or muon, jets, and missing transverse momentum in proton-proton collisions at sqrt[s] = 7 TeV. The measurement is based on 4.7 fb(-1) of data collected with the ATLAS detector at the LHC. Each top squark is assumed to decay to a top quark and the lightest supersymmetric particle (LSP). The data are found to be consistent with standard model expectations. Top squark masses between 230 GeV and 440 GeV are excluded with 95% confidence for massless LSPs, and top squark masses around 400 GeV are excluded for LSP masses up to 125 GeV.
Collapse
|
112
|
Search for a supersymmetric partner to the top quark in final states with jets and missing transverse momentum at sqrt[s] = 7 TeV with the ATLAS detector. PHYSICAL REVIEW LETTERS 2012; 109:211802. [PMID: 23215587 DOI: 10.1103/physrevlett.109.211802] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Indexed: 06/01/2023]
Abstract
A search for direct pair production of supersymmetric top squarks (t(1)) is presented, assuming the t(1) decays into a top quark and the lightest supersymmetric particle, χ(1)(0), and that both top quarks decay to purely hadronic final states. A total of 16 (4) events are observed compared to a predicted standard model background of 13.5(-3.6)(+3.7)(4.4(-1.3)(+1.7)) events in two signal regions based on ∫Ldt = 4.7 fb(-1) of pp collision data taken at sqrt[s] = 7 TeV with the ATLAS detector at the LHC. An exclusion region in the t(1) versus χ(1)(0) mass plane is evaluated: 370<m(t)(1)}<465 GeV is excluded for m(χ)(1)(0) ~ 0 GeV while m(t)(1) = 445 GeV is excluded for m(χ)(1)(0) ≤ 50 GeV.
Collapse
|
113
|
Misexpression of a chloroplast aspartyl protease leads to severe growth defects and alters carbohydrate metabolism in Arabidopsis. PLANT PHYSIOLOGY 2012; 160:1237-50. [PMID: 22987884 PMCID: PMC3490589 DOI: 10.1104/pp.112.204016] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The crucial role of carbohydrate in plant growth and morphogenesis is widely recognized. In this study, we describe the characterization of nana, a dwarf Arabidopsis (Arabidopsis thaliana) mutant impaired in carbohydrate metabolism. We show that the nana dwarf phenotype was accompanied by altered leaf morphology and a delayed flowering time. Our genetic and molecular data indicate that the mutation in nana is due to a transfer DNA insertion in the promoter region of a gene encoding a chloroplast-located aspartyl protease that alters its pattern of expression. Overexpression of the gene (oxNANA) phenocopies the mutation. Both nana and oxNANA display alterations in carbohydrate content, and the extent of these changes varies depending on growth light intensity. In particular, in low light, soluble sugar levels are lower and do not show the daily fluctuations observed in wild-type plants. Moreover, nana and oxNANA are defective in the expression of some genes implicated in sugar metabolism and photosynthetic light harvesting. Interestingly, some chloroplast-encoded genes as well as genes whose products seem to be involved in retrograde signaling appear to be down-regulated. These findings suggest that the NANA aspartic protease has an important regulatory function in chloroplasts that not only influences photosynthetic carbon metabolism but also plastid and nuclear gene expression.
Collapse
|
114
|
Search for supersymmetry in events with three leptons and missing transverse momentum in √[s]=7 TeV pp collisions with the ATLAS detector. PHYSICAL REVIEW LETTERS 2012; 108:261804. [PMID: 23004965 DOI: 10.1103/physrevlett.108.261804] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2012] [Indexed: 06/01/2023]
Abstract
A search for the weak production of charginos and neutralinos decaying to a final state with three leptons (electrons or muons) and missing transverse momentum is presented. The analysis uses 2.06 fb(-1) of √[s]=7 TeV proton-proton collision data delivered by the Large Hadron Collider and recorded with the ATLAS detector. Observations are consistent with standard model expectations in two signal regions that are either depleted or enriched in Z-boson decays. Upper limits at 95% confidence level are set in R-parity conserving phenomenological minimal supersymmetric and simplified models. For the simplified models, degenerate lightest chargino and next-to-lightest neutralino masses up to 300 GeV are excluded for mass differences from the lightest neutralino up to 300 GeV.
Collapse
|
115
|
Measurement of the ZZ production cross section and limits on anomalous neutral triple gauge couplings in proton-proton collisions at sqrt[s] = 7 TeV with the ATLAS detector. PHYSICAL REVIEW LETTERS 2012; 108:041804. [PMID: 22400826 DOI: 10.1103/physrevlett.108.041804] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2011] [Indexed: 05/31/2023]
Abstract
A measurement of the ZZ production cross section in proton-proton collisions at sqrt[s] = 7 TeV using data corresponding to an integrated luminosity of 1.02 fb(-1) recorded by the ATLAS experiment at the LHC is presented. Twelve events containing two Z boson candidates decaying to electrons and/or muons are observed, with an expected background of 0.3 ± 0.3(stat)(-0.3)(+0.4)(syst) events. The cross section measured in a phase-space region with good detector acceptance and for dilepton masses within the range 66 to 116 GeV is σ(ZZ → ℓ+ ℓ- ℓ+ ℓ-)(fid) = 19.4(-5.2)(+6.3)(stat)(-0.7)(+0.9)(syst) ± 0.7(lumi) fb. The resulting total cross section for on-shell ZZ production, σ(ZZ)(tot) = 8.5(-2.3)(+2.7)(stat)(-0.3)(+0.4)(syst) ± 0.3(lumi) pb, is consistent with the standard model expectation of 6.5(-0.2)(+0.3) pb calculated at the next-to-leading order in QCD. Limits on anomalous neutral triple gauge boson couplings are derived.
Collapse
|
116
|
Abstract
The majority of eukaryotic organisms rely on molecular oxygen for respiratory energy production. When the supply of oxygen is compromised, a variety of acclimation responses are activated to reduce the detrimental effects of energy depletion. Various oxygen-sensing mechanisms have been described that are thought to trigger these responses, but they each seem to be kingdom specific and no sensing mechanism has been identified in plants until now. Here we show that one branch of the ubiquitin-dependent N-end rule pathway for protein degradation, which is active in both mammals and plants, functions as an oxygen-sensing mechanism in Arabidopsis thaliana. We identified a conserved amino-terminal amino acid sequence of the ethylene response factor (ERF)-transcription factor RAP2.12 to be dedicated to an oxygen-dependent sequence of post-translational modifications, which ultimately lead to degradation of RAP2.12 under aerobic conditions. When the oxygen concentration is low-as during flooding-RAP2.12 is released from the plasma membrane and accumulates in the nucleus to activate gene expression for hypoxia acclimation. Our discovery of an oxygen-sensing mechanism opens up new possibilities for improving flooding tolerance in crops.
Collapse
|
117
|
PlaNet: combined sequence and expression comparisons across plant networks derived from seven species. THE PLANT CELL 2011; 23:895-910. [PMID: 21441431 PMCID: PMC3082271 DOI: 10.1105/tpc.111.083667] [Citation(s) in RCA: 144] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2011] [Revised: 01/26/2011] [Accepted: 03/07/2011] [Indexed: 05/17/2023]
Abstract
The model organism Arabidopsis thaliana is readily used in basic research due to resource availability and relative speed of data acquisition. A major goal is to transfer acquired knowledge from Arabidopsis to crop species. However, the identification of functional equivalents of well-characterized Arabidopsis genes in other plants is a nontrivial task. It is well documented that transcriptionally coordinated genes tend to be functionally related and that such relationships may be conserved across different species and even kingdoms. To exploit such relationships, we constructed whole-genome coexpression networks for Arabidopsis and six important plant crop species. The interactive networks, clustered using the HCCA algorithm, are provided under the banner PlaNet (http://aranet.mpimp-golm.mpg.de). We implemented a comparative network algorithm that estimates similarities between network structures. Thus, the platform can be used to swiftly infer similar coexpressed network vicinities within and across species and can predict the identity of functional homologs. We exemplify this using the PSA-D and chalcone synthase-related gene networks. Finally, we assessed how ontology terms are transcriptionally connected in the seven species and provide the corresponding MapMan term coexpression networks. The data support the contention that this platform will considerably improve transfer of knowledge generated in Arabidopsis to valuable crop species.
Collapse
|
118
|
SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data. FRONTIERS IN PLANT SCIENCE 2011; 2:43. [PMID: 22639594 PMCID: PMC3355584 DOI: 10.3389/fpls.2011.00043] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2011] [Accepted: 08/12/2011] [Indexed: 05/08/2023]
Abstract
Despite the growing volume of experimentally validated knowledge about the subcellular localization of plant proteins, a well performing in silico prediction tool is still a necessity. Existing tools, which employ information derived from protein sequence alone, offer limited accuracy and/or rely on full sequence availability. We explored whether gene expression profiling data can be harnessed to enhance prediction performance. To achieve this, we trained several support vector machines to predict the subcellular localization of Arabidopsis thaliana proteins using sequence derived information, expression behavior, or a combination of these data and compared their predictive performance through a cross-validation test. We show that gene expression carries information about the subcellular localization not available in sequence information, yielding dramatic benefits for plastid localization prediction, and some notable improvements for other compartments such as the mitochondrion, the Golgi, and the plasma membrane. Based on these results, we constructed a novel subcellular localization prediction engine, SLocX, combining gene expression profiling data with protein sequence-based information. We then validated the results of this engine using an independent test set of annotated proteins and a transient expression of GFP fusion proteins. Here, we present the prediction framework and a website of predicted localizations for Arabidopsis. The relatively good accuracy of our prediction engine, even in cases where only partial protein sequence is available (e.g., in sequences lacking the N-terminal region), offers a promising opportunity for similar application to non-sequenced or poorly annotated plant species. Although the prediction scope of our method is currently limited by the availability of expression information on the ATH1 array, we believe that the advances in measuring gene expression technology will make our method applicable for all Arabidopsis proteins.
Collapse
|
119
|
Genomic and transcriptomic analysis of the AP2/ERF superfamily in Vitis vinifera. BMC Genomics 2010; 11:719. [PMID: 21171999 PMCID: PMC3022922 DOI: 10.1186/1471-2164-11-719] [Citation(s) in RCA: 177] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Accepted: 12/20/2010] [Indexed: 01/01/2023] Open
Abstract
Background The AP2/ERF protein family contains transcription factors that play a crucial role in plant growth and development and in response to biotic and abiotic stress conditions in plants. Grapevine (Vitis vinifera) is the only woody crop whose genome has been fully sequenced. So far, no detailed expression profile of AP2/ERF-like genes is available for grapevine. Results An exhaustive search for AP2/ERF genes was carried out on the Vitis vinifera genome and their expression profile was analyzed by Real-Time quantitative PCR (qRT-PCR) in different vegetative and reproductive tissues and under two different ripening stages. One hundred and forty nine sequences, containing at least one ERF domain, were identified. Specific clusters within the AP2 and ERF families showed conserved expression patterns reminiscent of other species and grapevine specific trends related to berry ripening. Moreover, putative targets of group IX ERFs were identified by co-expression and protein similarity comparisons. Conclusions The grapevine genome contains an amount of AP2/ERF genes comparable to that of other dicot species analyzed so far. We observed an increase in the size of specific groups within the ERF family, probably due to recent duplication events. Expression analyses in different aerial tissues display common features previously described in other plant systems and introduce possible new roles for members of some ERF groups during fruit ripening. The presented analysis of AP2/ERF genes in grapevine provides the bases for studying the molecular regulation of berry development and the ripening process.
Collapse
|
120
|
Algorithm-driven artifacts in median polish summarization of microarray data. BMC Bioinformatics 2010; 11:553. [PMID: 21070630 PMCID: PMC2998528 DOI: 10.1186/1471-2105-11-553] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2010] [Accepted: 11/11/2010] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND High-throughput measurement of transcript intensities using Affymetrix type oligonucleotide microarrays has produced a massive quantity of data during the last decade. Different preprocessing techniques exist to convert the raw signal intensities measured by these chips into gene expression estimates. Although these techniques have been widely benchmarked in the context of differential gene expression analysis, there are only few examples where their performance has been assessed in respect to coexpression-based studies such as sample classification. RESULTS In the present paper we benchmark the three most used normalization procedures (MAS5, RMA and GCRMA) in the context of inter-array correlation analysis, confirming and extending the finding that RMA and GCRMA consistently overestimate sample similarity upon normalization. We determine that median polish summarization is responsible for generating a large proportion of these over-similarity artifacts. Furthermore, we show that most affected probesets show also internal signal disagreement, and tend to be composed by individual probes hitting different gene transcripts. We finally provide a correction to the RMA/GCRMA summarization procedure that massively reduces inter-array correlation artifacts, without affecting the detection of differentially expressed genes. CONCLUSIONS We propose tRMA as a modification of RMA to normalize microarray experiments for correlation-based analysis.
Collapse
|
121
|
Structural analysis of the RZZ complex reveals common ancestry with multisubunit vesicle tethering machinery. Structure 2010; 18:616-26. [PMID: 20462495 DOI: 10.1016/j.str.2010.02.014] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2009] [Revised: 01/22/2010] [Accepted: 02/19/2010] [Indexed: 01/31/2023]
Abstract
The RZZ complex recruits dynein to kinetochores. We investigated structure, topology, and interactions of the RZZ subunits (ROD, ZWILCH, and ZW10) in vitro, in vivo, and in silico. We identify neuroblastoma-amplified gene (NAG), a ZW10 binder, as a ROD homolog. ROD and NAG contain an N-terminal beta propeller followed by an alpha solenoid, which is the architecture of certain nucleoporins and vesicle coat subunits, suggesting a distant evolutionary relationship. ZW10 binding to ROD and NAG is mutually exclusive. The resulting ZW10 complexes (RZZ and NRZ) respectively contain ZWILCH and RINT1 as additional subunits. The X-ray structure of ZWILCH, the first for an RZZ subunit, reveals a novel fold distinct from RINT1's. The evolutionarily conserved NRZ likely acts as a tethering complex for retrograde trafficking of COPI vesicles from the Golgi to the endoplasmic reticulum. The RZZ, limited to metazoans, probably evolved from the NRZ, exploiting the dynein-binding capacity of ZW10 to direct dynein to kinetochores.
Collapse
|
122
|
Robin: an intuitive wizard application for R-based expression microarray quality assessment and analysis. PLANT PHYSIOLOGY 2010; 153:642-51. [PMID: 20388663 PMCID: PMC2879776 DOI: 10.1104/pp.109.152553] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2009] [Accepted: 04/06/2010] [Indexed: 05/17/2023]
Abstract
The wide application of high-throughput transcriptomics using microarrays has generated a plethora of technical platforms, data repositories, and sophisticated statistical analysis methods, leaving the individual scientist with the problem of choosing the appropriate approach to address a biological question. Several software applications that provide a rich environment for microarray analysis and data storage are available (e.g. GeneSpring, EMMA2), but these are mostly commercial or require an advanced informatics infrastructure. There is a need for a noncommercial, easy-to-use graphical application that aids the lab researcher to find the proper method to analyze microarray data, without this requiring expert understanding of the complex underlying statistics, or programming skills. We have developed Robin, a Java-based graphical wizard application that harnesses the advanced statistical analysis functions of the R/BioConductor project. Robin implements streamlined workflows that guide the user through all steps of two-color, single-color, or Affymetrix microarray analysis. It provides functions for thorough quality assessment of the data and automatically generates warnings to notify the user of potential outliers, low-quality chips, or low statistical power. The results are generated in a standard format that allows ready use with both specialized analysis tools like MapMan and PageMan and generic spreadsheet applications. To further improve user friendliness, Robin includes both integrated help and comprehensive external documentation. To demonstrate the statistical power and ease of use of the workflows in Robin, we present a case study in which we apply Robin to analyze a two-color microarray experiment comparing gene expression in tomato (Solanum lycopersicum) leaves, flowers, and roots.
Collapse
|
123
|
Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. PLANT, CELL & ENVIRONMENT 2009; 32:1633-51. [PMID: 19712066 DOI: 10.1111/j.1365-3040.2009.02040.x] [Citation(s) in RCA: 323] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Gene co-expression analysis has emerged in the past 5 years as a powerful tool for gene function prediction. In essence, co-expression analysis asks the question 'what are the genes that are co-expressed, that is, those that show similar expression profiles across many experiments, with my gene of interest?'. Genes that are highly co-expressed may be involved in the biological process or processes of the query gene. This review describes the tools that are available for performing such analyses, how each of these perform, and also discusses statistical issues including how normalization of gene expression data can influence co-expression results, calculation of co-expression scores and P values, and the influence of data sets used for co-expression analysis. Finally, examples from the literature will be presented, wherein co-expression has been used to corroborate and discover various aspects of plant biology.
Collapse
|
124
|
Abstract
Transcriptional coordination, or co-expression, of genes may signify functional relatedness of the corresponding proteins. For example, several genes involved in secondary cell wall cellulose biosynthesis are co-expressed with genes engaged in the synthesis of xylan, which is a major component of the secondary cell wall. To extend these types of analyses, we investigated the co-expression relationships of all Carbohydrate-Active enZYmes (CAZy)-related genes for Arabidopsis thaliana. Thus, the intention was to transcriptionally link different cell wall-related processes to each other, and also to other biological functions. To facilitate easy manual inspection, we have displayed these interactions as networks and matrices, and created a web-based interface (http://aranet.mpimp-golm.mpg.de/corecarb) containing downloadable files for all the transcriptional associations.
Collapse
|
125
|
Low duplicability and network fragility of cancer genes. Trends Genet 2008; 24:427-30. [PMID: 18675489 DOI: 10.1016/j.tig.2008.06.003] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2008] [Revised: 06/19/2008] [Accepted: 06/24/2008] [Indexed: 11/24/2022]
Abstract
We identified genomic and network properties of approximately 600 genes mutated in different cancer types. These genes tend not to duplicate but, unlike most human singletons, they encode central hubs of highly interconnected modules within the protein-protein interaction network (PIN). We find that cancer genes are fragile components of the human gene repertoire, sensitive to dosage modification. Furthermore, other nodes of the human PIN with similar properties are rare and probably enriched in candidate cancer genes.
Collapse
|