1
|
Taylor SC, Nadeau K, Abbasi M, Lachance C, Nguyen M, Fenrich J. The Ultimate qPCR Experiment: Producing Publication Quality, Reproducible Data the First Time. Trends Biotechnol 2019; 37:761-774. [PMID: 30654913 DOI: 10.1016/j.tibtech.2018.12.002] [Citation(s) in RCA: 482] [Impact Index Per Article: 80.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Revised: 11/30/2018] [Accepted: 12/07/2018] [Indexed: 12/20/2022]
Abstract
Quantitative PCR (qPCR) is one of the most common techniques for quantification of nucleic acid molecules in biological and environmental samples. Although the methodology is perceived to be relatively simple, there are a number of steps and reagents that require optimization and validation to ensure reproducible data that accurately reflect the biological question(s) being posed. This review article describes and illustrates the critical pitfalls and sources of error in qPCR experiments, along with a rigorous, stepwise process to minimize variability, time, and cost in generating reproducible, publication quality data every time. Finally, an approach to make an informed choice between qPCR and digital PCR technologies is described.
Collapse
|
Review |
6 |
482 |
2
|
Kohl SM, Klein MS, Hochrein J, Oefner PJ, Spang R, Gronwald W. State-of-the art data normalization methods improve NMR-based metabolomic analysis. Metabolomics 2012; 8:146-160. [PMID: 22593726 PMCID: PMC3337420 DOI: 10.1007/s11306-011-0350-z] [Citation(s) in RCA: 148] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Accepted: 08/01/2011] [Indexed: 12/20/2022]
Abstract
Extracting biomedical information from large metabolomic datasets by multivariate data analysis is of considerable complexity. Common challenges include among others screening for differentially produced metabolites, estimation of fold changes, and sample classification. Prior to these analysis steps, it is important to minimize contributions from unwanted biases and experimental variance. This is the goal of data preprocessing. In this work, different data normalization methods were compared systematically employing two different datasets generated by means of nuclear magnetic resonance (NMR) spectroscopy. To this end, two different types of normalization methods were used, one aiming to remove unwanted sample-to-sample variation while the other adjusts the variance of the different metabolites by variable scaling and variance stabilization methods. The impact of all methods tested on sample classification was evaluated on urinary NMR fingerprints obtained from healthy volunteers and patients suffering from autosomal polycystic kidney disease (ADPKD). Performance in terms of screening for differentially produced metabolites was investigated on a dataset following a Latin-square design, where varied amounts of 8 different metabolites were spiked into a human urine matrix while keeping the total spike-in amount constant. In addition, specific tests were conducted to systematically investigate the influence of the different preprocessing methods on the structure of the analyzed data. In conclusion, preprocessing methods originally developed for DNA microarray analysis, in particular, Quantile and Cubic-Spline Normalization, performed best in reducing bias, accurately detecting fold changes, and classifying samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-011-0350-z) contains supplementary material, which is available to authorized users.
Collapse
|
research-article |
13 |
148 |
3
|
Fromer M, Purcell SM. Using XHMM Software to Detect Copy Number Variation in Whole-Exome Sequencing Data. CURRENT PROTOCOLS IN HUMAN GENETICS 2014; 81:7.23.1-7.23.21. [PMID: 24763994 PMCID: PMC4065038 DOI: 10.1002/0471142905.hg0723s81] [Citation(s) in RCA: 111] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Copy number variation (CNV) has emerged as an important genetic component in human diseases, which are increasingly being studied for large numbers of samples by sequencing the coding regions of the genome, i.e., exome sequencing. Nonetheless, detecting this variation from such targeted sequencing data is a difficult task, involving sorting out signal from noise, for which we have recently developed a set of statistical and computational tools called XHMM. In this unit, we give detailed instructions on how to run XHMM and how to use the resulting CNV calls in biological analyses.
Collapse
|
research-article |
11 |
111 |
4
|
Vigelsø A, Dybboe R, Hansen CN, Dela F, Helge JW, Guadalupe Grau A. GAPDH and β-actin protein decreases with aging, making Stain-Free technology a superior loading control in Western blotting of human skeletal muscle. J Appl Physiol (1985) 2014; 118:386-94. [PMID: 25429098 DOI: 10.1152/japplphysiol.00840.2014] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Reference proteins (RP) or the total protein (TP) loaded is used to correct for uneven loading and/or transfer in Western blotting. However, the signal sensitivity and the influence of physiological conditions may question the normalization methods. Therefore, three widely used reference proteins [β-actin, glyceraldehyde 3-phosphate dehydrogenase (GAPDH), and α-tubulin], as well as TP loaded measured by Stain-Free technology (SF) as normalization tool were tested. This was done using skeletal muscle samples from men subjected to physiological conditions often investigated in applied physiology where the intervention has been suggested to impede normalization (ageing, muscle atrophy, and different muscle fiber type composition). The linearity of signal and the methodological variation coefficient was obtained. Furthermore, the inter- and intraindividual variation in signals obtained from SF and RP was measured in relation to ageing, muscle atrophy, and different muscle fiber type composition, respectively. A stronger linearity of SF and β-actin compared with GAPDH and α-tubulin was observed. The methodological variation was relatively low in all four methods (4-11%). Protein level of β-actin and GAPDH was lower in older men compared with young men. In conclusion, β-actin, GAPDH, and α-tubulin may not be used for normalization in studies that include subjects with a large age difference. In contrast, the RPs may not be affected in studies that include muscle wasting and differences in muscle fiber type. The novel SF technology adds lower variation to the results compared with the existing methods for correcting for loading inaccuracy in Western blotting of human skeletal muscle in applied physiology.
Collapse
|
Research Support, Non-U.S. Gov't |
11 |
98 |
5
|
Mangat CS, Bharat A, Gehrke SS, Brown ED. Rank ordering plate data facilitates data visualization and normalization in high-throughput screening. ACTA ACUST UNITED AC 2014; 19:1314-20. [PMID: 24828052 DOI: 10.1177/1087057114534298] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
High-throughput screening (HTS) of chemical and microbial strain collections is an indispensable tool for modern chemical and systems biology; however, HTS data sets have inherent systematic and random error, which may lead to false-positive or false-negative results. Several methods of normalization of data exist; nevertheless, due to the limitations of each, no single method has been universally adopted. Here, we present a method of data visualization and normalization that is effective, intuitive, and easy to implement in a spreadsheet program. For each plate, the data are ordered by ascending values and a plot thereof yields a curve that is a signature of the plate data. Curve shape characteristics provide intuitive visualization of the frequency and strength of inhibitors, activators, and noise on the plate, allowing potentially problematic plates to be flagged. To reduce plate-to-plate variation, the data can be normalized by the mean of the middle 50% of ordered values, also called the interquartile mean (IQM) or the 50% trimmed mean of the plate. Positional effects due to bias in columns, rows, or wells can be corrected using the interquartile mean of each well position across all plates (IQMW) as a second level of normalization. We illustrate the utility of this method using data sets from biochemical and phenotypic screens.
Collapse
|
Research Support, Non-U.S. Gov't |
11 |
40 |
6
|
Krasnov GS, Kudryavtseva AV, Snezhkina AV, Lakunina VA, Beniaminov AD, Melnikova NV, Dmitriev AA. Pan-Cancer Analysis of TCGA Data Revealed Promising Reference Genes for qPCR Normalization. Front Genet 2019; 10:97. [PMID: 30881377 PMCID: PMC6406071 DOI: 10.3389/fgene.2019.00097] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 01/29/2019] [Indexed: 11/20/2022] Open
Abstract
Quantitative PCR (qPCR) remains the most widely used technique for gene expression evaluation. Obtaining reliable data using this method requires reference genes (RGs) with stable mRNA level under experimental conditions. This issue is especially crucial in cancer studies because each tumor has a unique molecular portrait. The Cancer Genome Atlas (TCGA) project provides RNA-Seq data for thousands of samples corresponding to dozens of cancers and presents the basis for assessment of the suitability of genes as reference ones for qPCR data normalization. Using TCGA RNA-Seq data and previously developed CrossHub tool, we evaluated mRNA level of 32 traditionally used RGs in 12 cancer types, including those of lung, breast, prostate, kidney, and colon. We developed an 11-component scoring system for the assessment of gene expression stability. Among the 32 genes, PUM1 was one of the most stably expressed in the majority of examined cancers, whereas GAPDH, which is widely used as a RG, showed significant mRNA level alterations in more than a half of cases. For each of 12 cancer types, we suggested a pair of genes that are the most suitable for use as reference ones. These genes are characterized by high expression stability and absence of correlation between their mRNA levels. Next, the scoring system was expanded with several features of a gene: mutation rate, number of transcript isoforms and pseudogenes, participation in cancer-related processes on the basis of Gene Ontology, and mentions in PubMed-indexed articles. All the genes covered by RNA-Seq data in TCGA were analyzed using the expanded scoring system that allowed us to reveal novel promising RGs for each examined cancer type and identify several "universal" pan-cancer RG candidates, including SF3A1, CIAO1, and SFRS4. The choice of RGs is the basis for precise gene expression evaluation by qPCR. Here, we suggested optimal pairs of traditionally used RGs for 12 cancer types and identified novel promising RGs that demonstrate high expression stability and other features of reliable and convenient RGs (high expression level, low mutation rate, non-involvement in cancer-related processes, single transcript isoform, and absence of pseudogenes).
Collapse
|
research-article |
6 |
34 |
7
|
Hochrein J, Zacharias HU, Taruttis F, Samol C, Engelmann JC, Spang R, Oefner PJ, Gronwald W. Data Normalization of (1)H NMR Metabolite Fingerprinting Data Sets in the Presence of Unbalanced Metabolite Regulation. J Proteome Res 2015; 14:3217-28. [PMID: 26147738 DOI: 10.1021/acs.jproteome.5b00192] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Data normalization is an essential step in NMR-based metabolomics. Conducted properly, it improves data quality and removes unwanted biases. The choice of the appropriate normalization method is critical and depends on the inherent properties of the data set in question. In particular, the presence of unbalanced metabolic regulation, where the different specimens and cohorts under investigation do not contain approximately equal shares of up- and down-regulated features, may strongly influence data normalization. Here, we demonstrate the suitability of the Shapiro-Wilk test to detect such unbalanced regulation. Next, employing a Latin-square design consisting of eight metabolites spiked into a urine specimen at eight different known concentrations, we show that commonly used normalization and scaling methods fail to retrieve true metabolite concentrations in the presence of increasing amounts of glucose added to simulate unbalanced regulation. However, by learning the normalization parameters on a subset of nonregulated features only, Linear Baseline Normalization, Probabilistic Quotient Normalization, and Variance Stabilization Normalization were found to account well for different dilutions of the samples without distorting the true spike-in levels even in the presence of marked unbalanced metabolic regulation. Finally, the methods described were applied successfully to a real world example of unbalanced regulation, namely, a set of plasma specimens collected from patients with and without acute kidney injury after cardiac surgery with cardiopulmonary bypass use.
Collapse
|
Research Support, Non-U.S. Gov't |
10 |
30 |
8
|
Zacharias HU, Altenbuchinger M, Gronwald W. Statistical Analysis of NMR Metabolic Fingerprints: Established Methods and Recent Advances. Metabolites 2018; 8:E47. [PMID: 30154338 PMCID: PMC6161311 DOI: 10.3390/metabo8030047] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 08/01/2018] [Accepted: 08/18/2018] [Indexed: 01/02/2023] Open
Abstract
In this review, we summarize established and recent bioinformatic and statistical methods for the analysis of NMR-based metabolomics. Data analysis of NMR metabolic fingerprints exhibits several challenges, including unwanted biases, high dimensionality, and typically low sample numbers. Common analysis tasks comprise the identification of differential metabolites and the classification of specimens. However, analysis results strongly depend on the preprocessing of the data, and there is no consensus yet on how to remove unwanted biases and experimental variance prior to statistical analysis. Here, we first review established and new preprocessing protocols and illustrate their pros and cons, including different data normalizations and transformations. Second, we give a brief overview of state-of-the-art statistical analysis in NMR-based metabolomics. Finally, we discuss a recent development in statistical data analysis, where data normalization becomes obsolete. This method, called zero-sum regression, builds metabolite signatures whose estimation as well as predictions are independent of prior normalization.
Collapse
|
Review |
7 |
22 |
9
|
Kubinski R, Djamen-Kepaou JY, Zhanabaev T, Hernandez-Garcia A, Bauer S, Hildebrand F, Korcsmaros T, Karam S, Jantchou P, Kafi K, Martin RD. Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease. Front Genet 2022; 13:784397. [PMID: 35251123 PMCID: PMC8895431 DOI: 10.3389/fgene.2022.784397] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 01/13/2022] [Indexed: 12/14/2022] Open
Abstract
Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome's composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools. We collected fifteen 16S rRNA microbiome datasets (7,707 samples) from North America to benchmark combinations of gut microbiome features, data normalization and transformation methods, batch effect correction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with two binary classification metrics following leave-one-dataset-out cross (LODO) validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features processed with a compositional transformation method and batch effect correction with the naive zero-centering method attain the best classification performance. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.
Collapse
|
research-article |
3 |
13 |
10
|
Benedetti E, Gerstner N, Pučić-Baković M, Keser T, Reiding KR, Ruhaak LR, Štambuk T, Selman MH, Rudan I, Polašek O, Hayward C, Beekman M, Slagboom E, Wuhrer M, Dunlop MG, Lauc G, Krumsiek J. Systematic Evaluation of Normalization Methods for Glycomics Data Based on Performance of Network Inference. Metabolites 2020; 10:E271. [PMID: 32630764 PMCID: PMC7408386 DOI: 10.3390/metabo10070271] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 05/29/2020] [Accepted: 06/04/2020] [Indexed: 01/15/2023] Open
Abstract
Glycomics measurements, like all other high-throughput technologies, are subject to technical variation due to fluctuations in the experimental conditions. The removal of this non-biological signal from the data is referred to as normalization. Contrary to other omics data types, a systematic evaluation of normalization options for glycomics data has not been published so far. In this paper, we assess the quality of different normalization strategies for glycomics data with an innovative approach. It has been shown previously that Gaussian Graphical Models (GGMs) inferred from glycomics data are able to identify enzymatic steps in the glycan synthesis pathways in a data-driven fashion. Based on this finding, here, we quantify the quality of a given normalization method according to how well a GGM inferred from the respective normalized data reconstructs known synthesis reactions in the glycosylation pathway. The method therefore exploits a biological measure of goodness. We analyzed 23 different normalization combinations applied to six large-scale glycomics cohorts across three experimental platforms: Liquid Chromatography - ElectroSpray Ionization - Mass Spectrometry (LC-ESI-MS), Ultra High Performance Liquid Chromatography with Fluorescence Detection (UHPLC-FLD), and Matrix Assisted Laser Desorption Ionization - Furier Transform Ion Cyclotron Resonance - Mass Spectrometry (MALDI-FTICR-MS). Based on our results, we recommend normalizing glycan data using the 'Probabilistic Quotient' method followed by log-transformation, irrespective of the measurement platform. This recommendation is further supported by an additional analysis, where we ranked normalization methods based on their statistical associations with age, a factor known to associate with glycomics measurements.
Collapse
|
research-article |
5 |
11 |
11
|
Philips A, Nowis K, Stelmaszczuk M, Jackowiak P, Podkowiński J, Handschuh L, Figlerowicz M. Expression Landscape of circRNAs in Arabidopsis thaliana Seedlings and Adult Tissues. FRONTIERS IN PLANT SCIENCE 2020; 11:576581. [PMID: 33014000 PMCID: PMC7511659 DOI: 10.3389/fpls.2020.576581] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 08/25/2020] [Indexed: 05/27/2023]
Abstract
RNA-seq is currently the only method that can provide a comprehensive landscape of circular RNA (circRNAs) in the whole organism and its particular organs. Recent years have brought an increasing number of RNA-seq-based reports on plant circRNAs. Notably, the picture they revealed is questionable and depends on the applied circRNA identification and quantification techniques. In consequence, little is known about the biogenesis and functions of circRNAs in plants. In this work, we tested two experimental and six bioinformatics procedures of circRNA analysis to determine the optimal approach for studying the profiles of circRNAs in Arabidopsis thaliana. Then using the optimized strategy, we determined the accumulation of circular and corresponding linear transcripts in plant seedlings and organs. We observed that only a small fraction of circRNAs was reproducibly generated. Among them, two groups of circRNAs were discovered: ubiquitous and organ-specific. The highest number of circRNAs with significantly increased accumulation in comparison to other organs/seedlings was found in roots. The circRNAs in seedlings, leaves and flowers originated mainly from genes involved in photosynthesis and the response to stimulus. The levels of circular and linear transcripts were not correlated. Although RNase R treatment enriches the analyzed RNA samples in circular transcripts, it may also have a negative impact on the stability of some of the circRNAs. We also showed that the normalization of NGS data by the library size is not proper for circRNAs quantification. Alternatively, we proposed four other normalization types whose accuracy was confirmed by ddPCR. Moreover, we provided a comprehensive characterization of circRNAs in A. thaliana organs and in seedlings. Our analyses revealed that plant circRNAs are formed in both stochastic and controlled processes. The latter are less frequent and likely engage circRNA-specific mechanisms. Only a few circRNAs were organ-specific. The lack of correlation between the accumulation of linear and circular transcripts indicated that their biogenesis depends on different mechanisms.
Collapse
|
research-article |
5 |
10 |
12
|
Ivanova L, Rangel-Huerta OD, Tartor H, Gjessing MC, Dahle MK, Uhlig S. Fish Skin and Gill Mucus: A Source of Metabolites for Non-Invasive Health Monitoring and Research. Metabolites 2021; 12:28. [PMID: 35050150 PMCID: PMC8781917 DOI: 10.3390/metabo12010028] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/16/2021] [Accepted: 12/25/2021] [Indexed: 11/28/2022] Open
Abstract
Mucous membranes such as the gill and skin mucosa in fish protect them against a multitude of environmental factors. At the same time, changes in the molecular composition of mucus may provide valuable information about the interaction of the fish with their environment, as well as their health and welfare. In this study, the metabolite profiles of the plasma, skin and gill mucus of freshwater Atlantic salmon (Salmo salar) were compared using liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS). Several normalization procedures aimed to reduce unwanted variation in the untargeted data were tested. In addition, the basal metabolism of skin and gills, and the impact of the anesthetic benzocaine for euthanisation were studied. For targeted metabolomics, the commercial AbsoluteIDQ p400 HR kit was used to evaluate the potential differences in metabolic composition in epidermal mucus as compared to the plasma. The targeted metabolomics data showed a high level of correlation between different types of biological fluids from the same individual, indicating that mucus metabolite composition could be used for fish health monitoring and research.
Collapse
|
research-article |
4 |
10 |
13
|
Haering M, Habermann BH. RNfuzzyApp: an R shiny RNA-seq data analysis app for visualisation, differential expression analysis, time-series clustering and enrichment analysis. F1000Res 2021; 10:654. [PMID: 35186266 PMCID: PMC8825645 DOI: 10.12688/f1000research.54533.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/20/2021] [Indexed: 09/23/2023] Open
Abstract
RNA sequencing (RNA-seq) is a widely adopted affordable method for large scale gene expression profiling. However, user-friendly and versatile tools for wet-lab biologists to analyse RNA-seq data beyond standard analyses such as differential expression, are rare. Especially, the analysis of time-series data is difficult for wet-lab biologists lacking advanced computational training. Furthermore, most meta-analysis tools are tailored for model organisms and not easily adaptable to other species. With RNfuzzyApp, we provide a user-friendly, web-based R shiny app for differential expression analysis, as well as time-series analysis of RNA-seq data. RNfuzzyApp offers several methods for normalization and differential expression analysis of RNA-seq data, providing easy-to-use toolboxes, interactive plots and downloadable results. For time-series analysis, RNfuzzyApp presents the first web-based, fully automated pipeline for soft clustering with the Mfuzz R package, including methods to aid in cluster number selection, cluster overlap analysis, Mfuzz loop computations, as well as cluster enrichments. RNfuzzyApp is an intuitive, easy to use and interactive R shiny app for RNA-seq differential expression and time-series analysis, offering a rich selection of interactive plots, providing a quick overview of raw data and generating rapid analysis results. Furthermore, its orthology assignment, enrichment analysis, as well as ID conversion functions are accessible to non-model organisms.
Collapse
|
Meta-Analysis |
4 |
10 |
14
|
Zauber H, Schüler V, Schulze W. Systematic evaluation of reference protein normalization in proteomic experiments. FRONTIERS IN PLANT SCIENCE 2013; 4:25. [PMID: 23450762 PMCID: PMC3583035 DOI: 10.3389/fpls.2013.00025] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 02/04/2013] [Indexed: 06/01/2023]
Abstract
Quantitative comparative analyses of protein abundances using peptide ion intensities and their modifications have become a widely used technique in studying various biological questions. In the past years, several methods for quantitative proteomics were established using stable-isotope labeling and label-free approaches. We systematically evaluated the application of reference protein normalization (RPN) for proteomic experiments using a high mass accuracy LC-MS/MS platform. In RPN all sample peptide intensities were normalized to an average protein intensity of a spiked reference protein. The main advantage of this method is that it avoids fraction of total based relative analysis of proteomic data, which is often very much dependent on sample complexity. We could show that reference protein ion intensity sums are sufficiently reproducible to ensure a reliable normalization. We validated the RPN strategy by analyzing changes in protein abundances induced by nutrient starvation in Arabidopsis thaliana. Beyond that, we provide a principle guideline for determining optimal combination of sample protein and reference protein load on individual LC-MS/MS systems.
Collapse
|
research-article |
12 |
7 |
15
|
Zheng H, Zhao H, Zhang X, Liang Z, He Q. Systematic Identification and Validation of Suitable Reference Genes for the Normalization of Gene Expression in Prunella vulgaris under Different Organs and Spike Development Stages. Genes (Basel) 2022; 13:1947. [PMID: 36360184 PMCID: PMC9689956 DOI: 10.3390/genes13111947] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/19/2022] [Accepted: 10/24/2022] [Indexed: 08/01/2023] Open
Abstract
The quantitative real-time PCR (qRT-PCR) is an efficient and sensitive method for determining gene expression levels, but the accuracy of the results substantially depends on the stability of the reference gene (RG). Therefore, choosing an appropriate reference gene is a critical step in normalizing qRT-PCR data. Prunella vulgaris L. is a traditional Chinese medicine herb widely used in China. Its main medicinal part is the fruiting spike which is termed Spica Prunellae. However, thus far, few studies have been conducted on the mechanism of Spica Prunellae development. Meanwhile, no reliable RGs have been reported in P. vulgaris. The expression levels of 14 candidate RGs were analyzed in this study in various organs and at different stages of Spica Prunellae development. Four statistical algorithms (Delta Ct, BestKeeper, NormFinder, and geNorm) were utilized to identify the RGs' stability, and an integrated stability rating was generated via the RefFinder website online. The final ranking results revealed that eIF-2 was the most stable RG, whereas VAB2 was the least suitable as an RG. Furthermore, eIF-2 + Histon3.3 was identified as the best RG combination in different periods and the total samples. Finally, the expressions of the PvTAT and Pv4CL2 genes related to the regulation of rosmarinic acid synthesis in different organs were used to verify the stable and unstable RGs. The stable RGs in P. vulgaris were originally identified and verified in this work. This achievement provides strong support for obtaining a reliable qPCR analysis and lays the foundation for in-depth research on the developmental mechanism of Spica Prunellae.
Collapse
|
research-article |
3 |
5 |
16
|
Herrmann HA, Rusz M, Baier D, Jakupec MA, Keppler BK, Berger W, Koellensperger G, Zanghellini J. Thermodynamic Genome-Scale Metabolic Modeling of Metallodrug Resistance in Colorectal Cancer. Cancers (Basel) 2021; 13:4130. [PMID: 34439283 PMCID: PMC8391396 DOI: 10.3390/cancers13164130] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/23/2021] [Accepted: 08/03/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Mass spectrometry-based metabolomics approaches provide an immense opportunity to enhance our understanding of the mechanisms that underpin the cellular reprogramming of cancers. Accurate comparative metabolic profiling of heterogeneous conditions, however, is still a challenge. METHODS Measuring both intracellular and extracellular metabolite concentrations, we constrain four instances of a thermodynamic genome-scale metabolic model of the HCT116 colorectal carcinoma cell line to compare the metabolic flux profiles of cells that are either sensitive or resistant to ruthenium- or platinum-based treatments with BOLD-100/KP1339 and oxaliplatin, respectively. RESULTS Normalizing according to growth rate and normalizing resistant cells according to their respective sensitive controls, we are able to dissect metabolic responses specific to the drug and to the resistance states. We find the normalization steps to be crucial in the interpretation of the metabolomics data and show that the metabolic reprogramming in resistant cells is limited to a select number of pathways. CONCLUSIONS Here, we elucidate the key importance of normalization steps in the interpretation of metabolomics data, allowing us to uncover drug-specific metabolic reprogramming during acquired metal-drug resistance.
Collapse
|
research-article |
4 |
5 |
17
|
Comparison of 5 Normalization Methods for Knee Joint Moments in the Single-Leg Squat. J Appl Biomech 2022; 38:29-38. [PMID: 35042188 DOI: 10.1123/jab.2021-0143] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 10/13/2021] [Accepted: 12/01/2021] [Indexed: 11/18/2022]
Abstract
Ratio scaling is the most common magnitude normalization approach for net joint moment (NJM) data. Generally, researchers compute a ratio between NJM and (some combination of) physical body characteristics (eg, mass, height, limb length, etc). However, 3 assumptions must be verified when normalizing NJM data this way. First, the regression line between NJM and the characteristic(s) used passes through the origin. Second, normalizing NJM eliminates its correlation with the characteristic(s). Third, the statistical interpretations following normalization are consistent with adjusted linear models. The study purpose was to assess these assumptions using data collected from 16 males and 16 females who performed a single-leg squat. Standard inverse dynamics analyses were conducted, and ratios were computed between the mediolateral and anteroposterior components of the knee NJM and participant mass, height, leg length, mass × height, and mass × leg length. Normalizing NJM-mediolateral by mass × height and mass × leg length satisfied all 3 assumptions. Normalizing NJM-anteroposterior by height and leg length satisfied all 3 assumptions. Therefore, if normalization of the knee NJM is deemed necessary to address a given research question, it can neither be assumed that using (any combination of) participant mass, height, or leg length as the denominator is appropriate nor consistent across joint axes.
Collapse
|
|
3 |
4 |
18
|
Kim YJ, Kim KG. Detection and Weak Segmentation of Masses in Gray-Scale Breast Mammogram Images Using Deep Learning. Yonsei Med J 2022; 63:S63-S73. [PMID: 35040607 PMCID: PMC8790585 DOI: 10.3349/ymj.2022.63.s63] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 11/10/2021] [Accepted: 11/11/2021] [Indexed: 11/27/2022] Open
Abstract
PURPOSE In this paper, we propose deep-learning methodology with which to enhance the mass differentiation performance of convolutional neural network (CNN)-based architecture. MATERIALS AND METHODS We differentiated breast mass lesions from gray-scale X-ray mammography images based on regions of interest (ROIs). Our dataset comprised breast mammogram images for 150 cases of malignant masses from which we extracted the mass ROI, and we composed a CNN-based deep learning model trained on this dataset to identify ROI mass lesions. The test dataset was created by shifting some of the training data images. Thus, although both datasets were different, they retained a deep structural similarity. We then applied our trained deep-learning model to detect masses on 8-bit mammogram images containing malignant masses. The input images were preprocessed by applying a scaling parameter of intensity before being used to train the CNN model for mass differentiation. RESULTS The highest area under the receiver operating characteristic curve was 0.897 (Î 20). CONCLUSION Our results indicated that the proposed patch-wise detection method can be utilized as a mass detection and segmentation tool.
Collapse
|
research-article |
3 |
4 |
19
|
Liu F, Singhal K, Matney R, Acharya S, Akdis CA, Nadeau KC, Chien AS, Leib RD. Enhancing Data Reliability in TOMAHAQ for Large-Scale Protein Quantification. Proteomics 2020; 20:e1900105. [PMID: 32032464 DOI: 10.1002/pmic.201900105] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 01/19/2020] [Indexed: 11/10/2022]
Abstract
The analytical scale of most mass-spectrometry-based targeted proteomics assays is usually limited by assay performance and instrument utilization. A recently introduced method, called triggered by offset, multiplexed, accurate mass, high resolution, and absolute quantitation (TOMAHAQ), combines both peptide and sample multiplexing to simultaneously improve analytical scale and quantitative performance. In the present work, critical technical requirements and data analysis considerations for successful implementation of the TOMAHAQ technique based on the study of a total of 185 target peptides across over 200 clinical plasma samples are discussed. Importantly, it is observed that significant interference originate from the TMTzero reporter ion used for the synthetic trigger peptides. This interference is not expected because only TMT10plex reporter ions from the target peptides should be observed under typical TOMAHAQ conditions. In order to unlock the great promise of the technique for high throughput quantification, here a post-acquisition data correction strategy to deconvolute the reporter ion superposition and recover reliable data is proposed.
Collapse
|
Research Support, Non-U.S. Gov't |
5 |
4 |
20
|
Zhang Y, Fan S, Wohlgemuth G, Fiehn O. Denoising Autoencoder Normalization for Large-Scale Untargeted Metabolomics by Gas Chromatography-Mass Spectrometry. Metabolites 2023; 13:944. [PMID: 37623887 PMCID: PMC10456436 DOI: 10.3390/metabo13080944] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 07/31/2023] [Accepted: 08/08/2023] [Indexed: 08/26/2023] Open
Abstract
Large-scale metabolomics assays are widely used in epidemiology for biomarker discovery and risk assessments. However, systematic errors introduced by instrumental signal drifting pose a big challenge in large-scale assays, especially for derivatization-based gas chromatography-mass spectrometry (GC-MS). Here, we compare the results of different normalization methods for a study with more than 4000 human plasma samples involved in a type 2 diabetes cohort study, in addition to 413 pooled quality control (QC) samples, 413 commercial pooled plasma samples, and a set of 25 stable isotope-labeled internal standards used for every sample. Data acquisition was conducted across 1.2 years, including seven column changes. In total, 413 pooled QC (training) and 413 BioIVT samples (validation) were used for normalization comparisons. Surprisingly, neither internal standards nor sum-based normalizations yielded median precision of less than 30% across all 563 metabolite annotations. While the machine-learning-based SERRF algorithm gave 19% median precision based on the pooled quality control samples, external cross-validation with BioIVT plasma pools yielded a median 34% relative standard deviation (RSD). We developed a new method: systematic error reduction by denoising autoencoder (SERDA). SERDA lowered the median standard deviations of the training QC samples down to 16% RSD, yielding an overall error of 19% RSD when applied to the independent BioIVT validation QC samples. This is the largest study on GC-MS metabolomics ever reported, demonstrating that technical errors can be normalized and handled effectively for this assay. SERDA was further validated on two additional large-scale GC-MS-based human plasma metabolomics studies, confirming the superior performance of SERDA over SERRF or sum normalizations.
Collapse
|
research-article |
2 |
4 |
21
|
Ampavathi A, Saradhi TV. Multi disease-prediction framework using hybrid deep learning: an optimal prediction model. Comput Methods Biomech Biomed Engin 2021; 24:1146-1168. [PMID: 33427480 DOI: 10.1080/10255842.2020.1869726] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Big data and its approaches are generally helpful for healthcare and biomedical sectors for predicting the disease. For trivial symptoms, the difficulty is to meet the doctors at any time in the hospital. Thus, big data provides essential data regarding the diseases on the basis of the patient's symptoms. For several medical organizations, disease prediction is important for making the best feasible health care decisions. Conversely, the conventional medical care model offers input as structured that requires more accurate and consistent prediction. This paper is planned to develop the multi-disease prediction using the improvised deep learning concept. Here, the different datasets pertain to "Diabetes, Hepatitis, lung cancer, liver tumor, heart disease, Parkinson's disease, and Alzheimer's disease", from the benchmark UCI repository is gathered for conducting the experiment. The proposed model involves three phases (a) Data normalization (b) Weighted normalized feature extraction, and (c) prediction. Initially, the dataset is normalized in order to make the attribute's range at a certain level. Further, weighted feature extraction is performed, in which a weight function is multiplied with each attribute value for making large scale deviation. Here, the weight function is optimized using the combination of two meta-heuristic algorithms termed as Jaya Algorithm-based Multi-Verse Optimization algorithm (JA-MVO). The optimally extracted features are subjected to the hybrid deep learning algorithms like "Deep Belief Network (DBN) and Recurrent Neural Network (RNN)". As a modification to hybrid deep learning architecture, the weight of both DBN and RNN is optimized using the same hybrid optimization algorithm. Further, the comparative evaluation of the proposed prediction over the existing models certifies its effectiveness through various performance measures.
Collapse
|
Journal Article |
4 |
3 |
22
|
Appropriate Reference Genes for RT-qPCR Normalization in Various Organs of Anemone flaccida Fr. Schmidt at Different Growing Stages. Genes (Basel) 2021; 12:genes12030459. [PMID: 33807101 PMCID: PMC8005022 DOI: 10.3390/genes12030459] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 03/12/2021] [Accepted: 03/17/2021] [Indexed: 11/17/2022] Open
Abstract
Anemone flaccida Fr. Schmidt is a traditional medicinal herb in southwestern China and has multiple pharmacological effects on bruise injuries and rheumatoid arthritis (RA). A new drug with a good curative effect on RA has recently been developed from the extract of A. flaccida rhizomes, of which the main medicinal ingredients are triterpenoid saponins. Due to excessive exploitation, the wild population has been scarce and endangered in a few of its natural habitats and research on the cultivation of the plant commenced. Studies on the gene expressions related to the biosynthesis of triterpenoid saponins are not only helpful for understanding the effects of environmental factors on the medicinal ingredient accumulations but also necessary for monitoring the herb quality of the cultivated plants. Reverse transcription quantitative polymerase chain reaction (RT-qPCR) as a sensitive and powerful technique has been widely used to detect gene expression across tissues in plants at different stages; however, its accuracy and reliability depend largely on the reference gene selection. In this study, the expressions of 10 candidate reference genes were evaluated in various organs of the wild and cultivated plants at different stages, using the algorithms of geNorm, NormFinder and BestKeeper, respectively. The purpose of this study was to identify the suitable reference genes for RT-qPCR detection in A. flaccida. The results showed that two reference genes were sufficient for RT-qPCR data normalization in A. flaccida. PUBQ and ETIF1a can be used as suitable reference genes in most organs at various stages because of their expression stabilitywhereas the PUBQ and EF1Α genes were desirable in the rhizomes of the plant at the vegetative stage.
Collapse
|
Journal Article |
4 |
3 |
23
|
Li F, Rao G, Du J, Xiang Y, Zhang Y, Selek S, Hamilton JE, Xu H, Tao C. Ontological representation-oriented term normalization and standardization of the Research Domain Criteria. Health Informatics J 2019; 26:726-737. [PMID: 30843449 DOI: 10.1177/1460458219832059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The Research Domain Criteria, launched by the National Institute of Mental Health, is a new dimensional and interdisciplinary research framework for mental disorders. The Research Domain Criteria matrix is its core part. Since an ontology has the strengths of supporting semantic inferencing and automatic data processing, we would like to transform the Research Domain Criteria matrix into an ontological structure. In terms of data normalization, which is the essential part of an ontology representation, the Research Domain Criteria elements (mainly in the Units of Analysis) have some limitations. In this article, we propose a series of solutions to improve data normalization of the Research Domain Criteria elements in the Units of Analysis, including leveraging standard terminologies (i.e. the Unified Medical Language System Metathesaurus), context-combining queries, and domain expertise. The evaluation results show the positive (Yes) percentage is more than 80 percent, indicating our work is favorably received by the mental health professionals, and we have formed a good data foundation for the Research Domain Criteria ontological representation in the future work.
Collapse
|
Research Support, N.I.H., Extramural |
6 |
1 |
24
|
A Comprehensive Mass Spectrometry-Based Workflow for Clinical Metabolomics Cohort Studies. Metabolites 2022; 12:metabo12121168. [PMID: 36557207 PMCID: PMC9782571 DOI: 10.3390/metabo12121168] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 11/14/2022] [Accepted: 11/16/2022] [Indexed: 11/27/2022] Open
Abstract
As a comprehensive analysis of all metabolites in a biological system, metabolomics is being widely applied in various clinical/health areas for disease prediction, diagnosis, and prognosis. However, challenges remain in dealing with the metabolomic complexity, massive data, metabolite identification, intra- and inter-individual variation, and reproducibility, which largely limit its widespread implementation. This study provided a comprehensive workflow for clinical metabolomics, including sample collection and preparation, mass spectrometry (MS) data acquisition, and data processing and analysis. Sample collection from multiple clinical sites was strictly carried out with standardized operation procedures (SOP). During data acquisition, three types of quality control (QC) samples were set for respective MS platforms (GC-MS, LC-MS polar, and LC-MS lipid) to assess the MS performance, facilitate metabolite identification, and eliminate contamination. Compounds annotation and identification were implemented with commercial software and in-house-developed PAppLineTM and UlibMS library. The batch effects were removed using a deep learning model method (NormAE). Potential biomarkers identification was performed with tree-based modeling algorithms including random forest, AdaBoost, and XGBoost. The modeling performance was evaluated using the F1 score based on a 10-times repeated trial for each. Finally, a sub-cohort case study validated the reliability of the entire workflow.
Collapse
|
research-article |
3 |
1 |
25
|
Sullivan GJ, Barquist L, Cain AK. A method to correct for local alterations in DNA copy number that bias functional genomics assays applied to antibiotic-treated bacteria. mSystems 2024; 9:e0066523. [PMID: 38470252 PMCID: PMC11019837 DOI: 10.1128/msystems.00665-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 02/13/2024] [Indexed: 03/13/2024] Open
Abstract
Functional genomics techniques, such as transposon insertion sequencing and RNA-sequencing, are key to studying relative differences in bacterial mutant fitness or gene expression under selective conditions. However, certain stress conditions, mutations, or antibiotics can directly interfere with DNA synthesis, resulting in systematic changes in local DNA copy numbers along the chromosome. This can lead to artifacts in sequencing-based functional genomics data when comparing antibiotic treatment to an unstressed control. Further, relative differences in gene-wise read counts may result from alterations in chromosomal replication dynamics, rather than selection or direct gene regulation. We term this artifact "chromosomal location bias" and implement a principled statistical approach to correct it by calculating local normalization factors along the chromosome. These normalization factors are then directly incorporated into statistical analyses using standard RNA-sequencing analysis methods without modifying the read counts themselves, preserving important information about the mean-variance relationship in the data. We illustrate the utility of this approach by generating and analyzing a ciprofloxacin-treated transposon insertion sequencing data set in Escherichia coli as a case study. We show that ciprofloxacin treatment generates chromosomal location bias in the resulting data, and we further demonstrate that failing to correct for this bias leads to false predictions of mutant drug sensitivity as measured by minimum inhibitory concentrations. We have developed an R package and user-friendly graphical Shiny application, ChromoCorrect, that detects and corrects for chromosomal bias in read count data, enabling the application of functional genomics technologies to the study of antibiotic stress.IMPORTANCEAltered gene dosage due to changes in DNA replication has been observed under a variety of stresses with a variety of experimental techniques. However, the implications of changes in gene dosage for sequencing-based functional genomics assays are rarely considered. We present a statistically principled approach to correcting for the effect of changes in gene dosage, enabling testing for differences in the fitness effects or regulation of individual genes in the presence of confounding differences in DNA copy number. We show that failing to correct for these effects can lead to incorrect predictions of resistance phenotype when applying functional genomics assays to investigate antibiotic stress, and we provide a user-friendly application to detect and correct for changes in DNA copy number.
Collapse
|
research-article |
1 |
|