1
|
Chilimoniuk J, Erol A, Rödiger S, Burdukiewicz M. Challenges and opportunities in processing NanoString nCounter data. Comput Struct Biotechnol J 2024; 23:1951-1958. [PMID: 38736697 PMCID: PMC11087919 DOI: 10.1016/j.csbj.2024.04.061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/14/2024] Open
Abstract
NanoString nCounter is a medium-throughput technology used in mRNA and miRNA differential expression studies. It offers several advantages, including the absence of an amplification step and the ability to analyze low-grade samples. Despite its considerable strengths, the popularity of the nCounter platform in experimental research stabilized in 2022 and 2023, and this trend may continue in the upcoming years. Such stagnation could potentially be attributed to the absence of a standardized analytical pipeline or the indication of optimal processing methods for nCounter data analysis. To standardize the description of the nCounter data analysis workflow, we divided it into five distinct steps: data pre-processing, quality control, background correction, normalization and differential expression analysis. Next, we evaluated eleven R packages dedicated to nCounter data processing to point out functionalities belonging to these steps and provide comments on their applications in studies of mRNA and miRNA samples.
Collapse
Affiliation(s)
| | - Anna Erol
- Clinical Research Centre, Medical University of Białystok, Białystok, Poland
| | - Stefan Rödiger
- Institute of Biotechnology, Faculty Environment and Natural Sciences, Brandenburg University of Technology Cottbus - Senftenberg, Senftenberg, Germany
| | - Michał Burdukiewicz
- Clinical Research Centre, Medical University of Białystok, Białystok, Poland
- Institute of Biotechnology and Biomedicine, Autonomous University of Barcelona, Barcelona, Spain
| |
Collapse
|
2
|
Barth J, Yang Y, Xiao G, Wang X. MetaNorm: incorporating meta-analytic priors into normalization of NanoString nCounter data. Bioinformatics 2024; 40:btae024. [PMID: 38237909 PMCID: PMC10826904 DOI: 10.1093/bioinformatics/btae024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 12/28/2023] [Accepted: 01/12/2024] [Indexed: 02/01/2024] Open
Abstract
MOTIVATION Non-informative or diffuse prior distributions are widely employed in Bayesian data analysis to maintain objectivity. However, when meaningful prior information exists and can be identified, using an informative prior distribution to accurately reflect current knowledge may lead to superior outcomes and great efficiency. RESULTS We propose MetaNorm, a Bayesian algorithm for normalizing NanoString nCounter gene expression data. MetaNorm is based on RCRnorm, a powerful method designed under an integrated series of hierarchical models that allow various sources of error to be explained by different types of probes in the nCounter system. However, a lack of accurate prior information, weak computational efficiency, and instability of estimates that sometimes occur weakens the approach despite its impressive performance. MetaNorm employs priors carefully constructed from a rigorous meta-analysis to leverage information from large public data. Combined with additional algorithmic enhancements, MetaNorm improves RCRnorm by yielding more stable estimation of normalized values, better convergence diagnostics and superior computational efficiency. AVAILABILITY AND IMPLEMENTATION R Code for replicating the meta-analysis and the normalization function can be found at github.com/jbarth216/MetaNorm.
Collapse
Affiliation(s)
- Jackson Barth
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX 75275, United States
- Department of Statistical Science, Baylor University, Waco, TX 76798, United States
| | - Yuqiu Yang
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX 75275, United States
| | - Guanghua Xiao
- Quantitative Biomedical Research Center, The University of Texas Southwestern Medical Center, Dallas, TX 75390, United States
| | - Xinlei Wang
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX 75275, United States
- Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019 United States
- Division of Data Science, College of Science, University of Texas at Arlington, Arlington, TX 76019, United States
| |
Collapse
|
3
|
Xu C, Wang X, Lim J, Xiao G, Xie Y. RCRdiff: A fully integrated Bayesian method for differential expression analysis using raw NanoString nCounter data. Stat Med 2022; 41:665-680. [PMID: 34773277 PMCID: PMC8795478 DOI: 10.1002/sim.9250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 08/23/2021] [Accepted: 10/16/2021] [Indexed: 11/05/2022]
Abstract
The medium-throughput mRNA abundance platform NanoString nCounter has gained great popularity in the past decade, due to its high sensitivity and technical reproducibility as well as remarkable applicability to ubiquitous formalin fixed paraffin embedded (FFPE) tissue samples. Based on RCRnorm developed for normalizing NanoString nCounter data and Bayesian LASSO for variable selection, we propose a fully integrated Bayesian method, called RCRdiff, to detect differentially expressed (DE) genes between different groups of tissue samples (eg, normal and cancer). Unlike existing methods that often require normalization performed beforehand, RCRdiff directly handles raw read counts and jointly models the behaviors of different types of internal controls along with DE and non-DE gene patterns. Doing so would avoid efficiency loss caused by ignoring estimation uncertainty from the normalization step in a sequential approach and thus can offer more reliable statistical inference. We also propose clustering-based strategies for DE gene selection, which do not require any external dataset and are free of any arbitrary cutoff. Empirical evidence of the attractiveness of RCRdiff is demonstrated via extensive simulation and data examples.
Collapse
Affiliation(s)
- Can Xu
- Department of Statistical Science, Southern Methodist University, Texas, USA
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Texas, USA,Correspondence: Xinlei Wang, Department of Statistical Science, Southern Methodist University, Dallas, TX 75275.
| | - Johan Lim
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Guanghua Xiao
- Department of Population & Data Sciences and Department of Bioinformatics, University of Texas Southwestern Medical Center, Texas, USA
| | - Yang Xie
- Department of Population & Data Sciences and Department of Bioinformatics, University of Texas Southwestern Medical Center, Texas, USA
| |
Collapse
|
4
|
Bhattacharya A, Hamilton AM, Furberg H, Pietzak E, Purdue MP, Troester MA, Hoadley KA, Love MI. An approach for normalization and quality control for NanoString RNA expression data. Brief Bioinform 2021; 22:bbaa163. [PMID: 32789507 PMCID: PMC8138885 DOI: 10.1093/bib/bbaa163] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 06/29/2020] [Accepted: 06/30/2020] [Indexed: 01/10/2023] Open
Abstract
The NanoString RNA counting assay for formalin-fixed paraffin embedded samples is unique in its sensitivity, technical reproducibility and robustness for analysis of clinical and archival samples. While commercial normalization methods are provided by NanoString, they are not optimal for all settings, particularly when samples exhibit strong technical or biological variation or where housekeeping genes have variable performance across the cohort. Here, we develop and evaluate a more comprehensive normalization procedure for NanoString data with steps for quality control, selection of housekeeping targets, normalization and iterative data visualization and biological validation. The approach was evaluated using a large cohort ($N=\kern0.5em 1649$) from the Carolina Breast Cancer Study, two cohorts of moderate sample size ($N=359$ and$130$) and a small published dataset ($N=12$). The iterative process developed here eliminates technical variation (e.g. from different study phases or sites) more reliably than the three other methods, including NanoString's commercial package, without diminishing biological variation, especially in long-term longitudinal multiphase or multisite cohorts. We also find that probe sets validated for nCounter, such as the PAM50 gene signature, are impervious to batch issues. This work emphasizes that systematic quality control, normalization and visualization of NanoString nCounter data are an imperative component of study design that influences results in downstream analyses.
Collapse
Affiliation(s)
| | | | | | | | - Mark P Purdue
- Division of Cancer Epidemiology and Genetics, National Cancer Institute
| | | | | | | |
Collapse
|
5
|
Jia G, Wang X, Li Q, Lu W, Tang X, Wistuba I, Xie Y. RCRnorm: An integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data. Ann Appl Stat 2019; 13:1617-1647. [PMID: 33564347 PMCID: PMC7869841 DOI: 10.1214/19-aoas1249] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Formalin-fixed paraffin-embedded (FFPE) samples have great potential for biomarker discovery, retrospective studies, and diagnosis or prognosis of diseases. Their application, however, is hindered by the unsatisfactory performance of traditional gene expression profiling techniques on damaged RNAs. NanoString nCounter platform is well suited for profiling of FFPE samples and measures gene expression with high sensitivity, which may greatly facilitate realization of scientific and clinical values of FFPE samples. However, methodological development for normalization, a critical step when analyzing this type of data, is far behind. Existing methods designed for the platform use information from different types of internal controls separately and rely on an overly-simplified assumption that expression of housekeeping genes is constant across samples for global scaling. Thus, these methods are not optimized for the nCounter system, not mentioning that they were not developed for FFPE samples. We construct an integrated system of random-coefficient hierarchical regression models to capture main patterns and characteristics observed from NanoString data of FFPE samples, and develop a Bayesian approach to estimate parameters and normalize gene expression across samples. Our method, labeled RCRnorm, incorporates information from all aspects of the experimental design and simultaneously removes biases from various sources. It eliminates the unrealistic assumption on housekeeping genes and offers great interpretability. Furthermore, it is applicable to freshly frozen or like samples that can be generally viewed as a reduced case of FFPE samples. Simulation and applications showed the superior performance of RCRnorm.
Collapse
Affiliation(s)
- Gaoxiang Jia
- Department of Statistical Science, Southern Methodist University, 3225 Daniel Avenue, P O Box 750332, Dallas, Texas 75275
- Quantitative Biomedical Research Center, Department of Clinical Sciences, The University of Texas Southwestern Medical Center, Dallas, Texas 75390
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, 3225 Daniel Avenue, P O Box 750332, Dallas, Texas 75275
| | - Qiwei Li
- Quantitative Biomedical Research Center, Department of Clinical Sciences, The University of Texas Southwestern Medical Center, Dallas, Texas 75390
| | - Wei Lu
- Department of Translational Molecular Pathology, University of Texas, MD Anderson Cancer Center, Houston, Texas 77030
| | - Ximing Tang
- Department of Translational Molecular Pathology, University of Texas, MD Anderson Cancer Center, Houston, Texas 77030
| | - Ignacio Wistuba
- Department of Translational Molecular Pathology, University of Texas, MD Anderson Cancer Center, Houston, Texas 77030
| | - Yang Xie
- Quantitative Biomedical Research Center, Department of Clinical Sciences, The University of Texas Southwestern Medical Center, Dallas, Texas 75390
| |
Collapse
|