1
|
Paton V, Ramirez Flores RO, Gabor A, Badia-I-Mompel P, Tanevski J, Garrido-Rodriguez M, Saez-Rodriguez J. Assessing the impact of transcriptomics data analysis pipelines on downstream functional enrichment results. Nucleic Acids Res 2024:gkae552. [PMID: 38943333 DOI: 10.1093/nar/gkae552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 06/03/2024] [Accepted: 06/19/2024] [Indexed: 07/01/2024] Open
Abstract
Transcriptomics is widely used to assess the state of biological systems. There are many tools for the different steps, such as normalization, differential expression, and enrichment. While numerous studies have examined the impact of method choices on differential expression results, little attention has been paid to their effects on further downstream functional analysis, which typically provides the basis for interpretation and follow-up experiments. To address this, we introduce FLOP, a comprehensive nextflow-based workflow combining methods to perform end-to-end analyses of transcriptomics data. We illustrate FLOP on datasets ranging from end-stage heart failure patients to cancer cell lines. We discovered effects not noticeable at the gene-level, and observed that not filtering the data had the highest impact on the correlation between pipelines in the gene set space. Moreover, we performed three benchmarks to evaluate the 12 pipelines included in FLOP, and confirmed that filtering is essential in scenarios of expected moderate-to-low biological signal. Overall, our results underscore the impact of carefully evaluating the consequences of the choice of preprocessing methods on downstream enrichment analyses. We envision FLOP as a valuable tool to measure the robustness of functional analyses, ultimately leading to more reliable and conclusive biological findings.
Collapse
Affiliation(s)
- Victor Paton
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | - Ricardo Omar Ramirez Flores
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | - Attila Gabor
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | - Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | - Jovan Tanevski
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | - Martin Garrido-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
- European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| |
Collapse
|
2
|
Jiang G, Zheng JY, Ren SN, Yin W, Xia X, Li Y, Wang HL. A comprehensive workflow for optimizing RNA-seq data analysis. BMC Genomics 2024; 25:631. [PMID: 38914930 PMCID: PMC11197194 DOI: 10.1186/s12864-024-10414-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 05/15/2024] [Indexed: 06/26/2024] Open
Abstract
BACKGROUND Current RNA-seq analysis software for RNA-seq data tends to use similar parameters across different species without considering species-specific differences. However, the suitability and accuracy of these tools may vary when analyzing data from different species, such as humans, animals, plants, fungi, and bacteria. For most laboratory researchers lacking a background in information science, determining how to construct an analysis workflow that meets their specific needs from the array of complex analytical tools available poses a significant challenge. RESULTS By utilizing RNA-seq data from plants, animals, and fungi, it was observed that different analytical tools demonstrate some variations in performance when applied to different species. A comprehensive experiment was conducted specifically for analyzing plant pathogenic fungal data, focusing on differential gene analysis as the ultimate goal. In this study, 288 pipelines using different tools were applied to analyze five fungal RNA-seq datasets, and the performance of their results was evaluated based on simulation. This led to the establishment of a relatively universal and superior fungal RNA-seq analysis pipeline that can serve as a reference, and certain standards for selecting analysis tools were derived for reference. Additionally, we compared various tools for alternative splicing analysis. The results based on simulated data indicated that rMATS remained the optimal choice, although consideration could be given to supplementing with tools such as SpliceWiz. CONCLUSION The experimental results demonstrate that, in comparison to the default software parameter configurations, the analysis combination results after tuning can provide more accurate biological insights. It is beneficial to carefully select suitable analysis software based on the data, rather than indiscriminately choosing tools, in order to achieve high-quality analysis results more efficiently.
Collapse
Affiliation(s)
- Gao Jiang
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing, 100083, People's Republic of China
| | - Juan-Yu Zheng
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing, 100083, People's Republic of China
| | - Shu-Ning Ren
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, People's Republic of China
| | - Weilun Yin
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, People's Republic of China
| | - Xinli Xia
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, People's Republic of China
| | - Yun Li
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing, 100083, People's Republic of China.
| | - Hou-Ling Wang
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, People's Republic of China.
| |
Collapse
|
3
|
Xiao J, Yao X, Guan X, Xiong J, Fang Y, Zhang J, Zhang Y, Moming A, Su Z, Jin J, Ge Y, Wang J, Fan Z, Tang S, Shen S, Deng F. Viromes of Haemaphysalis longicornis reveal different viral abundance and diversity in free and engorged ticks. Virol Sin 2024; 39:194-204. [PMID: 38360150 DOI: 10.1016/j.virs.2024.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/08/2024] [Indexed: 02/17/2024] Open
Abstract
Haemaphysalis longicornis ticks, commonly found in East Asia, can transmit various pathogenic viruses, including the severe fever with thrombocytopenia syndrome virus (SFTSV) that has caused febrile diseases among humans in Hubei Province. However, understanding of the viromes of H. longicornis was limited, and the prevalence of viruses among H. longicornis ticks in Hubei was not well clarified. This study investigates the viromes of both engorged (fed) and free (unfed) H. longicornis ticks across three mountainous regions in Hubei Province from 2019 to 2020. RNA-sequencing analysis identified viral sequences that were related to 39 reference viruses belonging to unclassified viruses and seven RNA viral families, namely Chuviridae, Nairoviridae, Orthomyxoviridae, Parvoviridae, Phenuiviridae, Rhabdoviridae, and Totiviridae. Viral abundance and diversity in these ticks were analysed, and phylogenetic characteristics of the Henan tick virus (HNTV), Dabieshan tick virus (DBSTV), Okutama tick virus (OKTV), and Jingmen tick virus (JMTV) were elucidated based on their full genomic sequences. Prevalence analysis demonstrated that DBSTV was the most common virus found in individual H. longicornis ticks (12.59%), followed by HNTV (0.35%), whereas JMTV and OKTV were not detected. These results improve our understanding of H. longicornis tick viromes in central China and highlight the role of tick feeding status and geography in shaping the viral community. The findings of new viral strains and their potential impact on public health raise the need to strengthen surveillance efforts for comprehensively assessing their spillover potentials.
Collapse
Affiliation(s)
- Jian Xiao
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China; University of Chinese Academy of Sciences, Beijing, 101408, China
| | - Xuan Yao
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, 430070, China
| | - Xuhua Guan
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, 430070, China
| | - Jinfeng Xiong
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, 430070, China
| | - Yaohui Fang
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China
| | - Jingyuan Zhang
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China
| | - You Zhang
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China; Current address: Department of Medical Laboratory, The Second Affiliated Hospital, Hainan Medical University, Haikou, 57000, China
| | - Abulimiti Moming
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China; Xinjiang Key Laboratory of Vector-borne Infectious Diseases, Urumqi, 830002, China
| | - Zhengyuan Su
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China
| | - Jiayin Jin
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China
| | - Yingying Ge
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China
| | - Jun Wang
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China
| | - Zhaojun Fan
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China
| | - Shuang Tang
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China
| | - Shu Shen
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China; Hubei Jiangxia Laboratory, Wuhan, 430200, China; Xinjiang Key Laboratory of Vector-borne Infectious Diseases, Urumqi, 830002, China.
| | - Fei Deng
- Key Laboratory of Virology and Biosafety and National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, 430071, China.
| |
Collapse
|
4
|
Singh V, Kirtipal N, Song B, Lee S. Normalization of RNA-Seq data using adaptive trimmed mean with multi-reference. Brief Bioinform 2024; 25:bbae241. [PMID: 38770720 PMCID: PMC11107385 DOI: 10.1093/bib/bbae241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/04/2024] [Accepted: 05/07/2024] [Indexed: 05/22/2024] Open
Abstract
The normalization of RNA sequencing data is a primary step for downstream analysis. The most popular method used for the normalization is the trimmed mean of M values (TMM) and DESeq. The TMM tries to trim away extreme log fold changes of the data to normalize the raw read counts based on the remaining non-deferentially expressed genes. However, the major problem with the TMM is that the values of trimming factor M are heuristic. This paper tries to estimate the adaptive value of M in TMM based on Jaeckel's Estimator, and each sample acts as a reference to find the scale factor of each sample. The presented approach is validated on SEQC, MAQC2, MAQC3, PICKRELL and two simulated datasets with two-group and three-group conditions by varying the percentage of differential expression and the number of replicates. The performance of the present approach is compared with various state-of-the-art methods, and it is better in terms of area under the receiver operating characteristic curve and differential expression.
Collapse
Affiliation(s)
- Vikas Singh
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| | - Nikhil Kirtipal
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| | - Byeongsop Song
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| | - Sunjae Lee
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| |
Collapse
|
5
|
Zou C, Tan H, Huang K, Zhai R, Yang M, Huang A, Wei X, Mo R, Xiong F. Physiological Characteristic Changes and Transcriptome Analysis of Maize ( Zea mays L.) Roots under Drought Stress. Int J Genomics 2024; 2024:5681174. [PMID: 38269194 PMCID: PMC10807950 DOI: 10.1155/2024/5681174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 10/08/2023] [Accepted: 12/18/2023] [Indexed: 01/26/2024] Open
Abstract
Water deficit is a key limiting factor for limiting yield in maize (Zea mays L.). It is crucial to elucidate the molecular regulatory networks of stress tolerance for genetic enhancement of drought tolerance. The mechanism of drought tolerance of maize was explored by comparing physiological and transcriptomic data under normal conditions and drought treatment at polyethylene glycol- (PEG-) induced drought stress (5%, 10%, 15%, and 20%) in the root during the seedling stage. The content of saccharide, SOD, CAT, and MDA showed an upward trend, proteins showed a downward trend, and the levels of POD first showed an upward trend and then decreased. Compared with the control group, a total of 597, 2748, 6588, and 5410 differentially expressed genes were found at 5%, 10%, 15%, and 20% PEG, respectively, and 354 common DEGs were identified in these comparisons. Some differentially expressed genes were remarkably enriched in the MAPK signaling pathway and plant hormone signal transduction. The 50 transcription factors (TFs) divided into 15 categories were screened from the 354 common DEGs during drought stress. Auxin response factor 10 (ARF10), auxin-responsive protein IAA9 (IAA9), auxin response factor 14 (ARF14), auxin-responsive protein IAA1 (IAA1), auxin-responsive protein IAA27 (IAA27), and 1 ethylene response sensor 2 (ERS2) were upregulated. The two TFs, including bHLH 35 and bHLH 96, involved in the MAPK signal pathway and plant hormones pathway, are significantly upregulated in 5%, 10%, 15%, and 20% PEG stress groups. The present study provides greater insight into the fundamental transcriptome reprogramming of grain crops under drought.
Collapse
Affiliation(s)
- Chenglin Zou
- Maize Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, Guangxi, China
| | - Hua Tan
- Maize Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, Guangxi, China
| | - Kaijian Huang
- Maize Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, Guangxi, China
| | - Ruining Zhai
- Maize Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, Guangxi, China
| | - Meng Yang
- Maize Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, Guangxi, China
| | - Aihua Huang
- Maize Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, Guangxi, China
| | - Xinxing Wei
- Maize Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, Guangxi, China
| | - Runxiu Mo
- Maize Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, Guangxi, China
| | - Faqian Xiong
- Cash Crops Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, Guangxi, China
| |
Collapse
|
6
|
Wang G, Tian X, Peng R, Huang Y, Li Y, Li Z, Hu X, Luo Z, Zhang Y, Cui X, Niu L, Lu G, Yang F, Gao L, Chan JFW, Jin Q, Yin F, Tang C, Ren Y, Du J. Genomic and phylogenetic profiling of RNA of tick-borne arboviruses in Hainan Island, China. Microbes Infect 2024; 26:105218. [PMID: 37714509 DOI: 10.1016/j.micinf.2023.105218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 09/11/2023] [Accepted: 09/11/2023] [Indexed: 09/17/2023]
Abstract
Ticks act as vectors and hosts of numerous arboviruses. Examples of medically important arboviruses include the tick-borne encephalitis virus, Crimean Congo hemorrhagic fever, and severe fever with thrombocytopenia syndrome. Recently, some novel arboviruses have been identified in blood specimens of patients with unexplained fever and a history of tick bites in Inner Mongolia. Consequently, tick-borne viruses are a major focus of infectious disease research. However, the spectrum of tick-borne viruses in subtropical areas of China has yet to be sufficiently characterized. In this study, we collected 855 ticks from canine and bovine hosts in four locations in Hainan Province. The ticks were combined into 18 pools according to genus and location. Viral RNA-sequence libraries were subjected to transcriptome sequencing analysis. Molecular clues from metagenomic analyses were used to classify sequence reads into virus species, genera, or families. The diverse viral reads closely associated with mammals were assigned to 12 viral families and important tick-borne viruses, such as Jingmen, Beiji nairovirus, and Colorado tick fever. Our virome and phylogenetic analyses of the arbovirus strains provide basic data for preventing and controlling human infectious diseases caused by tick-borne viruses in the subtropical areas of China.
Collapse
Affiliation(s)
- Gaoyu Wang
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Xiuying Tian
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Ruoyan Peng
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Yi Huang
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Youyou Li
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Zihan Li
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China; Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Xiaoyuan Hu
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Zufen Luo
- Department of Infectious Disease, the Second Affiliated Hospital of Hainan Medical University, Haikou, 570216, China
| | - Yun Zhang
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Xiuji Cui
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Lina Niu
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Gang Lu
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Fan Yang
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China
| | - Lei Gao
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China
| | - Jasper Fuk-Woo Chan
- State Key Laboratory of Emerging Infectious Diseases, Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
| | - Qi Jin
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China
| | - Feifei Yin
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China
| | - Chuanning Tang
- Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China.
| | - Yi Ren
- Haikou Maternal and Child Health Hospital, Haikou, 570102, China.
| | - Jiang Du
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China; Hainan Medical University-The University of Hong Kong Joint Laboratory of Tropical Infectious Diseases, Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, 571199, China.
| |
Collapse
|
7
|
Xia Y. Statistical normalization methods in microbiome data with application to microbiome cancer research. Gut Microbes 2023; 15:2244139. [PMID: 37622724 PMCID: PMC10461514 DOI: 10.1080/19490976.2023.2244139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 07/12/2023] [Accepted: 07/31/2023] [Indexed: 08/26/2023] Open
Abstract
Mounting evidence has shown that gut microbiome is associated with various cancers, including gastrointestinal (GI) tract and non-GI tract cancers. But microbiome data have unique characteristics and pose major challenges when using standard statistical methods causing results to be invalid or misleading. Thus, to analyze microbiome data, it not only needs appropriate statistical methods, but also requires microbiome data to be normalized prior to statistical analysis. Here, we first describe the unique characteristics of microbiome data and the challenges in analyzing them (Section 2). Then, we provide an overall review on the available normalization methods of 16S rRNA and shotgun metagenomic data along with examples of their applications in microbiome cancer research (Section 3). In Section 4, we comprehensively investigate how the normalization methods of 16S rRNA and shotgun metagenomic data are evaluated. Finally, we summarize and conclude with remarks on statistical normalization methods (Section 5). Altogether, this review aims to provide a broad and comprehensive view and remarks on the promises and challenges of the statistical normalization methods in microbiome data with microbiome cancer research examples.
Collapse
Affiliation(s)
- Yinglin Xia
- Division of Gastroenterology and Hepatology, Department of Medicine, University of Illinois Chicago, Chicago, USA
| |
Collapse
|
8
|
Hu Y, Wang L, Yang G, Wang S, Guo M, Lu H, Zhang T. VDR promotes testosterone synthesis in mouse Leydig cells via regulation of cholesterol side chain cleavage cytochrome P450 (Cyp11a1) expression. Genes Genomics 2023; 45:1377-1387. [PMID: 37747642 DOI: 10.1007/s13258-023-01444-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Accepted: 09/30/2022] [Indexed: 09/26/2023]
Abstract
BACKGROUND The vitamin D receptor (VDR) mediates the pleiotropic biological actions that include osteoporosis, immune responses and androgen synthesis.VDR is widely expressed in testis cells such as Leydig cells, Sertoli cells, and sperm. The levels of steroids are critical for sexual development. In the early stage of steroidogenesis, cholesterol is converted to pregnenolone (precursor of most steroid hormones) by cholesterol side-chain lyase (CYP11A1), which eventually synthesizes the male hormone testosterone. OBJECTIVE This study aims to reveal how VDR regulates CYP11A1 expression and affects testosterone synthesis in murine Leydig cells. METHODS The levels of VDR, CYP11A1 were determined by quantitative real-time polymerase chain reaction (RT-qPCR) or western blot. Targeted relationship between VDR and Cyp11a1 was evaluated by dual-luciferase reporter assay. The levels of testosterone concentrations in cell culture media serum by enzyme-linked immunosorbent assay (ELISA). RESULTS Phylogenetic and motif analysis showed that the Cyp11a1 family had sequence loss, which may have special biological functions during evolution. The results of promoter prediction showed that vitamin D response element (VDRE) existed in the upstream promoter region of murine Cyp11a1. Dual-luciferase assay confirmed that VDR could bind candidate VDREs in upstream region of Cyp11a1, and enhance gene expression. Tissue distribution and localizatio analysis showed that Cyp11a1 was mainly expressed in testis, and dominantly existed in murine Leydig cells. Furthermore, over-expression VDR and CYP11A1 significantly increased testosterone synthesis in mice Leydig cells. CONCLUSIONS Active vitamin D3 (VD3) and Vdr interference treatment showed that VD3/VDR had a positive regulatory effect on Cyp11a1 expression and testosterone secretion. VDR promotes testosterone synthesis in male mice by up-regulating Cyp11a1 expression, which played an important role for male reproduction.
Collapse
Affiliation(s)
- Yuanyuan Hu
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Ling Wang
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
- Shaanxi Province Key Laboratory of Bio-Resources, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Ge Yang
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Shanshan Wang
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Miaomiao Guo
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Hongzhao Lu
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China
- Qinba State Key Laboratory of Biological Resources and Ecological Environment, Shaanxi University of Technology, Hanzhong, 723001, China
| | - Tao Zhang
- School of Biological Science and Engineering, Shaanxi University of Technology, Hanzhong, 723001, China.
- QinLing-Bashan Mountains Bioresources Comprehensive Development C. I. C., Shaanxi University of Technology, Hanzhong, 723001, China.
- Qinba State Key Laboratory of Biological Resources and Ecological Environment, Shaanxi University of Technology, Hanzhong, 723001, China.
| |
Collapse
|
9
|
Stokes T, Cen HH, Kapranov P, Gallagher IJ, Pitsillides AA, Volmar C, Kraus WE, Johnson JD, Phillips SM, Wahlestedt C, Timmons JA. Transcriptomics for Clinical and Experimental Biology Research: Hang on a Seq. ADVANCED GENETICS (HOBOKEN, N.J.) 2023; 4:2200024. [PMID: 37288167 PMCID: PMC10242409 DOI: 10.1002/ggn2.202200024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Indexed: 06/09/2023]
Abstract
Sequencing the human genome empowers translational medicine, facilitating transcriptome-wide molecular diagnosis, pathway biology, and drug repositioning. Initially, microarrays are used to study the bulk transcriptome; but now short-read RNA sequencing (RNA-seq) predominates. Positioned as a superior technology, that makes the discovery of novel transcripts routine, most RNA-seq analyses are in fact modeled on the known transcriptome. Limitations of the RNA-seq methodology have emerged, while the design of, and the analysis strategies applied to, arrays have matured. An equitable comparison between these technologies is provided, highlighting advantages that modern arrays hold over RNA-seq. Array protocols more accurately quantify constitutively expressed protein coding genes across tissue replicates, and are more reliable for studying lower expressed genes. Arrays reveal long noncoding RNAs (lncRNA) are neither sparsely nor lower expressed than protein coding genes. Heterogeneous coverage of constitutively expressed genes observed with RNA-seq, undermines the validity and reproducibility of pathway analyses. The factors driving these observations, many of which are relevant to long-read or single-cell sequencing are discussed. As proposed herein, a reappreciation of bulk transcriptomic methods is required, including wider use of the modern high-density array data-to urgently revise existing anatomical RNA reference atlases and assist with more accurate study of lncRNAs.
Collapse
Affiliation(s)
- Tanner Stokes
- Faculty of ScienceMcMaster UniversityHamiltonL8S 4L8Canada
| | - Haoning Howard Cen
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | - Iain J Gallagher
- School of Applied SciencesEdinburgh Napier UniversityEdinburghEH11 4BNUK
| | | | | | | | - James D. Johnson
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | | | - James A. Timmons
- Miller School of MedicineUniversity of MiamiMiamiFL33136USA
- William Harvey Research InstituteQueen Mary University LondonLondonEC1M 6BQUK
- Augur Precision Medicine LTDStirlingFK9 5NFUK
| |
Collapse
|
10
|
Altay G, Zapardiel-Gonzalo J, Peters B. RNA-seq preprocessing and sample size considerations for gene network inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.02.522518. [PMID: 36711979 PMCID: PMC9881880 DOI: 10.1101/2023.01.02.522518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Background Gene network inference (GNI) methods have the potential to reveal functional relationships between different genes and their products. Most GNI algorithms have been developed for microarray gene expression datasets and their application to RNA-seq data is relatively recent. As the characteristics of RNA-seq data are different from microarray data, it is an unanswered question what preprocessing methods for RNA-seq data should be applied prior to GNI to attain optimal performance, or what the required sample size for RNA-seq data is to obtain reliable GNI estimates. Results We ran 9144 analysis of 7 different RNA-seq datasets to evaluate 300 different preprocessing combinations that include data transformations, normalizations and association estimators. We found that there was no single best performing preprocessing combination but that there were several good ones. The performance varied widely over various datasets, which emphasized the importance of choosing an appropriate preprocessing configuration before GNI. Two preprocessing combinations appeared promising in general: First, Log-2 TPM (transcript per million) with Variance-stabilizing transformation (VST) and Pearson Correlation Coefficient (PCC) association estimator. Second, raw RNA-seq count data with PCC. Along with these two, we also identified 18 other good preprocessing combinations. Any of these algorithms might perform best in different datasets. Therefore, the GNI performances of these approaches should be measured on any new dataset to select the best performing one for it. In terms of the required biological sample size of RNA-seq data, we found that between 30 to 85 samples were required to generate reliable GNI estimates. Conclusions This study provides practical recommendations on default choices for data preprocessing prior to GNI analysis of RNA-seq data to obtain optimal performance results.
Collapse
Affiliation(s)
- Gökmen Altay
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | | | - Bjoern Peters
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| |
Collapse
|
11
|
Lucena-Leandro VS, Abreu EFA, Vidal LA, Torres CR, Junqueira CICVF, Dantas J, Albuquerque ÉVS. Current Scenario of Exogenously Induced RNAi for Lepidopteran Agricultural Pest Control: From dsRNA Design to Topical Application. Int J Mol Sci 2022; 23:ijms232415836. [PMID: 36555476 PMCID: PMC9785151 DOI: 10.3390/ijms232415836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 11/24/2022] [Accepted: 11/25/2022] [Indexed: 12/24/2022] Open
Abstract
Invasive insects cost the global economy around USD 70 billion per year. Moreover, increasing agricultural insect pests raise concerns about global food security constraining and infestation rising after climate changes. Current agricultural pest management largely relies on plant breeding-with or without transgenes-and chemical pesticides. Both approaches face serious technological obsolescence in the field due to plant resistance breakdown or development of insecticide resistance. The need for new modes of action (MoA) for managing crop health is growing each year, driven by market demands to reduce economic losses and by consumer demand for phytosanitary measures. The disabling of pest genes through sequence-specific expression silencing is a promising tool in the development of environmentally-friendly and safe biopesticides. The specificity conferred by long dsRNA-base solutions helps minimize effects on off-target genes in the insect pest genome and the target gene in non-target organisms (NTOs). In this review, we summarize the status of gene silencing by RNA interference (RNAi) for agricultural control. More specifically, we focus on the engineering, development and application of gene silencing to control Lepidoptera through non-transforming dsRNA technologies. Despite some delivery and stability drawbacks of topical applications, we reviewed works showing convincing proof-of-concept results that point to innovative solutions. Considerations about the regulation of the ongoing research on dsRNA-based pesticides to produce commercialized products for exogenous application are discussed. Academic and industry initiatives have revealed a worthy effort to control Lepidoptera pests with this new mode of action, which provides more sustainable and reliable technologies for field management. New data on the genomics of this taxon may contribute to a future customized target gene portfolio. As a case study, we illustrate how dsRNA and associated methodologies could be applied to control an important lepidopteran coffee pest.
Collapse
Affiliation(s)
| | | | - Leonardo A. Vidal
- Embrapa Recursos Genéticos e Biotecnologia, Brasília 70770-917, DF, Brazil
- Department of Cellular Biology, Institute of Biological Sciences, Campus Darcy Ribeiro, Universidade de Brasília—UnB, Brasília 70910-9002, DF, Brazil
| | - Caroline R. Torres
- Embrapa Recursos Genéticos e Biotecnologia, Brasília 70770-917, DF, Brazil
- Department of Agronomy and Veterinary Medicine, Campus Darcy Ribeiro, Universidade de Brasília—UnB, Brasília 70910-9002, DF, Brazil
| | - Camila I. C. V. F. Junqueira
- Embrapa Recursos Genéticos e Biotecnologia, Brasília 70770-917, DF, Brazil
- Department of Agronomy and Veterinary Medicine, Campus Darcy Ribeiro, Universidade de Brasília—UnB, Brasília 70910-9002, DF, Brazil
| | - Juliana Dantas
- Embrapa Recursos Genéticos e Biotecnologia, Brasília 70770-917, DF, Brazil
| | | |
Collapse
|
12
|
Costa-Silva J, Domingues DS, Menotti D, Hungria M, Lopes FM. Temporal progress of gene expression analysis with RNA-Seq data: A review on the relationship between computational methods. Comput Struct Biotechnol J 2022; 21:86-98. [PMID: 36514333 PMCID: PMC9730150 DOI: 10.1016/j.csbj.2022.11.051] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/03/2022] Open
Abstract
Analysis of differential gene expression from RNA-seq data has become a standard for several research areas. The steps for the computational analysis include many data types and file formats, and a wide variety of computational tools that can be applied alone or together as pipelines. This paper presents a review of the differential expression analysis pipeline, addressing its steps and the respective objectives, the principal methods available in each step, and their properties, therefore introducing an organized overview to this context. This review aims to address mainly the aspects involved in the differentially expressed gene (DEG) analysis from RNA sequencing data (RNA-seq), considering the computational methods. In addition, a timeline of the computational methods for DEG is shown and discussed, and the relationships existing between the most important computational tools are presented by an interaction network. A discussion on the challenges and gaps in DEG analysis is also highlighted in this review. This paper will serve as a tutorial for new entrants into the field and help established users update their analysis pipelines.
Collapse
Affiliation(s)
- Juliana Costa-Silva
- Department of Informatics – Federal University of Paraná, Rua Coronel Francisco Heráclito dos Santos, 100, 81531-990 Curitiba, Paraná, Brazil
| | - Douglas S. Domingues
- Department of Genetics, “Luiz de Queiroz” College of Agriculture, University of São Paulo, Av. Pádua Dias, 11, 13418-900 Piracicaba, São Paulo, Brazil
| | - David Menotti
- Department of Informatics – Federal University of Paraná, Rua Coronel Francisco Heráclito dos Santos, 100, 81531-990 Curitiba, Paraná, Brazil
| | - Mariangela Hungria
- Department of Soil Biotecnology - Embrapa Soybean, Cx. Postal 231, 86000-970 Londrina, Paraná, Brazil
| | - Fabrício Martins Lopes
- Department of Computer Science, Universidade Tecnológica Federal do Paraná – UTFPR, Av. Alberto Carazzai, 1640, 86300-000, Cornélio Procópio, Paraná, Brazil
| |
Collapse
|
13
|
Kochhar P, Vukku M, Rajashekhar R, Mukhopadhyay A. microRNA signatures associated with fetal growth restriction: a systematic review. Eur J Clin Nutr 2022; 76:1088-1102. [PMID: 34741137 DOI: 10.1038/s41430-021-01041-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 10/17/2021] [Accepted: 10/19/2021] [Indexed: 12/20/2022]
Abstract
Placental-origin microRNA (miRNA) profiles can be useful toward early diagnosis and management of fetal growth restriction (FGR) and associated complications. We conducted a systematic review to identify case-control studies that have examined miRNA signatures associated with human FGR. We systematically searched PubMed and ScienceDirect databases for relevant articles and manually searched reference lists of the relevant articles till May 18th, 2021. Of the 2133 studies identified, 21 were included. FGR-associated upregulation of miR-210 and miR-424 and downregulation of a placenta-specific miRNA cluster miRNA located on C19MC (miR-518b, miR-519d) and miR-221-3p was reported by >1 included studies. Analysis of the target genes of these miRNA as well as pathway analysis pointed to the involvement of angiogenesis and growth signaling pathways, such as the phosphatidylinositol 3-kinase- protein kinase B (PI3K-Akt) pathway. Only 3 out of the 21 included studies reported FGR-associated miRNAs in matched placental and maternal blood samples. We conclude that FGR-associated placental miRNAs could be utilized to inform clinical practice towards early diagnosis of FGR, provided enough evidence from studies on matched placental and maternal blood samples become available.Prospective Register of Systematic Reviews (PROSPERO) registration number: CRD42019136762.
Collapse
Affiliation(s)
- P Kochhar
- Division of Nutrition, St. John's Research Institute, A Recognized Research Centre of University of Mysore, Bangalore, India
| | - M Vukku
- Division of Nutrition, St. John's Research Institute, A Recognized Research Centre of University of Mysore, Bangalore, India
| | - R Rajashekhar
- Division of Nutrition, St. John's Research Institute, A Recognized Research Centre of University of Mysore, Bangalore, India.,Department of Neurology, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, India
| | - A Mukhopadhyay
- Division of Nutrition, St. John's Research Institute, A Recognized Research Centre of University of Mysore, Bangalore, India.
| |
Collapse
|
14
|
Thawng CN, Smith GB. A transcriptome software comparison for the analyses of treatments expected to give subtle gene expression responses. BMC Genomics 2022; 23:452. [PMID: 35725382 PMCID: PMC9208185 DOI: 10.1186/s12864-022-08673-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 05/26/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In this comparative study we evaluate the performance of four software tools: DNAstar-D (DESeq2), DNAstar-E (edgeR), CLC Genomics and Partek Flow for identification of differentially expressed genes (DEGs) using a transcriptome of E. coli. The RNA-seq data are from the effect of below-background radiation 5.5 nGy total dose (0.2nGy/hr) on E. coli grown shielded from natural radiation 655 m below ground in a pre-World War II steel vault. The gene expression response to three supplemented sources of radiation designed to mimic natural background, 1952 - 5720 nGy in total dose (71-208 nGy/hr), are compared to this "radiation-deprived" treatment. In addition, RNA-seq data of Caenorhabditis elegans nematode from similar radiation treatments was analyzed by three of the software packages. RESULTS In E. coli, the four software programs identified one of the supplementary sources of radiation (KCl) to evoke about 5 times more transcribed genes than the minus-radiation treatment (69-114 differentially expressed genes, DEGs), and so the rest of the analyses used this KCl vs "Minus" comparison. After imposing a 30-read minimum cutoff, one of the DNAStar options shared two of the three steps (mapping, normalization, and statistic) with Partek Flow (they both used median of ratios to normalize and the DESeq2 statistical package), and these two programs identified the highest number of DEGs in common with each other (53). In contrast, when the programs used different approaches in each of the three steps, between 31 and 40 DEGs were found in common. Regarding the extent of expression differences, three of the four programs gave high fold-change results (15-178 fold), but one (DNAstar's DESeq2) resulted in more conservative fold-changes (1.5-3.5). In a parallel study comparing three qPCR commercial validation software programs, these programs also gave variable results as to which genes were significantly regulated. Similarly, the C. elegans analysis showed exaggerated fold-changes in CLC and DNAstar's edgeR while DNAstar-D was more conservative. CONCLUSIONS Regarding the extent of expression (fold-change), and considering the subtlety of the very low level radiation treatments, in E. coli three of the four programs gave what we consider exaggerated fold-change results (15 - 178 fold), but one (DNAstar's DESeq2) gave more realistic fold-changes (1.5-3.5). When RT-qPCR validation comparisons to transcriptome results were carried out, they supported the more conservative DNAstar-D's expression results. When another model organism's (nematode) response to these radiation differences was similarly analyzed, DNAstar-D also resulted in the most conservative expression patterns. Therefore, we would propose DESeq2 ("DNAstar-D") as an appropriate software tool for differential gene expression studies for treatments expected to give subtle transcriptome responses.
Collapse
Affiliation(s)
- Cung Nawl Thawng
- Biology Department and Molecular Biology Program, New Mexico State University, Las Cruces, NM, USA
| | - Geoffrey Battle Smith
- Biology Department and Molecular Biology Program, New Mexico State University, Las Cruces, NM, USA.
| |
Collapse
|
15
|
Analysis of Gut Microbiome Structure Based on GMPR+Spectrum. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12125895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The gut microbiome is related to many major human diseases, and it is of great significance to study the structure of the gut microbiome under different conditions. Multivariate statistics or pattern recognition methods were often used to identify different structural patterns in gut microbiome data. However, these methods have some limitations. Minimal hepatic encephalopathy (MHE) datasets were taken as an example. Due to the physical lack or insufficient sampling of the gut microbiome in the sequencing process, the microbiome data contains many zeros. Therefore, the geometric mean of pairwise ratios (GMPR) was used to normalize gut microbiome data, then Spectrum was used to analyze the structure of the gut microbiome, and lastly, the structure of core microflora was compared with Network analysis. GMPR calculates the Intraclass correlation coefficient (ICC), whose reproducibility was significantly better than other normalization methods. In addition, running-time, Normalized Mutual Information (NMI), Davies-Boulding Index (DBI), and Calinski-Harabasz index (CH) of GMPR+Spectrum were far superior to other clustering algorithms such as M3C, iClusterPlus. GMPR+Spectrum can not only perform better but also effectively identify the structural differences of intestinal microbiota in different patients and excavate the unique critical bacteria such as Akkermansia, and Lactobacillus in MHE patients, which may provide a new reference for the study of the gut microbiome in disease.
Collapse
|
16
|
Sun X, Xu H, Liu G, Chen J, Xu J, Li M, Liu L. A Robust Immuno-Prognostic Model of Non-Muscle-Invasive Bladder Cancer Indicates Dynamic Interaction in Tumor Immune Microenvironment Contributes to Cancer Progression. Front Genet 2022; 13:833989. [PMID: 35719408 PMCID: PMC9205430 DOI: 10.3389/fgene.2022.833989] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 04/28/2022] [Indexed: 12/24/2022] Open
Abstract
Non-muscle-invasive bladder cancer (NMIBC) accounts for more than 70% of urothelial cancer. More than half of NMIBC patients experience recurrence, progression, or metastasis, which essentially reduces life quality and survival time. Identifying the high-risk patients prone to progression remains the primary concern of risk management of NMIBC. In this study, we included 1370 NMIBC transcripts data from nine public datasets, identified nine tumor-infiltrating marker cells highly related to the survival of NMIBC, quantified the cells’ proportion by self-defined differentially expressed signature genes, and established a robust immuno-prognostic model dividing NMIBC patients into low-risk versus high-risk progression groups. Our model implies that the loss of crosstalk between tumor cells and adjacent normal epithelium, along with enriched cell proliferation signals, may facilitate tumor progression. Thus, evaluating tumor progression should consider various components in the tumor immune microenvironment instead of the single marker in a single dimension. Moreover, we also appeal to the necessity of using appropriate meta-analysis methods to integrate the evidence from multiple sources in the feature selection step from large-scale heterogeneous omics data such as our study.
Collapse
Affiliation(s)
- Xiaomeng Sun
- Institutes of Biomedical Sciences and School of Basic Medical Sciences, Fudan University, Shanghai, China
- Research Institute, GloriousMed Clinical Laboratory Co., Ltd., Shanghai, China
| | - Huilin Xu
- Institutes of Biomedical Sciences and School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Gang Liu
- Institutes of Biomedical Sciences and School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Jiani Chen
- Department of Pharmacy, Second Affiliated Hospital of Naval Medical University, Shanghai, China
| | - Jinrong Xu
- Department of Electronic Engineering, Taiyuan Institute of Technology, Taiyuan, China
- *Correspondence: Jinrong Xu, ; Mingming Li, ; Lei Liu,
| | - Mingming Li
- Department of Pharmacy, Second Affiliated Hospital of Naval Medical University, Shanghai, China
- *Correspondence: Jinrong Xu, ; Mingming Li, ; Lei Liu,
| | - Lei Liu
- Institutes of Biomedical Sciences and School of Basic Medical Sciences, Fudan University, Shanghai, China
- *Correspondence: Jinrong Xu, ; Mingming Li, ; Lei Liu,
| |
Collapse
|
17
|
Vandenbon A. Evaluation of critical data processing steps for reliable prediction of gene co-expression from large collections of RNA-seq data. PLoS One 2022; 17:e0263344. [PMID: 35089979 PMCID: PMC8797241 DOI: 10.1371/journal.pone.0263344] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 01/16/2022] [Indexed: 11/19/2022] Open
Abstract
Motivation Gene co-expression analysis is an attractive tool for leveraging enormous amounts of public RNA-seq datasets for the prediction of gene functions and regulatory mechanisms. However, the optimal data processing steps for the accurate prediction of gene co-expression from such large datasets remain unclear. Especially the importance of batch effect correction is understudied. Results We processed RNA-seq data of 68 human and 76 mouse cell types and tissues using 50 different workflows into 7,200 genome-wide gene co-expression networks. We then conducted a systematic analysis of the factors that result in high-quality co-expression predictions, focusing on normalization, batch effect correction, and measure of correlation. We confirmed the key importance of high sample counts for high-quality predictions. However, choosing a suitable normalization approach and applying batch effect correction can further improve the quality of co-expression estimates, equivalent to a >80% and >40% increase in samples. In larger datasets, batch effect removal was equivalent to a more than doubling of the sample size. Finally, Pearson correlation appears more suitable than Spearman correlation, except for smaller datasets. Conclusion A key point for accurate prediction of gene co-expression is the collection of many samples. However, paying attention to data normalization, batch effects, and the measure of correlation can significantly improve the quality of co-expression estimates.
Collapse
Affiliation(s)
- Alexis Vandenbon
- Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto, Japan
- Institute for Liberal Arts and Sciences, Kyoto University, Kyoto, Japan
- * E-mail:
| |
Collapse
|
18
|
Khan RIN, Sahu AR, Malla WA, Praharaj MR, Hosamani N, Kumar S, Gupta S, Sharma S, Saxena A, Varshney A, Singh P, Verma V, Kumar P, Singh G, Pandey A, Saxena S, Gandham RK, Tiwari AK. Systems biology under heat stress in Indian cattle. Gene 2021; 805:145908. [PMID: 34411649 DOI: 10.1016/j.gene.2021.145908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 08/11/2021] [Accepted: 08/13/2021] [Indexed: 11/26/2022]
Abstract
Transcriptome profiling of Vrindavani and Tharparkar cattle (n = 5 each) revealed that more numbers of genes were dysregulated in Vrindavani than in Tharparkar. A contrast in gene expression was observed with 18.9 % of upregulated genes in Vrindavani downregulated in Tharparkar and 17.8% upregulated genes in Tharparkar downregulated in Vrindavani. Functional annotation of genes differentially expressed in Tharparkar and Vrindavani revealed that the systems biology in Tharparkar is moving towards counteracting the effects due to heat stress. Unlike Vrindavani, Tharparkar is not only endowed with higher expression of the scavengers (UBE2G1, UBE2S, and UBE2H) of misfolded proteins but also with protectors (VCP, Serp1, and CALR) of naïve unfolded proteins. Further, higher expression of the antioxidants in Tharparkar enables it to cope up with higher levels of free radicals generated as a result of heat stress. In this study, we found relevant genes dysregulated in Tharparkar in the direction that can counter heat stress.
Collapse
Affiliation(s)
- Raja Ishaq Nabi Khan
- Division of Veterinary Biotechnology, Indian Veterinary Research Institute, Bareilly, India
| | - Amit Ranjan Sahu
- Division of Veterinary Biotechnology, Indian Veterinary Research Institute, Bareilly, India
| | - Waseem Akram Malla
- Division of Veterinary Biotechnology, Indian Veterinary Research Institute, Bareilly, India
| | - Manas Ranjan Praharaj
- Computational Biology and Genomics, National Institute of Animal Biotechnology, Hyderabad, India
| | - Neelima Hosamani
- Computational Biology and Genomics, National Institute of Animal Biotechnology, Hyderabad, India
| | - Shakti Kumar
- Computational Biology and Genomics, National Institute of Animal Biotechnology, Hyderabad, India
| | - Smita Gupta
- Division of Veterinary Biotechnology, Indian Veterinary Research Institute, Bareilly, India
| | - Shweta Sharma
- Division of Veterinary Biotechnology, Indian Veterinary Research Institute, Bareilly, India
| | - Archana Saxena
- Division of Veterinary Biotechnology, Indian Veterinary Research Institute, Bareilly, India
| | - Anshul Varshney
- Division of Veterinary Biotechnology, Indian Veterinary Research Institute, Bareilly, India
| | - Pragya Singh
- Division of Veterinary Biotechnology, Indian Veterinary Research Institute, Bareilly, India
| | - Vinay Verma
- Division of Physiology and Climatology, Indian Veterinary Research Institute, Bareilly, India
| | - Puneet Kumar
- Division of Physiology and Climatology, Indian Veterinary Research Institute, Bareilly, India
| | - Gyanendra Singh
- Division of Physiology and Climatology, Indian Veterinary Research Institute, Bareilly, India
| | - Aruna Pandey
- Division of Veterinary Biotechnology, Indian Veterinary Research Institute, Bareilly, India
| | - Shikha Saxena
- Division of Veterinary Biotechnology, Indian Veterinary Research Institute, Bareilly, India
| | - Ravi Kumar Gandham
- Computational Biology and Genomics, National Institute of Animal Biotechnology, Hyderabad, India.
| | - Ashok Kumar Tiwari
- Division of Biological Standardization, Indian Veterinary Research Institute, Bareilly, India.
| |
Collapse
|
19
|
Helmy M, Agrawal R, Ali J, Soudy M, Bui TT, Selvarajoo K. GeneCloudOmics: A Data Analytic Cloud Platform for High-Throughput Gene Expression Analysis. FRONTIERS IN BIOINFORMATICS 2021; 1:693836. [PMID: 36303746 PMCID: PMC9581002 DOI: 10.3389/fbinf.2021.693836] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 10/14/2021] [Indexed: 11/18/2022] Open
Abstract
Gene expression profiling techniques, such as DNA microarray and RNA-Sequencing, have provided significant impact on our understanding of biological systems. They contribute to almost all aspects of biomedical research, including studying developmental biology, host-parasite relationships, disease progression and drug effects. However, the high-throughput data generations present challenges for many wet experimentalists to analyze and take full advantage of such rich and complex data. Here we present GeneCloudOmics, an easy-to-use web server for high-throughput gene expression analysis that extends the functionality of our previous ABioTrans with several new tools, including protein datasets analysis, and a web interface. GeneCloudOmics allows both microarray and RNA-Seq data analysis with a comprehensive range of data analytics tools in one package that no other current standalone software or web-based tool can do. In total, GeneCloudOmics provides the user access to 23 different data analytical and bioinformatics tasks including reads normalization, scatter plots, linear/non-linear correlations, PCA, clustering (hierarchical, k-means, t-SNE, SOM), differential expression analyses, pathway enrichments, evolutionary analyses, pathological analyses, and protein-protein interaction (PPI) identifications. Furthermore, GeneCloudOmics allows the direct import of gene expression data from the NCBI Gene Expression Omnibus database. The user can perform all tasks rapidly through an intuitive graphical user interface that overcomes the hassle of coding, installing tools/packages/libraries and dealing with operating systems compatibility and version issues, complications that make data analysis tasks challenging for biologists. Thus, GeneCloudOmics is a one-stop open-source tool for gene expression data analysis and visualization. It is freely available at http://combio-sifbi.org/GeneCloudOmics.
Collapse
Affiliation(s)
- Mohamed Helmy
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada
| | - Rahul Agrawal
- Department of Geology and Geophysics, Indian Institute of Technology (IIT) Kharagpur, Kharagpur, India
| | - Javed Ali
- Department of Geology and Geophysics, Indian Institute of Technology (IIT) Kharagpur, Kharagpur, India
| | - Mohamed Soudy
- Proteomics and Metabolomics Unit, Children Cancer Hospital (CCHE-57357), Cairo, Egypt
| | - Thuy Tien Bui
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Kumar Selvarajoo
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Singapore Institute of Food and Biotechnology Innovation (SIFBI), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore (NUS), Singapore, Singapore
- *Correspondence: Kumar Selvarajoo,
| |
Collapse
|
20
|
Constructing a Defined Starter for Multispecies Vinegar Fermentation via Evaluating the Vitality and Dominance of Functional Microbes in Autochthonous Starter. Appl Environ Microbiol 2021; 88:e0217521. [PMID: 34818103 DOI: 10.1128/aem.02175-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Mature vinegar culture has usually been used as a type of autochthonous starter for rapidly initiate initiating the next batch of acetic acid fermentation (AAF) and maintaining the batch-to-batch uniformity of AAF in the production of traditional cereal vinegar. However, the vitality and dominance of functional microbes in autochthonous starters remain unclear, which hinders further improvement of fermentation yield and production. Here, based on metagenomic (MG), metatranscriptomic (MT), and 16S rRNA gene sequencings, 11 bacterial operational taxonomic units (OTUs) with significant metabolic activity (MT/MG ratio >1) and dominance (relative abundance >1%) were targeted in the autochthonous vinegar starter, all of which were assigned to 4 species (Acetobacter pasteurianus, Lactobacillus acetotolerans, L. helveticus, Acetilactobacillus jinshanensis). Then, we evaluated the successions and interactions of these 11 bacterial OTUs at different AAF stages. Last, a defined starter was constructed with 4 core species isolated from the autochthonous starter (A. pasteurianus, L. acetotolerans, L. helveticus, Ac. jinshanensis). The defined starter culture could rapidly initiate the AAF in a sterile or unsterilized environment and similar dynamics of metabolites (ethanol, titratable acidity, acetic acid, lactic acid, and volatile compounds) and environmental indexes (temperature, pH) of fermentation were observed as compared with that of autochthonous starter (P > 0.05). This work provides a method to construct a defined microbiota from a complex system while preserving its metabolic function. IMPORTANCE Complex microorganisms are beneficial to the flavor formation in natural food fermentation, but they also pose challenges to the mass production of standardized products. It is attractive to construct a defined starter to rapidly initiate fermentation process and significantly improve fermentation yield. This study provides a comprehensive understanding of vital and dominant species in the autochthonous vinegar starter via multi-omics, and designs a defined microbial community for the efficient fermentation of cereal vinegar.
Collapse
|
21
|
Sobreiro MB, Collevatti RG, Dos Santos YLA, Bandeira LF, Lopes FJF, Novaes E. RNA-Seq reveals different responses to drought in Neotropical trees from savannas and seasonally dry forests. BMC PLANT BIOLOGY 2021; 21:463. [PMID: 34641780 PMCID: PMC8507309 DOI: 10.1186/s12870-021-03244-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 09/24/2021] [Indexed: 05/24/2023]
Abstract
BACKGROUND Water is one of the main limiting factors for plant growth and crop productivity. Plants constantly monitor water availability and can rapidly adjust their metabolism by altering gene expression. This leads to phenotypic plasticity, which aids rapid adaptation to climate changes. Here, we address phenotypic plasticity under drought stress by analyzing differentially expressed genes (DEG) in four phylogenetically related neotropical Bignoniaceae tree species: two from savanna, Handroanthus ochraceus and Tabebuia aurea, and two from seasonally dry tropical forests (SDTF), Handroanthus impetiginosus and Handroanthus serratifolius. To the best of our knowledge, this is the first report of an RNA-Seq study comparing tree species from seasonally dry tropical forest and savanna ecosystems. RESULTS Using a completely randomized block design with 4 species × 2 treatments (drought and wet) × 3 blocks (24 plants) and an RNA-seq approach, we detected a higher number of DEGs between treatments for the SDTF species H. serratifolius (3153 up-regulated and 2821 down-regulated under drought) and H. impetiginosus (332 and 207), than for the savanna species. H. ochraceus showed the lowest number of DEGs, with only five up and nine down-regulated genes, while T. aurea exhibited 242 up- and 96 down-regulated genes. The number of shared DEGs among species was not related to habitat of origin or phylogenetic relationship, since both T. aurea and H impetiginosus shared a similar number of DEGs with H. serratifolius. All four species shared a low number of enriched gene ontology (GO) terms and, in general, exhibited different mechanisms of response to water deficit. We also found 175 down-regulated and 255 up-regulated transcription factors from several families, indicating the importance of these master regulators in drought response. CONCLUSION Our findings show that phylogenetically related species may respond differently at gene expression level to drought stress. Savanna species seem to be less responsive to drought at the transcriptional level, likely due to morphological and anatomical adaptations to seasonal drought. The species with the largest geographic range and widest edaphic-climatic niche, H. serratifolius, was the most responsive, exhibiting the highest number of DEG and up- and down-regulated transcription factors (TF).
Collapse
Affiliation(s)
- Mariane B Sobreiro
- Laboratório de Genética & Biodiversidade, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, GO, 74690-900, Brazil
| | - Rosane G Collevatti
- Laboratório de Genética & Biodiversidade, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, GO, 74690-900, Brazil
| | - Yuri L A Dos Santos
- Laboratório de Genética e Genômica de Plantas, Escola de Agronomia, Universidade Federal de Goiás, Goiânia, GO, 74690-900, Brazil
| | - Ludmila F Bandeira
- Laboratório de Genética e Genômica de Plantas, Escola de Agronomia, Universidade Federal de Goiás, Goiânia, GO, 74690-900, Brazil
| | - Francis J F Lopes
- Laboratório de Fisiologia Vegetal, Instituto de Ciências Biológicas, Universidade Federal de Goiás, Goiânia, GO, 74690-900, Brazil
| | - Evandro Novaes
- Laboratório de Genética Molecular, Departamento de Biologia, Universidade Federal de Lavras, Lavras, MG, 37200-900, Brazil.
| |
Collapse
|
22
|
Zhang Y, Hu B, Agwanda B, Fang Y, Wang J, Kuria S, Yang J, Masika M, Tang S, Lichoti J, Fan Z, Shi Z, Ommeh S, Wang H, Deng F, Shen S. Viromes and surveys of RNA viruses in camel-derived ticks revealing transmission patterns of novel tick-borne viral pathogens in Kenya. Emerg Microbes Infect 2021; 10:1975-1987. [PMID: 34570681 PMCID: PMC8525980 DOI: 10.1080/22221751.2021.1986428] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Tick-borne viruses (TBVs) capable of transmitting between ticks and hosts have been increasingly recognized as a global public health concern. In this study, Hyalomma ticks and serum samples from camels were collected using recorded sampling correlations in eastern Kenya. Viromes of pooled ticks were profiled by metagenomic sequencing, revealing a diverse community of viruses related to at least 11 families. Five highly abundant viruses, including three novel viruses (Iftin tick virus, Mbalambala tick virus [MATV], and Bangali torovirus [BanToV]) and new strains of previously identified viruses (Bole tick virus 4 [BLTV4] and Liman tick virus [LMTV]), were characterized in terms of genome sequences, organizations, and phylogeny, and their molecular prevalence was investigated in individual ticks. Moreover, viremia and antibody responses to these viruses have been investigated in camels. MATV, BLTV4, LMTV, and BanToV were identified as viral pathogens that can potentially cause zoonotic diseases. The transmission patterns of these viruses were summarized, suggesting three different types according to the sampling relationships between viral RNA-positive ticks and camels positive for viral RNA and/or antibodies. They also revealed the frequent transmission of BanToV and limited but effective transmission of other viruses between ticks and camels. Furthermore, follow-up surveys on TBVs from tick, animal, and human samples with definite sampling relationships are suggested. The findings revealed substantial threats from the emerging TBVs and may guide the prevention and control of TBV-related zoonotic diseases in Kenya and in other African countries.
Collapse
Affiliation(s)
- You Zhang
- State Key Laboratory of Virology and National Virus Resource Centre, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, People's Republic of China.,University of Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Ben Hu
- CAS Key Laboratory of Special Pathogens and Biosafety, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, People's Republic of China
| | - Bernard Agwanda
- Department of Zoology, National Museums of Kenya, Nairobi, Kenya
| | - Yaohui Fang
- State Key Laboratory of Virology and National Virus Resource Centre, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, People's Republic of China.,University of Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Jun Wang
- State Key Laboratory of Virology and National Virus Resource Centre, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, People's Republic of China
| | - Stephen Kuria
- Institute For Biotechnology Research (IBR), Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya
| | - Juan Yang
- State Key Laboratory of Virology and National Virus Resource Centre, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, People's Republic of China
| | - Moses Masika
- Department of Medical Microbiology, University of Nairobi Nairobi, Kenya
| | - Shuang Tang
- State Key Laboratory of Virology and National Virus Resource Centre, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, People's Republic of China
| | - Jacqueline Lichoti
- Directorate of Veterinary Services, State Department of Livestock, Ministry of Agriculture, Livestock, Fisheries and Irrigation, Nairobi, Kenya
| | - Zhaojun Fan
- State Key Laboratory of Virology and National Virus Resource Centre, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, People's Republic of China
| | - Zhengli Shi
- CAS Key Laboratory of Special Pathogens and Biosafety, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, People's Republic of China
| | - Sheila Ommeh
- Institute For Biotechnology Research (IBR), Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya
| | - Hualin Wang
- State Key Laboratory of Virology and National Virus Resource Centre, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, People's Republic of China
| | - Fei Deng
- State Key Laboratory of Virology and National Virus Resource Centre, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, People's Republic of China
| | - Shu Shen
- State Key Laboratory of Virology and National Virus Resource Centre, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, People's Republic of China
| |
Collapse
|
23
|
Genome-Wide Identification and Transcriptional Expression Profiles of Transcription Factor WRKY in Common Walnut ( Juglans regia L.). Genes (Basel) 2021; 12:genes12091444. [PMID: 34573426 PMCID: PMC8466090 DOI: 10.3390/genes12091444] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 09/07/2021] [Accepted: 09/17/2021] [Indexed: 11/16/2022] Open
Abstract
The transcription factor WRKY is widely distributed in the plant kingdom, playing a significant role in plant growth, development and response to stresses. Walnut is an economically important temperate tree species valued for both its edible nuts and high-quality wood, and its response to various stresses is an important factor that determines the quality of its fruit. However, in walnut trees themselves, information about the WRKY gene family remains scarce. In this paper, we perform a comprehensive study of the WRKY gene family in walnut. In total, we identified 103 WRKY genes in the common walnut that are clustered into 4 groups and distributed on 14 chromosomes. The conserved domains all contained a WRKY domain, and motif 2 was observed in most WRKYs, suggesting a high degree of conservation and similar functions within each subfamily. However, gene structure was significantly differentiated between different subfamilies. Synteny analysis indicates that there were 56 gene pairs in J. regia and A. thaliana, 76 in J. regia and J. mandshurica, 75 in J. regia and J. microcarpa, 76 in J. regia and P. trichocarpa, and 33 in J. regia and Q. robur, indicating that the WRKY gene family may come from a common ancestor. GO and KEGG enrichment analysis showed that the WRKY gene family was involved in resistance traits and the plant-pathogen interaction pathway. In anthracnose-resistant F26 fruits (AR) and anthracnose-susceptible F423 fruits (AS), transcriptome and qPCR analysis results showed that JrWRKY83, JrWRKY73 and JrWRKY74 were expressed significantly more highly in resistant cultivars, indicating that these three genes may be important contributors to stress resistance in walnut trees. Furthermore, we investigate how these three genes potentially target miRNAs and interact with proteins. JrWRKY73 was target by the miR156 family, including 12 miRNAs; this miRNA family targets WRKY genes to enhance plant defense. JrWRKY73 also interacted with the resistance gene AtMPK6, showing that it may play a crucial role in walnut defense.
Collapse
|
24
|
Assessment of reference genes at six different developmental stages of Schistosoma mansoni for quantitative RT-PCR. Sci Rep 2021; 11:16816. [PMID: 34413342 PMCID: PMC8376997 DOI: 10.1038/s41598-021-96055-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 07/31/2021] [Indexed: 12/13/2022] Open
Abstract
Reverse-transcription quantitative real-time polymerase chain reaction (RT-qPCR) is the most used, fast, and reproducible method to confirm large-scale gene expression data. The use of stable reference genes for the normalization of RT-qPCR assays is recognized worldwide. No systematic study for selecting appropriate reference genes for usage in RT-qPCR experiments comparing gene expression levels at different Schistosoma mansoni life-cycle stages has been performed. Most studies rely on genes commonly used in other organisms, such as actin, tubulin, and GAPDH. Therefore, the present study focused on identifying reference genes suitable for RT-qPCR assays across six S. mansoni developmental stages. The expression levels of 25 novel candidates that we selected based on the analysis of public RNA-Seq datasets, along with eight commonly used reference genes, were systematically tested by RT-qPCR across six developmental stages of S. mansoni (eggs, miracidia, cercariae, schistosomula, adult males and adult females). The stability of genes was evaluated with geNorm, NormFinder and RefFinder algorithms. The least stable candidate reference genes tested were actin, tubulin and GAPDH. The two most stable reference genes suitable for RT-qPCR normalization were Smp_101310 (Histone H4 transcription factor) and Smp_196510 (Ubiquitin recognition factor in ER-associated degradation protein 1). Performance of these two genes as normalizers was successfully evaluated with females maintained unpaired or paired to males in culture for 8 days, or with worm pairs exposed for 16 days to double-stranded RNAs to silence a protein-coding gene. This study provides reliable reference genes for RT-qPCR analysis using samples from six different S. mansoni life-cycle stages.
Collapse
|
25
|
Carmona-Mora P, Ander BP, Jickling GC, Dykstra-Aiello C, Zhan X, Ferino E, Hamade F, Amini H, Hull H, Sharp FR, Stamova B. Distinct peripheral blood monocyte and neutrophil transcriptional programs following intracerebral hemorrhage and different etiologies of ischemic stroke. J Cereb Blood Flow Metab 2021; 41:1398-1416. [PMID: 32960689 PMCID: PMC8142129 DOI: 10.1177/0271678x20953912] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 07/07/2020] [Accepted: 07/29/2020] [Indexed: 12/25/2022]
Abstract
Understanding cell-specific transcriptome responses following intracerebral hemorrhage (ICH) and ischemic stroke (IS) will improve knowledge of the immune response to brain injury. Transcriptomic profiles of 141 samples from 48 subjects with ICH, different IS etiologies, and vascular risk factor controls were characterized using RNA-seq in isolated neutrophils, monocytes and whole blood. In both IS and ICH, monocyte genes were down-regulated, whereas neutrophil gene expression changes were generally up-regulated. The monocyte down-regulated response to ICH included innate, adaptive immune, dendritic, NK cell and atherosclerosis signaling. Neutrophil responses to ICH included tRNA charging, mitochondrial dysfunction, and ER stress pathways. Common monocyte and neutrophil responses to ICH included interferon signaling, neuroinflammation, death receptor signaling, and NFAT pathways. Suppressed monocyte responses to IS included interferon and dendritic cell maturation signaling, phagosome formation, and IL-15 signaling. Activated neutrophil responses to IS included oxidative phosphorylation, mTOR, BMP, growth factor signaling, and calpain proteases-mediated blood-brain barrier (BBB) dysfunction. Common monocyte and neutrophil responses to IS included JAK1, JAK3, STAT3, and thrombopoietin signaling. Cell-type and cause-specific approaches will assist the search for future IS and ICH biomarkers and treatments.
Collapse
Affiliation(s)
- Paulina Carmona-Mora
- Department of Neurology, School of Medicine, University of California, Davis, Sacramento, CA, USA
| | - Bradley P Ander
- Department of Neurology, School of Medicine, University of California, Davis, Sacramento, CA, USA
| | - Glen C Jickling
- Department of Neurology, School of Medicine, University of California, Davis, Sacramento, CA, USA
- Department of Medicine, University of Alberta, Edmonton, Canada
| | - Cheryl Dykstra-Aiello
- Department of Neurology, School of Medicine, University of California, Davis, Sacramento, CA, USA
| | - Xinhua Zhan
- Department of Neurology, School of Medicine, University of California, Davis, Sacramento, CA, USA
| | - Eva Ferino
- Department of Neurology, School of Medicine, University of California, Davis, Sacramento, CA, USA
| | - Farah Hamade
- Department of Neurology, School of Medicine, University of California, Davis, Sacramento, CA, USA
| | - Hajar Amini
- Department of Neurology, School of Medicine, University of California, Davis, Sacramento, CA, USA
| | - Heather Hull
- Department of Neurology, School of Medicine, University of California, Davis, Sacramento, CA, USA
| | - Frank R Sharp
- Department of Neurology, School of Medicine, University of California, Davis, Sacramento, CA, USA
| | - Boryana Stamova
- Department of Neurology, School of Medicine, University of California, Davis, Sacramento, CA, USA
| |
Collapse
|
26
|
Van Houtven J, Cuypers B, Meysman P, Hooyberghs J, Laukens K, Valkenborg D. Constrained Standardization of Count Data from Massive Parallel Sequencing. J Mol Biol 2021; 433:166966. [PMID: 33794260 DOI: 10.1016/j.jmb.2021.166966] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 02/26/2021] [Accepted: 03/23/2021] [Indexed: 11/22/2022]
Abstract
In high-throughput omics disciplines like transcriptomics, researchers face a need to assess the quality of an experiment prior to an in-depth statistical analysis. To efficiently analyze such voluminous collections of data, researchers need triage methods that are both quick and easy to use. Such a normalization method for relative quantitation, CONSTANd, was recently introduced for isobarically-labeled mass spectra in proteomics. It transforms the data matrix of abundances through an iterative, convergent process enforcing three constraints: (I) identical column sums; (II) each row sum is fixed (across matrices) and (III) identical to all other row sums. In this study, we investigate whether CONSTANd is suitable for count data from massively parallel sequencing, by qualitatively comparing its results to those of DESeq2. Further, we propose an adjustment of the method so that it may be applied to identically balanced but differently sized experiments for joint analysis. We find that CONSTANd can process large data sets at well over 1 million count records per second whilst mitigating unwanted systematic bias and thus quickly uncovering the underlying biological structure when combined with a PCA plot or hierarchical clustering. Moreover, it allows joint analysis of data sets obtained from different batches, with different protocols and from different labs but without exploiting information from the experimental setup other than the delineation of samples into identically processed sets (IPSs). CONSTANd's simplicity and applicability to proteomics as well as transcriptomics data make it an interesting candidate for integration in multi-omics workflows.
Collapse
Affiliation(s)
- Joris Van Houtven
- Flemish Institute for Technological Research (VITO), Boeretang 200, B-2400 Mol, Belgium; Universiteit Hasselt, Data Science Institute (DSI), Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Agoralaan, Diepenbeek BE 3590, Belgium; Universiteit Antwerpen, Centre for Proteomics, Groenenborgerlaan 171, Antwerpen BE 2020, Belgium.
| | - Bart Cuypers
- Universiteit Antwerpen, Biomedical Informatics Network Antwerp (Biomina), Middelheimlaan 1, Antwerpen BE 2020, Belgium; Molecular Parasitology Unit, Institute of Tropical Medicine, Nationalestraat 155, Antwerpen BE 2020, Belgium; Universiteit Antwerpen, Adrem Data Lab, Department of Computer Sciences, Middelheimlaan 1, Antwerpen BE 2020, Belgium
| | - Pieter Meysman
- Universiteit Antwerpen, Biomedical Informatics Network Antwerp (Biomina), Middelheimlaan 1, Antwerpen BE 2020, Belgium; Universiteit Antwerpen, Adrem Data Lab, Department of Computer Sciences, Middelheimlaan 1, Antwerpen BE 2020, Belgium
| | - Jef Hooyberghs
- Flemish Institute for Technological Research (VITO), Boeretang 200, B-2400 Mol, Belgium; Universiteit Hasselt, Data Science Institute (DSI), Theoretical Physics, Agoralaan, Diepenbeek BE 3590, Belgium
| | - Kris Laukens
- Universiteit Antwerpen, Biomedical Informatics Network Antwerp (Biomina), Middelheimlaan 1, Antwerpen BE 2020, Belgium; Universiteit Antwerpen, Adrem Data Lab, Department of Computer Sciences, Middelheimlaan 1, Antwerpen BE 2020, Belgium
| | - Dirk Valkenborg
- Universiteit Hasselt, Data Science Institute (DSI), Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Agoralaan, Diepenbeek BE 3590, Belgium; Universiteit Antwerpen, Centre for Proteomics, Groenenborgerlaan 171, Antwerpen BE 2020, Belgium.
| |
Collapse
|
27
|
Stupnikov A, McInerney CE, Savage KI, McIntosh SA, Emmert-Streib F, Kennedy R, Salto-Tellez M, Prise KM, McArt DG. Robustness of differential gene expression analysis of RNA-seq. Comput Struct Biotechnol J 2021; 19:3470-3481. [PMID: 34188784 PMCID: PMC8214188 DOI: 10.1016/j.csbj.2021.05.040] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 05/25/2021] [Accepted: 05/25/2021] [Indexed: 01/05/2023] Open
Abstract
RNA-sequencing (RNA-seq) is a relatively new technology that lacks standardisation. RNA-seq can be used for Differential Gene Expression (DGE) analysis, however, no consensus exists as to which methodology ensures robust and reproducible results. Indeed, it is broadly acknowledged that DGE methods provide disparate results. Despite obstacles, RNA-seq assays are in advanced development for clinical use but further optimisation will be needed. Herein, five DGE models (DESeq2, voom + limma, edgeR, EBSeq, NOISeq) for gene-level detection were investigated for robustness to sequencing alterations using a controlled analysis of fixed count matrices. Two breast cancer datasets were analysed with full and reduced sample sizes. DGE model robustness was compared between filtering regimes and for different expression levels (high, low) using unbiased metrics. Test sensitivity estimated as relative False Discovery Rate (FDR), concordance between model outputs and comparisons of a ’population’ of slopes of relative FDRs across different library sizes, generated using linear regressions, were examined. Patterns of relative DGE model robustness proved dataset-agnostic and reliable for drawing conclusions when sample sizes were sufficiently large. Overall, the non-parametric method NOISeq was the most robust followed by edgeR, voom, EBSeq and DESeq2. Our rigorous appraisal provides information for method selection for molecular diagnostics. Metrics may prove useful towards improving the standardisation of RNA-seq for precision medicine.
Collapse
Affiliation(s)
- A Stupnikov
- Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, Dolgoprudny, Russian Federation.,Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - C E McInerney
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - K I Savage
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - S A McIntosh
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - F Emmert-Streib
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
| | - R Kennedy
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - M Salto-Tellez
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - K M Prise
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| | - D G McArt
- Patrick G. Johnson Centre for Cancer Research, Queen's University, Belfast, Northern Ireland, UK
| |
Collapse
|
28
|
Yang J, Wang D, Yang Y, Yang W, Jin W, Niu X, Gong J. A systematic comparison of normalization methods for eQTL analysis. Brief Bioinform 2021; 22:6278608. [PMID: 34015824 DOI: 10.1093/bib/bbab193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 04/14/2021] [Accepted: 04/28/2021] [Indexed: 11/15/2022] Open
Abstract
Expression quantitative trait loci (eQTL) analysis has been widely used in interpreting disease-associated loci through correlating genetic variant loci with the expression of specific genes. RNA-sequencing (RNA-Seq), which can quantify gene expression at the genome-wide level, is often used in eQTL identification. Since different normalization methods of gene expression have substantial impacts on RNA-seq downstream analysis, it is of great necessity to systematically compare the effects of these methods on eQTL identification. Here, by using RNA-seq and genotype data of four different cancers in The Cancer Genome Atlas (TCGA) database, we comprehensively evaluated the effect of eight commonly used normalization methods on eQTL identification. Our results showed that the application of different methods could cause 20-30% differences in the final results of eQTL identification. Among these methods, COUNT, Median of Ratio (MED) and Trimmed Mean of M-values (TMM) generated similar results for identifying eQTLs, while Fragments Per Kilobase Million (FPKM) or RANK produced more differential results compared with other methods. Based on the accuracy and receiver operating characteristic (ROC) curve, the TMM method was found to be the optimal method for normalizing gene expression data in eQTLs analysis. In addition, we also evaluated the performance of different pairwise combinations of these methods. As a result, compared with single normalization methods, the combination of methods can not only identify more cis-eQTLs, but also improve the performance of the ROC curve. Overall, this study provides a comprehensive comparison of normalization methods for identifying eQTLs from RNA-seq data, and proposes some practical recommendations for diverse scenarios.
Collapse
Affiliation(s)
- Jiajun Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Dongyang Wang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Yanbo Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Wenqian Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Weiwei Jin
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Xiaohui Niu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China
| | - Jing Gong
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, P. R. China.,College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430070, P. R. China
| |
Collapse
|
29
|
Identification of transcriptional subtypes in lung adenocarcinoma and squamous cell carcinoma through integrative analysis of microarray and RNA sequencing data. Sci Rep 2021; 11:8709. [PMID: 33888829 PMCID: PMC8062554 DOI: 10.1038/s41598-021-88209-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Accepted: 04/08/2021] [Indexed: 02/02/2023] Open
Abstract
Classification of tumors into subtypes can inform personalized approaches to treatment including the choice of targeted therapies. The two most common lung cancer histological subtypes, lung adenocarcinoma and lung squamous cell carcinoma, have been previously divided into transcriptional subtypes using microarray data, and corresponding signatures were subsequently used to classify RNA-seq data. Cross-platform unsupervised classification facilitates the identification of robust transcriptional subtypes by combining vast amounts of publicly available microarray and RNA-seq data. However, cross-platform classification is challenging because of intrinsic differences in data generated using the two gene expression profiling technologies. In this report, we show that robust gene expression subtypes can be identified in integrated data representing over 3500 normal and tumor lung samples profiled using two widely used platforms, Affymetrix HG-U133 Plus 2.0 Array and Illumina HiSeq RNA sequencing. We tested and analyzed consensus clustering for 384 combinations of data processing methods. The agreement between subtypes identified in single-platform and cross-platform normalized data was then evaluated using a variety of statistics. Results show that unsupervised learning can be achieved with combined microarray and RNA-seq data using selected preprocessing, cross-platform normalization, and unsupervised feature selection methods. Our analysis confirmed three lung adenocarcinoma transcriptional subtypes, but only two consistent subtypes in squamous cell carcinoma, as opposed to four subtypes previously identified. Further analysis showed that tumor subtypes were associated with distinct patterns of genomic alterations in genes coding for therapeutic targets. Importantly, by integrating quantitative proteomics data, we were able to identify tumor subtype biomarkers that effectively classify samples on the basis of both gene and protein expression. This study provides the basis for further integrative data analysis across gene and protein expression profiling platforms.
Collapse
|
30
|
de Vries JJC, Brown JR, Couto N, Beer M, Le Mercier P, Sidorov I, Papa A, Fischer N, Oude Munnink BB, Rodriquez C, Zaheri M, Sayiner A, Hönemann M, Cataluna AP, Carbo EC, Bachofen C, Kubacki J, Schmitz D, Tsioka K, Matamoros S, Höper D, Hernandez M, Puchhammer-Stöckl E, Lebrand A, Huber M, Simmonds P, Claas ECJ, López-Labrador FX. Recommendations for the introduction of metagenomic next-generation sequencing in clinical virology, part II: bioinformatic analysis and reporting. J Clin Virol 2021; 138:104812. [PMID: 33819811 DOI: 10.1016/j.jcv.2021.104812] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 03/20/2021] [Indexed: 12/11/2022]
Abstract
Metagenomic next-generation sequencing (mNGS) is an untargeted technique for determination of microbial DNA/RNA sequences in a variety of sample types from patients with infectious syndromes. mNGS is still in its early stages of broader translation into clinical applications. To further support the development, implementation, optimization and standardization of mNGS procedures for virus diagnostics, the European Society for Clinical Virology (ESCV) Network on Next-Generation Sequencing (ENNGS) has been established. The aim of ENNGS is to bring together professionals involved in mNGS for viral diagnostics to share methodologies and experiences, and to develop application guidelines. Following the ENNGS publication Recommendations for the introduction of mNGS in clinical virology, part I: wet lab procedure in this journal, the current manuscript aims to provide practical recommendations for the bioinformatic analysis of mNGS data and reporting of results to clinicians.
Collapse
Affiliation(s)
- Jutte J C de Vries
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | - Julianne R Brown
- Microbiology, Virology and Infection Prevention & Control, Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom.
| | - Natacha Couto
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom.
| | - Martin Beer
- Friedrich-Loeffler-Institute, Institute of Diagnostic Virology, Greifswald, Germany.
| | | | - Igor Sidorov
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | - Anna Papa
- Department of Microbiology, Medical School, Aristotle University of Thessaloniki, Greece.
| | - Nicole Fischer
- University Medical Center Hamburg-Eppendorf, UKE Institute for Medical Microbiology, Virology and Hygiene, Germany.
| | | | - Christophe Rodriquez
- Department of Virology, University hospital Henri Mondor, Assistance Public des Hopitaux de Paris, Créteil, France.
| | - Maryam Zaheri
- Institute of Medical Virology, University of Zurich, Switzerland.
| | - Arzu Sayiner
- Dokuz Eylul University, Medical Faculty, Department of Medical Microbiology, Izmir, Turkey.
| | - Mario Hönemann
- Institute of Virology, Leipzig University, Leipzig, Germany.
| | - Alba Perez Cataluna
- Department of Preservation and Food Safety Technologies, IATA-CSIC, Paterna, Valencia, Spain.
| | - Ellen C Carbo
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | | | - Jakub Kubacki
- Institute of Virology, University of Zurich, Switzerland.
| | - Dennis Schmitz
- RIVM National Institute for Public Health and Environment, Bilthoven, the Netherlands.
| | - Katerina Tsioka
- Department of Microbiology, Medical School, Aristotle University of Thessaloniki, Greece.
| | - Sébastien Matamoros
- Medical Microbiology and Infection Control, Amsterdam UMC, Amsterdam, the Netherlands.
| | - Dirk Höper
- Friedrich-Loeffler-Institute, Institute of Diagnostic Virology, Greifswald, Germany.
| | - Marta Hernandez
- Laboratory of Molecular Biology and Microbiology, Instituto Tecnologico Agrario de Castilla y Leon, Valladolid, Spain.
| | | | | | - Michael Huber
- Institute of Medical Virology, University of Zurich, Switzerland.
| | - Peter Simmonds
- Nuffield Department of Medicine, University of Oxford, Oxford, UK.
| | - Eric C J Claas
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | - F Xavier López-Labrador
- Virology Laboratory, Genomics and Health Area, Centre for Public Health Research (FISABIO-Public Health), Valencia, Spain; Department of Microbiology, Medical School, University of Valencia, Spain; CIBERESP, Instituto de Salud Carlos III, Madrid, Spain.
| | | |
Collapse
|
31
|
Giraud D, Lima O, Rousseau-Gueutin M, Salmon A, Aïnouche M. Gene and Transposable Element Expression Evolution Following Recent and Past Polyploidy Events in Spartina (Poaceae). Front Genet 2021; 12:589160. [PMID: 33841492 PMCID: PMC8027259 DOI: 10.3389/fgene.2021.589160] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 02/23/2021] [Indexed: 12/18/2022] Open
Abstract
Gene expression dynamics is a key component of polyploid evolution, varying in nature, intensity, and temporal scales, most particularly in allopolyploids, where two or more sub-genomes from differentiated parental species and different repeat contents are merged. Here, we investigated transcriptome evolution at different evolutionary time scales among tetraploid, hexaploid, and neododecaploid Spartina species (Poaceae, Chloridoideae) that successively diverged in the last 6-10 my, at the origin of differential phenotypic and ecological traits. Of particular interest are the recent (19th century) hybridizations between the two hexaploids Spartina alterniflora (2n = 6x = 62) and S. maritima (2n = 6x = 60) that resulted in two sterile F1 hybrids: Spartina × townsendii (2n = 6x = 62) in England and Spartina × neyrautii (2n = 6x = 62) in France. Whole genome duplication of S. × townsendii gave rise to the invasive neo-allododecaploid species Spartina anglica (2n = 12x = 124). New transcriptome assemblies and annotations for tetraploids and the enrichment of previously published reference transcriptomes for hexaploids and the allododecaploid allowed identifying 42,423 clusters of orthologs and distinguishing 21 transcribed transposable element (TE) lineages across the seven investigated Spartina species. In 4x and 6x mesopolyploids, gene and TE expression changes were consistent with phylogenetic relationships and divergence, revealing weak expression differences in the tetraploid sister species Spartina bakeri and Spartina versicolor (<2 my divergence time) compared to marked transcriptome divergence between the hexaploids S. alterniflora and S. maritima that diverged 2-4 mya. Differentially expressed genes were involved in glycolysis, post-transcriptional protein modifications, epidermis development, biosynthesis of carotenoids. Most detected TE lineages (except SINE elements) were found more expressed in hexaploids than in tetraploids, in line with their abundance in the corresponding genomes. Comparatively, an astonishing (52%) expression repatterning and deviation from parental additivity were observed following recent reticulate evolution (involving the F1 hybrids and the neo-allododecaploid S. anglica), with various patterns of biased homoeologous gene expression, including genes involved in epigenetic regulation. Downregulation of TEs was observed in both hybrids and accentuated in the neo-allopolyploid. Our results reinforce the view that allopolyploidy represents springboards to new regulatory patterns, offering to worldwide invasive species, such as S. anglica, the opportunity to colonize stressful and fluctuating environments on saltmarshes.
Collapse
Affiliation(s)
- Delphine Giraud
- UMR CNRS 6553 Ecosystèmes, Biodiversité, Evolution (ECOBIO), Université de Rennes 1, Rennes, France
| | - Oscar Lima
- UMR CNRS 6553 Ecosystèmes, Biodiversité, Evolution (ECOBIO), Université de Rennes 1, Rennes, France
| | | | - Armel Salmon
- UMR CNRS 6553 Ecosystèmes, Biodiversité, Evolution (ECOBIO), Université de Rennes 1, Rennes, France
| | - Malika Aïnouche
- UMR CNRS 6553 Ecosystèmes, Biodiversité, Evolution (ECOBIO), Université de Rennes 1, Rennes, France
| |
Collapse
|
32
|
A comprehensive analysis of tumor microenvironment-related genes in colon cancer. Clin Transl Oncol 2021; 23:1769-1781. [PMID: 33689097 DOI: 10.1007/s12094-021-02578-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 02/23/2021] [Indexed: 12/19/2022]
Abstract
BACKGROUND The development and progression of colon cancer are significantly affected by the tumor microenvironment, which has attracted much attention. The goal of our study was primarily to find out all possible tumor microenvironment-related genes in colon cancer. METHOD This study quantified the immune and stromal landscape using the ESTIMATION algorithm using the gene expression matrix obtained from the UCSC Xena database. Dysregulated genes were harvested using the limma R package, and relevant pathways and biofunctions were identified using enrichment analysis. A least absolute shrinkage and selection operator (LASSO) regression was used to select the pivotal genes from the DEGs. Then, survival analysis was performed to determine the hub genes and a prognostic model was constructed by these hub genes with (or) TNM stage. Besides, associations between hub gene expressions and immune cell infiltration were assessed. RESULTS A total of 725 DEGs were identified. Most of the results of the enrichment analysis were immune-related items. 13 genes were selected as the hub genes and a moderate-to-strong positive correlation between most hub genes and several immune cells were observed. Besides, the prognostic value of the hub genes were comparable to TNM staging. CONCLUSIONS Our study provides a better understanding of how interactions between the 13 immune-prognostic hub genes and immune cells in the tumor microenvironment affect biological processes in colon cancer. These genes exhibit an equivalent ability to TNM staging in prognosis prediction. They are particularly expected to become novel prognostic biomarkers and targets of immunotherapies for colon cancer.
Collapse
|
33
|
Lim DK, Rashid NU, Ibrahim JG. MODEL-BASED FEATURE SELECTION AND CLUSTERING OF RNA-SEQ DATA FOR UNSUPERVISED SUBTYPE DISCOVERY. Ann Appl Stat 2021; 15:481-508. [PMID: 34457104 PMCID: PMC8386505 DOI: 10.1214/20-aoas1407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Clustering is a form of unsupervised learning that aims to uncover latent groups within data based on similarity across a set of features. A common application of this in biomedical research is in delineating novel cancer subtypes from patient gene expression data, given a set of informative genes. However, it is typically unknown a priori what genes may be informative in discriminating between clusters, and what the optimal number of clusters are. Few methods exist for performing unsupervised clustering of RNA-seq samples, and none currently adjust for between-sample global normalization factors, select cluster-discriminatory genes, or account for potential confounding variables during clustering. To address these issues, we propose the Feature Selection and Clustering of RNA-seq (FSCseq): a model-based clustering algorithm that utilizes a finite mixture of regression (FMR) model and the quadratic penalty method with a Smoothly-Clipped Absolute Deviation (SCAD) penalty. The maximization is done by a penalized Classification EM algorithm, allowing us to include normalization factors and confounders in our modeling framework. Given the fitted model, our framework allows for subtype prediction in new patients via posterior probabilities of cluster membership, even in the presence of batch effects. Based on simulations and real data analysis, we show the advantages of our method relative to competing approaches.
Collapse
Affiliation(s)
- David K Lim
- University of North Carolina at Chapel Hill, NC, USA
| | - Naim U Rashid
- University of North Carolina at Chapel Hill, NC, USA
| | | |
Collapse
|
34
|
Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis. Sci Rep 2020; 10:19737. [PMID: 33184454 PMCID: PMC7665074 DOI: 10.1038/s41598-020-76881-x] [Citation(s) in RCA: 85] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 11/03/2020] [Indexed: 01/16/2023] Open
Abstract
RNA-seq is currently considered the most powerful, robust and adaptable technique for measuring gene expression and transcription activation at genome-wide level. As the analysis of RNA-seq data is complex, it has prompted a large amount of research on algorithms and methods. This has resulted in a substantial increase in the number of options available at each step of the analysis. Consequently, there is no clear consensus about the most appropriate algorithms and pipelines that should be used to analyse RNA-seq data. In the present study, 192 pipelines using alternative methods were applied to 18 samples from two human cell lines and the performance of the results was evaluated. Raw gene expression signal was quantified by non-parametric statistics to measure precision and accuracy. Differential gene expression performance was estimated by testing 17 differential expression methods. The procedures were validated by qRT-PCR in the same samples. This study weighs up the advantages and disadvantages of the tested algorithms and pipelines providing a comprehensive guide to the different methods and procedures applied to the analysis of RNA-seq data, both for the quantification of the raw expression signal and for the differential gene expression.
Collapse
|
35
|
Elolimy AA, Washam C, Byrum S, Chen C, Dawson H, Bowlin AK, Randolph CE, Saraf MK, Yeruva L. Formula Diet Alters the Ileal Metagenome and Transcriptome at Weaning and during the Postweaning Period in a Porcine Model. mSystems 2020; 5:e00457-20. [PMID: 32753508 PMCID: PMC7406227 DOI: 10.1128/msystems.00457-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 07/21/2020] [Indexed: 01/05/2023] Open
Abstract
Exclusive breastfeeding impacts the intestinal microbiome and is associated with a better immune function than is seen with milk formula (MF) feeding in infants and yet with mechanisms poorly defined. The porcine model was used to evaluate the impact of MF on ileum microbial communities and gene expression relative to human milk (HM)-fed piglets. Fifty-two Dutch Landrace male piglets were fed an isocaloric diet of either HM (n = 26) or MF (n = 26) from day 2 through day 21 of age and weaned to a solid diet until day 51. Eleven piglets from each group were euthanized at day 21, while the remaining piglets (HM, n = 15; MF, n = 15) were euthanized at day 51 to collect ileal epithelium (EP) scrapings and ileal (IL) tissues. The epithelial mucosa was subjected to shotgun metagenome sequencing, and EP and IL tissues were used for transcriptome analysis. On day 21, transcriptome data revealed that the levels of pathways involved in inflammation and apoptosis were significantly higher in MF piglets than in HM piglets, whereas the levels of tight junctions and pathogen detection systems were lower in MF piglets than in HM piglets. The MF impacts on the small intestine were maintained over the postweaning period (day 51) as indicated by higher levels of Dialister invisus bacteria and higher levels of expression of genes associated with inflammation and apoptosis pathways relative to HM group. The current study demonstrated that MF might impact local intestinal inflammation, apoptosis, and tight junctions and might suppress pathogen recognition in the small intestine compared with HM.IMPORTANCE Exclusive human milk (HM) breastfeeding for the first 6 months of age in infants is recommended to improve health outcomes during early life and beyond. When women are unable to provide sufficient HM, milk formula (MF) is often recommended as a complementary or alternative source of nutrition. Previous studies in piglets demonstrated that MF alters the gut microbiome and induces inflammatory cytokine production. The links between MF feeding, gut microbiome, and inflammation status are unclear due to challenges associated with the collection of intestinal samples from human infants. The current report provides the first insight into MF-microbiome-inflammation connections in the small intestine compared with HM feeding using a porcine model. The present results showed that, compared with HM, MF might impact immune function through the induction of ileal inflammation, apoptosis, and tight junction disruptions and likely compromised immune defense against pathogen detection in the small intestine relative to piglets that were fed HM.
Collapse
Affiliation(s)
- Ahmed A Elolimy
- Arkansas Children's Nutrition Center, Little Rock, Arkansas, USA
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Charity Washam
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Stephanie Byrum
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Celine Chen
- Diet, Genomics & Immunology Laboratory, USDA-ARS Beltsville Human Nutrition Research Center, Beltsville, Maryland, USA
| | - Harry Dawson
- Diet, Genomics & Immunology Laboratory, USDA-ARS Beltsville Human Nutrition Research Center, Beltsville, Maryland, USA
| | - Anne K Bowlin
- Arkansas Children's Nutrition Center, Little Rock, Arkansas, USA
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | | | - Manish K Saraf
- Arkansas Children's Nutrition Center, Little Rock, Arkansas, USA
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - Laxmi Yeruva
- Arkansas Children's Nutrition Center, Little Rock, Arkansas, USA
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
- Arkansas Children's Research Institute, Little Rock, Arkansas, USA
| |
Collapse
|
36
|
Zhao S, Ye Z, Stanton R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA (NEW YORK, N.Y.) 2020; 26:903-909. [PMID: 32284352 PMCID: PMC7373998 DOI: 10.1261/rna.074922.120] [Citation(s) in RCA: 191] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
In recent years, RNA-sequencing (RNA-seq) has emerged as a powerful technology for transcriptome profiling. For a given gene, the number of mapped reads is not only dependent on its expression level and gene length, but also the sequencing depth. To normalize these dependencies, RPKM (reads per kilobase of transcript per million reads mapped) and TPM (transcripts per million) are used to measure gene or transcript expression levels. A common misconception is that RPKM and TPM values are already normalized, and thus should be comparable across samples or RNA-seq projects. However, RPKM and TPM represent the relative abundance of a transcript among a population of sequenced transcripts, and therefore depend on the composition of the RNA population in a sample. Quite often, it is reasonable to assume that total RNA concentration and distributions are very close across compared samples. Nevertheless, the sequenced RNA repertoires may differ significantly under different experimental conditions and/or across sequencing protocols; thus, the proportion of gene expression is not directly comparable in such cases. In this review, we illustrate typical scenarios in which RPKM and TPM are misused, unintentionally, and hope to raise scientists' awareness of this issue when comparing them across samples or different sequencing protocols.
Collapse
Affiliation(s)
- Shanrong Zhao
- Integrative Biology Center of Excellence, Pfizer Worldwide Research and Development, Cambridge, Massachusetts 02139, USA
| | - Zhan Ye
- Early Clinical Development, Pfizer Worldwide Research and Development, Cambridge, Massachusetts 02139, USA
| | - Robert Stanton
- Integrative Biology Center of Excellence, Pfizer Worldwide Research and Development, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
37
|
Richard M, Decamps C, Chuffart F, Brambilla E, Rousseaux S, Khochbin S, Jost D. PenDA, a rank-based method for personalized differential analysis: Application to lung cancer. PLoS Comput Biol 2020; 16:e1007869. [PMID: 32392248 PMCID: PMC7274464 DOI: 10.1371/journal.pcbi.1007869] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 06/05/2020] [Accepted: 04/11/2020] [Indexed: 12/27/2022] Open
Abstract
The hopes of precision medicine rely on our capacity to measure various high-throughput genomic information of a patient and to integrate them for personalized diagnosis and adapted treatment. Reaching these ambitious objectives will require the development of efficient tools for the detection of molecular defects at the individual level. Here, we propose a novel method, PenDA, to perform Personalized Differential Analysis at the scale of a single sample. PenDA is based on the local ordering of gene expressions within individual cases and infers the deregulation status of genes in a sample of interest compared to a reference dataset. Based on realistic simulations of RNA-seq data of tumors, we showed that PenDA outcompetes existing approaches with very high specificity and sensitivity and is robust to normalization effects. Applying the method to lung cancer cohorts, we observed that deregulated genes in tumors exhibit a cancer-type-specific commitment towards up- or down-regulation. Based on the individual information of deregulation given by PenDA, we were able to define two new molecular histologies for lung adenocarcinoma cancers strongly correlated to survival. In particular, we identified 37 biomarkers whose up-regulation lead to bad prognosis and that we validated on two independent cohorts. PenDA provides a robust, generic tool to extract personalized deregulation patterns that can then be used for the discovery of therapeutic targets and for personalized diagnosis. An open-access, user-friendly R package is available at https://github.com/bcm-uga/penda. The hopes of precision medicine rely on our capacity to measure individual molecular information for personalized diagnosis and treatment. These challenging perspectives will be only possible with the development of efficient methodological tools to identify patient-specific molecular defects from the many precise molecular information that one can access at the single-individual, single tissue or even single-cell levels. Such methods will provide a better understanding of disease-specific biological mechanisms and will promote the development of personalized therapeutic strategies. Here we describe a novel method, named PenDA, to perform differential analysis of gene expression at the individual level. Based on a realistic benchmark of simulated tumors, we demonstrated that PenDA reaches very high efficiency in detecting sample-specific deregulated genes. We then applied the method to two large cohorts associated with lung cancer. A detailed statistical analysis of the results allowed to isolate genes with specific deregulation patterns, like genes that are up-regulated in all tumors or genes that are expressed but never deregulated in any tumors. Given their specificities, these genes are likely to be of interest in therapeutic research. In particular, we were able to identified 37 new biomarkers associated to bad prognosis that we validated on two independent cohorts.
Collapse
Affiliation(s)
- Magali Richard
- Univ Grenoble Alpes, CNRS, Grenoble INP, TIMC-IMAG, Grenoble, France
- * E-mail: (MR); (DJ)
| | | | - Florent Chuffart
- CNRS UMR 5309, Inserm U1209, Univ Grenoble Alpes, Institute for Advanced Biosciences, Grenoble, France
| | - Elisabeth Brambilla
- CHUGA, Inserm U1209, Univ Grenoble Alpes, Institute for Advanced Biosciences, Grenoble, France
| | - Sophie Rousseaux
- CNRS UMR 5309, Inserm U1209, Univ Grenoble Alpes, Institute for Advanced Biosciences, Grenoble, France
| | - Saadi Khochbin
- CNRS UMR 5309, Inserm U1209, Univ Grenoble Alpes, Institute for Advanced Biosciences, Grenoble, France
| | - Daniel Jost
- Univ Grenoble Alpes, CNRS, Grenoble INP, TIMC-IMAG, Grenoble, France
- University of Lyon, ENS de Lyon, Univ Claude Bernard, CNRS, Laboratory of Biology and Modelling of the Cell, Lyon, France
- * E-mail: (MR); (DJ)
| |
Collapse
|
38
|
Wang B. A Zipf-plot based normalization method for high-throughput RNA-seq data. PLoS One 2020; 15:e0230594. [PMID: 32271772 PMCID: PMC7144957 DOI: 10.1371/journal.pone.0230594] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 03/03/2020] [Indexed: 12/02/2022] Open
Abstract
Normalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have similar upper tail behaviors in their expression distributions. The new normalization method uses global information of all genes in the same profile without gene-level expression alteration. It doesn’t require the majority of genes to be not differentially expressed (DE), and can be applied to data where the majority of genes are weakly or not expressed. Two normalization schemes are implemented with ZN: a linear rescaling scheme and a non-linear transformation scheme. The linear rescaling scheme can be applied alone or together with the non-linear normalization scheme. The performance of ZN is benchmarked against five popular linear normalization methods for RNA-seq data. Results show that the linear rescaling normalization scheme by itself works well and is robust. The non-linear normalization scheme can further improve the normalization outcomes and is optional if the Zipf plots show parallel patterns.
Collapse
Affiliation(s)
- Bin Wang
- Department of Mathematics and Statistics, University of South Alabama, Mobile, AL, United States of America
- * E-mail:
| |
Collapse
|
39
|
Hu G, Grover CE, Arick MA, Liu M, Peterson DG, Wendel JF. Homoeologous gene expression and co-expression network analyses and evolutionary inference in allopolyploids. Brief Bioinform 2020; 22:1819-1835. [PMID: 32219306 PMCID: PMC7986634 DOI: 10.1093/bib/bbaa035] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 02/06/2020] [Accepted: 02/24/2020] [Indexed: 12/29/2022] Open
Abstract
Polyploidy is a widespread phenomenon throughout eukaryotes. Due to the coexistence of duplicated genomes, polyploids offer unique challenges for estimating gene expression levels, which is essential for understanding the massive and various forms of transcriptomic responses accompanying polyploidy. Although previous studies have explored the bioinformatics of polyploid transcriptomic profiling, the causes and consequences of inaccurate quantification of transcripts from duplicated gene copies have not been addressed. Using transcriptomic data from the cotton genus (Gossypium) as an example, we present an analytical workflow to evaluate a variety of bioinformatic method choices at different stages of RNA-seq analysis, from homoeolog expression quantification to downstream analysis used to infer key phenomena of polyploid expression evolution. In general, EAGLE-RC and GSNAP-PolyCat outperform other quantification pipelines tested, and their derived expression dataset best represents the expected homoeolog expression and co-expression divergence. The performance of co-expression network analysis was less affected by homoeolog quantification than by network construction methods, where weighted networks outperformed binary networks. By examining the extent and consequences of homoeolog read ambiguity, we illuminate the potential artifacts that may affect our understanding of duplicate gene expression, including an overestimation of homoeolog co-regulation and the incorrect inference of subgenome asymmetry in network topology. Taken together, our work points to a set of reasonable practices that we hope are broadly applicable to the evolutionary exploration of polyploids.
Collapse
Affiliation(s)
- Guanjing Hu
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Corrinne E Grover
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Mark A Arick
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Meiling Liu
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Daniel G Peterson
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Jonathan F Wendel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
40
|
Patient-Tailored Radiation Therapy for Rectal Cancer: The Devil Is in the Details. Dis Colon Rectum 2020; 63:265-266. [PMID: 32032138 DOI: 10.1097/dcr.0000000000001567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
41
|
Yang S, Wachtel MS, Wu J. DFseq: Distribution-Free Method to Detect Differential Gene Expression for RNA-Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:558-565. [PMID: 30176602 DOI: 10.1109/tcbb.2018.2866994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Many current RNA-sequencing data analysis methods compare expressions one gene at a time, taking little consideration of the correlations among genes. In this study, we propose a method to convert such an one-dimensional comparison approach into a two-dimensional evaluation of the ratio of standard deviations (SD) of two constructed random variables. This method allows the identification of differentially expressed genes while controlling a preset significance level conditional on the read count mean-variance relationship. Meanwhile, correlations among genes are naturally accommodated due to the clustering of genes with similar distribution in the proposed σ-σ plot. The proposed distribution-free method is designated as DFseq, because it does not depend on a parametric distribution to fit read count. As a result, compared with parametric methods, DFseq can effectively handle genes with a bimodal-like distribution and/or genes with excessive 0 read counts, as well as genes with outlying observations. Besides, DFseq is an ideal platform for comparing performance of different differential gene expression detection methods.
Collapse
|
42
|
Li X, Cooper NGF, O'Toole TE, Rouchka EC. Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies. BMC Genomics 2020; 21:75. [PMID: 31992223 PMCID: PMC6986029 DOI: 10.1186/s12864-020-6502-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 01/16/2020] [Indexed: 12/20/2022] Open
Abstract
Background High-throughput RNA sequencing (RNA-seq) has evolved as an important analytical tool in molecular biology. Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain. Of primary concern, there is no consensus regarding which normalization and statistical methods are the most appropriate for analyzing this data. The lack of standardized analytical methods leads to uncertainties in data interpretation and study reproducibility, especially with studies reporting high false discovery rates. In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data. We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR; 2) sample sizes in two balanced two-group comparisons; and 3) sequencing read depths. Results Using the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis. However, using an intra-group analysis of false positives from real and simulated data, we found that a Wald test performs better than an exact test when the number of sample replicates is large and that a QL F-test performs the best given sample sizes of 5, 10 and 15 for any normalization. The RLE, TMM and UQ methods performed similarly given a desired sample size. Conclusion We found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small. When the sample size is large, UQ-pgQ2 with a QL F-test is a better choice for the type I error control in an intra-group analysis. We observed read depths have a minimal impact for differential gene expression analysis based on the simulated data.
Collapse
Affiliation(s)
- Xiaohong Li
- Department of Anatomical Sciences and Neurobiology, University of Louisville, Louisville, KY, USA.
| | - Nigel G F Cooper
- Department of Anatomical Sciences and Neurobiology, University of Louisville, Louisville, KY, USA
| | | | - Eric C Rouchka
- Department of Computer Science and Engineering, University of Louisville, Louisville, KY, USA
| |
Collapse
|
43
|
Genome-Wide Analysis of Cyclophilin Proteins in 21 Oomycetes. Pathogens 2019; 9:pathogens9010024. [PMID: 31888032 PMCID: PMC7168621 DOI: 10.3390/pathogens9010024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 12/12/2019] [Accepted: 12/20/2019] [Indexed: 12/20/2022] Open
Abstract
Cyclophilins (CYPs), a highly-conserved family of proteins, belong to a subgroup of immunophilins. Ubiquitous in eukaryotes and prokaryotes, CYPs have peptidyl-prolyl cis–trans isomerase (PPIase) activity and have been implicated as virulence factors in plant pathogenesis by oomycetes. We identified 16 CYP orthogroups from 21 diverse oomycetes. Each species was found to encode 15 to 35 CYP genes. Three of these orthogroups contained proteins with signal peptides at the N-terminal end, suggesting a role in secretion. Multidomain analysis revealed five conserved motifs of the CYP domain of oomycetes shared with other eukaryotic PPIases. Expression analysis of CYP proteins in different asexual life stages of the hemibiotrophic Phytophthora infestans and the biotrophic Plasmopara halstedii demonstrated distinct expression profiles between life stages. In addition to providing detailed comparative information on the CYPs in multiple oomycetes, this study identified candidate CYP effectors that could be the foundation for future studies of virulence.
Collapse
|
44
|
Abrams ZB, Johnson TS, Huang K, Payne PRO, Coombes K. A protocol to evaluate RNA sequencing normalization methods. BMC Bioinformatics 2019; 20:679. [PMID: 31861985 PMCID: PMC6923842 DOI: 10.1186/s12859-019-3247-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND RNA sequencing technologies have allowed researchers to gain a better understanding of how the transcriptome affects disease. However, sequencing technologies often unintentionally introduce experimental error into RNA sequencing data. To counteract this, normalization methods are standardly applied with the intent of reducing the non-biologically derived variability inherent in transcriptomic measurements. However, the comparative efficacy of the various normalization techniques has not been tested in a standardized manner. Here we propose tests that evaluate numerous normalization techniques and applied them to a large-scale standard data set. These tests comprise a protocol that allows researchers to measure the amount of non-biological variability which is present in any data set after normalization has been performed, a crucial step to assessing the biological validity of data following normalization. RESULTS In this study we present two tests to assess the validity of normalization methods applied to a large-scale data set collected for systematic evaluation purposes. We tested various RNASeq normalization procedures and concluded that transcripts per million (TPM) was the best performing normalization method based on its preservation of biological signal as compared to the other methods tested. CONCLUSION Normalization is of vital importance to accurately interpret the results of genomic and transcriptomic experiments. More work, however, needs to be performed to optimize normalization methods for RNASeq data. The present effort helps pave the way for more systematic evaluations of normalization methods across different platforms. With our proposed schema researchers can evaluate their own or future normalization methods to further improve the field of RNASeq normalization.
Collapse
Affiliation(s)
- Zachary B Abrams
- Department Biomedical Informatics, Ohio State University, 250 Lincoln Tower, 1800 Cannon Dr. Columbus, Columbus, OH, 43210, USA.
| | - Travis S Johnson
- Department Biomedical Informatics, Ohio State University, 250 Lincoln Tower, 1800 Cannon Dr. Columbus, Columbus, OH, 43210, USA.,Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN, 46202, USA
| | - Kun Huang
- Department of Medicine, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN, 46202, USA.,Regenstrief Institute, Indiana University, 1101 West 10th Street, Indianapolis, IN, 46262, USA
| | - Philip R O Payne
- Department of Biomedical Informatics, Washington University, 4444 Forest Park Ave, Suite 6318 Campus Box 8102, St. Louis, MO, 63108-2212, USA
| | - Kevin Coombes
- Department Biomedical Informatics, Ohio State University, 250 Lincoln Tower, 1800 Cannon Dr. Columbus, Columbus, OH, 43210, USA
| |
Collapse
|
45
|
Jiang S, Cheng SJ, Ren LC, Wang Q, Kang YJ, Ding Y, Hou M, Yang XX, Lin Y, Liang N, Gao G. An expanded landscape of human long noncoding RNA. Nucleic Acids Res 2019; 47:7842-7856. [PMID: 31350901 PMCID: PMC6735957 DOI: 10.1093/nar/gkz621] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Revised: 06/18/2019] [Accepted: 07/11/2019] [Indexed: 12/21/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) are emerging as key regulators of multiple essential biological processes involved in physiology and pathology. By analyzing the largest compendium of 14,166 samples from normal and tumor tissues, we significantly expand the landscape of human long noncoding RNA with a high-quality atlas: RefLnc (Reference catalog of LncRNA). Powered by comprehensive annotation across multiple sources, RefLnc helps to pinpoint 275 novel intergenic lncRNAs correlated with sex, age or race as well as 369 novel ones associated with patient survival, clinical stage, tumor metastasis or recurrence. Integrated in a user-friendly online portal, the expanded catalog of human lncRNAs provides a valuable resource for investigating lncRNA function in both human biology and cancer development.
Collapse
Affiliation(s)
- Shuai Jiang
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Si-Jin Cheng
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Li-Chen Ren
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Qian Wang
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Yu-Jian Kang
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Yang Ding
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Mei Hou
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Xiao-Xu Yang
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Yuan Lin
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Nan Liang
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Ge Gao
- Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
46
|
Liu W, Jacquiod S, Brejnrod A, Russel J, Burmølle M, Sørensen SJ. Deciphering links between bacterial interactions and spatial organization in multispecies biofilms. THE ISME JOURNAL 2019; 13:3054-3066. [PMID: 31455806 PMCID: PMC6864094 DOI: 10.1038/s41396-019-0494-9] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 07/26/2019] [Accepted: 07/29/2019] [Indexed: 01/23/2023]
Abstract
Environmental microbes frequently live in multispecies biofilms where mutualistic relationships and co-evolution may occur, defining spatial organization for member species and overall community functions. In this context, intrinsic properties emerging from microbial interactions, such as efficient organization optimizing growth and activities in multispecies biofilms, may become the object of fitness selection. However, little is known on the nature of underlying interspecies interactions during establishment of a predictable spatial organization within multispecies biofilms. We present a comparative metatranscriptomic analysis of bacterial strains residing in triple-species and four-species biofilms, aiming at deciphering molecular mechanisms underpinning bacterial interactions responsible of the remarkably enhanced biomass production and associated typical spatial organization they display. Metatranscriptomic profiles concurred with changes in micro-site occupation in response to the addition/removal of a single species, being driven by both cooperation, competition, and facilitation processes. We conclude that the enhanced biomass production of the four-species biofilm is an intrinsic community property emerging from finely tuned space optimization achieved through concerted antagonistic and mutualistic interactions, where each species occupies a defined micro-site favoring its own growth. Our results further illustrate how molecular mechanisms can be better interpreted when supported by visual imaging of actual microscopic spatial organization, and we propose phenotypic adaptation selected by social interactions as molecular mechanisms stabilizing microbial communities.
Collapse
Affiliation(s)
- Wenzheng Liu
- School of Food and Pharmaceutical engineering, Nanjing Normal University, Nanjing, China
| | - Samuel Jacquiod
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- Agroécologie, AgroSup Dijon, INRA, Univ. Bourgogne Franche-Comté, Franche-Comté, France
| | - Asker Brejnrod
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jakob Russel
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Mette Burmølle
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Søren J Sørensen
- School of Food and Pharmaceutical engineering, Nanjing Normal University, Nanjing, China.
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
47
|
van Rooij J, Mandaviya PR, Claringbould A, Felix JF, van Dongen J, Jansen R, Franke L, 't Hoen PAC, Heijmans B, van Meurs JBJ. Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies. Genome Biol 2019; 20:235. [PMID: 31727104 PMCID: PMC6857161 DOI: 10.1186/s13059-019-1878-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 11/02/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND A large number of analysis strategies are available for DNA methylation (DNAm) array and RNA-seq datasets, but it is unclear which strategies are best to use. We compare commonly used strategies and report how they influence results in large cohort studies. RESULTS We tested the associations of DNAm and RNA expression with age, BMI, and smoking in four different cohorts (n = ~ 2900). By comparing strategies against the base model on the number and percentage of replicated CpGs for DNAm analyses or genes for RNA-seq analyses in a leave-one-out cohort replication approach, we find the choice of the normalization method and statistical test does not strongly influence the results for DNAm array data. However, adjusting for cell counts or hidden confounders substantially decreases the number of replicated CpGs for age and increases the number of replicated CpGs for BMI and smoking. For RNA-seq data, the choice of the normalization method, gene expression inclusion threshold, and statistical test does not strongly influence the results. Including five principal components or excluding correction of technical covariates or cell counts decreases the number of replicated genes. CONCLUSIONS Results were not influenced by the normalization method or statistical test. However, the correction method for cell counts, technical covariates, principal components, and/or hidden confounders does influence the results.
Collapse
Affiliation(s)
- Jeroen van Rooij
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam, the Netherlands.
| | - Pooja R Mandaviya
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Annique Claringbould
- Faculty of Medical Sciences, University of Groningen, Groningen, the Netherlands
| | - Janine F Felix
- The Generation R Study Group, Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
- The Generation R Study Group, Department of Pediatrics, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Jenny van Dongen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Rick Jansen
- Department of Psychiatry, VU University Medical Center, Amsterdam, the Netherlands
| | - Lude Franke
- Department of Genetics, University of Groningen, Groningen, the Netherlands
| | - Peter A C 't Hoen
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center Nijmegen, Nijmegen, the Netherlands
| | - Bas Heijmans
- Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| | - Joyce B J van Meurs
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam, the Netherlands.
| |
Collapse
|
48
|
Mandelboum S, Manber Z, Elroy-Stein O, Elkon R. Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias. PLoS Biol 2019; 17:e3000481. [PMID: 31714939 PMCID: PMC6850523 DOI: 10.1371/journal.pbio.3000481] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Accepted: 10/08/2019] [Indexed: 11/19/2022] Open
Abstract
Data normalization is a critical step in RNA sequencing (RNA-seq) analysis, aiming to remove systematic effects from the data to ensure that technical biases have minimal impact on the results. Analyzing numerous RNA-seq datasets, we detected a prevalent sample-specific length effect that leads to a strong association between gene length and fold-change estimates between samples. This stochastic sample-specific effect is not corrected by common normalization methods, including reads per kilobase of transcript length per million reads (RPKM), Trimmed Mean of M values (TMM), relative log expression (RLE), and quantile and upper-quartile normalization. Importantly, we demonstrate that this bias causes recurrent false positive calls by gene-set enrichment analysis (GSEA) methods, thereby leading to frequent functional misinterpretation of the data. Gene sets characterized by markedly short genes (e.g., ribosomal protein genes) or long genes (e.g., extracellular matrix genes) are particularly prone to such false calls. This sample-specific length bias is effectively removed by the conditional quantile normalization (cqn) and EDASeq methods, which allow the integration of gene length as a sample-specific covariate. Consequently, using these normalization methods led to substantial reduction in GSEA false results while retaining true ones. In addition, we found that application of gene-set tests that take into account gene–gene correlations attenuates false positive rates caused by the length bias, but statistical power is reduced as well. Our results advocate the inspection and correction of sample-specific length biases as default steps in RNA-seq analysis pipelines and reiterate the need to account for intergene correlations when performing gene-set enrichment tests to lessen false interpretation of transcriptomic data. Analysis of numerous RNA-seq datasets reveals a recurrent sample-specific length bias that causes frequent false positive calls by gene-set enrichment analyses, leading to functional misinterpretation of the data. Its removal requires methods that allow the integration of gene length as sample-specific covariate.
Collapse
Affiliation(s)
- Shir Mandelboum
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Zohar Manber
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Orna Elroy-Stein
- School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- * E-mail: (OE-S); (RE)
| | - Ran Elkon
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
- * E-mail: (OE-S); (RE)
| |
Collapse
|
49
|
Reyes ALP, Silva TC, Coetzee SG, Plummer JT, Davis BD, Chen S, Hazelett DJ, Lawrenson K, Berman BP, Gayther SA, Jones MR. GENAVi: a shiny web application for gene expression normalization, analysis and visualization. BMC Genomics 2019; 20:745. [PMID: 31619158 PMCID: PMC6796420 DOI: 10.1186/s12864-019-6073-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 08/29/2019] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND The development of next generation sequencing (NGS) methods led to a rapid rise in the generation of large genomic datasets, but the development of user-friendly tools to analyze and visualize these datasets has not developed at the same pace. This presents a two-fold challenge to biologists; the expertise to select an appropriate data analysis pipeline, and the need for bioinformatics or programming skills to apply this pipeline. The development of graphical user interface (GUI) applications hosted on web-based servers such as Shiny can make complex workflows accessible across operating systems and internet browsers to those without programming knowledge. RESULTS We have developed GENAVi (Gene Expression Normalization Analysis and Visualization) to provide a user-friendly interface for normalization and differential expression analysis (DEA) of human or mouse feature count level RNA-Seq data. GENAVi is a GUI based tool that combines Bioconductor packages in a format for scientists without bioinformatics expertise. We provide a panel of 20 cell lines commonly used for the study of breast and ovarian cancer within GENAVi as a foundation for users to bring their own data to the application. Users can visualize expression across samples, cluster samples based on gene expression or correlation, calculate and plot the results of principal components analysis, perform DEA and gene set enrichment and produce plots for each of these analyses. To allow scalability for large datasets we have provided local install via three methods. We improve on available tools by offering a range of normalization methods and a simple to use interface that provides clear and complete session reporting and for reproducible analysis. CONCLUSION The development of tools using a GUI makes them practical and accessible to scientists without bioinformatics expertise, or access to a data analyst with relevant skills. While several GUI based tools are currently available for RNA-Seq analysis we improve on these existing tools. This user-friendly application provides a convenient platform for the normalization, analysis and visualization of gene expression data for scientists without bioinformatics expertise.
Collapse
Affiliation(s)
- Alberto Luiz P Reyes
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Tiago C Silva
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Simon G Coetzee
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Jasmine T Plummer
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Brian D Davis
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Stephanie Chen
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Dennis J Hazelett
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Kate Lawrenson
- Women's Cancer Program, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Benjamin P Berman
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Simon A Gayther
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Michelle R Jones
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA.
| |
Collapse
|
50
|
Improved cellulase production in recombinant Saccharomyces cerevisiae by disrupting the cell wall protein-encoding gene CWP2. J Biosci Bioeng 2019; 129:165-171. [PMID: 31537451 DOI: 10.1016/j.jbiosc.2019.08.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/23/2019] [Accepted: 08/23/2019] [Indexed: 12/27/2022]
Abstract
Budding yeast Saccharomyces cerevisiae has been widely used for heterologous protein production. However, low protein production titer and secretion levels continue to challenge its practical applications. The yeast cell wall plays important roles in yeast cell growth and environmental responses. Nevertheless, the effects of yeast cell wall proteins on heterologous protein production and secretion remain unclear. CWP2 encodes a mannoprotein that is the major component of the yeast cell wall. So far, studies on its function have been very limited. Here we show that CWP2 disruption improved extracellular cellobiohydrolase activity by 85.9%. A calcofluor white hypersensitivity assay revealed increased sensitivity of the mutant compared to the parental strain, indicating impaired cell wall integrity. However, no changes were observed in normal cell growth or growth stressed by tunicamycin and dithiothreitol, suggesting that the unfolded protein response pathway was not affected by the gene disruption. Comparative transcriptome analysis revealed changes in multiple genes involved in cell wall structure, biosynthesis, and cell wall integrity induced by CWP2 disruption, suggesting a pivotal role of Cwp2p in yeast cell wall organization. Notably, CWP2 disruption also led to elevated transcription of a large number of genes involved in ribosome biogenesis, which indicated that CWP2 is not only in yeast cell wall biosynthesis, but also in protein translation. This work reveals novel insights into the functions of CWP2 and also presents a new strategy to increase heterologous protein production in yeast strains by manipulating cell wall-related proteins.
Collapse
|