1
|
Makino M, Shimizu K, Kadota K. Enhanced clustering-based differential expression analysis method for RNA-seq data. MethodsX 2024; 12:102518. [PMID: 38179066 PMCID: PMC10764243 DOI: 10.1016/j.mex.2023.102518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 12/10/2023] [Indexed: 01/06/2024] Open
Abstract
RNA-seq is a tool for measuring gene expression and is commonly used to identify differentially expressed genes (DEGs). Gene clustering has been widely used to classify DEGs with similar expression patterns, but rarely used to identify DEGs themselves. We recently reported that the clustering-based method (called MBCdeg1 and 2) for identifying DEGs has great potential. However, these methods left room for improvement. This study reports on the improvement (named MBCdeg3). We compared a total of six competing methods: three conventional R packages (edgeR, DESeq2, and TCC) and three versions of MBCdeg (i.e., MBCdeg1, 2, and 3) corresponding to three different normalization algorithms. As MBCdeg3 performs well in many simulation scenarios of RNA-seq count data, MBCdeg3 replaces MBCdeg1 and 2 in our previous report. •MBCdeg3 is a method for both identification and classification of DEGs from RNA-seq count data.•MBCdeg3 is available as a function of R, which is common in the field of expression analysis.•MBCdeg3 performs well in a variety of simulation scenarios for RNA-seq count data.
Collapse
Affiliation(s)
- Manon Makino
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Kentaro Shimizu
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Koji Kadota
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo 113-8657, Japan
- Interfaculty Initiative in Information Studies, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo 113-8657, Japan
| |
Collapse
|
2
|
Liu F, Yang Y, Xu XS, Yuan M. MESBC: A novel mutually exclusive spectral biclustering method for cancer subtyping. Comput Biol Chem 2024; 109:108009. [PMID: 38219419 DOI: 10.1016/j.compbiolchem.2023.108009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 12/22/2023] [Accepted: 12/24/2023] [Indexed: 01/16/2024]
Abstract
Many soft biclustering algorithms have been developed and applied to various biological and biomedical data analyses. However, few mutually exclusive (hard) biclustering algorithms have been proposed, which could better identify disease or molecular subtypes with survival significance based on genomic or transcriptomic data. In this study, we developed a novel mutually exclusive spectral biclustering (MESBC) algorithm based on spectral method to detect mutually exclusive biclusters. MESBC simultaneously detects relevant features (genes) and corresponding conditions (patients) subgroups and, therefore, automatically uses the signature features for each subtype to perform the clustering. Extensive simulations revealed that MESBC provided superior accuracy in detecting pre-specified biclusters compared with the non-negative matrix factorization (NMF) and Dhillon's algorithm, particularly in very noisy data. Further analysis of the algorithm on real datasets obtained from the TCGA database showed that MESBC provided more accurate (i.e., smaller p-value) overall survival prediction in patients with lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) cancers when compared to the existing, gold-standard subtypes for lung cancers (integrative clustering). Furthermore, MESBC detected several genes with significant prognostic value in both LUAD and LUSC patients. External validation on an independent, unseen GEO dataset of LUAD showed that MESBC-derived clusters based on TCGA data still exhibited clear biclustering patterns and consistent, outstanding prognostic predictability, demonstrating robust generalizability of MESBC. Therefore, MESBC could potentially be used as a risk stratification tool to optimize the treatment for the patient, improve the selection of patients for clinical trials, and contribute to the development of novel therapeutic agents.
Collapse
Affiliation(s)
- Fengrong Liu
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | - Yaning Yang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
| | | | - Min Yuan
- School of Public Health Administration, Anhui Medical University, Hefei 230032, China.
| |
Collapse
|
3
|
Singh V, Kirtipal N, Song B, Lee S. Normalization of RNA-Seq data using adaptive trimmed mean with multi-reference. Brief Bioinform 2024; 25:bbae241. [PMID: 38770720 PMCID: PMC11107385 DOI: 10.1093/bib/bbae241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/04/2024] [Accepted: 05/07/2024] [Indexed: 05/22/2024] Open
Abstract
The normalization of RNA sequencing data is a primary step for downstream analysis. The most popular method used for the normalization is the trimmed mean of M values (TMM) and DESeq. The TMM tries to trim away extreme log fold changes of the data to normalize the raw read counts based on the remaining non-deferentially expressed genes. However, the major problem with the TMM is that the values of trimming factor M are heuristic. This paper tries to estimate the adaptive value of M in TMM based on Jaeckel's Estimator, and each sample acts as a reference to find the scale factor of each sample. The presented approach is validated on SEQC, MAQC2, MAQC3, PICKRELL and two simulated datasets with two-group and three-group conditions by varying the percentage of differential expression and the number of replicates. The performance of the present approach is compared with various state-of-the-art methods, and it is better in terms of area under the receiver operating characteristic curve and differential expression.
Collapse
Affiliation(s)
- Vikas Singh
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| | - Nikhil Kirtipal
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| | - Byeongsop Song
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| | - Sunjae Lee
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| |
Collapse
|
4
|
Nießl C, Hoffmann S, Ullmann T, Boulesteix AL. Explaining the optimistic performance evaluation of newly proposed methods: A cross-design validation experiment. Biom J 2024; 66:e2200238. [PMID: 36999395 DOI: 10.1002/bimj.202200238] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 01/09/2023] [Accepted: 01/10/2023] [Indexed: 04/01/2023]
Abstract
The constant development of new data analysis methods in many fields of research is accompanied by an increasing awareness that these new methods often perform better in their introductory paper than in subsequent comparison studies conducted by other researchers. We attempt to explain this discrepancy by conducting a systematic experiment that we call "cross-design validation of methods". In the experiment, we select two methods designed for the same data analysis task, reproduce the results shown in each paper, and then reevaluate each method based on the study design (i.e., datasets, competing methods, and evaluation criteria) that was used to show the abilities of the other method. We conduct the experiment for two data analysis tasks, namely cancer subtyping using multiomic data and differential gene expression analysis. Three of the four methods included in the experiment indeed perform worse when they are evaluated on the new study design, which is mainly caused by the different datasets. Apart from illustrating the many degrees of freedom existing in the assessment of a method and their effect on its performance, our experiment suggests that the performance discrepancies between original and subsequent papers may not only be caused by the nonneutrality of the authors proposing the new method but also by differences regarding the level of expertise and field of application. Authors of new methods should thus focus not only on a transparent and extensive evaluation but also on comprehensive method documentation that enables the correct use of their methods in subsequent studies.
Collapse
Affiliation(s)
- Christina Nießl
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany
- Munich Center for Machine Learning (MCML), Munich, Germany
| | - Sabine Hoffmann
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany
- Department of Statistics, LMU Munich, Munich, Germany
| | - Theresa Ullmann
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, Germany
| |
Collapse
|
5
|
Stokes T, Cen HH, Kapranov P, Gallagher IJ, Pitsillides AA, Volmar C, Kraus WE, Johnson JD, Phillips SM, Wahlestedt C, Timmons JA. Transcriptomics for Clinical and Experimental Biology Research: Hang on a Seq. ADVANCED GENETICS (HOBOKEN, N.J.) 2023; 4:2200024. [PMID: 37288167 PMCID: PMC10242409 DOI: 10.1002/ggn2.202200024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Indexed: 06/09/2023]
Abstract
Sequencing the human genome empowers translational medicine, facilitating transcriptome-wide molecular diagnosis, pathway biology, and drug repositioning. Initially, microarrays are used to study the bulk transcriptome; but now short-read RNA sequencing (RNA-seq) predominates. Positioned as a superior technology, that makes the discovery of novel transcripts routine, most RNA-seq analyses are in fact modeled on the known transcriptome. Limitations of the RNA-seq methodology have emerged, while the design of, and the analysis strategies applied to, arrays have matured. An equitable comparison between these technologies is provided, highlighting advantages that modern arrays hold over RNA-seq. Array protocols more accurately quantify constitutively expressed protein coding genes across tissue replicates, and are more reliable for studying lower expressed genes. Arrays reveal long noncoding RNAs (lncRNA) are neither sparsely nor lower expressed than protein coding genes. Heterogeneous coverage of constitutively expressed genes observed with RNA-seq, undermines the validity and reproducibility of pathway analyses. The factors driving these observations, many of which are relevant to long-read or single-cell sequencing are discussed. As proposed herein, a reappreciation of bulk transcriptomic methods is required, including wider use of the modern high-density array data-to urgently revise existing anatomical RNA reference atlases and assist with more accurate study of lncRNAs.
Collapse
Affiliation(s)
- Tanner Stokes
- Faculty of ScienceMcMaster UniversityHamiltonL8S 4L8Canada
| | - Haoning Howard Cen
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | - Iain J Gallagher
- School of Applied SciencesEdinburgh Napier UniversityEdinburghEH11 4BNUK
| | | | | | | | - James D. Johnson
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | | | - James A. Timmons
- Miller School of MedicineUniversity of MiamiMiamiFL33136USA
- William Harvey Research InstituteQueen Mary University LondonLondonEC1M 6BQUK
- Augur Precision Medicine LTDStirlingFK9 5NFUK
| |
Collapse
|
6
|
Guan S, Zhong L, Yu H, Wang L, Jin Y, Liu J, Xiang H, Yu H, Wang L, Wang D. Molecular docking and proteomics reveals the synergistic antibacterial mechanism of theaflavin with β-lactam antibiotics against MRSA. Front Microbiol 2022; 13:993430. [PMID: 36452924 PMCID: PMC9702817 DOI: 10.3389/fmicb.2022.993430] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 10/11/2022] [Indexed: 04/09/2024] Open
Abstract
Recurrent epidemics of methicillin-resistant Staphylococcus aureus (S. aureus) (MRSA) have illustrated that the effectiveness of antibiotics in clinical application is rapidly fading. A feasible approach is to combine natural products with existing antibiotics to achieve an antibacterial effect. In this molecular docking study, we found that theaflavin (TF) preferentially binds the allosteric site of penicillin-binding protein 2a (PBP2a), inducing the PBP2a active site to open, which is convenient for β-lactam antibiotics to treat MRSA infection, instead of directly exerting antibacterial activity at the active site. Subsequent TMT-labeled proteomics analysis showed that TF treatment did not significantly change the landscape of the S. aureus USA300 proteome. Checkerboard dilution tests and kill curve assays were performed to validate the synergistic effect of TF and ceftiofur, and the fractional inhibitory concentration index (FICI) was 0.1875. The antibacterial effect of TF combined with ceftiofur was better than that of single-drug treatment in vitro. In addition, TF effectively enhanced the activity of ceftiofur in a mouse model of MRSA-induced pneumonia. Our findings provide a potential therapeutic strategy to combine existing antibiotics with natural products to resolve the prevalent infections of multidrug-resistant pathogens.
Collapse
Affiliation(s)
- Shuhan Guan
- College of Animal Science, Jilin University, Changchun, China
| | - Ling Zhong
- College of Animal Science, Jilin University, Changchun, China
| | - Hangqian Yu
- College of Animal Science, Jilin University, Changchun, China
| | - Li Wang
- Changchun University of Chinese Medicine, Changchun, China
| | - Yajing Jin
- College of Animal Science, Jilin University, Changchun, China
| | - Jingyu Liu
- College of Animal Science, Jilin University, Changchun, China
| | - Hua Xiang
- College of Animal Medicine, Jilin Agricultural University, Changchun, China
| | - Hao Yu
- College of Animal Science, Jilin University, Changchun, China
| | - Lin Wang
- State Key Laboratory for Zoonotic Diseases, Institute of Zoonosis, College of Veterinary Medicine, Jilin University, Changchun, China
| | - Dacheng Wang
- College of Animal Science, Jilin University, Changchun, China
| |
Collapse
|