1
|
Identification of Plasma Biomarkers from Rheumatoid Arthritis Patients Using an Optimized Sequential Window Acquisition of All THeoretical Mass Spectra (SWATH) Proteomics Workflow. Proteomes 2023; 11:32. [PMID: 37873874 PMCID: PMC10594463 DOI: 10.3390/proteomes11040032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/28/2023] [Accepted: 10/02/2023] [Indexed: 10/25/2023] Open
Abstract
Rheumatoid arthritis (RA) is a systemic autoimmune and inflammatory disease. Plasma biomarkers are critical for understanding disease mechanisms, treatment effects, and diagnosis. Mass spectrometry-based proteomics is a powerful tool for unbiased biomarker discovery. However, plasma proteomics is significantly hampered by signal interference from high-abundance proteins, low overall protein coverage, and high levels of missing data from data-dependent acquisition (DDA). To achieve quantitative proteomics analysis for plasma samples with a balance of throughput, performance, and cost, we developed a workflow incorporating plate-based high abundance protein depletion and sample preparation, comprehensive peptide spectral library building, and data-independent acquisition (DIA) SWATH mass spectrometry-based methodology. In this study, we analyzed plasma samples from both RA patients and healthy donors. The results showed that the new workflow performance exceeded that of the current state-of-the-art depletion-based plasma proteomic platforms in terms of both data quality and proteome coverage. Proteins from biological processes related to the activation of systemic inflammation, suppression of platelet function, and loss of muscle mass were enriched and differentially expressed in RA. Some plasma proteins, particularly acute-phase reactant proteins, showed great power to distinguish between RA patients and healthy donors. Moreover, protein isoforms in the plasma were also analyzed, providing even deeper proteome coverage. This workflow can serve as a basis for further application in discovering plasma biomarkers of other diseases.
Collapse
|
2
|
Abstract
Alternative splicing is pivotal to the regulation of gene expression and protein diversity in eukaryotic cells. The detection of alternative splicing events requires specific omics technologies. Although short-read RNA sequencing has successfully supported a plethora of investigations on alternative splicing, the emerging technologies of long-read RNA sequencing and top-down mass spectrometry open new opportunities to identify alternative splicing and protein isoforms with less ambiguity. Here, we summarize improvements in short-read RNA sequencing for alternative splicing analysis, including percent splicing index estimation and differential analysis. We also review the computational methods used in top-down proteomics analysis regarding proteoform identification, including the construction of databases of protein isoforms and statistical analyses of search results. While many improvements in sequencing and computational methods will result from emerging technologies, there should be future endeavors to increase the effectiveness, integration, and proteome coverage of alternative splicing events.
Collapse
|
3
|
Network integration and protein structural binding analysis of neurodegeneration-related interactome. Brief Bioinform 2023:bbad237. [PMID: 37350526 DOI: 10.1093/bib/bbad237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 05/25/2023] [Accepted: 06/08/2023] [Indexed: 06/24/2023] Open
Abstract
Neurodegenerative diseases (NDs) usually connect with aggregation and molecular interactions of pathological proteins. The integration of accumulative data from clinical and biomedical research will allow for the excavation of pathological proteins and related interactors. It is also important to systematically study their interacting proteins in order to find more related proteins and potential therapeutic targets. Understanding binding regions in protein interactions will help functional proteomics and provide an alternative method for predicting novel interactions. This study integrated data from biomedical research to achieve systematic mining and analysis of pathogenic proteins and their interaction network. A workflow has been built as a solution for the collective information of proteins involved in NDs, related protein-protein interactions (PPIs) and interactive visualizations. It also included protein isoforms and mapped them in a disease-related PPI network to illuminate the impact of alternative splicing on protein binding. The interacting proteins enriched by diseases and biological processes (BPs) revealed possible regulatory modules. A high-resolution network with structural affinity information was generated. Finally, Neurodegenerative Disease Atlas (NDAtlas) was constructed with an interactive and intuitive view of protein docking with 3D molecular graphics beyond the traditional 2D network. NDAtlas is available at http://bis.zju.edu.cn/ndatlas.
Collapse
|
4
|
Characterization of the genomic and transcriptional structure of chicken NRG4 gene. YI CHUAN = HEREDITAS 2023; 45:447-458. [PMID: 37194591 DOI: 10.16288/j.yczz.23-001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Neuregulin 4 (NRG4) is an important adipocytokine, which plays crucial roles in maintaining energy balance, regulating glucose and lipid metabolism, and preventing non-alcoholic fatty liver disease in mammals. At present, the genomic organization, transcript and protein isoforms of human NRG4 gene have been fully explored. Previous studies in our laboratory have shown that the NRG4 gene is expressed in chicken adipose tissue, but the chicken NRG4 (cNRG4) genomic structure, transcript and protein isoforms are still unknown. To this end, in this study, the genomic and transcriptional structure of the cNRG4 gene were systematically investigated using rapid amplification of cDNA ends (RACE) and reverse transcription-polymerase chain reaction (RT-PCR). The results showed that the coding region (CDS) of the cNRG4 gene was small, but it had a very complex transcriptional structure characterized by multiple transcription start sites, alternative splicing, intron retention, cryptic exons, and alternative polyadenylation, thus leading to production of four 5?UTR isoforms (cNRG4 A, cNRG4 B, cNRG4 C, and cNRG4 D) and six 3?UTR isoforms (cNRG4 a, cNRG4 b, cNRG4 c, cNRG4 d, cNRG4 e, and cNRG4 f) of the cNRG4 gene. The cNRG4 gene spanned 21,969 bp of genomic DNA (Chr.10:3,490,314~3,512,282) and consisted of 11 exons and 10 introns. Compared with the cNRG4 gene mRNA sequence (NM_001030544.4), two novel exons and one cryptic exon of the cNRG4 gene were identified in this study. Bioinformatics analysis, RT-PCR, cloning and sequencing analysis showed that the cNRG4 gene could encode three protein isoforms (cNRG4-1, cNRG4-2 and cNRG4-3). This study lays a foundation for further research on the function and regulation of the cNRG4 gene.
Collapse
|
5
|
Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing. J Proteome Res 2022; 21:1628-1639. [PMID: 35612954 DOI: 10.1021/acs.jproteome.1c00968] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.
Collapse
|
6
|
Sweet taste perception in mice is blunted by PTBP1-regulated skipping of Tas1r2 exon 4. Chem Senses 2022; 47:6884719. [PMID: 36484118 DOI: 10.1093/chemse/bjac034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Taste perception, initiated by activation of taste receptors in taste bud cells, is crucial for regulating nutrient intake. Genetic polymorphisms in taste receptor genes cannot fully explain the wide individual variations of taste sensitivity. Alternative splicing (AS) is a ubiquitous posttranscriptional mode of gene regulation that enriches the functional diversity of proteins. Here, we report the identification of a novel splicing variant of sweet taste receptor gene Tas1r2 (Tas1r2_∆e4) in mouse taste buds and the mechanism by which it diminishes sweet taste responses in vitro and in vivo. Skipping of Tas1r2 exon 4 in Tas1r2_∆e4 led to loss of amino acids in the extracellular Venus flytrap domain, and the truncated isoform reduced the response of sweet taste receptors (STRs) to all sweet compounds tested by generating nonfunctional T1R2/T1R3 STR heterodimers. The splicing factor PTBP1 (polypyrimidine tract-binding protein 1) promoted Tas1r2_∆e4 generation through binding to a polypyrimidine-rich splicing silencer in Tas1r2 exon 4, thus decreasing STR function and sweet taste perception in mice. Taken together, these data reveal the existence of a regulated AS event in Tas1r2 expression and its effect on sweet taste perception, providing a novel mechanism for modulating taste sensitivity at the posttranscriptional level.
Collapse
|
7
|
Proteogenomics Integrating Novel Junction Peptide Identification Strategy Discovers Three Novel Protein Isoforms of Human NHSL1 and EEF1B2. J Proteome Res 2021; 20:5294-5303. [PMID: 34420305 DOI: 10.1021/acs.jproteome.1c00373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In eukaryotes, alternative pre-mRNA splicing allows a single gene to encode different protein isoforms that function in many biological processes, and they are used as biomarkers or therapeutic targets for diseases. Although protein isoforms in the human genome are well annotated, we speculate that some low-abundance protein isoforms may still be under-annotated because most genes have a primary coding product and alternative protein isoforms tend to be under-expressed. A peptide coencoded by a novel exon and an annotated exon separated by an intron is known as a novel junction peptide. In the absence of known transcripts and homologous proteins, traditional whole-genome six-frame translation-based proteogenomics cannot identify novel junction peptides, and it cannot capture novel alternative splice sites. In this article, we first propose a strategy and tool for identifying novel junction peptides, called CJunction, which we then integrate into a proteogenomics process specifically designed for novel protein isoform discovery and apply to the analysis of a deep-coverage HeLa mass spectrometry data set with identifier PXD004452 in ProteomeXchange. We succeeded in identifying and validating three novel protein isoforms of two functionally important genes, NHSL1 (causative gene of Nance-Horan syndrome) and EEF1B2 (translation elongation factor), which validate our hypothesis. These novel protein isoforms have significant sequence differences from the annotated gene-coding products introduced by the novel N-terminal, suggesting that they may play importantly different functions.
Collapse
|
8
|
The GTPase Domain of MX2 Interacts with the HIV-1 Capsid, Enabling Its Short Isoform to Moderate Antiviral Restriction. Cell Rep 2020; 29:1923-1933.e3. [PMID: 31722207 PMCID: PMC7391006 DOI: 10.1016/j.celrep.2019.10.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 07/18/2019] [Accepted: 10/02/2019] [Indexed: 01/27/2023] Open
Abstract
Myxovirus resistance 2 (MX2/MXB) is an interferon (IFN)-induced HIV-1 restriction factor that inhibits viral nuclear DNA accumulation. The amino-terminal domain of MX2 binds the viral capsid and is essential for inhibition. Using in vitro assembled Capsid-Nucleocapsid (CANC) complexes as a surrogate for the HIV-1 capsid lattice, we reveal that the GTPase (G) domain of MX2 contains a second, independent capsid-binding site. The importance of this interaction was addressed in competition assays using the naturally occurring non-antiviral short isoform of MX2 that lacks the amino-terminal 25 amino acids. Specifically, these experiments show that the G domain enhances MX2 function, and the foreshortened isoform acts as a functional suppressor of the full-length protein in a G-domain-dependent manner. The interaction of MX2 with its HIV-1 capsid substrate is therefore multi-faceted: there are dual points of contact that, together with protein oligomerization, contribute to the complexity of MX2 regulation. MX2 interacts with the HIV-1 capsid via N-terminal and GTPase (G) domains The G-domain interaction enhances MX2 binding to the viral capsid The MX2 short isoform is not antiviral and binds the capsid through its G domain The MX2 short isoform suppresses the antiviral activity of the long isoform
Collapse
|
9
|
About three-fourths of mouse proteins unexpectedly appear at a low position of SDS-PAGE, often as additional isoforms, questioning whether all protein isoforms have been eliminated in gene-knockout cells or organisms. Protein Sci 2020; 29:978-990. [PMID: 31930537 DOI: 10.1002/pro.3823] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Revised: 01/01/2020] [Accepted: 01/05/2020] [Indexed: 01/08/2023]
Abstract
Most genes in evolutionarily complex genomes are expressed to multiple protein isoforms, but there is not yet any simple high-throughput approach to identify these isoforms. Using an oversimplified top-down LC-MS/MS strategy, we detected, around the 26-kD position of SDS-PAGE, proteins produced from 782 genes in a Cdk4-/- mouse embryonic fibroblast cell line. Interestingly, only 213 (27.24%, about one-fourth) of these 782 genes have their proteins with a theoretical molecular mass (TMM) 10% smaller or larger than 26 kD, that is, between 23 and 29 kD, the range set as allowed variation in SDS-PAGE. These 213 proteins are considered as the wild type (WT). The remaining three-fourths includes proteins from 66 (9.44%) genes with a TMM smaller than 23 kD and proteins from 503 (64.32%, nearly two-thirds) genes with a TMM larger than 29 kD; these proteins are categorized into a larger-group or a smaller-group, respectively, for their appearance at a higher or lower position of SDS-PAGE. For instance, at this 26-kD position we detected proteins from the Rps27a, Snrpf, Hist1h4a, and Rps25 genes whose proteins' TMM is 8.6, 9.7, 11.4, and 13.7 kD, respectively, and detected proteins from the Plelc1 and Prkdc genes, whose largest isoform is 533.9 and 471.1 kD, respectively. We extrapolate that many of those proteins migrating unexpectedly in SDS-PAGE may be isoforms besides the WT protein. Moreover, we also detected a Cdk4 protein in this Cdk4-/- cell line, thus wondering whether some of other gene-knockout cells or organisms show similar incompleteness of the knockout.
Collapse
|
10
|
Discovery of the Liver Hyaluronan Receptor for Endocytosis (HARE) and Its Progressive Emergence as the Multi-Ligand Scavenger Receptor Stabilin-2. Biomolecules 2019; 9:biom9090454. [PMID: 31500161 PMCID: PMC6769870 DOI: 10.3390/biom9090454] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 08/30/2019] [Accepted: 09/02/2019] [Indexed: 12/14/2022] Open
Abstract
Since the discovery of a novel liver hyaluronan (HA) clearance receptor in 1981 by Laurent, Fraser and coworkers, 22 different ligands cleared by the renamed receptor (the Hyaluronan Receptor for Endocytosis (HARE); Stabilin-2 (Stab2)) were discovered over 37 years. Ligands fall into three groups: (1) 11 anionic polymers, (2) seven cleaved or modified proteins and (3) four types of cells. Seven synthetic ligands, not found normally in serum or tissues, likely mimic natural molecules cleared by the receptor. In 2002 we purified and cloned HARE, based on HA-binding activity, and two other groups cloned full-length receptor; FEEL-2 and Stab2. Macrophages likely require full-length Stab2 for efficient binding and phagocytosis of bacteria or apoptotic cells, since cell-binding domains are throughout the receptor. In contrast, all 16 known single-molecule binding sites are only within the C-terminal half (190HARE). The HARE isoform is generated by proteolysis, not mRNA splicing. The majority of circulating ligands is cleared by HARE, since sinusoidal endothelial cells of liver, spleen and lymph node express twice as many HARE half-receptors as full-length receptors. Based on their significant binding and functional differences, a modified receptor nomenclature is proposed that designates HARE as the C-terminal half-receptor isoform and Stab2 as the full-length receptor isoform.
Collapse
|
11
|
Large Scale Profiling of Protein Isoforms Using Label-Free Quantitative Proteomics Revealed the Regulation of Nonsense-Mediated Decay in Moso Bamboo ( Phyllostachys edulis). Cells 2019; 8:E744. [PMID: 31330982 PMCID: PMC6678154 DOI: 10.3390/cells8070744] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 07/12/2019] [Accepted: 07/16/2019] [Indexed: 12/13/2022] Open
Abstract
Moso bamboo is an important forest species with a variety of ecological, economic, and cultural values. However, the gene annotation information of moso bamboo is only based on the transcriptome sequencing, lacking the evidence of proteome. The lignification and fiber in moso bamboo leads to a difficulty in the extraction of protein using conventional methods, which seriously hinders research on the proteomics of moso bamboo. The purpose of this study is to establish efficient methods for extracting the total proteins from moso bamboo for following mass spectrometry-based quantitative proteome identification. Here, we have successfully established a set of efficient methods for extracting total proteins of moso bamboo followed by mass spectrometry-based label-free quantitative proteome identification, which further improved the protein annotation of moso bamboo genes. In this study, 10,376 predicted coding genes were confirmed by quantitative proteomics, accounting for 35.8% of all annotated protein-coding genes. Proteome analysis also revealed the protein-coding potential of 1015 predicted long noncoding RNA (lncRNA), accounting for 51.03% of annotated lncRNAs. Thus, mass spectrometry-based proteomics provides a reliable method for gene annotation. Especially, quantitative proteomics revealed the translation patterns of proteins in moso bamboo. In addition, the 3284 transcript isoforms from 2663 genes identified by Pacific BioSciences (PacBio) single-molecule real-time long-read isoform sequencing (Iso-Seq) was confirmed on the protein level by mass spectrometry. Furthermore, domain analysis of mass spectrometry-identified proteins encoded in the same genomic locus revealed variations in domain composition pointing towards a functional diversification of protein isoform. Finally, we found that part transcripts targeted by nonsense-mediated mRNA decay (NMD) could also be translated into proteins. In summary, proteomic analysis in this study improves the proteomics-assisted genome annotation of moso bamboo and is valuable to the large-scale research of functional genomics in moso bamboo. In summary, this study provided a theoretical basis and technical support for directional gene function analysis at the proteomics level in moso bamboo.
Collapse
|
12
|
Investigating protein patterns in human leukemia cell line experiments: A Bayesian approach for extremely small sample sizes. Stat Methods Med Res 2019; 29:1181-1196. [PMID: 31172886 DOI: 10.1177/0962280219852721] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Human cancer cell line experiments are valuable for investigating drug sensitivity biomarkers. The number of biomarkers measured in these experiments is typically on the order of several thousand, whereas the number of samples is often limited to one or at most three replicates for each experimental condition. We have developed an innovative Bayesian approach that efficiently identifies clusters of proteins that exhibit similar patterns of expression. Motivated by the availability of ion mobility mass spectrometry data on cell line experiments in myelodysplastic syndrome and acute myeloid leukemia, our methodology can identify proteins that follow biologically meaningful trends of expression. Extensive simulation studies demonstrate good performance of the proposed method even in the presence of relatively small effects and sample sizes.
Collapse
|
13
|
Expression of murine muscle-enriched A-type lamin-interacting protein (MLIP) is regulated by tissue-specific alternative transcription start sites. J Biol Chem 2018; 293:19761-19770. [PMID: 30389785 DOI: 10.1074/jbc.ra118.003758] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 10/28/2018] [Indexed: 11/06/2022] Open
Abstract
Muscle-enriched lamin-interacting protein (Mlip) is an alternatively spliced gene whose splicing specificity is dictated by tissue type. MLIP is most abundantly expressed in brain, cardiac, and skeletal muscle. In the present study, we systematically mapped the transcriptional start and stop sites of murine Mlip Rapid amplification of cDNA ends (RACE) of Mlip transcripts from the brain, heart, and skeletal muscle revealed two transcriptional start sites (TSSs), exon 1a and exon 1b, and only one transcriptional termination site. RT-PCR analysis of the usage of the two identified TSSs revealed that the heart utilizes only exon 1a for MLIP expression, whereas the brain exclusively uses exon 1b TSS. Loss of Mlip exon 1a in mice resulted in a 7-fold increase in the prevalence of centralized nuclei in muscle fibers with the Mlip exon1a-deficient satellite cells on single fibers exhibiting a significant delay in commitment to a MYOD-positive phenotype. Furthermore, we demonstrate that the A-type lamin-binding domain in MLIP is encoded in exon 1a, indicating that MLIP isoforms generated with exon 1b TSS lack the A-type lamin-binding domain. Collectively these findings suggest that Mlip tissue-specific expression and alternative splicing play a critical role in determining MLIP's functions in mice.
Collapse
|
14
|
Framework and resource for more than 11,000 gene-transcript-protein-reaction associations in human metabolism. Proc Natl Acad Sci U S A 2017; 114:E9740-E9749. [PMID: 29078384 DOI: 10.1073/pnas.1713050114] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Alternative splicing plays important roles in generating different transcripts from one gene, and consequently various protein isoforms. However, there has been no systematic approach that facilitates characterizing functional roles of protein isoforms in the context of the entire human metabolism. Here, we present a systematic framework for the generation of gene-transcript-protein-reaction associations (GeTPRA) in the human metabolism. The framework in this study generated 11,415 GeTPRA corresponding to 1,106 metabolic genes for both principal and nonprincipal transcripts (PTs and NPTs) of metabolic genes. The framework further evaluates GeTPRA, using a human genome-scale metabolic model (GEM) that is biochemically consistent and transcript-level data compatible, and subsequently updates the human GEM. A generic human GEM, Recon 2M.1, was developed for this purpose, and subsequently updated to Recon 2M.2 through the framework. Both PTs and NPTs of metabolic genes were considered in the framework based on prior analyses of 446 personal RNA-Seq data and 1,784 personal GEMs reconstructed using Recon 2M.1. The framework and the GeTPRA will contribute to better understanding human metabolism at the systems level and enable further medical applications.
Collapse
|
15
|
Probably less than one-tenth of the genes produce only the wild type protein without at least one additional protein isoform in some human cancer cell lines. Oncotarget 2017; 8:82714-82727. [PMID: 29137297 PMCID: PMC5669923 DOI: 10.18632/oncotarget.20015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 06/30/2017] [Indexed: 11/25/2022] Open
Abstract
To estimate how many genes produce multiple protein isoforms, we electrophoresed proteins from MCF7 and MDA-MB231 (MB231) human breast cancer cells in SDS-PAGE and excised narrow stripes of the gel at the 48kD, 55kD and 72kD. Proteins in these stripes were identified using liquid chromatography and tandem mass spectrometry. A total of 765, 750 and 679 proteins from MB231 cells, as well as 470, 390 and 490 proteins from MCF7 cells, were identified from the 48kD, 55kD and 72kD stripes, respectively. We arbitrarily allowed a 10% technical variation from the proteins' theoretical molecular mass (TMM) and considered those proteins with their TMMs within the 43-53 kD, 49-61 kD and 65-79 kD ranges as the wild type (WT) expected from the corresponding stripe, whereas those with a TMM above or below this range as a smaller- or larger-group, respectively. Only 263 (34.4%), 269 (35.9%) and 151 (22.2%) proteins from MB231 cells and 117 (24.9%), 135 (34.6%) and 130 (26.5%) proteins from MCF7 cells from the 48kD, 55kD and 72kD stripes, respectively, belonged to the WT, while the remaining majority belonged to the smaller- or larger-groups. Only about 3-16%, on average about 10% regardless of the stripe and cell line, of the proteins appeared in only one stripe and within the WT range, while the remaining preponderance appeared also in additional stripe(s) or had a larger or smaller TMM. We conclude that few (fewer than 10%) of the human genes produce only the WT protein without additional isoform(s).
Collapse
|
16
|
Constrained selected reaction monitoring: quantification of selected post-translational modifications and protein isoforms. Methods 2013; 61:304-12. [PMID: 23523700 PMCID: PMC3990191 DOI: 10.1016/j.ymeth.2013.03.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Revised: 03/02/2013] [Accepted: 03/06/2013] [Indexed: 10/27/2022] Open
Abstract
Selected reaction monitoring (SRM) is a mass spectrometry method that can target signature peptides to provide for the detection and quantitation of specific proteins in complex biological samples. When quantifying a protein, multiple peptides are generated using a specific protease such as trypsin, thereby allowing a choice of signature peptides with robust signals. In contrast, signature peptide selection can be constrained when the goal is to monitor a specific post-translational modification (PTM) or protein isoform, as the signature peptide must include the amino acid residue(s) of PTM attachment or sequence variation. This can force the selection of a signature peptide with a weak SRM response or one that is confounded by high background. In this article, we discuss steps that can be optimized to maximize peptide selection and assay performance of constrained SRM assays, including tuning instrument parameters, fragmenting product ions, using a different protease, and enriching the sample. Examples are provided for phosphorylated or citrullinated peptides and protein isoforms.
Collapse
|
17
|
Isoforms of the neuronal glutamate transporter gene, SLC1A1/EAAC1, negatively modulate glutamate uptake: relevance to obsessive-compulsive disorder. Transl Psychiatry 2013; 3:e259. [PMID: 23695234 PMCID: PMC3669922 DOI: 10.1038/tp.2013.35] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The SLC1A1 gene, which encodes the neuronal glutamate transporter, EAAC1, has consistently been implicated in obsessive-compulsive disorder (OCD) in genetic studies. Moreover, neuroimaging, biochemical and clinical studies support a role for glutamatergic dysfunction in OCD. Although SLC1A1 is an excellent candidate gene for OCD, little is known about its regulation at the genomic level. Here, we report the identification and characterization of three alternative SLC1A1/EAAC1 mRNAs: a transcript derived from an internal promoter, termed P2 to distinguish it from the transcript generated by the primary promoter (P1), and two alternatively spliced mRNAs: ex2skip, which is missing exon 2, and ex11skip, which is missing exon 11. All isoforms inhibit glutamate uptake from the full-length EAAC1 transporter. Ex2skip and ex11skip also display partial colocalization and interact with the full-length EAAC1 protein. The three isoforms are evolutionarily conserved between human and mouse, and are expressed in brain, kidney and lymphocytes under nonpathological conditions, suggesting that the isoforms are physiological regulators of EAAC1. Moreover, under specific conditions, all SLC1A1 transcripts were differentially expressed in lymphocytes derived from subjects with OCD compared with controls. These initial results reveal the complexity of SLC1A1 regulation and the potential clinical utility of profiling glutamatergic gene expression in OCD and other psychiatric disorders.
Collapse
|
18
|
Alternative splicing in the aldo-keto reductase superfamily: implications for protein nomenclature. Chem Biol Interact 2013; 202:153-8. [PMID: 23298867 PMCID: PMC3758225 DOI: 10.1016/j.cbi.2012.12.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Revised: 12/06/2012] [Accepted: 12/18/2012] [Indexed: 12/13/2022]
Abstract
The aldo-keto reductase superfamily contains 173 proteins which are present in all phyla. Examination of the human and mouse genomes has identified that in some instances a single AKR gene can give rise to alternatively spliced mRNA variants which in some cases can give rise to more than one protein isoform. This is currently well documented in the AKR6A subfamily which contains the β-subunits of the voltage-gated potassium ion channels. With the emergence of second generation sequencing it is likely that the occurrence of transcript variants and protein isoforms from a single AKR gene may become common place. To deal with this issue we recommend that the Ensembl data-base nomenclature be used to annotate the transcript variants from a single AKR gene. However, since multiple transcript variants could give rise to either the same or multiple protein isoforms from the same AKR gene we also propose to expand the nomenclature of the AKR protein superfamily, so that when a protein isoform is shown to be expressed and is functional it would be assigned the standard AKR name followed by a "period or full-stop" and a number for that unique isoform. Numbers will be assigned chronologically and linked to the respective transcripts annotated in Ensembl e.g. AKR6A5.1 (Kvβ2.1) (AKR6A5-001, -006 and -201), followed by AKR6A5.2 (Kvβ2.2) (AKR6A5-002,-202). This nomenclature is expandable and it enables multiple protein isoforms to be assigned to their respective transcripts when they arise from the same AKR gene or for a single protein isoform to be assigned to multiple transcripts when the transcripts encode the same AKR protein.
Collapse
|