Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

47
(from Reference Citation Analysis)

Article PDFs (22)

Cited by > 0 (42)

Searched Name

Nicolas Carels

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Unveiling the Dynamics behind Glioblastoma Multiforme Single-Cell Data Heterogeneity. Int J Mol Sci 2024;25:4894. [PMID: 38732140 PMCID: PMC11084314 DOI: 10.3390/ijms25094894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 04/02/2024] [Accepted: 04/03/2024] [Indexed: 05/13/2024] Open Abstract Glioblastoma Multiforme is a brain tumor distinguished by its aggressiveness. We suggested that this aggressiveness leads single-cell RNA-sequence data (scRNA-seq) to span a representative portion of the cancer attractors domain. This conjecture allowed us to interpret the scRNA-seq heterogeneity as reflecting a representative trajectory within the attractor's domain. We considered factors such as genomic instability to characterize the cancer dynamics through stochastic fixed points. The fixed points were derived from centroids obtained through various clustering methods to verify our method sensitivity. This methodological foundation is based upon sample and time average equivalence, assigning an interpretative value to the data cluster centroids and supporting parameters estimation. We used stochastic simulations to reproduce the dynamics, and our results showed an alignment between experimental and simulated dataset centroids. We also computed the Waddington landscape, which provided a visual framework for validating the centroids and standard deviations as characterizations of cancer attractors. Additionally, we examined the stability and transitions between attractors and revealed a potential interplay between subtypes. These transitions might be related to cancer recurrence and progression, connecting the molecular mechanisms of cancer heterogeneity with statistical properties of gene expression dynamics. Our work advances the modeling of gene expression dynamics and paves the way for personalized therapeutic interventions. Collapse Key Words Glioblastoma Multiforme cancer attractors epigenetic landscape gene regulatory network dynamics heterogeneity parameter sets estimation single-cell RNA sequencing Collapse MESH Headings Glioblastoma/genetics Glioblastoma/pathology Glioblastoma/metabolism Humans Single-Cell Analysis/methods Brain Neoplasms/genetics Brain Neoplasms/pathology Brain Neoplasms/metabolism Gene Expression Regulation, Neoplastic Genetic Heterogeneity Gene Expression Profiling/methods Genomic Instability Sequence Analysis, RNA/methods Cluster Analysis Collapse Grants 88887.597339/2021-00 Coordenação de Aperfeicoamento de Pessoal de Nível Superior Collapse
2	Optimizing therapeutic targets for breast cancer using boolean network models. Comput Biol Chem 2024;109:108022. [PMID: 38350182 DOI: 10.1016/j.compbiolchem.2024.108022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 09/18/2023] [Accepted: 01/31/2024] [Indexed: 02/15/2024] Abstract Studying gene regulatory networks associated with cancer provides valuable insights for therapeutic purposes, given that cancer is fundamentally a genetic disease. However, as the number of genes in the system increases, the complexity arising from the interconnections between network components grows exponentially. In this study, using Boolean logic to adjust the existing relationships between network components has facilitated simplifying the modeling process, enabling the generation of attractors that represent cell phenotypes based on breast cancer RNA-seq data. A key therapeutic objective is to guide cells, through targeted interventions, to transition from the current cancer attractor to a physiologically distinct attractor unrelated to cancer. To achieve this, we developed a computational method that identifies network nodes whose inhibition can facilitate the desired transition from one tumor attractor to another associated with apoptosis, leveraging transcriptomic data from cell lines. To validate the model, we utilized previously published in vitro experiments where the downregulation of specific proteins resulted in cell growth arrest and death of a breast cancer cell line. The method proposed in this manuscript combines diverse data sources, conducts structural network analysis, and incorporates relevant biological knowledge on apoptosis in cancer cells. This comprehensive approach aims to identify potential targets of significance for personalized medicine. Collapse Key Words Apoptosis Boolean networks Epigenetic landscape attractors Gene regulatory network analysis Systems biology of cancer Collapse MESH Headings Humans Female Models, Genetic Breast Neoplasms/genetics Algorithms Gene Regulatory Networks MCF-7 Cells Models, Biological Collapse Grants Collapse
3	A Strategy Utilizing Protein-Protein Interaction Hubs for the Treatment of Cancer Diseases. Int J Mol Sci 2023;24:16098. [PMID: 38003288 PMCID: PMC10671768 DOI: 10.3390/ijms242216098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 09/04/2023] [Accepted: 09/12/2023] [Indexed: 11/26/2023] Open Abstract We describe a strategy for the development of a rational approach of neoplastic disease therapy based on the demonstration that scale-free networks are susceptible to specific attacks directed against its connective hubs. This strategy involves the (i) selection of up-regulated hubs of connectivity in the tumors interactome, (ii) drug repurposing of these hubs, (iii) RNA silencing of non-druggable hubs, (iv) in vitro hub validation, (v) tumor-on-a-chip, (vi) in vivo validation, and (vii) clinical trial. Hubs are protein targets that are assessed as targets for rational therapy of cancer in the context of personalized oncology. We confirmed the existence of a negative correlation between malignant cell aggressivity and the target number needed for specific drugs or RNA interference (RNAi) to maximize the benefit to the patient's overall survival. Interestingly, we found that some additional proteins not generally targeted by drug treatments might justify the addition of inhibitors designed against them in order to improve therapeutic outcomes. However, many proteins are not druggable, or the available pharmacopeia for these targets is limited, which justifies a therapy based on encapsulated RNAi. Collapse Key Words RNA-seq RNAi attractors clinical trial drug repurposing hubs in vivo validation interactome tumor on a chip tumors Collapse MESH Headings Humans Neoplasms/drug therapy Neoplasms/genetics Protein Interaction Mapping Collapse Grants No grant number Brazilian government (Fiocruz) Collapse
4	Plant Tolerance to Drought Stress with Emphasis on Wheat. PLANTS (BASEL, SWITZERLAND) 2023;12:plants12112170. [PMID: 37299149 DOI: 10.3390/plants12112170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 03/16/2023] [Accepted: 03/29/2023] [Indexed: 06/12/2023] Abstract Environmental stresses, such as drought, have negative effects on crop yield. Drought is a stress whose impact tends to increase in some critical regions. However, the worldwide population is continuously increasing and climate change may affect its food supply in the upcoming years. Therefore, there is an ongoing effort to understand the molecular processes that may contribute to improving drought tolerance of strategic crops. These investigations should contribute to delivering drought-tolerant cultivars by selective breeding. For this reason, it is worthwhile to review regularly the literature concerning the molecular mechanisms and technologies that could facilitate gene pyramiding for drought tolerance. This review summarizes achievements obtained using QTL mapping, genomics, synteny, epigenetics, and transgenics for the selective breeding of drought-tolerant wheat cultivars. Synthetic apomixis combined with the msh1 mutation opens the way to induce and stabilize epigenomes in crops, which offers the potential of accelerating selective breeding for drought tolerance in arid and semi-arid regions. Collapse Key Words ChIP QTL climate change epigenetic genomics histone code transcription factors transgenic crops Collapse MESH Headings Collapse Grants Collapse
5	Agathisflavone, a natural biflavonoid that inhibits SARS-CoV-2 replication by targeting its proteases. Int J Biol Macromol 2022;222:1015-1026. [PMID: 36183752 PMCID: PMC9525951 DOI: 10.1016/j.ijbiomac.2022.09.204] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/20/2022] [Accepted: 09/22/2022] [Indexed: 11/16/2022] Abstract Despite the fast development of vaccines, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) still circulates through variants of concern (VoC) and escape the humoral immune response. SARS-CoV-2 has provoked over 200,000 deaths/months since its emergence and only a few antiviral drugs showed clinical benefit up to this moment. Thus, chemical structures endowed with anti-SARS-CoV-2 activity are important for continuous antiviral development and natural products represent a fruitful source of substances with biological activity. In the present study, agathisflavone (AGT), a biflavonoid from Anacardium occidentale was investigated as a candidate anti-SARS-CoV-2 compound. In silico and enzymatic analysis indicated that AGT may target mainly the viral main protease (M^pro) and not the papain-like protease (PL^pro) in a non-competitive way. Cell-based assays in type II pneumocytes cell lineage (Calu-3) showed that SARS-CoV-2 is more susceptible to AGT than to apigenin (APG, monomer of AGT), in a dose-dependent manner, with an EC₅₀ of 4.23 ± 0.21 μM and CC₅₀ of 61.3 ± 0.1 μM and with a capacity to inhibit the level of pro-inflammatory mediator tumor necrosis factor-alpha (TNF-α). These results configure AGT as an interesting chemical scaffold for the development of novel semisynthetic antivirals against SARS-CoV-2. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
6	A Data Science Approach for the Identification of Molecular Signatures of Aggressive Cancers. Cancers (Basel) 2022;14:2325. [PMID: 35565454 PMCID: PMC9103663 DOI: 10.3390/cancers14092325] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/04/2022] [Accepted: 03/12/2022] [Indexed: 02/05/2023] Open Abstract The main hallmarks of cancer include sustaining proliferative signaling and resisting cell death. We analyzed the genes of the WNT pathway and seven cross-linked pathways that may explain the differences in aggressiveness among cancer types. We divided six cancer types (liver, lung, stomach, kidney, prostate, and thyroid) into classes of high (H) and low (L) aggressiveness considering the TCGA data, and their correlations between Shannon entropy and 5-year overall survival (OS). Then, we used principal component analysis (PCA), a random forest classifier (RFC), and protein-protein interactions (PPI) to find the genes that correlated with aggressiveness. Using PCA, we found GRB2, CTNNB1, SKP1, CSNK2A1, PRKDC, HDAC1, YWHAZ, YWHAB, and PSMD2. Except for PSMD2, the RFC analysis showed a different list, which was CAD, PSMD14, APH1A, PSMD2, SHC1, TMEFF2, PSMD11, H2AFZ, PSMB5, and NOTCH1. Both methods use different algorithmic approaches and have different purposes, which explains the discrepancy between the two gene lists. The key genes of aggressiveness found by PCA were those that maximized the separation of H and L classes according to its third component, which represented 19% of the total variance. By contrast, RFC classified whether the RNA-seq of a tumor sample was of the H or L type. Interestingly, PPIs showed that the genes of PCA and RFC lists were connected neighbors in the PPI signaling network of WNT and cross-linked pathways. Collapse Key Words PCA RFC RNA-seq WNT pathways aggressiveness cancer interactome machine learning prognostic genes Collapse MESH Headings Collapse Grants E-26/010.002175/2019 Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro E- 704 26/290.077/2017 - 227190 Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro 2019-20 Brazil Accelerator Fund Queen's Medical Centre Collapse
7	Data-Driven Modeling of Breast Cancer Tumors Using Boolean Networks. Front Big Data 2021;4:656395. [PMID: 34746770 PMCID: PMC8564392 DOI: 10.3389/fdata.2021.656395] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 09/22/2021] [Indexed: 12/05/2022] Open Abstract Cancer is a genomic disease involving various intertwined pathways with complex cross-communication links. Conceptually, this complex interconnected system forms a network, which allows one to model the dynamic behavior of the elements that characterize it to describe the entire system’s development in its various evolutionary stages of carcinogenesis. Knowing the activation or inhibition status of the genes that make up the network during its temporal evolution is necessary for the rational intervention on the critical factors for controlling the system’s dynamic evolution. In this report, we proposed a methodology for building data-driven boolean networks that model breast cancer tumors. We defined the network components and topology based on gene expression data from RNA-seq of breast cancer cell lines. We used a Boolean logic formalism to describe the network dynamics. The combination of single-cell RNA-seq and interactome data enabled us to study the dynamics of malignant subnetworks of up-regulated genes. First, we used the same Boolean function construction scheme for each network node, based on canalyzing functions. Using single-cell breast cancer datasets from The Cancer Genome Atlas, we applied a binarization algorithm. The binarized version of scRNA-seq data allowed identifying attractors specific to patients and critical genes related to each breast cancer subtype. The model proposed in this report may serve as a basis for a methodology to detect critical genes involved in malignant attractor stability, whose inhibition could have potential applications in cancer theranostics. Collapse Key Words Boolean networks breast cancer modeling cancer theranostics gene regulatory network analysis systems biology of cancer Collapse MESH Headings Collapse Grants Collapse
8	SARS-CoV-2 Proteins Bind to Hemoglobin and Its Metabolites. Int J Mol Sci 2021;22:9035. [PMID: 34445741 PMCID: PMC8396565 DOI: 10.3390/ijms22169035] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 07/28/2021] [Accepted: 08/10/2021] [Indexed: 01/19/2023] Open Abstract (1) Background: coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has been linked to hematological dysfunctions, but there are little experimental data that explain this. Spike (S) and Nucleoprotein (N) proteins have been putatively associated with these dysfunctions. In this work, we analyzed the recruitment of hemoglobin (Hb) and other metabolites (hemin and protoporphyrin IX-PpIX) by SARS-Cov2 proteins using different approaches. (2) Methods: shotgun proteomics (LC-MS/MS) after affinity column adsorption identified hemin-binding SARS-CoV-2 proteins. The parallel synthesis of the peptides technique was used to study the interaction of the receptor bind domain (RBD) and N-terminal domain (NTD) of the S protein with Hb and in silico analysis to identify the binding motifs of the N protein. The plaque assay was used to investigate the inhibitory effect of Hb and the metabolites hemin and PpIX on virus adsorption and replication in Vero cells. (3) Results: the proteomic analysis by LC-MS/MS identified the S, N, M, Nsp3, and Nsp7 as putative hemin-binding proteins. Six short sequences in the RBD and 11 in the NTD of the spike were identified by microarray of peptides to interact with Hb and tree motifs in the N protein by in silico analysis to bind with heme. An inhibitory effect in vitro of Hb, hemin, and PpIX at different levels was observed. Strikingly, free Hb at 1mM suppressed viral replication (99%), and its interaction with SARS-CoV-2 was localized into the RBD region of the spike protein. (4) Conclusions: in this study, we identified that (at least) five proteins (S, N, M, Nsp3, and Nsp7) of SARS-CoV-2 recruit Hb/metabolites. The motifs of the RDB of SARS-CoV-2 spike, which binds Hb, and the sites of the heme bind-N protein were disclosed. In addition, these compounds and PpIX block the virus's adsorption and replication. Furthermore, we also identified heme-binding motifs and interaction with hemin in N protein and other structural (S and M) and non-structural (Nsp3 and Nsp7) proteins. Collapse Key Words COVID-19 M N Nsp3 Nsp7 RBD S SARS-CoV-2 hemin hemoglobin protein–protein binding Collapse MESH Headings COVID-19/blood COVID-19/etiology Hemin/metabolism Hemoglobins/metabolism Hemoglobins/ultrastructure Humans Molecular Docking Simulation Protein Binding Protein Domains Proteomics Protoporphyrins/metabolism SARS-CoV-2/metabolism SARS-CoV-2/pathogenicity Viral Nonstructural Proteins/metabolism Viral Nonstructural Proteins/ultrastructure Viral Structural Proteins/metabolism Viral Structural Proteins/ultrastructure Virus Attachment Virus Replication Collapse Grants VPPCB-007FIO-18-2-21, VPPCB-005-FIO-20, and VPPIS-005FIO-20-2-51; B3-Bovespa FIOCRUZ/INOVA #467.488.2014-2, #301744/2019-0, Conselho Nacional de Desenvolvimento Científico e Tecnológico #110.198-13, #210.003/2018 Carlos Chagas Filho Foundation for Research Support of the State of Rio de Janeiro/FAPERJ Collapse
9	Galaxy and MEAN Stack to Create a User-Friendly Workflow for the Rational Optimization of Cancer Chemotherapy. Front Genet 2021;12:624259. [PMID: 33679888 PMCID: PMC7935533 DOI: 10.3389/fgene.2021.624259] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 01/22/2021] [Indexed: 12/24/2022] Open Abstract One aspect of personalized medicine is aiming at identifying specific targets for therapy considering the gene expression profile of each patient individually. The real-world implementation of this approach is better achieved by user-friendly bioinformatics systems for healthcare professionals. In this report, we present an online platform that endows users with an interface designed using MEAN stack supported by a Galaxy pipeline. This pipeline targets connection hubs in the subnetworks formed by the interactions between the proteins of genes that are up-regulated in tumors. This strategy has been proved to be suitable for the inhibition of tumor growth and metastasis in vitro. Therefore, Perl and Python scripts were enclosed in Galaxy for translating RNA-seq data into protein targets suitable for the chemotherapy of solid tumors. Consequently, we validated the process of target diagnosis by (i) reference to subnetwork entropy, (ii) the critical value of density probability of differential gene expression, and (iii) the inhibition of the most relevant targets according to TCGA and GDC data. Finally, the most relevant targets identified by the pipeline are stored in MongoDB and can be accessed through the aforementioned internet portal designed to be compatible with mobile or small devices through Angular libraries. Collapse Key Words Galaxy MEAN stack Shannon entropy angular personalized medicine protein–protein network systems biology translational oncology Collapse MESH Headings Collapse Grants Collapse
10	Proteome of the Triatomine Digestive Tract: From Catalytic to Immune Pathways; Focusing on Annexin Expression. Front Mol Biosci 2020;7:589435. [PMID: 33363206 PMCID: PMC7755933 DOI: 10.3389/fmolb.2020.589435] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 10/21/2020] [Indexed: 12/15/2022] Open Abstract Rhodnius prolixus, Panstrongylus megistus, Triatoma infestans, and Dipetalogaster maxima are all triatomines and potential vectors of the protozoan Trypanosoma cruzi responsible for human Chagas' disease. Considering that the T. cruzi's cycle occurs inside the triatomine digestive tract (TDT), the analysis of the TDT protein profile is an essential step to understand TDT physiology during T. cruzi infection. To characterize the protein profile of TDT of D. maxima, P. megistus, R. prolixus, and T. infestans, a shotgun liquid chromatography-tandem mass spectrometry (LC-MS/MS) approach was applied in this report. Most proteins were found to be closely related to metabolic pathways such as gluconeogenesis/glycolysis, citrate cycle, fatty acid metabolism, oxidative phosphorylation, but also to the immune system. We annotated this new proteome contribution gathering it with those previously published in accordance with Gene Ontology and KEGG. Enzymes were classified in terms of class, acceptor, and function, while the proteins from the immune system were annotated by reference to the pathways of humoral response, cell cycle regulation, Toll, IMD, JNK, Jak-STAT, and MAPK, as available from the Insect Innate Immunity Database (IIID). These pathways were further subclassified in recognition, signaling, response, coagulation, melanization and none. Finally, phylogenetic affinities and gene expression of annexins were investigated for understanding their role in the protection and homeostasis of intestinal epithelial cells against the inflammation. Collapse Key Words annexin chagas disease digestive tract enzymes immunity mass spectrometry triatomine Collapse MESH Headings Collapse Grants Collapse
11	Modeling Basins of Attraction for Breast Cancer Using Hopfield Networks. Front Genet 2020;11:314. [PMID: 32318098 PMCID: PMC7154169 DOI: 10.3389/fgene.2020.00314] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 03/16/2020] [Indexed: 12/26/2022] Open Abstract Cancer is a genetic disease for which traditional treatments cause harmful side effects. After two decades of genomics technological breakthroughs, personalized medicine is being used to improve treatment outcomes and mitigate side effects. In mathematical modeling, it has been proposed that cancer matches an attractor in Waddington's epigenetic landscape. The use of Hopfield networks is an attractive modeling approach because it requires neither previous biological knowledge about protein-protein interactions nor kinetic parameters. In this report, Hopfield network modeling was used to analyze bulk RNA-Seq data of paired breast tumor and control samples from 70 patients. We characterized the control and tumor attractors with respect to their size and potential energy and correlated the Euclidean distances between the tumor samples and the control attractor with their corresponding clinical data. In addition, we developed a protocol that outlines the key genes involved in tumor state stability. We found that the tumor basin of attraction is larger than that of the control and that tumor samples are associated with a more substantial negative energy than control samples, which is in agreement with previous reports. Moreover, we found a negative correlation between the Euclidean distances from tumor samples to the control attractor and patient overall survival. The ascending order of each node's density in the weight matrix and the descending order of the number of patients that have the target active only in the tumor sample were the parameters that withdrew more tumor samples from the tumor basin of attraction with fewer gene inhibitions. The combinations of therapeutic targets were specific to each patient. We performed an initial validation through simulation of trastuzumab treatment effects in HER2+ breast cancer samples. For that, we built an energy landscape composed of single-cell and bulk RNA-Seq data from trastuzumab-treated and non-treated HER2+ samples. The trajectory from the non-treated bulk sample toward the treated bulk sample was inferred through the perturbation of differentially expressed genes between these samples. Among them, we characterized key genes involved in the trastuzumab response according to the literature. Collapse Key Words Hopfield network basin region of attraction of a minimizer breast cancer dynamic system systems biology Collapse MESH Headings Collapse Grants Collapse
12	Signaling Complexity Measured by Shannon Entropy and Its Application in Personalized Medicine. Front Genet 2019;10:930. [PMID: 31695721 PMCID: PMC6816034 DOI: 10.3389/fgene.2019.00930] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 09/05/2019] [Indexed: 12/28/2022] Open Abstract Traditional approaches to cancer therapy seek common molecular targets in tumors from different patients. However, molecular profiles differ between patients, and most tumors exhibit inherent heterogeneity. Hence, imprecise targeting commonly results in side effects, reduced efficacy, and drug resistance. By contrast, personalized medicine aims to establish a molecular diagnosis specific to each patient, which is currently feasible due to the progress achieved with high-throughput technologies. In this report, we explored data from human RNA-seq and protein–protein interaction (PPI) networks using bioinformatics to investigate the relationship between tumor entropy and aggressiveness. To compare PPI subnetworks of different sizes, we calculated the Shannon entropy associated with vertex connections of differentially expressed genes comparing tumor samples with their paired control tissues. We found that the inhibition of up-regulated connectivity hubs led to a higher reduction of subnetwork entropy compared to that obtained with the inhibition of targets selected at random. Furthermore, these hubs were described to be participating in tumor processes. We also found a significant negative correlation between subnetwork entropies of tumors and the respective 5-year survival rates of the corresponding cancer types. This correlation was also observed considering patients with lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD) based on the clinical data from The Cancer Genome Atlas database (TCGA). Thus, network entropy increases in parallel with tumor aggressiveness but does not correlate with PPI subnetwork size. This correlation is consistent with previous reports and allowed us to assess the number of hubs to be inhibited for therapy to be effective, in the context of precision medicine, by reference to the 100% patient survival rate 5 years after diagnosis. Large standard deviations of subnetwork entropies and variations in target numbers per patient among tumor types characterize tumor heterogeneity. Collapse Key Words RNA-seq chemotherapy interactome molecular target precision medicine Collapse MESH Headings Collapse Grants Collapse
13	In vitro Trypanocidal Activity, Genomic Analysis of Isolates, and in vivo Transcription of Type VI Secretion System of Serratia marcescens Belonging to the Microbiota of Rhodnius prolixus Digestive Tract. Front Microbiol 2019;9:3205. [PMID: 30733713 PMCID: PMC6353840 DOI: 10.3389/fmicb.2018.03205] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Accepted: 12/11/2018] [Indexed: 11/13/2022] Open Abstract Serratia marcescens is a bacterium with the ability to colonize several niches, including some eukaryotic hosts. S. marcescens have been recently found in the gut of hematophagous insects that act as parasite vectors, such as Anopheles, Rhodnius, and Triatoma. While some S. marcescens strains have been reported as symbiotic or pathogenic to other insects, the role of S. marcescens populations from the gut microbiota of Rhodnius prolixus, a vector of Chagas’ disease, remains unknown. Bacterial colonies from R. prolixus gut were isolated on BHI agar. After BOX-PCR fingerprinting, the genomic sequences of two isolates RPA1 and RPH1 were compared to others S. marcescens from the NCBI database in other to estimate their evolutionary divergence. The in vitro trypanolytic activity of these two bacterial isolates against Trypanosoma cruzi (DM28c clone and Y strain) was assessed by microscopy. In addition, the gene expression of type VI secretion system (T6SS) was detected in vivo by RT-PCR. Comparative genomics of RPA1 and RPH1 revealed, besides plasmid presence and genomic islands, genes related to motility, attachment, and quorum sensing in both genomes while genes for urea hydrolysis and type II secretion system (T2SS) were found only in the RPA1 genome. The in vitro trypanolytic activity of both S. marcescens strains was stronger in their stationary phases of growth than in their exponential ones, with 65–70 and 85–90% of epimastigotes (Dm28c clone and Y strain, respectively) being lysed after incubation with RPA1 or RPH1 in stationary phase. Although T6SS transcripts were detected in guts up to 40 days after feeding (DAF), R. prolixus morbidity or mortality did not appear to be affected. In this report, we made available two trypanolytic S. marcescens strains from R. prolixus gut to the scientific community together with their genomic sequences. Here, we describe their genomic features with the purpose of bringing new insights into the S. marcescens adaptations for colonization of the specific niche of triatomine guts. This study provides the basis for a better understanding of the role of S. marcescens in the microbiota of R. prolixus gut as a potential antagonist of T. cruzi in this complex system. Collapse Key Words Rhodnius prolixus Serratia marcescens Trypanosoma cruzi antagonistic genes trypanocidal activity Collapse MESH Headings Collapse Grants Collapse
14	Specific enzyme functionalities of Fusarium oxysporum compared to host plants. Gene 2018;676:219-226. [PMID: 29981422 DOI: 10.1016/j.gene.2018.07.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Revised: 05/14/2018] [Accepted: 07/01/2018] [Indexed: 11/29/2022] Abstract The genus Fusarium contains some of the most studied and important species of plant pathogens that economically affect world agriculture and horticulture. Fusarium spp. are ubiquitous fungi widely distributed in soil, plants as well as in different organic substrates and are also considered as opportunistic human pathogens. The identification of specific enzymes essential to the metabolism of these fungi is expected to provide molecular targets to control the diseases they induce to their hosts. Through applications of traditional techniques of sequence homology comparison by similarity search and Markov modeling, this report describes the characterization of enzymatic functionalities associated to protein targets that could be considered for the control of root rots induced by Fusarium oxysporum. From the analysis of 318 F. graminearum enzymes, we retrieved 30 enzymes that are specific of F. oxysporum compared to 15 species of host plants. By comparing these 30 specific enzymes of F. oxysporum with the genome of Arabidopsis thaliana, Brassica rapa, Glycine max, Jatropha curcas and Ricinus communis, we found 7 key specific enzymes whose inhibition is expected to affect significantly the development of the fungus and 5 specific enzymes that were considered here to be secondary because they are inserted in pathways with alternative routes. Collapse Key Words Arabidopsis Fusariosis Molecular targets Oilseed plants Specific enzymes Collapse MESH Headings Collapse Grants Collapse
15	Validation of a network-based strategy for the optimization of combinatorial target selection in breast cancer therapy: siRNA knockdown of network targets in MDA-MB-231 cells as an in vitro model for inhibition of tumor development. Oncotarget 2018;7:63189-63203. [PMID: 27527857 PMCID: PMC5325356 DOI: 10.18632/oncotarget.11055] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Accepted: 07/10/2016] [Indexed: 12/14/2022] Open Abstract Network-based strategies provided by systems biology are attractive tools for cancer therapy. Modulation of cancer networks by anticancer drugs may alter the response of malignant cells and/or drive network re-organization into the inhibition of cancer progression. Previously, using systems biology approach and cancer signaling networks, we identified top-5 highly expressed and connected proteins (HSP90AB1, CSNK2B, TK1, YWHAB and VIM) in the invasive MDA-MB-231 breast cancer cell line. Here, we have knocked down the expression of these proteins, individually or together using siRNAs. The transfected cell lines were assessed for in vitro cell growth, colony formation, migration and invasion relative to control transfected MDA-MB-231, the non-invasive MCF-7 breast carcinoma cell line and the non-tumoral mammary epithelial cell line MCF-10A. The knockdown of the top-5 upregulated connectivity hubs successfully inhibited the in vitro proliferation, colony formation, anchorage independence, migration and invasion in MDA-MB-231 cells; with minimal effects in the control transfected MDA-MB-231 cells or MCF-7 and MCF-10A cells. The in vitro validation of bioinformatics predictions regarding optimized multi-target selection for therapy suggests that protein expression levels together with protein-protein interaction network analysis may provide an optimized combinatorial target selection for a highly effective anti-metastatic precision therapy in triple-negative breast cancer. This approach increases the ability to identify not only druggable hubs as essential targets for cancer survival, but also interactions most susceptible to synergistic drug action. The data provided in this report constitute a preliminary step toward the personalized clinical application of our strategy to optimize the therapeutic use of anti-cancer drugs. Collapse Key Words cancer therapy network-based strategy siRNA therapy triple-negative breast cancer (TNBC) Collapse MESH Headings Collapse Grants Collapse
16	Abstract B43: Validation of a network-based strategy for the optimization of combinatorial target selection in breast cancer therapy. Clin Cancer Res 2018. [DOI: 10.1158/1557-3265.tcm17-b43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract Network-based strategies provided by systems biology are attractive tools for cancer therapy. Modulation of cancer networks by anticancer drugs may alter the response of malignant cells and/or drive network reorganization into the inhibition of cancer progression. Previously, using systems biology approach and cancer signaling networks, we identified the top 5 highly expressed and connected proteins (HSP90AB1, CSNK2B, TK1, YWHAB, and VIM) in the invasive MDA-MB-231 breast cancer cell line. Here, we have knocked down the expression of these proteins, individually or together using siRNAs. The transfected cell lines were assessed for in vitro cell growth, colony formation, migration, and invasion relative to control transfected MDA-MB-231, the noninvasive MCF-7 breast carcinoma cell line, and the nontumoral mammary epithelial cell line MCF-10A. The knockdown of the top-5 upregulated connectivity hubs successfully inhibited the in vitro proliferation, colony formation, anchorage independence, migration, and invasion in MDA-MB-231 cells, with minimal effects in the control transfected MDA-MB-231 cells or MCF-7 and MCF-10A cells. The in vitro validation of bioinformatics predictions regarding optimized multitarget selection for therapy suggests that protein expression levels together with protein-protein interaction network analysis may provide an optimized combinatorial target selection for a highly effective antimetastatic precision therapy in triple-negative breast cancer. This approach increases the ability to identify not only druggable hubs as essential targets for cancer survival, but also interactions most susceptible to synergistic drug action. The data provided in this report constitute a preliminary step toward the personalized clinical application of our strategy to optimize the therapeutic use of anticancer drugs. Citation Format: Tatiana Martins Tilli, Nicolas Carels, Jack Adam Tuszynski, Manijeh Pasdar. Validation of a network-based strategy for the optimization of combinatorial target selection in breast cancer therapy [abstract]. In: Proceedings of the AACR International Conference held in cooperation with the Latin American Cooperative Oncology Group (LACOG) on Translational Cancer Medicine; May 4-6, 2017; São Paulo, Brazil. Philadelphia (PA): AACR; Clin Cancer Res 2018;24(1_Suppl):Abstract nr B43. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
17	A Metagenomic Analysis of Bacterial Microbiota in the Digestive Tract of Triatomines. Bioinform Biol Insights 2017;11:1177932217733422. [PMID: 28989277 PMCID: PMC5624349 DOI: 10.1177/1177932217733422] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 04/10/2017] [Indexed: 12/04/2022] Open Abstract The digestive tract of triatomines (DTT) is an ecological niche favored by microbiota whose enzymatic profile is adapted to the specific substrate availability in this medium. This report describes the molecular enzymatic properties that promote bacterial prominence in the DTT. The microbiota composition was assessed previously based on 16S ribosomal DNA, and whole sequenced genomes of bacteria from the same genera were used to calculate the GC level of rare and prominent bacterial species in the DTT. The enzymatic reactions encoded by coding sequences of both rare and common bacterial species were then compared and revealed key functions explaining why some genera outcompete others in the DTT. Representativeness of DTT microbiota was investigated by shotgun sequencing of DNA extracted from bacteria grown in liquid Luria-Bertani broth (LB) medium. Results showed that GC-rich bacteria outcompete GC-poor bacteria and are the dominant components of the DTT microbiota. In addition, oxidoreductases are the main enzymatic components of these bacteria. In particular, nitrate reductases (anaerobic respiration), oxygenases (catabolism of complex substrates), acetate-CoA ligase (tricarboxylic acid cycle and energy metabolism), and kinase (signaling pathway) were the major enzymatic determinants present together with a large group of minor enzymes including hydrogenases involved in energy and amino acid metabolism. In conclusion, despite their slower growth in liquid LB medium, bacteria from GC-rich genera outcompete the GC-poor bacteria because their specific enzymatic abilities impart a selective advantage in the DTT. Collapse Key Words EC number GC content ecological niche gene number genome size midgut Collapse MESH Headings Collapse Grants Collapse
18	A Computational Methodology to Overcome the Challenges Associated With the Search for Specific Enzyme Targets to Develop Drugs Against Leishmania major. Bioinform Biol Insights 2017. [PMID: 28638238 PMCID: PMC5470852 DOI: 10.1177/1177932217712471] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open Abstract We present an approach for detecting enzymes that are specific of Leishmania major compared with Homo sapiens and provide targets that may assist research in drug development. This approach is based on traditional techniques of sequence homology comparison by similarity search and Markov modeling; it integrates the characterization of enzymatic functionality, secondary and tertiary protein structures, protein domain architecture, and metabolic environment. From 67 enzymes represented by 42 enzymatic activities classified by AnEnPi (Analogous Enzymes Pipeline) as specific for L major compared with H sapiens, only 40 (23 Enzyme Commission [EC] numbers) could actually be considered as strictly specific of L major and 27 enzymes (19 EC numbers) were disregarded for having ambiguous homologies or analogies with H sapiens. Among the 40 strictly specific enzymes, we identified sterol 24-C-methyltransferase, pyruvate phosphate dikinase, trypanothione synthetase, and RNA-editing ligase as 4 essential enzymes for L major that may serve as targets for drug development. Collapse Key Words AnEnPi Leishmaniasis genomics metabolism sequence homology specific enzymes Collapse MESH Headings Collapse Grants Collapse
19	A strategy to identify housekeeping genes suitable for analysis in breast cancer diseases. BMC Genomics 2016;17:639. [PMID: 27526934 PMCID: PMC4986254 DOI: 10.1186/s12864-016-2946-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 07/18/2016] [Indexed: 02/03/2023] Open Abstract BACKGROUND The selection of suitable internal control genes is crucial for proper interpretation of real-time PCR data. Here we outline a strategy to identify housekeeping genes that could serve as suitable internal control for comparative analyses of gene expression data in breast cancer cell lines and tissues obtained by high throughput sequencing and quantitative real-time PCR (qRT-PCR). METHODS The strategy proposed includes the large-scale screening of potential candidate reference genes from RNA-seq data as well as their validation by qRT-PCR, and careful examination of reference data from the International Cancer Genome Consortium, The Cancer Genome Atlas and Gene Expression Omnibus repositories. RESULTS The identified set of reference genes, also called novel housekeeping genes that includes CCSER2, SYMPK, ANKRD17 and PUM1, proved to be less variable and thus potentially more accurate for research and clinical analyses of breast cell lines and tissue samples compared to the traditional housekeeping genes used to this end. DISCUSSION These results highlight the importance of a massive evaluation of housekeeping genes for their relevance as internal control for optimized intra- and inter-assay comparison of gene expression. CONCLUSION We developed a strategy to identify and evaluate the significance of housekeeping genes as internal control for the intra- and inter-assay comparison of gene expression in breast cancer that could be applied to other tumor types and diseases. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
20	Toward precision medicine of breast cancer. Theor Biol Med Model 2016;13:7. [PMID: 26925829 PMCID: PMC4772532 DOI: 10.1186/s12976-016-0035-4] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2015] [Accepted: 02/15/2016] [Indexed: 12/17/2022] Open Abstract In this review, we report on breast cancer's molecular features and on how high throughput technologies are helping in understanding the dynamics of tumorigenesis and cancer progression with the aim of developing precision medicine methods. We first address the current state of the art in breast cancer therapies and challenges in order to progress towards its cure. Then, we show how the interaction of high-throughput technologies with in silico modeling has led to set up useful inferences for promising strategies of target-specific therapies with low secondary effect incidence for patients. Finally, we discuss the challenge of pharmacogenetics in the clinical practice of cancer therapy. All these issues are explored within the context of precision medicine. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
21	Characterization of the microbiota in the guts of Triatoma brasiliensis and Triatoma pseudomaculata infected by Trypanosoma cruzi in natural conditions using culture independent methods. Parasit Vectors 2015;8:245. [PMID: 25903360 PMCID: PMC4429471 DOI: 10.1186/s13071-015-0836-z] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 03/31/2015] [Indexed: 01/31/2023] Open Abstract Background Chagas disease is caused by Trypanosoma cruzi, which is transmitted by triatomine vectors. The northeastern region of Brazil is endemic for Chagas disease and has the largest diversity of triatomine species. T. cruzi development in its triatomine vector depends on diverse factors, including the composition of bacterial gut microbiota. Methods We characterized the triatomines captured in the municipality of Russas (Ceará) by sequencing the cytochrome c oxidase subunit I (COI) gene. The composition of the bacterial community in the gut of peridomestic Triatoma brasiliensis and Triatoma pseudomaculata was investigated using culture independent methods based on the amplification of the 16S rRNA gene by polymerase chain reaction (PCR), denaturing gradient gel electrophoresis (DGGE), DNA fragment cloning, Sanger sequencing and 454 pyrosequencing. Additionally, we identified TcI and TcII types of T. cruzi by sequencing amplicons from the gut metagenomic DNA with primers for the mini-exon gene. Results Triatomines collected in the peridomestic ecotopes were diagnosed as T. pseudomaculata and T. brasiliensis by comparing their COI sequence with GenBank. The rate of infection by T. cruzi in adult triatomines reached 80% for T. pseudomaculata and 90% for T. brasiliensis. According to the DNA sequences from the DGGE bands, the triatomine gut microbiota was primarily composed of Proteobacteria and Actinobacteria. However, Firmicutes and Bacteroidetes were also detected, although in much lower proportions. Serratia was the main genus, as it was encountered in all samples analyzed by DGGE and 454 pyrosequencing. Members of Corynebacterinae, a suborder of the Actinomycetales, formed the next most important group. The cloning and sequencing of full-length 16S rRNA genes confirmed the presence of Serratia marcescens, Dietzia sp., Gordonia terrae, Corynebacterium stationis and Corynebacterium glutamicum. Conclusions The study of the bacterial microbiota in the triatomine gut has gained increased attention because of the possible role it may play in the epidemiology of Chagas disease by competing with T. cruzi. Culture independent methods have shown that the bacterial composition of the microbiota in the guts of peridomestic triatomines is made up by only few bacterial species. Electronic supplementary material The online version of this article (doi:10.1186/s13071-015-0836-z) contains supplementary material, which is available to authorized users. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
22	An Interpretation of the Ancestral Codon from Miller's Amino Acids and Nucleotide Correlations in Modern Coding Sequences. Bioinform Biol Insights 2015;9:37-47. [PMID: 25922573 PMCID: PMC4401237 DOI: 10.4137/bbi.s24021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Revised: 03/08/2015] [Accepted: 03/13/2015] [Indexed: 12/31/2022] Open Abstract Purine bias, which is usually referred to as an “ancestral codon”, is known to result in short-range correlations between nucleotides in coding sequences, and it is common in all species. We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins. Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller’s spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth. These amino acids match the hydropathy and secondary structures of modern proteins. Collapse Key Words ancestral codon genomics protein features purine bias short-range correlations Collapse MESH Headings Collapse Grants Collapse
23	Editorial: Sustainable production of renewable energy from non-food crops. Biotechnol J 2015;10:503-4. [PMID: 25847435 DOI: 10.1002/biot.201500100] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Abstract Since the world faced the petroleum crisis in the 1970s and people started to realize the limitation of fossil energy resources coupled with concerns over the effects of increasing carbon dioxide in the atmosphere, major efforts were devoted to the search for alternative energy sources. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
24	Perennial plants for biofuel production: bridging genomics and field research. Biotechnol J 2014;10:505-7. [PMID: 25382800 DOI: 10.1002/biot.201400201] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Abstract Development of dedicated perennial crops has been indicated as a strategic action to meet the growing demand for biofuels. Breeding of perennial crops,however, is often time- and resource-consuming. As genomics offers a platform from which to learn more about the relationships of genes and phenotypes,its operational use in the context of breeding programs through strategies such as genomic selection promises to foster the development of perennial crops dedicated to biodiesel production by increasing the efficiency of breeding programs and by shortening the length of the breeding cycles. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
25	The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins. Bioinform Biol Insights 2014;8:93-108. [PMID: 24899802 PMCID: PMC4039185 DOI: 10.4137/bbi.s13161] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2013] [Revised: 11/24/2013] [Accepted: 11/24/2013] [Indexed: 01/02/2023] Open Abstract For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional constraints on proteins. Collapse Key Words RNY ancestral codon energy cost genomics helix purine bias ribosome secondary structure sheet translation turn coil Collapse MESH Headings Collapse Grants Collapse
26	A Statistical Method without Training Step for the Classification of Coding Frame in Transcriptome Sequences. Bioinform Biol Insights 2013;7:35-54. [PMID: 23400232 PMCID: PMC3561939 DOI: 10.4137/bbi.s10053] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open Abstract In this study, we investigated the modalities of coding open reading frame (cORF) classification of expressed sequence tags (EST) by using the universal feature method (UFM). The UFM algorithm is based on the scoring of purine bias (Rrr) and stop codon frequencies. UFM classifies ORFs as coding or non-coding through a score based on 5 factors: (i) stop codon frequency; (ii) the product of the probabilities of purines occurring in the three positions of nucleotide triplets; (iii) the product of the probabilities of Cytosine (C), Guanine (G), and Adenine (A) occurring in the 1st, 2nd, and 3rd positions of triplets, respectively; (iv) the probabilities of a G occurring in the 1st and 2nd positions of triplets; and (v) the probabilities of a T occurring in the 1st and an A in the 2nd position of triplets. Because UFM is based on primary determinants of coding sequences that are conserved throughout the biosphere, it is suitable for cORF classification of any sequence in eukaryote transcriptomes without prior knowledge. Considering the protein sequences of the Protein Data Bank (RCSB PDB or more simply PDB) as a reference, we found that UFM classifies cORFs of ≥200 bp (if the coding strand is known) and cORFs of ≥300 bp (if the coding strand is unknown), and releases them in their coding strand and coding frame, which allows their automatic translation into protein sequences with a success rate equal to or higher than 95%. We first established the statistical parameters of UFM using ESTs from Plasmodium falciparum, Arabidopsis thaliana, Oryza sativa, Zea mays, Drosophila melanogaster, Homo sapiens and Chlamydomonas reinhardtii in reference to the protein sequences of PDB. Second, we showed that the success rate of cORF classification using UFM is expected to apply to approximately 95% of higher eukaryote genes that encode for proteins. Third, we used UFM in combination with CAP3 to assemble large EST samples into cORFs that we used to analyze transcriptome phenotypes in rice, maize, and humans. We discuss the error rate and the interference of noisy sequences such as pseudogenes, transposons, and retrotransposons. This method is suitable for rapid cORF extraction from transcriptome data and allows correct description of the genome phenotypes of plant genomes without prior knowledge. Additional care is necessary when addressing the human transcriptome due to the interference caused by large amounts of noisy sequences. UFM can be regarded as a low complexity tool for prior knowledge extraction concerning the coding fraction of the transcriptome of any eukaryote. Due to its low level of complexity, UFM is also very robust to variations of codon usage. Collapse Key Words CDS EST ORF RNY UFM classification genomics Collapse MESH Headings Collapse Grants Collapse
27	Cultivation-independent methods reveal differences among bacterial gut microbiota in triatomine vectors of Chagas disease. PLoS Negl Trop Dis 2012;6:e1631. [PMID: 22563511 PMCID: PMC3341335 DOI: 10.1371/journal.pntd.0001631] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 03/17/2012] [Indexed: 11/18/2022] Open Abstract Background Chagas disease is a trypanosomiasis whose agent is the protozoan parasite Trypanosoma cruzi, which is transmitted to humans by hematophagous bugs known as triatomines. Even though insecticide treatments allow effective control of these bugs in most Latin American countries where Chagas disease is endemic, the disease still affects a large proportion of the population of South America. The features of the disease in humans have been extensively studied, and the genome of the parasite has been sequenced, but no effective drug is yet available to treat Chagas disease. The digestive tract of the insect vectors in which T. cruzi develops has been much less well investigated than blood from its human hosts and constitutes a dynamic environment with very different conditions. Thus, we investigated the composition of the predominant bacterial species of the microbiota in insect vectors from Rhodnius, Triatoma, Panstrongylus and Dipetalogaster genera. Methodology/Principal Findings Microbiota of triatomine guts were investigated using cultivation-independent methods, i.e., phylogenetic analysis of 16s rDNA using denaturing gradient gel electrophoresis (DGGE) and cloned-based sequencing. The Chao index showed that the diversity of bacterial species in triatomine guts is low, comprising fewer than 20 predominant species, and that these species vary between insect species. The analyses showed that Serratia predominates in Rhodnius, Arsenophonus predominates in Triatoma and Panstrongylus, while Candidatus Rohrkolberia predominates in Dipetalogaster. Conclusions/Significance The microbiota of triatomine guts represents one of the factors that may interfere with T. cruzi transmission and virulence in humans. The knowledge of its composition according to insect species is important for designing measures of biological control for T. cruzi. We found that the predominant species of the bacterial microbiota in triatomines form a group of low complexity whose structure differs according to the vector genus. Chagas disease is one of the most important endemic diseases of South and Central America. Its causative agent is the protozoan Trypanosoma cruzi, which is transmitted to humans by blood-feeding insects known as triatomine bugs. These vectors mainly belong to Rhodnius, Triatoma and Panstrongylus genera of Reduviidae. The bacterial communities in the guts of these vectors may have important effects on the biology of T. cruzi. For this reason, we analyzed the bacterial diversity hosted in the gut of different species of triatomines using cultivation-independent methods. Among Rhodnius sp., we observed similar bacterial communities from specimens obtained from insectaries or sylvatic conditions. Endosymbionts of the Arsenophonus genus were preferentially associated with insects of the Panstrongylus and Triatoma genera, whereas the bacterial genus Serratia and Candidatus Rohrkolberia were typical of Rhodnius and Dipetalogaster, respectively. The diversity of the microbiota tended to be the largest in the Triatoma genus, with species of both Arsenophonus and Serratia being detected in T. infestans. Collapse Key Words Collapse MESH Headings Animals Bacteria/classification Bacteria/genetics Biodiversity Cluster Analysis DNA, Bacterial/chemistry DNA, Bacterial/genetics DNA, Ribosomal/chemistry DNA, Ribosomal/genetics Disease Vectors Female Gastrointestinal Tract/microbiology Humans Male Molecular Sequence Data Phylogeny RNA, Ribosomal, 16S/genetics Sequence Analysis, DNA South America Triatominae/microbiology Collapse Grants Collapse
28	ESTs from Seeds to Assist the Selective Breeding of Jatropha curcas L. for Oil and Active Compounds. GENOMICS INSIGHTS 2010. [PMID: 26217103 PMCID: PMC4510598 DOI: 10.4137/gei.s4340] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract We report here on the characterization of a cDNA library from seeds of Jatropha curcas L. at three stages of fruit maturation before yellowing. We sequenced a total of 2200 clones and obtained a set of 931 non-redundant sequences (unigenes) after trimming and quality control, ie, 140 contigs and 791 singlets with PHRED quality ≥10. We found low levels of sequence redundancy and extensive metabolic coverage by homology comparison to GO. After comparison of 5841 non-redundant ESTs from a total of 13193 reads from GenBank with KEGG, we identified tags with nucleotide variations among J. curcas accessions for genes of fatty acid, terpene, alkaloid, quinone and hormone pathways of biosynthesis. More specifically, the expression level of four genes (palmitoyl-acyl carrier protein thioesterase, 3-ketoacyl-CoA thiolase B, lysophosphatidic acid acyltransferase and geranyl pyrophosphate synthase) measured by real-time PCR proved to be significantly different between leaves and fruits. Since the nucleotide polymorphism of these tags is associated to higher level of gene expression in fruits compared to leaves, we propose this approach to speed up the search for quantitative traits in selective breeding of J. curcas. We also discuss its potential utility for the selective breeding of economically important traits in J. curcas. Collapse Key Words Jatropha curcas alkaloids biofuel fatty acids genomics terpenes Collapse MESH Headings Collapse Grants Collapse
29	Classifying coding DNA with nucleotide statistics. Bioinform Biol Insights 2009;3:141-54. [PMID: 20140062 PMCID: PMC2808172 DOI: 10.4137/bbi.s3030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open Abstract In this report, we compared the success rate of classification of coding sequences (CDS) vs. introns by Codon Structure Factor (CSF) and by a method that we called Universal Feature Method (UFM). UFM is based on the scoring of purine bias (Rrr) and stop codon frequency. We show that the success rate of CDS/intron classification by UFM is higher than by CSF. UFM classifies ORFs as coding or non-coding through a score based on (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine (C), Guanine (G), and Adenine (A) probabilities in the 1st, 2nd, and 3rd positions of triplets, respectively, (iv) the probabilities of G in 1st and 2nd position of triplets and (v) the distance of their GC3 vs. GC2 levels to the regression line of the universal correlation. More than 80% of CDSs (true positives) of Homo sapiens (>250 bp), Drosophila melanogaster (>250 bp) and Arabidopsis thaliana (>200 bp) are successfully classified with a false positive rate lower or equal to 5%. The method releases coding sequences in their coding strand and coding frame, which allows their automatic translation into protein sequences with 95% confidence. The method is a natural consequence of the compositional bias of nucleotides in coding sequences. Collapse Key Words ancestral codon coding features genomics open reading frame purines bias universal correlation Collapse MESH Headings Collapse Grants Collapse
30	Universal Features for the Classification of Coding and Non-coding DNA Sequences. Bioinform Biol Insights 2009;3:37-49. [PMID: 20140069 PMCID: PMC2808180 DOI: 10.4137/bbi.s2236] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open Abstract In this report, we revisited simple features that allow the classification of coding sequences (CDS) from non-coding DNA. The spectrum of codon usage of our sequence sample is large and suggests that these features are universal. The features that we investigated combine (i) the stop codon distribution, (ii) the product of purine probabilities in the three positions of nucleotide triplets, (iii) the product of Cytosine, Guanine, Adenine probabilities in 1st, 2nd, 3rd position of triplets, respectively, (iv) the product of G and C probabilities in 1st and 2nd position of triplets. These features are a natural consequence of the physico-chemical properties of proteins and their combination is successful in classifying CDS and non-coding DNA (introns) with a success rate >95% above 350 bp. The coding strand and coding frame are implicitly deduced when the sequences are classified as coding. Collapse Key Words ancestral codon coding features exon prediction genomics open reading frame purine bias Collapse MESH Headings Collapse Grants Collapse
31	Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches’ broom disease in cacao. GENETICS AND MOLECULAR RESEARCH 2009;8:799-808. [DOI: 10.4238/vol8-3gmr603] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
32	Comparative analysis of expressed genes from cacao meristems infected by Moniliophthora perniciosa. ANNALS OF BOTANY 2007;100:129-40. [PMID: 17557832 PMCID: PMC2735303 DOI: 10.1093/aob/mcm092] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023] Abstract BACKGROUND AND AIMS Witches' broom disease is caused by the hemibiotrophic basidiomycete Moniliophthora perniciosa, and is one of the most important diseases of cacao in the western hemisphere. Because very little is known about the global process of such disease development, expressed sequence tags (ESTs) were used to identify genes expressed during the Theobroma cacao-Moniliophthora perniciosa interaction. METHODS Two cDNA libraries corresponding to the resistant (RT) and susceptible (SP) cacao-M. perniciosa interactions were constructed from total RNA, using the DB SMART Creator cDNA library kit (Clontech). Clones were randomly selected, sequenced from the 5' end and analysed using bioinformatics tools including in silico analysis of the differential gene expression. KEY RESULTS A total of 6884 ESTs were generated from the RT and SP cDNA libraries. These ESTs were composed of 2585 singlets and 341 contigs for a total of 2926 non-redundant sequences. The redundancy of the libraries was low and their specificity high when compared with the few other cacao libraries already published. Sequence analysis allowed the assignment of a putative functional category for 54 % of sequences, whereas approx. 22 % of sequences corresponded to unknown function and approx. 24 % of sequences did not show any significant similarity with other proteins present in the database. Despite the similar overall distribution of the sequences in functional categories between the two libraries, qualitative differences were observed. Genes involved during the defence response to pathogen infection or in programmed cell death were identified, such as pathogenesis related-proteins, trypsin inhibitor or oxalate oxidase, and some of them showed an in silico differential expression between the resistant and the susceptible interactions. CONCLUSIONS As far as is known this is the first EST resource from the cacao-M. perniciosa interaction and it is believed that it will provide a significant contribution to the understanding of the molecular mechanisms of the resistance and susceptibility of cacao to M. perniciosa, to develop strategies to control witches' broom, and as a source of polymorphism for molecular marker development and marker-assisted selection. Collapse Key Words theobroma cacao moniliophthora perniciosa ests resistance programmed cell death witches' broom disease Collapse MESH Headings Agaricales/physiology Cacao/genetics Cacao/metabolism Cacao/microbiology Computational Biology Expressed Sequence Tags Gene Expression Profiling Gene Library Immunity, Innate/genetics Meristem/genetics Meristem/metabolism Meristem/microbiology Plant Diseases/genetics Sequence Analysis, DNA Collapse Grants Collapse
33	The maize gene space is compositionally compartimentalized. FEBS Lett 2005;579:3867-71. [PMID: 15996663 DOI: 10.1016/j.febslet.2005.05.063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2005] [Accepted: 05/13/2005] [Indexed: 11/18/2022] Abstract Previous investigations by Southern hybridization of cDNA with compositional DNA fractions showed that the majority of maize genes are located in a narrow GC range of DNA fragments and that the corresponding gene space was GC-richer than the region of the genome where zein genes are found. Here, we revisited the maize gene space using new data from the maize genome sequencing initiative. We found that the maize gene space itself is formed of two compositional compartments, i.e., a GC-poor and a GC-rich, characterized by a different distribution of Opie and Huck retrotransposons. The GC-rich compartment tends to be richer in GC-rich genes than the GC-poor compartment. However, the gene space compartimentalization of maize is much simpler than that of human. Collapse Key Words Collapse MESH Headings Blotting, Southern Computational Biology/methods DNA/genetics DNA Transposable Elements/genetics DNA, Complementary/metabolism Databases, Genetic Genome, Plant Models, Genetic Software Zea mays/genetics Zein/genetics Collapse Grants Collapse
34	The pig genome: compositional analysis and identification of the gene-richest regions in chromosomes and nuclei. Gene 2004;343:245-51. [PMID: 15588579 DOI: 10.1016/j.gene.2004.09.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2004] [Revised: 09/07/2004] [Accepted: 09/17/2004] [Indexed: 10/26/2022] Abstract The isochore organization of the mammalian genome comprises a general pattern and some special patterns, the former being characterized by a wider compositional distribution of the DNA fragments. The large majority of the mammalian genomes belong to the former, and only some groups, such as the Myomorpha sub-order of Rodentia, belong to the latter. Here we describe the compositional organization of the pig (Sus scrofa) genome that belongs to the general mammalian pattern. We investigated (i) the compositional distribution of the genes by analysis of their GC3 levels (the GC levels at the third codon positions), and (ii) the correlation between the GC3 value of orthologous genes from pig and other vertebrates (human, calf, mouse, chicken, and Xenopus). As expected, the highest gene concentration corresponded to the H3 isochore family, and the highest GC3 correlations were observed in the pig/human and pig/calf comparisons. Then we identified, by in situ hybridization of the GC-richest H3 isochores, the pig chromosomal regions endowed by the highest gene-density that largely corresponded to the telomeric chromosomal bands. Moreover, we observed that these gene-rich bands are syntenic with the previously identified GC-richest/gene richest H3+ bands of the human chromosomes. At the cell nucleus level, we observed that the gene-dense region corresponded to the more internal compartment, as previously found in human and avian cell nuclei. Collapse Key Words Collapse MESH Headings Animals Base Composition Cattle Cell Nucleus/genetics Chickens Chromosome Banding Chromosomes, Mammalian GC Rich Sequence Genome Humans In Situ Hybridization Isochores/genetics Karyotyping Mice Swine/genetics Synteny Xenopus Collapse Grants Collapse
35	The mutual information theory for the certification of rice coding sequences. FEBS Lett 2004;568:155-8. [PMID: 15196938 DOI: 10.1016/j.febslet.2004.05.026] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2004] [Accepted: 05/13/2004] [Indexed: 11/16/2022] Abstract We report here the use of the mutual information theory for the certification of annotated rice coding sequences of both GenBank and TIGR databases. Considering coding sequences larger than 600 bp, we successfully screened out genes with aberrant compositional features. We found that they represent about 10% of both datasets after cleaning for gene redundancy. Most of the rejected accessions showed a different trend in GC3% vs GC2% plot compared to the set of accessions that have been published in international journals. This suggests the existence of a bias in the pattern recognition algorithms used by gene prediction programs. Collapse Key Words Collapse MESH Headings Information Theory Oryza/genetics Collapse Grants Collapse
36	Using analytical ultracentrifugation to study compositional variation in vertebrate genomes. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2003;32:418-26. [PMID: 12684711 DOI: 10.1007/s00249-003-0294-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2002] [Revised: 12/19/2002] [Accepted: 01/28/2003] [Indexed: 10/26/2022] Abstract Although much attention has recently been directed to analytical ultracentrifugation (AUC), the revival of interest has hardly addressed the applications of this technology in genome analysis, and the extent to which AUC studies can quickly and effectively complement modern sequence-based analyses of genomes, e.g. by anticipating, extending or checking results that can be obtained by cloning and sequencing. In particular, AUC yields a quick overview of the base compositional structure of a species' genome even if no DNA sequences are available and the species is unlikely to be sequenced in the near future. The link between AUC and DNA sequences dates back to 1959, when a precise linear relation was discovered between the GC (guanine+cytosine) level of DNA fragments and their buoyant density in CsCl as measured at sedimentation equilibrium. A 24-hour AUC run of a high molecular weight sample of a species' total DNA already yields the GC distribution of its genome. AUC methods based on this principle remain sensitive tools in the age of genomics, and can now be fine-tuned by comparing CsCl absorbance profiles with the corresponding sequence histograms. The CsCl profiles of vertebrates allow insight into structural and functional properties that correlate with base composition, and their changes during vertebrate evolution can be monitored by comparing CsCl profiles of different taxa. Such comparisons also allow consistency checks of phylogenetic hypotheses at different taxonomic levels. We here discuss some of the information that can be deduced from CsCl profiles, with emphasis on mammalian DNAs. Collapse Key Words Collapse MESH Headings Animals Centrifugation, Density Gradient/methods Cystine/analysis Cystine/chemistry DNA/analysis DNA/chemistry DNA Mutational Analysis/methods Gene Expression Profiling/methods Genetic Variation Genome Guanidine/analysis Guanidine/chemistry Humans Mice Models, Chemical Molecular Weight Phylogeny Reproducibility of Results Sensitivity and Specificity Vertebrates Collapse Grants Collapse
37	Compositional mapping of chicken chromosomes and identification of the gene-richest regions. Chromosome Res 2002;9:521-32. [PMID: 11721951 DOI: 10.1023/a:1012436900788] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Abstract 'Compositional chromosomal mapping', namely the assessment of the GC level of chromosomal bands, led to the identification, in the human chromosomes, of the GC-richest H3+ bands and of the GC-poorest L1+ bands, which were so called on the basis of the isochore family predominantly present in the bands. The isochore organization of the avian genome is very similar to those of most mammals, the only difference being the presence of an additional, GC-richest, H4 isochore family. In contrast, the avian karyotypes are very different from those of mammals, being characterized, in most species, by few macrochromosomes and by a large number of microchromosomes. The 'compositional mapping' of chicken mitotic and meiotic chromosomes by in-situ hybridization of isochore families showed that the chicken GC-richest isochores are localized not only on a large number of microchromosomes but also on almost all telomeric bands of macrochromosomes. On the other hand, the GC-poorest isochores are generally localized on the internal regions of macrochromosomes and are almost absent in microchromosomes. Thus, the distinct localization of the GC-richest and the GC-poorest bands observed on human chromosomes appears to be a general feature of chromosomes from warm-blooded vertebrates. Collapse Key Words Collapse MESH Headings Animals Chickens/genetics Chromosome Banding Chromosome Mapping/veterinary Collapse Grants Collapse
38	Genome properties of the diatom Phaeodactylum tricornutum. PLANT PHYSIOLOGY 2002;129:993-1002. [PMID: 12114555 PMCID: PMC166495 DOI: 10.1104/pp.010713] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2001] [Revised: 01/28/2002] [Accepted: 03/28/2002] [Indexed: 05/20/2023] Abstract Diatoms are a ubiquitous class of microalgae of extreme importance for global primary productivity and for the biogeochemical cycling of minerals such as silica. However, very little is known about diatom cell biology or about their genome structure. For diatom researchers to take advantage of genomics and post-genomics technologies, it is necessary to establish a model diatom species. Phaeodactylum tricornutum is an obvious candidate because of its ease of culture and because it can be genetically transformed. Therefore, we have examined its genome composition by the generation of approximately 1,000 expressed sequence tags. Although more than 60% of the sequences could not be unequivocally identified by similarity to sequences in the databases, approximately 20% had high similarity with a range of genes defined functionally at the protein level. It is interesting that many of these sequences are more similar to animal rather than plant counterparts. Base composition at each codon position and GC content of the genome were compared with Arabidopsis, maize (Zea mays), and Chlamydomonas reinhardtii. It was found that distribution of GC within the coding sequences is as homogeneous in P. tricornutum as in Arabidopsis, but with a slightly higher GC content. Furthermore, we present evidence that the P. tricornutum genome is likely to be small (less than 20 Mb). Therefore, this combined information supports the development of this species as a model system for molecular-based studies of diatom biology. The nucleotide sequence data reported has been deposited in GenBank Nucleotide Sequence Database (dbEST section) under accession nos. BI306757 through BI307753. Collapse Key Words Collapse MESH Headings Animals Arabidopsis/genetics Base Composition Centrifugation, Density Gradient/methods Chlamydomonas reinhardtii/genetics DNA/analysis DNA, Complementary/chemistry DNA, Complementary/genetics Diatoms/genetics Evolution, Molecular Expressed Sequence Tags Gene Library Genome Molecular Sequence Data Phytoplankton/genetics Sequence Analysis, DNA Zea mays/genetics Collapse Grants Collapse
39	Compositional heterogeneity within and among isochores in mammalian genomes. I. CsCl and sequence analyses. Gene 2001;276:15-24. [PMID: 11591467 DOI: 10.1016/s0378-1119(01)00667-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Abstract GC level distributions of a species' nuclear genome, or of its compositional fractions, encode key information on structural and functional properties of the genome and on its evolution. They can be calculated either from absorbance profiles of the DNA in CsCl density gradients at sedimentation equilibrium, or by scanning long contigs of largely sequenced genomes. In the present study, we address the quantitative characterization of the compositional heterogeneity of genomes, as measured by the GC distributions of fixed-length fragments. Special attention is given to mammalian genomes, since their compartmentalization into isochores implies two levels of heterogeneity, intra-isochore (local) and inter-isochore (global). This partitioning is a natural one, since large-scale compositional properties vary much more among isochores than within them. Intra-isochore GC distributions become roughly Gaussian for long fragments, and their standard deviations decrease only slowly with increasing fragment length, unlike random sequences. This effect can be explained by 'long-range' correlations, often overlooked, that are present along isochores. Collapse Key Words Collapse MESH Headings Animals Base Composition Centrifugation, Density Gradient Cesium Chlorides DNA/chemistry DNA/genetics GC Rich Sequence/genetics Genome Humans Collapse Grants Collapse
40	Translational selection shapes codon usage in the GC-rich genome of Chlamydomonas reinhardtii. FEBS Lett 2001;501:127-30. [PMID: 11470270 DOI: 10.1016/s0014-5793(01)02644-8] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Abstract In unicellular species codon usage is determined by mutational biases and natural selection. Among prokaryotes, the influence of these factors is different if the genome is skewed towards AT or GC, since in AT-rich organisms translational selection is absent. On the other hand, in AT-rich unicellular eukaryotes the two factors are present. In order to understand if GC-rich genomes display a similar behavior, the case of Chlamydomonas reinhardtii was studied. Since we found that translational selection strongly influences codon usage in this species, we conclude that there is not a common pattern among unicellular organisms. Collapse Key Words Collapse MESH Headings AT Rich Sequence/genetics Animals Chlamydomonas reinhardtii/genetics Codon/genetics GC Rich Sequence/genetics Genes, Protozoan/genetics Mutation Translations Collapse Grants Collapse
41	Diversity and phylogenetic implications of CsCl profiles from rodent DNAs. Mol Phylogenet Evol 2000;17:219-30. [PMID: 11083936 DOI: 10.1006/mpev.2000.0838] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Abstract Buoyant density profiles of high-molecular-weight DNAs sedimented in CsCl gradients, i.e., compositional distributions of 50- to 100-kb genomic fragments, have revealed a clear difference between the murids so far studied and most other mammals, including other rodents. Sequence analyses have revealed other, related, compositional differences between murids and nonmurids. In the present study, we obtained CsCl profiles of 17 rodent species representing 13 families. The modal buoyant densities obtained for rodents span the full range of values observed in other eutherians. More remarkably, the skewness (asymmetry, mean - modal buoyant density) of the rodent profiles extends to values well below those of other eutherians. Scatterplots of these and related CsCl profile parameters show groups of rodent families that agree largely with established rodent taxonomy, in particular with the monophyly of the Geomyoidea superfamily and the position of the Dipodidae family within the Myomorpha. In contrast, while confirming and extending previously reported differences between the profiles of Myomorpha and those of other rodents, the CsCl data question a traditional hypothesis positing Gliridae within Myomorpha, as does the recently sequenced mitochondrial genome of dormouse. Analysis of CsCl profiles is presented here as a rapid, robust method for exploring rodent and other vertebrate systematics. Collapse Key Words Collapse MESH Headings Animals Base Composition Centrifugation, Density Gradient Cesium Chlorides DNA/chemistry DNA/genetics DNA, Satellite/chemistry DNA, Satellite/genetics Genetic Variation Phylogeny Rodentia/classification Rodentia/genetics Species Specificity Collapse Grants Collapse
42	The compositional organization and the expression of the Arabidopsis genome. FEBS Lett 2000;472:302-6. [PMID: 10788631 DOI: 10.1016/s0014-5793(00)01476-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Abstract The base composition patterns of genes, coding sequences and gene expression levels were analyzed in the available long sequences (contigs) of Arabidopsis. Chromosome 5 was analyzed in detail and all chromosomes for which sequence data are now available show essentially the same large-scale compositional properties. Guanine+cytosine levels of genes and of their coding regions, as well as gene densities and expression levels, all show a marked tendency to be higher in the distal regions of Arabidopsis chromosomes. Collapse Key Words Collapse MESH Headings Arabidopsis/genetics Base Composition Gene Expression Genome, Plant Collapse Grants Collapse
43	Two classes of genes in plants. Genetics 2000;154:1819-25. [PMID: 10747072 PMCID: PMC1461008 DOI: 10.1093/genetics/154.4.1819] [Citation(s) in RCA: 141] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract Two classes of genes were identified in three Gramineae (maize, rice, barley) and six dicots (Arabidopsis, soybean, pea, tobacco, tomato, potato). One class, the GC-rich class, contained genes with no, or few, short introns. In contrast, the GC-poor class contained genes with numerous, long introns. The similarity of the properties of each class, as present in the genomes of maize and Arabidopsis, is particularly remarkable in view of the fact that these plants exhibit large differences in genome size, average intron size, and DNA base composition. The functional relevance of the two classes of genes is stressed by (1) the conservation in homologous genes from maize and Arabidopsis not only of the number of introns and of their positions, but also of the relative size of concatenated introns; and (2) the existence of two similar classes of genes in vertebrates; interestingly, the differences in intron sizes and numbers in genes from the GC-poor and GC-rich classes are much more striking in plants than in vertebrates. Collapse Key Words Collapse MESH Headings GC Rich Sequence Genes, Plant Plants/genetics Species Specificity Collapse Grants Collapse
44	Synonymous and nonsynonymous substitutions in genes from Gramineae: intragenic correlations. J Mol Evol 1999;49:330-42. [PMID: 10473774 DOI: 10.1007/pl00006556] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract In this work, we have investigated the relationships between synonymous and nonsynonymous rates and base composition in coding sequences from Gramineae to analyze the factors underlying the variation in substitutional rates. We have shown that in these genes the rates of nucleotide divergence, both synonymous and nonsynonymous, are, to some extent, dependent on each other and on the base composition. In the first place, the variation in nonsynonymous rate is related to the GC level at the second codon position (the higher the GC(2) level, the higher the amino acid replacement rate). The correlation is especially strong with T(2), the coefficients being significant in the three data sets analyzed. This correlation between nonsynonymous rate and base composition at the second codon position is also detectable at the intragenic level, which implies that the factors that tend to increase the intergenic variance in nonsynonymous rates also affect the intragenic variance. On the other hand, we have shown that the synonymous rate is strongly correlated with the GC(3) level. This correlation is observed both across genes and at the intragenic level. Similarly, the nonsynonymous rate is also affected at the intragenic level by GC(3) level, like the silent rate. In fact, synonymous and nonsynonymous rates exhibit a parallel behavior in relation to GC(3) level, indicating that the intragenic patterns of both silent and amino acid divergence rates are influenced in a similar way by the intragenic variation of GC(3). This result, taken together with the fact that the number of genes displaying intragenic correlation coefficients between synonymous and nonsynonymous rates is not very high, but higher than random expectation (in the three data sets analyzed), strongly suggests that the processes of silent and amino acid replacement divergence are, at least in part, driven by common evolutionary forces in genes from Gramineae. Collapse Key Words Collapse MESH Headings Base Composition Edible Grain/genetics Enzymes/genetics Evolution, Molecular Genes, Plant Genetic Variation Hordeum/genetics Oryza/genetics Phylogeny Plant Proteins/genetics Triticum/genetics Zea mays/genetics Collapse Grants Collapse
45	Compositional properties of homologous coding sequences from plants. J Mol Evol 1998;46:45-53. [PMID: 9419224 DOI: 10.1007/pl00006282] [Citation(s) in RCA: 48] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Abstract In this work, we investigated (1) the compositional distributions of all available nuclear coding sequences (and of their three codon positions) of six dicots and four Gramineae; this considerably expanded our knowledge about the differences previously seen between these two groups of plants; (2) the compositional correlations of homologous genes from dicots and from Gramineae, as well as from both groups; all correlations were characterized by very good coefficients, with slopes close to unity in the former two cases and very high in the last; (3) the compositional transition that accompanied the emergence of Gramineae from an ancestral monocot; (4) the compositional correlations between exons and introns, which were very good in Gramineae, but only poor to good in dicots; and (5) the compositional profiles of homologous genes from angiosperms, which were characterized by a series of peaks (exons) and valleys (introns) separated by 15-20% GC. The conservative and transitional modes of compositional evolution in plant genes and their general implications are discussed. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
46	The distribution of genes in the genomes of Gramineae. Proc Natl Acad Sci U S A 1997;94:6857-61. [PMID: 9192656 PMCID: PMC21249 DOI: 10.1073/pnas.94.13.6857] [Citation(s) in RCA: 122] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open Abstract Recent investigations showed that most maize genes are present in compositional fractions of nuclear DNA that cover only a 1-2% GC (molar fraction of guanosine plus cytosine in DNA) range and represent only 10-20% of the genome. These fractions, which correspond to compositional genome compartments that are distributed on all chromosomes, were collectively called the "gene space." Outside the gene space, the maize genome appears to contain no genes, except for some zein genes and for ribosomal genes. Here, we investigated the distribution of genes in the genomes of two other Gramineae, rice and barley, and used a new set of probes to study further the gene distribution of maize. We found that the distribution of genes in these three genomes is basically similar in that all genes, except for ribosomal genes and some storage protein genes, were located in gene spaces that (i) cover GC ranges of 0.8%, 1.0%, and 1.6% and represent 12%, 17%, and 24% of the genomes of barley, maize, and rice, respectively; (ii) are due to a remarkably uniform base composition in the sequences surrounding the genes, which are now known to consist mainly of transposons; (iii) have sizes approximately proportional to genome sizes, suggesting that expansion-contraction phenomena proceed in parallel in the gene space and in the gene-empty regions of the genome; and (iv) only hybridize on the gene spaces (and not on the other DNA fractions) of other Gramineae. Collapse Key Words chromosomes isochores plants Collapse MESH Headings Genes, Plant Genome, Plant Hordeum/genetics Oryza/genetics Collapse Grants Collapse
47	The gene distribution of the maize genome. Proc Natl Acad Sci U S A 1995;92:11057-60. [PMID: 7479936 PMCID: PMC40570 DOI: 10.1073/pnas.92.24.11057] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open Abstract Previous investigations from our laboratory showed that the genomes of plants, like those of vertebrates, are mosaics of isochores, i.e., of very long DNA segments that are compositionally homogeneous and that can be subdivided into a small number of families characterized by different GC levels (GC is the mole fraction of guanine+cytosine). Compositional DNA fractions corresponding to different isochore families were used to investigate, by hybridization with appropriate probes, the gene distribution in vertebrate genomes. Here we report such a study on the genome of a plant, maize. The gene distribution that we found is most striking, in that almost all genes are present in isochores covering an extremely narrow (1-2%) GC range and only representing 10-20% of the genome. This gene distribution, which seems to characterize other Gramineae as well, is remarkably different from the gene distribution previously found in vertebrate genomes. Collapse Key Words Collapse MESH Headings Base Composition DNA Transposable Elements Genes, Plant Zea mays/genetics Collapse Grants Collapse