126
|
Mora-Márquez F, Chano V, Vázquez-Poletti JL, López de Heredia U. TOA: A software package for automated functional annotation in non-model plant species. Mol Ecol Resour 2021; 21:621-636. [PMID: 33070442 DOI: 10.22541/au.159611047.70067764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 10/01/2020] [Accepted: 10/13/2020] [Indexed: 05/19/2023]
Abstract
The increase of sequencing capacity provided by high-throughput platforms has made it possible to routinely obtain large sets of genomic and transcriptomic sequences from model and non-model organisms. Subsequent genomic analysis and gene discovery in next-generation sequencing experiments are, however, bottlenecked by functional annotation. One common way to perform functional annotation of sets of sequences obtained from next-generation sequencing experiments, is by searching for homologous sequences and accessing the related functional information deposited in genomic databases. Functional annotation is especially challenging for non-model organisms, like many plant species. In such cases, existing free and commercial general-purpose applications may not offer complete and accurate results. We present TOA (Taxonomy-oriented annotation), a Python-based user-friendly open source application designed to establish functional annotation pipelines geared towards non-model plant species that can run in Linux/Mac computers, HPCs and cloud servers. TOA performs homology searches against proteins stored in the PLAZA databases, NCBI RefSeq Plant, Nucleotide Database and Non-Redundant Protein Sequence Database, and outputs functional information from several ontology systems: Gene Ontology, InterPro, EC, KEGG, Mapman and MetaCyc. The software performance was validated by comparing the runtimes, total number of annotated sequences and accuracy of the functional information obtained for several plant benchmark data sets with TOA and other functional annotation solutions. TOA outperformed the other software in terms of number of annotated sequences and accuracy of the annotation and constitutes a good alternative to improve functional annotation in plants. TOA is especially recommended for gymnosperms or for low quality sequence data sets of non-model plants.
Collapse
|
127
|
Musilova J, Kourilova X, Bezdicek M, Lengerova M, Obruca S, Skutkova H, Sedlar K. First Complete Genome of the Thermophilic Polyhydroxyalkanoates-Producing Bacterium Schlegelella thermodepolymerans DSM 15344. Genome Biol Evol 2021; 13:6081016. [PMID: 33432323 PMCID: PMC8023429 DOI: 10.1093/gbe/evab007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/06/2021] [Indexed: 11/13/2022] Open
Abstract
Schlegelella thermodepolymerans is a moderately thermophilic bacterium capable of producing polyhydroxyalkanoates-biodegradable polymers representing an alternative to conventional plastics. Here, we present the first complete genome of the type strain S. thermodepolymerans DSM 15344 that was assembled by hybrid approach using both long (Oxford Nanopore) and short (Illumina) reads. The genome consists of a single 3,858,501-bp-long circular chromosome with GC content of 70.3%. Genome annotation identified 3,650 genes in total, whereas 3,598 open reading frames belonged to protein-coding genes. Functional annotation of the genome and division of genes into clusters of orthologous groups revealed a relatively high number of 1,013 genes with unknown function or unknown clusters of orthologous groups, which reflects the fact that only a little is known about thermophilic polyhydroxyalkanoates-producing bacteria on a genome level. On the other hand, 270 genes involved in energy conversion and production were detected. This group covers genes involved in catabolic processes, which suggests capability of S. thermodepolymerans DSM 15344 to utilize and biotechnologically convert various substrates such as lignocellulose-based saccharides, glycerol, or lipids. Based on the knowledge of its genome, it can be stated that S. thermodepolymerans DSM 15344 is a very interesting, metabolically versatile bacterium with great biotechnological potential.
Collapse
|
128
|
Jin Y, Zhang Z, Xi Y, Yang Z, Xiao Z, Guan S, Qu J, Wang P, Zhao R. Identification and Functional Verification of Cold Tolerance Genes in Spring Maize Seedlings Based on a Genome-Wide Association Study and Quantitative Trait Locus Mapping. FRONTIERS IN PLANT SCIENCE 2021; 12:776972. [PMID: 34956272 PMCID: PMC8696014 DOI: 10.3389/fpls.2021.776972] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/19/2021] [Indexed: 05/13/2023]
Abstract
Maize (Zea mays L.) is a tropical crop, and low temperature has become one of the main abiotic stresses for maize growth and development, affecting many maize growth processes. The main area of maize production in China, Jilin province, often suffers from varying degrees of cold damage in spring, which seriously affects the quality and yield of maize. In the face of global climate change and food security concerns, discovering cold tolerance genes, developing cold tolerance molecular markers, and creating cold-tolerant germplasm have become urgent for improving maize resilience against these conditions and obtaining an increase in overall yield. In this study, whole-genome sequencing and genotyping by sequencing were used to perform genome-wide association analysis (GWAS) and quantitative trait locus (QTL) mapping of the two populations, respectively. Overall, four single-nucleotide polymorphisms (SNPs) and 12 QTLs were found to be significantly associated with cold tolerance. Through joint analysis, an intersection of GWAS and QTL mapping was found on chromosome 3, on which the Zm00001d002729 gene was identified as a potential factor in cold tolerance. We verified the function of this target gene through overexpression, suppression of expression, and genetic transformation into maize. We found that Zm00001d002729 overexpression resulted in better cold tolerance in this crop. The identification of genes associated with cold tolerance contributes to the clarification of the underlying mechanism of this trait in maize and provides a foundation for the adaptation of maize to colder environments in the future, to ensure food security.
Collapse
|
129
|
Lobb B, Tremblay BJM, Moreno-Hagelsieb G, Doxey AC. An assessment of genome annotation coverage across the bacterial tree of life. Microb Genom 2020; 6. [PMID: 32124724 PMCID: PMC7200070 DOI: 10.1099/mgen.0.000341] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Although gene-finding in bacterial genomes is relatively straightforward, the automated assignment of gene function is still challenging, resulting in a vast quantity of hypothetical sequences of unknown function. But how prevalent are hypothetical sequences across bacteria, what proportion of genes in different bacterial genomes remain unannotated, and what factors affect annotation completeness? To address these questions, we surveyed over 27 000 bacterial genomes from the Genome Taxonomy Database, and measured genome annotation completeness as a function of annotation method, taxonomy, genome size, 'research bias' and publication date. Our analysis revealed that 52 and 79 % of the average bacterial proteome could be functionally annotated based on protein and domain-based homology searches, respectively. Annotation coverage using protein homology search varied significantly from as low as 14 % in some species to as high as 98 % in others. We found that taxonomy is a major factor influencing annotation completeness, with distinct trends observed across the microbial tree (e.g. the lowest level of completeness was found in the Patescibacteria lineage). Most lineages showed a significant association between genome size and annotation incompleteness, likely reflecting a greater degree of uncharacterized sequences in 'accessory' proteomes than in 'core' proteomes. Finally, research bias, as measured by publication volume, was also an important factor influencing genome annotation completeness, with early model organisms showing high completeness levels relative to other genomes in their own taxonomic lineages. Our work highlights the disparity in annotation coverage across the bacterial tree of life and emphasizes a need for more experimental characterization of accessory proteomes as well as understudied lineages.
Collapse
|
130
|
Mora-Márquez F, Chano V, Vázquez-Poletti JL, López de Heredia U. TOA: A software package for automated functional annotation in non-model plant species. Mol Ecol Resour 2020; 21:621-636. [PMID: 33070442 DOI: 10.1111/1755-0998.13285] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 10/01/2020] [Accepted: 10/13/2020] [Indexed: 01/05/2023]
Abstract
The increase of sequencing capacity provided by high-throughput platforms has made it possible to routinely obtain large sets of genomic and transcriptomic sequences from model and non-model organisms. Subsequent genomic analysis and gene discovery in next-generation sequencing experiments are, however, bottlenecked by functional annotation. One common way to perform functional annotation of sets of sequences obtained from next-generation sequencing experiments, is by searching for homologous sequences and accessing the related functional information deposited in genomic databases. Functional annotation is especially challenging for non-model organisms, like many plant species. In such cases, existing free and commercial general-purpose applications may not offer complete and accurate results. We present TOA (Taxonomy-oriented annotation), a Python-based user-friendly open source application designed to establish functional annotation pipelines geared towards non-model plant species that can run in Linux/Mac computers, HPCs and cloud servers. TOA performs homology searches against proteins stored in the PLAZA databases, NCBI RefSeq Plant, Nucleotide Database and Non-Redundant Protein Sequence Database, and outputs functional information from several ontology systems: Gene Ontology, InterPro, EC, KEGG, Mapman and MetaCyc. The software performance was validated by comparing the runtimes, total number of annotated sequences and accuracy of the functional information obtained for several plant benchmark data sets with TOA and other functional annotation solutions. TOA outperformed the other software in terms of number of annotated sequences and accuracy of the annotation and constitutes a good alternative to improve functional annotation in plants. TOA is especially recommended for gymnosperms or for low quality sequence data sets of non-model plants.
Collapse
|
131
|
Sheng M, She J, Xu W, Hong Y, Su Z, Zhang X. HpeNet: Co-expression Network Database for de novo Transcriptome Assembly of Paeonia lactiflora Pall. Front Genet 2020; 11:570138. [PMID: 33193666 PMCID: PMC7641121 DOI: 10.3389/fgene.2020.570138] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Accepted: 08/18/2020] [Indexed: 01/23/2023] Open
Abstract
The herbaceous peony (Paeonia lactiflora Pall.) is a well-known ornamental flowering and pharmaceutical plant found in China. Its high medicinal value has long been recognized by traditional Chinese medicine (as Radix paeoniae Alba and Radix paeoniae Rubra), and it has become economically valued for its oilseed in recent years; like other Paeonia species, it has been identified as a novel resource for the α-linolenic acid used in seed oil production. However, its genome has not yet been sequenced, and little transcriptome data on Paeonia lactiflora are available. To obtain a comprehensive transcriptome for Paeonia lactiflora, RNAs from 10 tissues of the Paeonia lactiflora Pall. cv Shaoyou17C were used for de novo assembly, and 416,062 unigenes were obtained. Using a homology search, it was found that 236,222 (approximately 57%) unigenes had at least one BLAST hit in one or more public data resources. The construction of co-expression networks is a feasible means for improving unigene annotation. Using in-house transcriptome data, we obtained a co-expression network covering 95.13% of the unigenes. Then we integrated co-expression network analyses and lipid-related pathway genes to study lipid metabolism in Paeonia lactiflora cultivars. Finally, we constructed the online database HpeNet (http://bioinformatics.cau.edu.cn/HpeNet) to integrate transcriptome data, gene information, the co-expression network, and so forth. The database can also be searched for gene details, gene functions, orthologous matches, and other data. Our online database may help the research community identify functional genes and perform research on Paeonia lactiflora more conveniently. We hope that de novo transcriptome assembly, combined with co-expression networks, can provide a feasible means to predict the gene function of species that do not have a reference genome.
Collapse
|
132
|
Vyse TJ, Cunninghame Graham DS. Trans-Ancestral Fine-Mapping and Epigenetic Annotation as Tools to Delineate Functionally Relevant Risk Alleles at IKZF1 and IKZF3 in Systemic Lupus Erythematosus. Int J Mol Sci 2020; 21:ijms21218383. [PMID: 33182226 PMCID: PMC7664943 DOI: 10.3390/ijms21218383] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/09/2020] [Accepted: 10/13/2020] [Indexed: 12/19/2022] Open
Abstract
Background: Prioritizing tag-SNPs carried on extended risk haplotypes at susceptibility loci for common disease is a challenge. Methods: We utilized trans-ancestral exclusion mapping to reduce risk haplotypes at IKZF1 and IKZF3 identified in multiple ancestries from SLE GWAS and ImmunoChip datasets. We characterized functional annotation data across each risk haplotype from publicly available datasets including ENCODE, RoadMap Consortium, PC Hi-C data from 3D genome browser, NESDR NTR conditional eQTL database, GeneCards Genehancers and TF (transcription factor) binding sites from Haploregv4. Results: We refined the 60 kb associated haplotype upstream of IKZF1 to just 12 tag-SNPs tagging a 47.7 kb core risk haplotype. There was preferential enrichment of DNAse I hypersensitivity and H3K27ac modification across the 3′ end of the risk haplotype, with four tag-SNPs sharing allele-specific TF binding sites with promoter variants, which are eQTLs for IKZF1 in whole blood. At IKZF3, we refined a core risk haplotype of 101 kb (27 tag-SNPs) from an initial extended haplotype of 194 kb (282 tag-SNPs), which had widespread DNAse I hypersensitivity, H3K27ac modification and multiple allele-specific TF binding sites. Dimerization of Fox family TFs bound at the 3′ and promoter of IKZF3 may stabilize chromatin looping across the locus. Conclusions: We combined trans-ancestral exclusion mapping and epigenetic annotation to identify variants at both IKZF1 and IKZF3 with the highest likelihood of biological relevance. The approach will be of strong interest to other complex trait geneticists seeking to attribute biological relevance to risk alleles on extended risk haplotypes in their disease of interest.
Collapse
|
133
|
Identification of Candidate Genes and Pathways Associated with Obesity-Related Traits in Canines via Gene-Set Enrichment and Pathway-Based GWAS Analysis. Animals (Basel) 2020; 10:ani10112071. [PMID: 33182249 PMCID: PMC7695335 DOI: 10.3390/ani10112071] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 11/06/2020] [Accepted: 11/06/2020] [Indexed: 02/06/2023] Open
Abstract
The present study aimed to identify causative loci and genes enriched in pathways associated with canine obesity using a genome-wide association study (GWAS). The GWAS was first performed to identify candidate single-nucleotide polymorphisms (SNPs) associated with obesity and obesity-related traits including body weight and blood sugar in 18 different breeds of 153 dogs. A total of 10 and 2 SNPs were found to be significantly (p < 3.74 × 10-7) associated with body weight and blood sugar, respectively. None of the SNPs were identified to be significantly associated with obesity trait. We subsequently followed up the GWAS analysis with gene-set enrichment and pathway analyses. A gene-set with 1057, 1409, and 1243 SNPs annotated to 449, 933 and 820 genes for obesity, body weight, and blood sugar, respectively was created by sub-setting the GWAS result at a threshold of p < 0.01 for the gene-set enrichment analysis. In total, 84 GO and 21 KEGG pathways for obesity, 114 GO and 44 KEGG pathways for blood sugar, 120 GO and 24 KEGG pathways for body weight were found to be enriched. Among the pathways and GO terms, we highlighted five enriched pathways (Wnt signaling pathway, adherens junction, pathways in cancer, axon guidance, and insulin secretion) and seven GO terms (fat cell differentiation, calcium ion binding, cytoplasm, nucleus, phospholipid transport, central nervous system development, and cell surface) that were found to be shared among all the traits. Our data provide insights into the genes and pathways associated with obesity and obesity-related traits.
Collapse
|
134
|
Ahmad S, Ballester PJ, Fernandez M. Editorial: Intelligent Systems for Genome Functional Annotations. Front Genet 2020; 11:915. [PMID: 33061935 PMCID: PMC7477101 DOI: 10.3389/fgene.2020.00915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 07/23/2020] [Indexed: 11/27/2022] Open
|
135
|
Li L, Liu H, Wen W, Huang C, Li X, Xiao S, Wu M, Shi J, Xu D. Full Transcriptome Analysis of Callus Suspension Culture System of Bletilla striata. Front Genet 2020; 11:995. [PMID: 33193583 PMCID: PMC7593603 DOI: 10.3389/fgene.2020.00995] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 08/05/2020] [Indexed: 12/13/2022] Open
Abstract
Background Bletilla striata has been widely used in the pharmacology industry. To effectively produce the secondary metabolites through suspension cultured cells of B. striata, it is important to exploring the full-length transcriptome data and the genes related to cell growth and chemical producing of all culture stages. We applied a combination of Real-Time Sequencing of Single Molecule (SMRT) and second-generation sequencing (SGS) to generate the complete and full-length transcriptome of B. striata suspension cultured cells. Methods The B. striata transcriptome was formed in de novo way by using PacBio isoform sequencing (Iso-Seq) on a pooled RNA sample derived from 23 samples of 10 culture stages, to explore the potential for capturing full-length transcript isoforms. All unigenes were obtained after splicing, assembling, and clustering, and corrected by the SGS results. The obtained unigenes were compared with the databases, and the functions were annotated and classified. Results and conclusions A total of 100,276 high-quality full-length transcripts were obtained, with an average length of 2530 bp and an N50 of 3302 bp. About 52% of total sequences were annotated against the Gene Ontology, 53,316 unigenes were hit by KOG annotations and divided into 26 functional categories, 80,020 unigenes were mapped by KEGG annotations and clustered into 363 pathways. Furthermore, 15,133 long-chain non-coding RNAs (lncRNAs) were detected. And 68,996 coding sequences were identified based on SSR analysis, among which 31 pairs of primers selected at random were amplified and obtained stable bands. In conclusion, our results provide new full-length transcriptome data and genetic resources for identifying growth and metabolism-related genes, which provide a solid foundation for further research on its growth regulation mechanisms and genetic engineering breeding mechanisms of B. striata.
Collapse
|
136
|
Zhang T, Kayani MUR, Hong L, Zhang C, Zhong J, Wang Z, Chen L. Dynamics of the Salivary Microbiome During Different Phases of Crohn's Disease. Front Cell Infect Microbiol 2020; 10:544704. [PMID: 33123492 PMCID: PMC7574453 DOI: 10.3389/fcimb.2020.544704] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 09/07/2020] [Indexed: 12/18/2022] Open
Abstract
Crohn's disease is a chronic disorder that typically affects the gastrointestinal tract. The increased incidence in the recent years, especially in Asian countries, prompts for performing studies and gain newer insights into the etiology and pathogenesis of the disease. Among other causative factors, gut microbiome and its cross-talk with the salivary microbiome is a known factor that has a plausible role in the pathogenesis of Crohn's disease. The gut microbiome has been extensively studied, however, the salivary microbiome and its dynamics during different phases of this disease remain understudied. In this study, we obtained saliva samples from the patients during active and remission phases of the disease and compared them with control samples and highlighted the differences in taxonomic as well as predicted functional pathways among them. Our results indicated that the α and β diversities were significantly lower during the active phase in contrast with remission phase and healthy samples. In general, Firmicutes were most abundant among the three sample groups, followed by Bacteroidetes and Proteobacteria. Genus level distribution highlighted Streptococcus, Neisseria, Prevotella, Haemophilus, and Veillonella as the five most abundant taxa. Differential abundance analysis of the three sample groups identified significant enrichment of 30 bacterial taxa in the active phase that included g_Prevotella, f_Prevotellaceae, and p_Bacteroidetes. Furthermore, remission phase and control also exhibited significant enrichment of 24 and 22 bacterial taxa, respectively. Eleven differentially abundant pathways were also identified, four were significantly enriched in healthy controls whereas other seven were significantly enriched in active phase of the disease. Several important pathways, such as ribosome biogenesis and Energy metabolism were depleted in the active phase. Our study has highlighted several taxa and functional categories that could be implicated with the onset of Crohn's disease and thus have the potential to serve as biomarkers of the active disease. However, these findings require further validation through functional studies in the future.
Collapse
|
137
|
Ejigu GF, Jung J. Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing. BIOLOGY 2020; 9:E295. [PMID: 32962098 PMCID: PMC7565776 DOI: 10.3390/biology9090295] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 09/13/2020] [Accepted: 09/16/2020] [Indexed: 12/16/2022]
Abstract
Next-Generation Sequencing (NGS) has made it easier to obtain genome-wide sequence data and it has shifted the research focus into genome annotation. The challenging tasks involved in annotation rely on the currently available tools and techniques to decode the information contained in nucleotide sequences. This information will improve our understanding of general aspects of life and evolution and improve our ability to diagnose genetic disorders. Here, we present a summary of both structural and functional annotations, as well as the associated comparative annotation tools and pipelines. We highlight visualization tools that immensely aid the annotation process and the contributions of the scientific community to the annotation. Further, we discuss quality-control practices and the need for re-annotation, and highlight the future of annotation.
Collapse
|
138
|
Li T, Li X, Guo Y, Zheng G, Yu T, Zeng W, Qiu L, He X, Yang Y, Zheng X, Li Y, Huang H, Liu X. Distinct mRNA and long non-coding RNA expression profiles of decidual natural killer cells in patients with early missed abortion. FASEB J 2020; 34:14264-14286. [PMID: 32915478 DOI: 10.1096/fj.202000621r] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 07/20/2020] [Accepted: 08/03/2020] [Indexed: 12/11/2022]
Abstract
Early non-chromosome-related missed abortion (MA) is commonly associated with an altered immunological environment during pregnancy. Human decidual natural killer (dNK) cells, the most abundant lymphocyte population within the first-trimester maternal-fetal interface, are vital maternal regulators of immune tolerance mediating successful embryo implantation and placentation. Previous studies have shown that dNK cells may play a role in MA. However, the gene expression status and specific altered manifestations of dNK cells in patients with early MA remain largely unknown. Here, we show that MA dNK cells have distinct mRNA and lncRNA expression profiles through RNA sequencing, with a total of 276 mRNAs and 67 lncRNAs being differentially expressed compared with controls. Protein-protein interaction analysis of differentially expressed mRNAs was performed to identify hub genes and key modules. An lncRNA-mRNA regulatory network characterized by the small-world property was constructed to reveal the regulation of mRNA transcription by differential hub lncRNAs. Functional annotation of differentially expressed mRNAs and lncRNAs was performed to disclose their potential roles in MA pathogenesis. Our data highlight several enriched biological processes (immune response, inflammatory response, cell adhesion, and extracellular matrix [ECM] organization) and signaling pathways (cytokine-cytokine receptor interaction, ECM-receptor interaction, Toll-like receptor signaling pathway, and phosphatidylinositol signaling system) that may influence MA. This study is the first to demonstrate the involvement of altered mRNA and lncRNA expression profiles in the dNK cell pathogenesis of early MA, facilitating a better understanding of the underlying molecular mechanisms and the development of novel MA therapeutic strategies targeting key mRNAs and lncRNAs.
Collapse
|
139
|
Wang Q, Shen Y, Hu H, Fan C, Zhang A, Ding R, Ye B, Xiang M. Systematic Transcriptome Analysis of Noise-Induced Hearing Loss Pathogenesis Suggests Inflammatory Activities and Multiple Susceptible Molecules and Pathways. Front Genet 2020; 11:968. [PMID: 33005175 PMCID: PMC7483666 DOI: 10.3389/fgene.2020.00968] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 07/31/2020] [Indexed: 12/12/2022] Open
Abstract
Noise-induced hearing loss (NIHL) is characterized by damage to cochlear neurons and associated hair cells; however, a systematic evaluation of NIHL pathogenesis is still lacking. Here, we systematically evaluated differentially expressed genes of 22 cochlear samples in an NIHL mouse model. We performed Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis and weighted gene co-expression network analysis (WGCNA). Core modules were detected using protein–protein interactions and WGCNA with functional annotation, diagnostic value evaluation, and experimental validation. Pooled functional annotation suggested the involvement of multiple inflammatory pathways, including the TNF signaling pathway, IL-17 signaling pathway, NF-kappa B signaling pathway, rheumatoid arthritis, and p53 signaling pathway. The core modules suggested that responses to cytokines, heat, cAMP, ATP, mechanical stimuli, and immune responses were important in NIHL pathogenesis. These activities primarily occurred on the external side of the plasma membrane, the extracellular region, and the nucleus. Binding activities, including CCR2 receptor binding, protein binding, and transcription factor binding, may be important. Additionally, the hub molecules with diagnostic value included Relb, Hspa1b, Ccl2, Ptgs2, Ldlr, Plat, and Ccl17. An evaluation of Relb and Hspa1b protein levels showed that Relb was upregulated in spiral ganglion neurons, which might have diagnostic value. In conclusion, this study indicates that the inflammatory response is involved in auditory organ changes in NIHL pathogenesis; moreover, several molecules and activities have essential and subtle influences that have translational potential for pharmacological intervention.
Collapse
|
140
|
Pucker B, Reiher F, Schilbert HM. Automatic Identification of Players in the Flavonoid Biosynthesis with Application on the Biomedicinal Plant Croton tiglium. PLANTS (BASEL, SWITZERLAND) 2020; 9:E1103. [PMID: 32867203 PMCID: PMC7570183 DOI: 10.3390/plants9091103] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 08/11/2020] [Accepted: 08/25/2020] [Indexed: 02/06/2023]
Abstract
The flavonoid biosynthesis is a well-characterised model system for specialised metabolism and transcriptional regulation in plants. Flavonoids have numerous biological functions such as UV protection and pollinator attraction, but also biotechnological potential. Here, we present Knowledge-based Identification of Pathway Enzymes (KIPEs) as an automatic approach for the identification of players in the flavonoid biosynthesis. KIPEs combines comprehensive sequence similarity analyses with the inspection of functionally relevant amino acid residues and domains in subjected peptide sequences. Comprehensive sequence sets of flavonoid biosynthesis enzymes and knowledge about functionally relevant amino acids were collected. As a proof of concept, KIPEs was applied to investigate the flavonoid biosynthesis of the medicinal plant Croton tiglium on the basis of a transcriptome assembly. Enzyme candidates for all steps in the biosynthesis network were identified and matched to previous reports of corresponding metabolites in Croton species.
Collapse
|
141
|
A Perspective on Enzyme Inhibitors from Marine Organisms. Mar Drugs 2020; 18:md18090431. [PMID: 32824888 PMCID: PMC7551548 DOI: 10.3390/md18090431] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 08/14/2020] [Indexed: 12/11/2022] Open
Abstract
Marine habitats are promising sources for the identification of novel organisms as well as natural products. Still, we lack detailed knowledge on most of the marine biosphere. In the last decade, a number of reports described the potential of identifying novel bioactive compounds or secondary metabolites from marine environments. This is, and will be, a promising source for candidate compounds in pharma research and chemical biology. In recent years, a number of novel techniques were introduced into the field, and it has become easier to actually prospect for natural products, such as enzyme inhibitors. These novel compounds then need to be characterized and evaluated in comparison to well-known representatives. A number of current research projects target the exploitation of marine organisms and thus the corresponding diversity of metabolites. These are often encountered as potential drugs or biological active compounds. Among these, the class of enzyme inhibitors is an important group of compounds. There is room for new discoveries, and some more recent discoveries are highlighted herein.
Collapse
|
142
|
MiR-93/miR-375: Diagnostic Potential, Aggressiveness Correlation and Common Target Genes in Prostate Cancer. Int J Mol Sci 2020; 21:ijms21165667. [PMID: 32784653 PMCID: PMC7460886 DOI: 10.3390/ijms21165667] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 07/29/2020] [Accepted: 08/05/2020] [Indexed: 12/15/2022] Open
Abstract
Dysregulation of miRNAs has a fundamental role in the initiation, development and progression of prostate cancer (PCa). The potential of miRNA in gene therapy and diagnostic applications is well documented. To further improve miRNAs’ ability to distinguish between PCa and benign prostatic hyperplasia (BPH) patients, nine miRNA (-21, -27b, -93, -141, -205, -221, -182, -375 and let-7a) with the highest reported differentiation power were chosen and for the first time used in comparative studies of serum and prostate tissue samples. Spearman correlations and response operating characteristic (ROC) analyses were applied to assess the capability of the miRNAs present in serum to discriminate between PCa and BPH patients. The present study clearly demonstrates that miR-93 and miR-375 could be taken into consideration as single blood-based non-invasive molecules to distinguish PCa from BPH patients. We indicate that these two miRNAs have six common, PCa-related, target genes (CCND2, MAP3K2, MXI1, PAFAH1B1, YOD1, ZFYVE26) that share the molecular function of protein binding (GO:0005515 term). A high diagnostic value of the new serum derived miR-182 (AUC = 0.881, 95% confidence interval, CI = 0.816–0.946, p < 0.0001, sensitivity and specificity were 85% and 79%, respectively) is also described.
Collapse
|
143
|
Prabhu D, Rajamanikandan S, Anusha SB, Chowdary MS, Veerapandiyan M, Jeyakanthan J. In silico Functional Annotation and Characterization of Hypothetical Proteins from Serratia marcescens FGI94. BIOL BULL+ 2020; 47:319-331. [PMID: 32834707 PMCID: PMC7394047 DOI: 10.1134/s1062359020300019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Revised: 09/28/2019] [Accepted: 09/30/2019] [Indexed: 01/16/2023]
Abstract
Serratia marcescens, rod-shaped Gram-negative bacteria is classified as an opportunistic pathogen in the family Enterobacteriaceae. It causes a wide variety of infections in humans, including urinary, respiratory, ocular lens and ear infections, osteomyelitis, endocarditis, meningitis and septicemia. Unfortunately, over the past decade, antibiotic resistance has become a serious health care issue; the effective means to control and dissemination of S. marcescens resistance is the need of hour. The whole genome sequencing of S. marcescens FGI94 strain contains 4434 functional proteins, among which 690 (15.56%) proteins were classified under hypothetical. In the present study, we applied the power of various bioinformatics tools on the basis of protein family comparison, motifs, functional properties of amino acids and genome context to assign the possible functions for the HPs. The pseudo sequences (protein sequence that contain ≤100 amino acid residues) are eliminated from the study. Although we have successfully predicted the function for 483 proteins, we were able to infer the high level of confidence only for 108 proteins. The predicted HPs were classified into various classes such as enzymes, transporters, binding proteins, cell division, cell regulatory and other proteins. The outcome of the study could be helpful to understand the molecular mechanism in bacterial pathogenesis and also provide an insight into the identification of potential targets for drug and vaccine development.
Collapse
|
144
|
Pranavathiyani G, Prava J, Rajeev AC, Pan A. Novel Target Exploration from Hypothetical Proteins of Klebsiella pneumoniae MGH 78578 Reveals a Protein Involved in Host-Pathogen Interaction. Front Cell Infect Microbiol 2020; 10:109. [PMID: 32318354 PMCID: PMC7146069 DOI: 10.3389/fcimb.2020.00109] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 02/28/2020] [Indexed: 11/13/2022] Open
Abstract
The opportunistic pathogen Klebsiella pneumoniae is a causative agent of several hospital-acquired infections. It has become resistant to a wide range of currently available antibiotics, leading to high mortality rates among patients; this has further led to a demand for novel therapeutic intervention to treat such infections. Using a series of in silico analyses, the present study aims to explore novel drug/vaccine candidates from the hypothetical proteins of K. pneumoniae. A total of 540 proteins were found to be hypothetical in this organism. Analysis of these 540 hypothetical proteins revealed 30 pathogen-specific proteins essential for pathogen survival. A motifs/domain family analysis, similarity search against known proteins, gene ontology, and protein–protein interaction analysis of the shortlisted 30 proteins led to functional assignment for 17 proteins. They were mainly cataloged as enzymes, lipoproteins, stress-induced proteins, transporters, and other proteins (viz., two-component proteins, skeletal proteins and toxins). Among the annotated proteins, 16 proteins, located in the cytoplasm, periplasm, and inner membrane, were considered as potential drug targets, and one extracellular protein was considered as a vaccine candidate. A druggability analysis indicated that the identified 17 drug/vaccine candidates were “novel”. Furthermore, a host–pathogen interaction analysis of these identified target candidates revealed a betaine/carnitine/choline transporters (BCCT) family protein showing interactions with five host proteins. Structure prediction and validation were carried out for this protein, which could aid in structure-based inhibitor design.
Collapse
|
145
|
An α/β-Hydrolase Fold Subfamily Comprising Pseudomonas Quinolone Signal-Cleaving Dioxygenases. Appl Environ Microbiol 2020; 86:AEM.00279-20. [PMID: 32086305 DOI: 10.1128/aem.00279-20] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 02/12/2020] [Indexed: 01/25/2023] Open
Abstract
The quinolone ring is a common core structure of natural products exhibiting antimicrobial, cytotoxic, and signaling activities. A prominent example is the Pseudomonas quinolone signal (PQS), a quorum-sensing signal molecule involved in the regulation of virulence of Pseudomonas aeruginosa The key reaction to quinolone inactivation and biodegradation is the cleavage of the 3-hydroxy-4(1H)-quinolone ring, catalyzed by dioxygenases (HQDs), which are members of the α/β-hydrolase fold superfamily. The α/β-hydrolase fold core domain consists of a β-sheet surrounded by α-helices, with an active site usually containing a catalytic triad comprising a nucleophilic residue, an acidic residue, and a histidine. The nucleophile is located at the tip of a sharp turn, called the "nucleophilic elbow." In this work, we developed a search workflow for the identification of HQD proteins from databases. Search and validation criteria include an [H-x(2)-W] motif at the nucleophilic elbow, an [HFP-x(4)-P] motif comprising the catalytic histidine, the presence of a helical cap domain, the positioning of the triad's acidic residue at the end of β-strand 6, and a set of conserved hydrophobic residues contributing to the substrate cavity. The 161 candidate proteins identified from the UniProtKB database originate from environmental and plant-associated microorganisms from all domains of life. Verification and characterization of HQD activity of 9 new candidate proteins confirmed the reliability of the search strategy and suggested residues correlating with distinct substrate preferences. Among the new HQDs, PQS dioxygenases from Nocardia farcinica, N. cyriacigeorgica, and Streptomyces bingchenggensis likely are part of a catabolic pathway for alkylquinolone utilization.IMPORTANCE Functional annotation of protein sequences is a major requirement for the investigation of metabolic pathways and the identification of sought-after biocatalysts. To identify heterocyclic ring-cleaving dioxygenases within the huge superfamily of α/β-hydrolase fold proteins, we defined search and validation criteria for the primarily motif-based identification of 3-hydroxy-4(1H)-quinolone 2,4-dioxygenases (HQD). HQDs are key enzymes for the inactivation of metabolites, which can have signaling, antimicrobial, or cytotoxic functions. The HQD candidates detected in this study occur particularly in environmental and plant-associated microorganisms. Because HQDs active toward the Pseudomonas quinolone signal (PQS) likely contribute to interactions within microbial communities and modulate the virulence of Pseudomonas aeruginosa, we analyzed the catalytic properties of a PQS-cleaving subset of HQDs and specified characteristics to identify PQS-cleaving dioxygenases within the HQD family.
Collapse
|
146
|
Xie J, Ma A, Fennell A, Ma Q, Zhao J. It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief Bioinform 2020; 20:1449-1464. [PMID: 29490019 DOI: 10.1093/bib/bby014] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 01/16/2018] [Indexed: 12/12/2022] Open
Abstract
Biclustering is a powerful data mining technique that allows clustering of rows and columns, simultaneously, in a matrix-format data set. It was first applied to gene expression data in 2000, aiming to identify co-expressed genes under a subset of all the conditions/samples. During the past 17 years, tens of biclustering algorithms and tools have been developed to enhance the ability to make sense out of large data sets generated in the wake of high-throughput omics technologies. These algorithms and tools have been applied to a wide variety of data types, including but not limited to, genomes, transcriptomes, exomes, epigenomes, phenomes and pharmacogenomes. However, there is still a considerable gap between biclustering methodology development and comprehensive data interpretation, mainly because of the lack of knowledge for the selection of appropriate biclustering tools and further supporting computational techniques in specific studies. Here, we first deliver a brief introduction to the existing biclustering algorithms and tools in public domain, and then systematically summarize the basic applications of biclustering for biological data and more advanced applications of biclustering for biomedical data. This review will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency.
Collapse
|
147
|
NLGenomeSweeper: A Tool for Genome-Wide NBS-LRR Resistance Gene Identification. Genes (Basel) 2020; 11:genes11030333. [PMID: 32245073 PMCID: PMC7141099 DOI: 10.3390/genes11030333] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 03/16/2020] [Accepted: 03/17/2020] [Indexed: 12/02/2022] Open
Abstract
Although there are a number of bioinformatic tools to identify plant nucleotide-binding leucine-rich repeat (NLR) disease resistance genes based on conserved protein sequences, only a few of these tools have attempted to identify disease resistance genes that have not been annotated in the genome. The overall goal of the NLGenomeSweeper pipeline is to annotate NLR disease resistance genes, including RPW8, in the genome assembly with high specificity and a focus on complete functional genes. This is based on the identification of the complete NB-ARC domain, the most conserved domain of NLR genes, using the BLAST suite. In this way, the tool has a high specificity for complete genes and relatively intact pseudogenes. The tool returns all candidate NLR gene locations as well as InterProScan ORF and domain annotations for manual curation of the gene structure.
Collapse
|
148
|
Song B, Tang Y, Wei Z, Liu G, Su J, Meng J, Chen K. PIANO: A Web Server for Pseudouridine-Site (Ψ) Identification and Functional Annotation. Front Genet 2020; 11:88. [PMID: 32226440 PMCID: PMC7080813 DOI: 10.3389/fgene.2020.00088] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 01/27/2020] [Indexed: 12/04/2022] Open
Abstract
Known as the "fifth RNA nucleotide", pseudouridine (Ψ or psi) is the first-discovered and most abundant RNA modification occurring at the Uridine site, and it plays a prominent role in a number of biological processes. Thousands of Ψ sites have been identified within different biological contexts thanks to the advancement in high-throughput sequencing technology; nevertheless, the transcriptome-wide distribution, biomolecular functions, regulatory mechanisms, and disease relevance of pseudouridylation are largely elusive. We report here a web server-PIANO-for pseudouridine site (Ψ) identification and functional annotation. PIANO was built upon a high-accuracy predictor that takes advantage of both conventional sequence features and 42 additional genomic features. When tested on six independent datasets generated from four independent Ψ-profiling technologies (Ψ-seq, RBS-seq, Pseudo-seq, and CeU-seq) as benchmarks, PIANO achieved an average AUC of 0.955 and 0.838 under the full transcript and mature mRNA models, respectively, marking a substantial improvement in accuracy compared to the existing in silico Ψ-site prediction methods, i.e., PPUS (0.713 and 0.707), iRNA-PseU (0.713 and 0.712), and PseUI (0.634 and 0.652). Besides, PIANO web server systematically annotates the predicted Ψ sites with post-transcriptional regulatory mechanisms (miRNA-targets, RBP-binding regions, and splicing sites) in its prediction report to help the users explore potential machinery of Ψ. Moreover, a concise query interface was also built for 4,303 known Ψ sites, which is currently the largest collection of experimentally validated human Ψ sites. The PIANO website is freely accessible at: http://piano.rnamd.com.
Collapse
|
149
|
Qiu F, Bachle S, Nippert JB, Ungerer MC. Comparing control options for time-series RNA sequencing experiments in nonmodel organisms: An example from grasses. Mol Ecol Resour 2020; 20. [PMID: 31957196 DOI: 10.1111/1755-0998.13137] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 01/13/2020] [Indexed: 01/23/2023]
Abstract
RNA sequencing (RNA-seq) is a widely used approach to investigate gene expression and increasingly is used in time-course studies to characterize transcriptomic changes over time. Two primary options are available as controls in time-course experiments: samples collected at the first sampling time are used as controls (temporal control, TC) and samples collected in parallel at each individual sampling time are used as controls (biological control, BC). While both approaches are used in experimental studies, we know of no analyses performed to date that directly compare effects of control type choices on identifying differentially expressed genes (DEGs) and subsequent functional analysis. In the current study, we compare experimental results using these different control types for time-course RNA-seq drought stress experiments in two wild grass species in the genus Paspalum. Our results showed BC assemblies gave a higher number of loci in both species. The number of DEGs increased with increasing stress and then decreased dramatically at the recovery time point using both control types. Expression levels of the same DEGs were highly correlated between control types in both species, ranging from r = .653 to r = .852. We also observed similar rank orders of shared enriched Gene Ontology term lists using the two different control types. Collectively, our findings suggest similar results in differential gene expression and functional annotation between control types. The ultimate choice of control type will rely on the experimental length and organism type, with labour time and sequencing costs as additional factors to be considered.
Collapse
|
150
|
Characterization of Embryonic Skin Transcriptome in Anser cygnoides at Three Feather Follicles Developmental Stages. G3-GENES GENOMES GENETICS 2020; 10:443-454. [PMID: 31792007 PMCID: PMC7003092 DOI: 10.1534/g3.119.400875] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In order to enrich the Anser cygnoides genome and identify the gene expression profiles of primary and secondary feather follicles development, de novo transcriptome assembly of skin tissues was established by analyzing three developmental stages at embryonic day 14, 18, and 28 (E14, E18, E28). Sequencing output generated 436,730,608 clean reads from nine libraries and de novo assembled into 56,301 unigenes. There were 2,298, 9,423 and 12,559 unigenes showing differential expression in three stages respectively. Furthermore, differentially expressed genes (DEGs) were functionally classified according to genes ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and series-cluster analysis. Relevant specific GO terms such as epithelium development, regulation of keratinocyte proliferation, morphogenesis of an epithelium were identified. In all, 15,144 DEGs were clustered into eight profiles with distinct expression patterns and 2,424 DEGs were assigned to 198 KEGG pathways. Skin development related pathways (mitogen-activated protein kinase signaling pathway, extra-cellular matrix -receptor interaction, Wingless-type signaling pathway) and genes (delta like canonical Notch ligand 1, fibroblast growth factor 2, Snail family transcriptional repressor 2, bone morphogenetic protein 6, polo like kinase 1) were identified, and eight DEGs were selected to verify the reliability of transcriptome results by real-time quantitative PCR. The findings of this study will provide the key insights into the complicated molecular mechanism and breeding techniques underlying the developmental characteristics of skin and feather follicles in Anser cygnoides.
Collapse
|