1
|
Kundu P, Beura S, Mondal S, Das AK, Ghosh A. Machine learning for the advancement of genome-scale metabolic modeling. Biotechnol Adv 2024; 74:108400. [PMID: 38944218 DOI: 10.1016/j.biotechadv.2024.108400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/13/2024] [Accepted: 06/23/2024] [Indexed: 07/01/2024]
Abstract
Constraint-based modeling (CBM) has evolved as the core systems biology tool to map the interrelations between genotype, phenotype, and external environment. The recent advancement of high-throughput experimental approaches and multi-omics strategies has generated a plethora of new and precise information from wide-ranging biological domains. On the other hand, the continuously growing field of machine learning (ML) and its specialized branch of deep learning (DL) provide essential computational architectures for decoding complex and heterogeneous biological data. In recent years, both multi-omics and ML have assisted in the escalation of CBM. Condition-specific omics data, such as transcriptomics and proteomics, helped contextualize the model prediction while analyzing a particular phenotypic signature. At the same time, the advanced ML tools have eased the model reconstruction and analysis to increase the accuracy and prediction power. However, the development of these multi-disciplinary methodological frameworks mainly occurs independently, which limits the concatenation of biological knowledge from different domains. Hence, we have reviewed the potential of integrating multi-disciplinary tools and strategies from various fields, such as synthetic biology, CBM, omics, and ML, to explore the biochemical phenomenon beyond the conventional biological dogma. How the integrative knowledge of these intersected domains has improved bioengineering and biomedical applications has also been highlighted. We categorically explained the conventional genome-scale metabolic model (GEM) reconstruction tools and their improvement strategies through ML paradigms. Further, the crucial role of ML and DL in omics data restructuring for GEM development has also been briefly discussed. Finally, the case-study-based assessment of the state-of-the-art method for improving biomedical and metabolic engineering strategies has been elaborated. Therefore, this review demonstrates how integrating experimental and in silico strategies can help map the ever-expanding knowledge of biological systems driven by condition-specific cellular information. This multiview approach will elevate the application of ML-based CBM in the biomedical and bioengineering fields for the betterment of society and the environment.
Collapse
Affiliation(s)
- Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Satyajit Beura
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Suman Mondal
- P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Kumar Das
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
2
|
Suárez-Vega A, Gutiérrez-Gil B, Fonseca PAS, Hervás G, Pelayo R, Toral PG, Marina H, de Frutos P, Arranz JJ. Milk transcriptome biomarker identification to enhance feed efficiency and reduce nutritional costs in dairy ewes. Animal 2024; 18:101250. [PMID: 39096599 DOI: 10.1016/j.animal.2024.101250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 07/03/2024] [Accepted: 07/05/2024] [Indexed: 08/05/2024] Open
Abstract
In recent years, rising prices for high-quality protein-based feeds have significantly increased nutrition costs. Consequently, investigating strategies to reduce these expenses and improve feed efficiency (FE) have become increasingly important for the dairy sheep industry. This research investigates the impact of nutritional protein restriction (NPR) during prepuberty and FE on the milk transcriptome of dairy Assaf ewes (sampled during the first lactation). To this end, we first compared transcriptomic differences between NPR and control ewes. Subsequently, we evaluated gene expression differences between ewes with divergent FE, using feed conversion ratio (FCR), residual feed intake (RFI), and consensus classifications of high- and low-FE animals for both indices. Lastly, we assess milk gene expression as a predictor of FE phenotype using random forest. No effect was found for the prepubertal NPR on milk performance or FE. Moreover, at the milk transcriptome level, only one gene, HBB, was differentially expressed between the NPR (n = 14) and the control group (n = 14). Further, the transcriptomic analysis between divergent FE sheep revealed 114 differentially expressed genes (DEGs) for RFI index (high-FERFI = 10 vs low-FERFI = 10), 244 for FCR (high-FEFCR = 10 vs low-FEFCR = 10), and 1 016 DEGs between divergent consensus ewes for both indices (high-FEconsensus = 8 vs low-FEconsensus = 8). These results underscore the critical role of selected FE indices for RNA-Seq analyses, revealing that consensus divergent animals for both indices maximise differences in transcriptomic responses. Genes overexpressed in high-FEconsensus ewes were associated with milk production and mammary gland development, while low-FEconsensus genes were linked to higher metabolic expenditure for tissue organisation and repair. The best prediction accuracy for FE phenotype using random forest was obtained for a set of 44 genes consistently differentially expressed across lactations, with Spearman correlations of 0.37 and 0.22 for FCR and RFI, respectively. These findings provide insights into potential sustainability strategies for dairy sheep, highlighting the utility of transcriptomic markers as FE proxies.
Collapse
Affiliation(s)
- A Suárez-Vega
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain
| | - B Gutiérrez-Gil
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain
| | - P A S Fonseca
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain
| | - G Hervás
- Instituto de Ganadería de Montaña (CSIC-University of León), Finca Marzanas s/n, 24346 Grulleros, León, Spain
| | - R Pelayo
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain
| | - P G Toral
- Instituto de Ganadería de Montaña (CSIC-University of León), Finca Marzanas s/n, 24346 Grulleros, León, Spain
| | - H Marina
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain
| | - P de Frutos
- Instituto de Ganadería de Montaña (CSIC-University of León), Finca Marzanas s/n, 24346 Grulleros, León, Spain
| | - J J Arranz
- Dpto. Producción Animal, Facultad de Veterinaria, Universidad de León, Campus de Vegazana s/n, 24007 Leon, Spain.
| |
Collapse
|
3
|
Zhang X, Zhang Y, Wang L, Wu G, Pan C. Three novel simple sequence repeats (SSRs) identified by MALDI-TOF-MS method were associated with backfat in pig. Anim Biotechnol 2023; 34:1014-1021. [PMID: 35048796 DOI: 10.1080/10495398.2021.2009845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Backfat trait is an important economic trait and highly heritable, but difficult to evaluate. Thus, it is of great significance to explore optimal backfat thickness of pigs by using marker-assisted selection (MAS) to speed up its breeding process and improve economic efficiency. This study aimed to investigate the relationship between genetic variations (e.g., SSRs) and backfat of Qinghai Bamei pigs using MALDI-TOF Mass Spectrometry (MALDI-TOF-MS). Herein, five alternative SSR loci (namely V1, V2, V3, V4 and V5) were selected for subsequent detection. The results suggested that 3 (141-, 143- and 145-), 3 (128-, 130- and 132-), 2 (160- and 162-), 2 (136- and 139-) and 3 (170-, 184- and 192-) alleles of V1, V2, V3, V4 and V5 were found, respectively. Subsequent analysis showed that there was linkage equilibrium among five SSRs and Hap19 (13.1%) (141-/132-/160-/139-/192-) had the highest haplotype frequency. Among these five SSR loci, V1, V2 and V3 loci were significantly associated to the backfat of Qinghai Bamei sows. These findings enriched the study of SSRs in Qinghai Bamei pigs, and (AC)n (Chr15:85485851-85485995), (AC)n (Chr10:52724583-52724713) and (TG)n (Chr4:90732644-90732802) could be utilized as the candidate locus for MAS in pig industry.HIGHLIGHTSFive novel SSR loci was identified in pigs through MALDI-TOF MS.V1, V2 and V3 loci was were significantly associated to the backfat of pigs.
Collapse
Affiliation(s)
- Xuelian Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Yanghai Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
- Meat Science and Muscle Biology Laboratory, Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Lei Wang
- College of Animal Science and Veterinary Medicine, Qinghai University, Xining, China
| | - Guofang Wu
- College of Animal Science and Veterinary Medicine, Qinghai University, Xining, China
| | - Chuanying Pan
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China
| |
Collapse
|
4
|
Suárez-Vega A, Frutos P, Gutiérrez-Gil B, Esteban-Blanco C, Toral PG, Arranz JJ, Hervás G. Feed efficiency in dairy sheep: An insight from the milk transcriptome. Front Vet Sci 2023; 10:1122953. [PMID: 37077950 PMCID: PMC10106586 DOI: 10.3389/fvets.2023.1122953] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 03/16/2023] [Indexed: 04/05/2023] Open
Abstract
IntroductionAs higher feed efficiency in dairy ruminants means a higher capability to transform feed nutrients into milk and milk components, differences in feed efficiency are expected to be partly linked to changes in the physiology of the mammary glands. Therefore, this study aimed to determine the biological functions and key regulatory genes associated with feed efficiency in dairy sheep using the milk somatic cell transcriptome.Material and methodsRNA-Seq data from high (H-FE, n = 8) and low (L-FE, n = 8) feed efficiency ewes were compared through differential expression analysis (DEA) and sparse Partial Least Square-Discriminant analysis (sPLS-DA).ResultsIn the DEA, 79 genes were identified as differentially expressed between both conditions, while the sPLS-DA identified 261 predictive genes [variable importance in projection (VIP) > 2] that discriminated H-FE and L-FE sheep.DiscussionThe DEA between sheep with divergent feed efficiency allowed the identification of genes associated with the immune system and stress in L-FE animals. In addition, the sPLS-DA approach revealed the importance of genes involved in cell division (e.g., KIF4A and PRC1) and cellular lipid metabolic process (e.g., LPL, SCD, GPAM, and ACOX3) for the H-FE sheep in the lactating mammary gland transcriptome. A set of discriminant genes, commonly identified by the two statistical approaches, was also detected, including some involved in cell proliferation (e.g., SESN2, KIF20A, or TOP2A) or encoding heat-shock proteins (HSPB1). These results provide novel insights into the biological basis of feed efficiency in dairy sheep, highlighting the informative potential of the mammary gland transcriptome as a target tissue and revealing the usefulness of combining univariate and multivariate analysis approaches to elucidate the molecular mechanisms controlling complex traits.
Collapse
Affiliation(s)
- Aroa Suárez-Vega
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, León, Spain
| | - Pilar Frutos
- Instituto de Ganadería de Montaña (CSIC-Universidad de León), Grulleros, León, Spain
| | - Beatriz Gutiérrez-Gil
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, León, Spain
| | - Cristina Esteban-Blanco
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, León, Spain
| | - Pablo G. Toral
- Instituto de Ganadería de Montaña (CSIC-Universidad de León), Grulleros, León, Spain
| | - Juan-José Arranz
- Departamento de Producción Animal, Facultad de Veterinaria, Universidad de León, León, Spain
- *Correspondence: Juan-José Arranz
| | - Gonzalo Hervás
- Instituto de Ganadería de Montaña (CSIC-Universidad de León), Grulleros, León, Spain
| |
Collapse
|
5
|
Blood-based gene expression as non-lethal tool for inferring salinity-habitat history of European eel (Anguilla anguilla). Sci Rep 2022; 12:22142. [PMID: 36550161 PMCID: PMC9780358 DOI: 10.1038/s41598-022-26302-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022] Open
Abstract
The European eel is a facultative catadromous species, meaning that it can skip the freshwater phase or move between marine and freshwater habitats during its continental life stage. Otolith microchemistry, used to determine the habitat use of eel or its salinity history, requires the sacrifice of animals. In this context, blood-based gene expression may represent a non-lethal alternative. In this work, we tested the ability of blood transcriptional profiling to identify the different salinity-habitat histories of European eel. Eels collected from different locations in Norway were classified through otolith microchemistry as freshwater residents (FWR), seawater residents (SWR) or inter-habitat shifters (IHS). We detected 3451 differentially expressed genes from blood by comparing FWR and SWR groups, and then used that subset of genes in a machine learning approach (i.e., random forest) to the extended FWR, SWR, and IHS group. Random forest correctly classified 100% of FWR and SWR and 83% of the IHS using a minimum of 30 genes. The implementation of this non-lethal approach may replace otolith-based microchemistry analysis for the general assessment of life-history tactics in European eels. Overall, this approach is promising for the replacement or reduction of other lethal analyses in determining certain fish traits.
Collapse
|
6
|
Shetta O, Niranjan M, Dasmahapatra S. Convex Multi-View Clustering Via Robust Low Rank Approximation With Application to Multi-Omic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3340-3352. [PMID: 34705655 DOI: 10.1109/tcbb.2021.3122961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent advances in high throughput technologies have made large amounts of biomedical omics data accessible to the scientific community. Single omic data clustering has proved its impact in the biomedical and biological research fields. Multi-omic data clustering and multi-omic data integration techniques have shown improved clustering performance and biological insight. Cancer subtype clustering is an important task in the medical field to be able to identify a suitable treatment procedure and prognosis for cancer patients. State of the art multi-view clustering methods are based on non-convex objectives which only guarantee non-global solutions that are high in computational complexity. Only a few convex multi-view methods are present. However, their models do not take into account the intrinsic manifold structure of the data. In this paper, we introduce a convex graph regularized multi-view clustering method that is robust to outliers. We compare our algorithm to state of the art convex and non-convex multi-view and single view clustering methods, and show its superiority in clustering cancer subtypes on publicly available cancer genomic datasets from the TCGA repository. We also show our method's better ability to potentially discover cancer subtypes compared to other state of the art multi-view methods.
Collapse
|
7
|
|
8
|
Bin Satter K, Ramsey Z, Tran PMH, Hopkins D, Bearden G, Richardson KP, Terris MK, Savage NM, Kavuri SK, Purohit S. Development of a Single Molecule Counting Assay to Differentiate Chromophobe Renal Cancer and Oncocytoma in Clinics. Cancers (Basel) 2022; 14:3242. [PMID: 35805014 PMCID: PMC9265083 DOI: 10.3390/cancers14133242] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 06/28/2022] [Accepted: 06/28/2022] [Indexed: 02/01/2023] Open
Abstract
Malignant chromophobe renal cancer (chRCC) and benign oncocytoma (RO) are two renal tumor types difficult to differentiate using histology and immunohistochemistry-based methods because of their similarity in appearance. We previously developed a transcriptomics-based classification pipeline with "Chromophobe-Oncocytoma Gene Signature" (COGS) on a single-molecule counting platform. Renal cancer patients (n = 32, chRCC = 17, RO = 15) were recruited from Augusta University Medical Center (AUMC). Formalin-fixed paraffin-embedded (FFPE) blocks from their excised tumors were collected. We created a custom single-molecule counting code set for COGS to assay RNA from FFPE blocks. Utilizing hematoxylin-eosin stain, pathologists were able to correctly classify these tumor types (91.8%). Our unsupervised learning with UMAP (Uniform manifold approximation and projection, accuracy = 0.97) and hierarchical clustering (accuracy = 1.0) identified two clusters congruent with their histology. We next developed and compared four supervised models (random forest, support vector machine, generalized linear model with L2 regularization, and supervised UMAP). Supervised UMAP has shown to classify all the cases correctly (sensitivity = 1, specificity = 1, accuracy = 1) followed by random forest models (sensitivity = 0.84, specificity = 1, accuracy = 1). This pipeline can be used as a clinical tool by pathologists to differentiate chRCC from RO.
Collapse
Affiliation(s)
- Khaled Bin Satter
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, 1120, 15th St., Augusta, GA 30912, USA; (K.B.S.); (P.M.H.T.); (D.H.); (G.B.); (K.P.R.)
| | - Zach Ramsey
- Department of Pathology, Medical College of Georgia, Augusta University, 1120 15th St., Augusta, GA 30912, USA; (Z.R.); (N.M.S.); (S.K.K.)
| | - Paul M. H. Tran
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, 1120, 15th St., Augusta, GA 30912, USA; (K.B.S.); (P.M.H.T.); (D.H.); (G.B.); (K.P.R.)
| | - Diane Hopkins
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, 1120, 15th St., Augusta, GA 30912, USA; (K.B.S.); (P.M.H.T.); (D.H.); (G.B.); (K.P.R.)
| | - Gregory Bearden
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, 1120, 15th St., Augusta, GA 30912, USA; (K.B.S.); (P.M.H.T.); (D.H.); (G.B.); (K.P.R.)
| | - Katherine P. Richardson
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, 1120, 15th St., Augusta, GA 30912, USA; (K.B.S.); (P.M.H.T.); (D.H.); (G.B.); (K.P.R.)
| | - Martha K. Terris
- Department of Urology, Medical College of Georgia, Augusta University, 1120 15th St., Augusta, GA 30912, USA;
| | - Natasha M. Savage
- Department of Pathology, Medical College of Georgia, Augusta University, 1120 15th St., Augusta, GA 30912, USA; (Z.R.); (N.M.S.); (S.K.K.)
| | - Sravan K. Kavuri
- Department of Pathology, Medical College of Georgia, Augusta University, 1120 15th St., Augusta, GA 30912, USA; (Z.R.); (N.M.S.); (S.K.K.)
| | - Sharad Purohit
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, 1120, 15th St., Augusta, GA 30912, USA; (K.B.S.); (P.M.H.T.); (D.H.); (G.B.); (K.P.R.)
- Department of Obstetrics and Gynecology, Medical College of Georgia, Augusta University, 1120 15th St., Augusta, GA 30912, USA
- Department of Undergraduate Health Professionals, College of Allied Health Sciences, Augusta University, 1120 15th St., Augusta, GA 30912, USA
| |
Collapse
|
9
|
Transcriptome profile analysis identifies candidate genes of feed utilization in Dorper and Small Tail Han Crossbred sheep. Small Rumin Res 2022. [DOI: 10.1016/j.smallrumres.2022.106788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
10
|
Verma P, Shakya M. Machine learning model for predicting Major Depressive Disorder using RNA-Seq data: optimization of classification approach. Cogn Neurodyn 2022; 16:443-453. [PMID: 35401859 PMCID: PMC8934793 DOI: 10.1007/s11571-021-09724-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 08/28/2021] [Accepted: 09/12/2021] [Indexed: 10/20/2022] Open
Abstract
Considering human brain disorders, Major Depressive Disorder (MDD) is seen as a lethal disease in which a person goes to the extent of suicidal behavior. Physical detection of MDD patients is less precise but machine learning can aid in improved classification of disease. The present research included three RNA-seq data classes to classify DEGs and then train key gene data using a random forest machine learning method. The three classes in the sample are 29 CON (sudden death healthy control), 21 MDD-S (a Major Depressive Disorder Suicide) being included in the second group, and 9 MDD (non-suicides MDD) which are included in the third group. With PCA analysis, 99 key genes were obtained. 47.1% data variability is given by these 99 genes. The model training of 99 genes indicated improved classification. The RF classification model has an accuracy of 61.11% over test data and 97.56% over train data. It was also noticed that the RF method offered greater accuracy than the KNN method. 99 genes were annotated using DAVID and ClueGo packages. Some of the important pathways and function observed in the study were glutamatergic synapse, GABA receptor activation, long-term synaptic depression, and morphine addiction. Out Of 99 genes, four genes, namely DLGAP1, GNG2, GRIA1, and GRIA4, were found to be predominantly involved in the glutamatergic synapse pathway. Another substantial link was observed in the GABA receptor activation involving the following two genes, GABBR2 and GNG2. Also, the genes found responsible for long-term synaptic depression were GRIA1, MAPT, and PTEN. There was another finding of morphine addiction which comprises three genes, namely GABBR2, GNG2, and PDE4D. For massive datasets, this approach will act as the gold standard. The cases of CON, MDD, and MDD-S are physically distinct. There was dysregulation in the expression level of 12 genes. The 12 genes act as a possible biomarker for Major Depressive Disorder and open up a new path for depressed subjects to explore further.
Collapse
Affiliation(s)
- Pragya Verma
- Department of Mathematics, Bioinformatics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, 462003 India
| | - Madhvi Shakya
- Department of Mathematics, Bioinformatics and Computer Applications, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, 462003 India
| |
Collapse
|
11
|
Krishnakumar R, Ruffing AM. OperonSEQer: A set of machine-learning algorithms with threshold voting for detection of operon pairs using short-read RNA-sequencing data. PLoS Comput Biol 2022; 18:e1009731. [PMID: 34986143 PMCID: PMC8765615 DOI: 10.1371/journal.pcbi.1009731] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 01/18/2022] [Accepted: 12/07/2021] [Indexed: 11/19/2022] Open
Abstract
Operon prediction in prokaryotes is critical not only for understanding the regulation of endogenous gene expression, but also for exogenous targeting of genes using newly developed tools such as CRISPR-based gene modulation. A number of methods have used transcriptomics data to predict operons, based on the premise that contiguous genes in an operon will be expressed at similar levels. While promising results have been observed using these methods, most of them do not address uncertainty caused by technical variability between experiments, which is especially relevant when the amount of data available is small. In addition, many existing methods do not provide the flexibility to determine the stringency with which genes should be evaluated for being in an operon pair. We present OperonSEQer, a set of machine learning algorithms that uses the statistic and p-value from a non-parametric analysis of variance test (Kruskal-Wallis) to determine the likelihood that two adjacent genes are expressed from the same RNA molecule. We implement a voting system to allow users to choose the stringency of operon calls depending on whether your priority is high recall or high specificity. In addition, we provide the code so that users can retrain the algorithm and re-establish hyperparameters based on any data they choose, allowing for this method to be expanded as additional data is generated. We show that our approach detects operon pairs that are missed by current methods by comparing our predictions to publicly available long-read sequencing data. OperonSEQer therefore improves on existing methods in terms of accuracy, flexibility, and adaptability. Bacteria and archaea, single-cell organisms collectively known as prokaryotes, live in all imaginable environments and comprise the majority of living organisms on this planet. Prokaryotes play a critical role in the homeostasis of multicellular organisms (such as animals and plants) and ecosystems. In addition, bacteria can be pathogenic and cause a variety of diseases in these same hosts and ecosystems. In short, understanding the biology and molecular functions of bacteria and archaea and devising mechanisms to engineer and optimize their properties are critical scientific endeavors with significant implications in healthcare, agriculture, manufacturing, and climate science among others. One major molecular difference between unicellular and multicellular organisms is the way they express genes–multicellular organisms make individual RNA molecules for each gene while, prokaryotes express operons (i.e., a group of genes coding functionally related proteins) in contiguous polycistronic RNA molecules. Understanding which genes exist within operons is critical for elucidating basic biology and for engineering organisms. In this work, we use a combination of statistical and machine learning-based methods to use next-generation sequencing data to predict operon structure across a range of prokaryotes. Our method provides an easily implemented, robust, accurate, and flexible way to determine operon structure in an organism-agnostic manner using readily available data.
Collapse
Affiliation(s)
- Raga Krishnakumar
- Systems Biology Department, Sandia National Laboratories, Livermore, California, United States of America
- * E-mail:
| | - Anne M. Ruffing
- Molecular and Microbiology Department, Sandia National Laboratories, Albuquerque, New Mexico, United States of America
| |
Collapse
|
12
|
Passamonti MM, Somenzi E, Barbato M, Chillemi G, Colli L, Joost S, Milanesi M, Negrini R, Santini M, Vajana E, Williams JL, Ajmone-Marsan P. The Quest for Genes Involved in Adaptation to Climate Change in Ruminant Livestock. Animals (Basel) 2021; 11:2833. [PMID: 34679854 PMCID: PMC8532622 DOI: 10.3390/ani11102833] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 09/21/2021] [Accepted: 09/23/2021] [Indexed: 12/14/2022] Open
Abstract
Livestock radiated out from domestication centres to most regions of the world, gradually adapting to diverse environments, from very hot to sub-zero temperatures and from wet and humid conditions to deserts. The climate is changing; generally global temperature is increasing, although there are also more extreme cold periods, storms, and higher solar radiation. These changes impact livestock welfare and productivity. This review describes advances in the methodology for studying livestock genomes and the impact of the environment on animal production, giving examples of discoveries made. Sequencing livestock genomes has facilitated genome-wide association studies to localize genes controlling many traits, and population genetics has identified genomic regions under selection or introgressed from one breed into another to improve production or facilitate adaptation. Landscape genomics, which combines global positioning and genomics, has identified genomic features that enable animals to adapt to local environments. Combining the advances in genomics and methods for predicting changes in climate is generating an explosion of data which calls for innovations in the way big data sets are treated. Artificial intelligence and machine learning are now being used to study the interactions between the genome and the environment to identify historic effects on the genome and to model future scenarios.
Collapse
Affiliation(s)
- Matilde Maria Passamonti
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Elisa Somenzi
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Mario Barbato
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Giovanni Chillemi
- Department for Innovation in Biological, Agro-Food and Forest Systems–DIBAF, Università Della Tuscia, Via S. Camillo de Lellis snc, 01100 Viterbo, Italy; (G.C.); (M.M.)
| | - Licia Colli
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
- Research Center on Biodiversity and Ancient DNA—BioDNA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy
| | - Stéphane Joost
- Laboratory of Geographic Information Systems (LASIG), School of Architecture, Civil and Environmental Engineering (ENAC), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; (S.J.); (E.V.)
| | - Marco Milanesi
- Department for Innovation in Biological, Agro-Food and Forest Systems–DIBAF, Università Della Tuscia, Via S. Camillo de Lellis snc, 01100 Viterbo, Italy; (G.C.); (M.M.)
| | - Riccardo Negrini
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Monia Santini
- Impacts on Agriculture, Forests and Ecosystem Services (IAFES) Division, Fondazione Centro Euro-Mediterraneo Sui Cambiamenti Climatici (CMCC), Viale Trieste 127, 01100 Viterbo, Italy;
| | - Elia Vajana
- Laboratory of Geographic Information Systems (LASIG), School of Architecture, Civil and Environmental Engineering (ENAC), Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland; (S.J.); (E.V.)
| | - John Lewis Williams
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
| | - Paolo Ajmone-Marsan
- Department of Animal Science, Food and Nutrition—DIANA, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy; (M.M.P.); (E.S.); (M.B.); (L.C.); (R.N.); (J.L.W.)
- Nutrigenomics and Proteomics Research Center—PRONUTRIGEN, Università Cattolica del Sacro Cuore, Via Emilia Parmense, 84, 29122 Piacenza, Italy
| |
Collapse
|
13
|
Messad F, Louveau I, Renaudeau D, Gilbert H, Gondret F. Analysis of merged whole blood transcriptomic datasets to identify circulating molecular biomarkers of feed efficiency in growing pigs. BMC Genomics 2021; 22:501. [PMID: 34217223 PMCID: PMC8254903 DOI: 10.1186/s12864-021-07843-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 06/24/2021] [Indexed: 11/10/2022] Open
Abstract
Background Improving feed efficiency (FE) is an important goal due to its economic and environmental significance for farm animal production. The FE phenotype is complex and based on the measurements of the individual feed consumption and average daily gain during a test period, which is costly and time-consuming. The identification of reliable predictors of FE is a strategy to reduce phenotyping efforts. Results Gene expression data of the whole blood from three independent experiments were combined and analyzed by machine learning algorithms to propose molecular biomarkers of FE traits in growing pigs. These datasets included Large White pigs from two lines divergently selected for residual feed intake (RFI), a measure of net FE, and in which individual feed conversion ratio (FCR) and blood microarray data were available. Merging the three datasets allowed considering FCR values (Mean = 2.85; Min = 1.92; Max = 5.00) for a total of n = 148 pigs, with a large range of body weight (15 to 115 kg) and different test period duration (2 to 9 weeks). Random forest (RF) and gradient tree boosting (GTB) were applied on the whole blood transcripts (26,687 annotated molecular probes) to identify the most important variables for binary classification on RFI groups and a quantitative prediction of FCR, respectively. The dataset was split into learning (n = 74) and validation sets (n = 74). With iterative steps for variable selection, about three hundred’s (328 to 391) molecular probes participating in various biological pathways, were identified as important predictors of RFI or FCR. With the GTB algorithm, simpler models were proposed combining 34 expressed unique genes to classify pigs into RFI groups (100% of success), and 25 expressed unique genes to predict FCR values (R2 = 0.80, RMSE = 8%). The accuracy performance of RF models was slightly lower in classification and markedly lower in regression. Conclusion From small subsets of genes expressed in the whole blood, it is possible to predict the binary class and the individual value of feed efficiency. These predictive models offer good perspectives to identify animals with higher feed efficiency in precision farming applications. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07843-4.
Collapse
Affiliation(s)
- Farouk Messad
- PEGASE, INRAE, Institut Agro, 35590, Saint-Gilles, France
| | | | | | - Hélène Gilbert
- GenPhySE, INRAE, INP-ENVT, 31326, Castanet Tolosan, France
| | | |
Collapse
|
14
|
Chen W, Alexandre PA, Ribeiro G, Fukumasu H, Sun W, Reverter A, Li Y. Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data. Front Genet 2021; 12:619857. [PMID: 33664767 PMCID: PMC7921797 DOI: 10.3389/fgene.2021.619857] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/15/2021] [Indexed: 12/22/2022] Open
Abstract
Machine learning (ML) methods have shown promising results in identifying genes when applied to large transcriptome datasets. However, no attempt has been made to compare the performance of combining different ML methods together in the prediction of high feed efficiency (HFE) and low feed efficiency (LFE) animals. In this study, using RNA sequencing data of five tissues (adrenal gland, hypothalamus, liver, skeletal muscle, and pituitary) from nine HFE and nine LFE Nellore bulls, we evaluated the prediction accuracies of five analytical methods in classifying FE animals. These included two conventional methods for differential gene expression (DGE) analysis (t-test and edgeR) as benchmarks, and three ML methods: Random Forests (RFs), Extreme Gradient Boosting (XGBoost), and combination of both RF and XGBoost (RX). Utility of a subset of candidate genes selected from each method for classification of FE animals was assessed by support vector machine (SVM). Among all methods, the smallest subsets of genes (117) identified by RX outperformed those chosen by t-test, edgeR, RF, or XGBoost in classification accuracy of animals. Gene co-expression network analysis confirmed the interactivity existing among these genes and their relevance within the network related to their prediction ranking based on ML. The results demonstrate a great potential for applying a combination of ML methods to large transcriptome datasets to identify biologically important genes for accurately classifying FE animals.
Collapse
Affiliation(s)
- Weihao Chen
- College of Animal Science and Technology, Yangzhou University, Yangzhou, China.,CSIRO Agriculture and Food, St Lucia, QLD, Australia
| | | | - Gabriela Ribeiro
- School of Animal Science and Food Engineering, University of São Paulo, Pirassununga, Brazil
| | - Heidge Fukumasu
- School of Animal Science and Food Engineering, University of São Paulo, Pirassununga, Brazil
| | - Wei Sun
- College of Animal Science and Technology, Yangzhou University, Yangzhou, China.,Institute of Agriculture Science and Technology Development, Yangzhou University, Yangzhou, China.,Joint International Research Laboratory of Agriculture and Agri-Product Safety of Ministry of Education of China, Yangzhou University, Yangzhou, China
| | | | - Yutao Li
- CSIRO Agriculture and Food, St Lucia, QLD, Australia
| |
Collapse
|
15
|
Crisci E, Moroldo M, Vu Manh TP, Mohammad A, Jourdren L, Urien C, Bouguyon E, Bordet E, Bevilacqua C, Bourge M, Pezant J, Pléau A, Boulesteix O, Schwartz I, Bertho N, Giuffra E. Distinctive Cellular and Metabolic Reprogramming in Porcine Lung Mononuclear Phagocytes Infected With Type 1 PRRSV Strains. Front Immunol 2020; 11:588411. [PMID: 33365028 PMCID: PMC7750501 DOI: 10.3389/fimmu.2020.588411] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 10/19/2020] [Indexed: 01/17/2023] Open
Abstract
Porcine reproductive and respiratory syndrome (PRRS) has an extensive impact on pig production. The causative virus (PRRSV) is divided into two species, PRRSV-1 (European origin) and PRRSV-2 (North American origin). Within PRRSV-1, PRRSV-1.3 strains, such as Lena, are more pathogenic than PRRSV-1.1 strains, such as Flanders 13 (FL13). To date, the molecular interactions of PRRSV with primary lung mononuclear phagocyte (MNP) subtypes, including conventional dendritic cells types 1 (cDC1) and 2 (cDC2), monocyte-derived DCs (moDC), and pulmonary intravascular macrophages (PIM), have not been thoroughly investigated. Here, we analyze the transcriptome profiles of in vivo FL13-infected parenchymal MNP subpopulations and of in vitro FL13- and Lena-infected parenchymal MNP. The cell-specific expression profiles of in vivo sorted cells correlated with their murine counterparts (AM, cDC1, cDC2, moDC) with the exception of PIM. Both in vivo and in vitro, FL13 infection altered the expression of a low number of host genes, and in vitro infection with Lena confirmed the higher ability of this strain to modulate host response. Machine learning (ML) and gene set enrichment analysis (GSEA) unraveled additional relevant genes and pathways modulated by FL13 infection that were not identified by conventional analyses. GSEA increased the cellular pathways enriched in the FL13 data set, but ML allowed a more complete comprehension of functional profiles during FL13 in vitro infection. Data indicates that cellular reprogramming differs upon Lena and FL13 infection and that the latter might keep antiviral and inflammatory macrophage/DC functions silent. Although the slow replication kinetics of FL13 likely contribute to differences in cellular gene expression, the data suggest distinct mechanisms of interaction of the two viruses with the innate immune system during early infection.
Collapse
Affiliation(s)
- Elisa Crisci
- Université Paris Saclay, INRAE, AgroParisTech, GABI, Jouy-en-Josas, France
| | - Marco Moroldo
- Université Paris Saclay, INRAE, AgroParisTech, GABI, Jouy-en-Josas, France
| | | | - Ammara Mohammad
- Genomics Core Facility, Institut de Biologie de l'ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Laurent Jourdren
- Genomics Core Facility, Institut de Biologie de l'ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Celine Urien
- Virologie et Immunologie Moléculaire, INRAE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Edwige Bouguyon
- Virologie et Immunologie Moléculaire, INRAE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Elise Bordet
- Virologie et Immunologie Moléculaire, INRAE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Claudia Bevilacqua
- Université Paris Saclay, INRAE, AgroParisTech, GABI, Jouy-en-Josas, France
| | - Mickael Bourge
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Jérémy Pezant
- Plate-Forme d'Infectiologie Expérimentale-PFIE-UE1277, Centre Val de Loire, INRAE, Nouzilly, France
| | - Alexis Pléau
- Plate-Forme d'Infectiologie Expérimentale-PFIE-UE1277, Centre Val de Loire, INRAE, Nouzilly, France
| | - Olivier Boulesteix
- Plate-Forme d'Infectiologie Expérimentale-PFIE-UE1277, Centre Val de Loire, INRAE, Nouzilly, France
| | - Isabelle Schwartz
- Virologie et Immunologie Moléculaire, INRAE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Nicolas Bertho
- Virologie et Immunologie Moléculaire, INRAE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Elisabetta Giuffra
- Université Paris Saclay, INRAE, AgroParisTech, GABI, Jouy-en-Josas, France
| |
Collapse
|
16
|
|
17
|
Shetta O, Niranjan M. Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality. ROYAL SOCIETY OPEN SCIENCE 2020; 7:190714. [PMID: 32257299 PMCID: PMC7062061 DOI: 10.1098/rsos.190714] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Accepted: 12/12/2019] [Indexed: 06/11/2023]
Abstract
The application of machine learning to inference problems in biology is dominated by supervised learning problems of regression and classification, and unsupervised learning problems of clustering and variants of low-dimensional projections for visualization. A class of problems that have not gained much attention is detecting outliers in datasets, arising from reasons such as gross experimental, reporting or labelling errors. These could also be small parts of a dataset that are functionally distinct from the majority of a population. Outlier data are often identified by considering the probability density of normal data and comparing data likelihoods against some threshold. This classical approach suffers from the curse of dimensionality, which is a serious problem with omics data which are often found in very high dimensions. We develop an outlier detection method based on structured low-rank approximation methods. The objective function includes a regularizer based on neighbourhood information captured in the graph Laplacian. Results on publicly available genomic data show that our method robustly detects outliers whereas a density-based method fails even at moderate dimensions. Moreover, we show that our method has better clustering and visualization performance on the recovered low-dimensional projection when compared with popular dimensionality reduction techniques.
Collapse
Affiliation(s)
- Omar Shetta
- Author for correspondence: Omar Shetta e-mail:
| | | |
Collapse
|
18
|
Ramayo-Caldas Y, Mármol-Sánchez E, Ballester M, Sánchez JP, González-Prendes R, Amills M, Quintanilla R. Integrating genome-wide co-association and gene expression to identify putative regulators and predictors of feed efficiency in pigs. Genet Sel Evol 2019; 51:48. [PMID: 31477014 PMCID: PMC6721172 DOI: 10.1186/s12711-019-0490-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Accepted: 08/19/2019] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Feed efficiency (FE) has a major impact on the economic sustainability of pig production. We used a systems-based approach that integrates single nucleotide polymorphism (SNP) co-association and gene-expression data to identify candidate genes, biological pathways, and potential predictors of FE in a Duroc pig population. RESULTS We applied an association weight matrix (AWM) approach to analyse the results from genome-wide association studies (GWAS) for nine FE associated and production traits using 31K SNPs by defining residual feed intake (RFI) as the target phenotype. The resulting co-association network was formed by 829 SNPs. Additive effects of this SNP panel explained 61% of the phenotypic variance of RFI, and the resulting phenotype prediction accuracy estimated by cross-validation was 0.65 (vs. 0.20 using pedigree-based best linear unbiased prediction and 0.12 using the 31K SNPs). Sixty-eight transcription factor (TF) genes were identified in the co-association network; based on the lossless approach, the putative main regulators were COPS5, GTF2H5, RUNX1, HDAC4, ESR1, USP16, SMARCA2 and GTF2F2. Furthermore, gene expression data of the gluteus medius muscle was explored through differential expression and multivariate analyses. A list of candidate genes showing functional and/or structural associations with FE was elaborated based on results from both AWM and gene expression analyses, and included the aforementioned TF genes and other ones that have key roles in metabolism, e.g. ESRRG, RXRG, PPARGC1A, TCF7L2, LHX4, MAML2, NFATC3, NFKBIZ, TCEA1, CDCA7L, LZTFL1 or CBFB. The most enriched biological pathways in this list were associated with behaviour, immunity, nervous system, and neurotransmitters, including melatonin, glutamate receptor, and gustation pathways. Finally, an expression GWAS allowed identifying 269 SNPs associated with the candidate genes' expression (eSNPs). Addition of these eSNPs to the AWM panel of 829 SNPs did not improve the accuracy of genomic predictions. CONCLUSIONS Candidate genes that have a direct or indirect effect on FE-related traits belong to various biological processes that are mainly related to immunity, behaviour, energy metabolism, and the nervous system. The pituitary gland, hypothalamus and thyroid axis, and estrogen signalling play fundamental roles in the regulation of FE in pigs. The 829 selected SNPs explained 61% of the phenotypic variance of RFI, which constitutes a promising perspective for applying genetic selection on FE relying on molecular-based prediction.
Collapse
Affiliation(s)
- Yuliaxis Ramayo-Caldas
- 0000 0001 1943 6646grid.8581.4Animal Breeding and Genetics Program, Institute for Research and Technology in Food and Agriculture (IRTA), Torre Marimon, 08140 Caldes de Montbui, Spain
| | - Emilio Mármol-Sánchez
- grid.7080.fDepartment of Animal Genetics, Centre for Research in Agricultural Genomics (CRAG), CSCIC-IRTA-UAB-UB, Campus de LA Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - Maria Ballester
- 0000 0001 1943 6646grid.8581.4Animal Breeding and Genetics Program, Institute for Research and Technology in Food and Agriculture (IRTA), Torre Marimon, 08140 Caldes de Montbui, Spain
| | - Juan Pablo Sánchez
- 0000 0001 1943 6646grid.8581.4Animal Breeding and Genetics Program, Institute for Research and Technology in Food and Agriculture (IRTA), Torre Marimon, 08140 Caldes de Montbui, Spain
| | - Rayner González-Prendes
- grid.7080.fDepartment of Animal Genetics, Centre for Research in Agricultural Genomics (CRAG), CSCIC-IRTA-UAB-UB, Campus de LA Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - Marcel Amills
- grid.7080.fDepartment of Animal Genetics, Centre for Research in Agricultural Genomics (CRAG), CSCIC-IRTA-UAB-UB, Campus de LA Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
- grid.7080.fDepartament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - Raquel Quintanilla
- 0000 0001 1943 6646grid.8581.4Animal Breeding and Genetics Program, Institute for Research and Technology in Food and Agriculture (IRTA), Torre Marimon, 08140 Caldes de Montbui, Spain
| |
Collapse
|
19
|
Messad F, Louveau I, Koffi B, Gilbert H, Gondret F. Investigation of muscle transcriptomes using gradient boosting machine learning identifies molecular predictors of feed efficiency in growing pigs. BMC Genomics 2019; 20:659. [PMID: 31419934 PMCID: PMC6697907 DOI: 10.1186/s12864-019-6010-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Accepted: 07/30/2019] [Indexed: 01/09/2023] Open
Abstract
Background Improving feed efficiency (FE) is a major challenge in pig production. This complex trait is characterized by a high variability. Therefore, the identification of predictors of FE may be a relevant strategy to reduce phenotyping efforts in breeding and selection programs. The aim of this study was to investigate the suitability of expressed muscle genes in prediction of FE traits in growing pigs. The approach considered different transcriptomics experiments to cover a large range of FE values and identify reliable predictors. Results Microarrays data were obtained from longissimus muscles of two lines divergently selected for residual feed intake (RFI). Pigs (n = 71) from three experiments belonged to generations 6 to 8 of selection, were fed either a diet with a standard composition or a diet rich in fiber and lipids, received feed ad libitum or at restricted level, and weighed between 80 and 115 kg at slaughter. For each pig, breeding value for RFI was estimated (RFI-BV), and feed conversion ratio (FCR) and energy-based feed conversion ratio (FCRe) were calculated during the test periods. Gradient boosting algorithms were used on the merged muscle transcriptomes to identify very important predictors of FE traits. About 20,405 annotated molecular probes were commonly expressed in longissimus muscle across experiments. Six to 267 expressed muscle genes covering a variety of biological processes were found as important predictors for RFI-BV (R2 = 0.63–0.65), FCR (R2 = 0.61–0.70) and FCRe (R2 = 0.49–0.52). The error of prediction was less than 8% for FCR. Altogether, 56 predictors were common to RFI-BV and FCR. Expression levels of 24 target genes were further measured by qPCR. Linear regression confirmed the good accuracy of combining mRNA levels of these genes to fit FE traits (RFI-BV: R2 = 0.73, FRC: R2 = 0.76; FCRe: R2 = 0.75). Stepwise regression procedure highlighted 10 genes (FKBP5, MUM1, AKAP12, FYN, TMED3, PHKB, TGF, SOCS6, ILR4, and FRAS1) in a linear combination predicting FCR and FCRe. In addition, FKBP5 and expression levels of five other genes (IGF2, SERINC3, CSRNP3, EZR and RPL16) significantly contributed to RFI-BV. Conclusion It was possible to identify few genes expressed in muscle that might be reliable predictors of feed efficiency. Electronic supplementary material The online version of this article (10.1186/s12864-019-6010-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Farouk Messad
- Pegase, INRA, Agrocampus Ouest, 35590, Saint-Gilles, France
| | | | - Basile Koffi
- Pegase, INRA, Agrocampus Ouest, 35590, Saint-Gilles, France
| | | | | |
Collapse
|