1
|
Wang G, Lai H, Bi S, Guo D, Zhao X, Chen X, Liu S, Liu X, Su Y, Yi H, Li G. ddRAD‐Seq
reveals evolutionary insights into population differentiation and the cryptic phylogeography of
Hyporhamphus intermedius
in Mainland China. Ecol Evol 2022; 12:e9053. [PMID: 35813915 PMCID: PMC9251877 DOI: 10.1002/ece3.9053] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 05/28/2022] [Accepted: 06/08/2022] [Indexed: 11/12/2022] Open
Abstract
Species differentiation and local adaptation in heterogeneous environments have attracted much attention, although little is known about the mechanisms involved. Hyporhamphus intermedius is an anadromous, brackish‐water halfbeak that is widely distributed in coastal areas and hyperdiverse freshwater systems in China, making it an interesting model for research on phylogeography and local adaptation. Here, 156 individuals were sampled at eight sites from heterogeneous aquatic habitats to examine environmental and genetic contributions to phenotypic divergence. Using double‐digest restriction‐site‐associated DNA sequencing (ddRAD‐Seq) in the specimens from the different watersheds, 5498 single nucleotide polymorphisms (SNPs) were found among populations, with obvious population differentiation. We find that present‐day Mainland China populations are structured into distinct genetic clusters stretching from southern and northern ancestries, mirroring geography. Following a transplant event in Plateau Lakes, there were virtually no variations of genetic diversity occurred in two populations, despite the fact two main splits were unveiled in the demographic history. Additionally, dorsal, and anal fin traits varied widely between the southern group and the others, which highlighted previously unrecognized lineages. We then explore genotype–phenotype‐environment associations and predict candidate loci. Subgroup ranges appeared to correspond to geographic regions with heterogeneous hydrological factors, indicating that these features are likely important drivers of diversification. Accordingly, we conclude that genetic and phenotypic polymorphism and a moderate amount of genetic differentiation occurred, which might be ascribed to population subdivision, and the impact of abiotic factors.
Collapse
Affiliation(s)
- Gongpei Wang
- Guangdong Province Key Laboratory for Aquatic Economic Animals State Key Laboratory of Biocontrol School of Life Sciences Sun Yat‐Sen University Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) Guangzhou China
- State Key Laboratory of Ophthalmology Zhongshan Ophthalmic Center Sun Yat‐Sen University Guangzhou China
- Guangdong Provincial Engineering Technology Research Center for Healthy Breeding of Important Economic Fish Guangzhou China
| | - Han Lai
- Guangdong Province Key Laboratory for Aquatic Economic Animals State Key Laboratory of Biocontrol School of Life Sciences Sun Yat‐Sen University Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) Guangzhou China
- Guangdong Provincial Engineering Technology Research Center for Healthy Breeding of Important Economic Fish Guangzhou China
| | - Sheng Bi
- Guangdong Province Key Laboratory for Aquatic Economic Animals State Key Laboratory of Biocontrol School of Life Sciences Sun Yat‐Sen University Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) Guangzhou China
- Guangdong Provincial Engineering Technology Research Center for Healthy Breeding of Important Economic Fish Guangzhou China
| | - Dingli Guo
- Guangdong Province Key Laboratory for Aquatic Economic Animals State Key Laboratory of Biocontrol School of Life Sciences Sun Yat‐Sen University Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) Guangzhou China
- Guangdong Provincial Engineering Technology Research Center for Healthy Breeding of Important Economic Fish Guangzhou China
| | - Xiaopin Zhao
- Guangdong Province Key Laboratory for Aquatic Economic Animals State Key Laboratory of Biocontrol School of Life Sciences Sun Yat‐Sen University Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) Guangzhou China
- Guangdong Provincial Engineering Technology Research Center for Healthy Breeding of Important Economic Fish Guangzhou China
| | - Xiaoli Chen
- Guangdong Province Key Laboratory for Aquatic Economic Animals State Key Laboratory of Biocontrol School of Life Sciences Sun Yat‐Sen University Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) Guangzhou China
- Guangdong Provincial Engineering Technology Research Center for Healthy Breeding of Important Economic Fish Guangzhou China
| | - Shuang Liu
- Guangdong Province Key Laboratory for Aquatic Economic Animals State Key Laboratory of Biocontrol School of Life Sciences Sun Yat‐Sen University Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) Guangzhou China
- Guangdong Provincial Engineering Technology Research Center for Healthy Breeding of Important Economic Fish Guangzhou China
| | - Xuange Liu
- Guangdong Province Key Laboratory for Aquatic Economic Animals State Key Laboratory of Biocontrol School of Life Sciences Sun Yat‐Sen University Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) Guangzhou China
- Guangdong Provincial Engineering Technology Research Center for Healthy Breeding of Important Economic Fish Guangzhou China
| | - Yuqin Su
- Guangdong Province Key Laboratory for Aquatic Economic Animals State Key Laboratory of Biocontrol School of Life Sciences Sun Yat‐Sen University Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) Guangzhou China
- Guangdong Provincial Engineering Technology Research Center for Healthy Breeding of Important Economic Fish Guangzhou China
| | - Huadong Yi
- Guangdong Province Key Laboratory for Aquatic Economic Animals State Key Laboratory of Biocontrol School of Life Sciences Sun Yat‐Sen University Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) Guangzhou China
- Guangdong Provincial Engineering Technology Research Center for Healthy Breeding of Important Economic Fish Guangzhou China
| | - Guifeng Li
- Guangdong Province Key Laboratory for Aquatic Economic Animals State Key Laboratory of Biocontrol School of Life Sciences Sun Yat‐Sen University Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) Guangzhou China
- Guangdong Provincial Engineering Technology Research Center for Healthy Breeding of Important Economic Fish Guangzhou China
| |
Collapse
|
2
|
Palaiokostas C, Anjum A, Jeuthe H, Kurta K, Lopes Pinto F, Koning DJ. A genomic‐based vision on the genetic diversity and key performance traits in selectively bred Arctic charr (
Salvelinus alpinus
). Evol Appl 2021; 15:565-577. [PMID: 35505879 PMCID: PMC9046918 DOI: 10.1111/eva.13261] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 04/19/2021] [Accepted: 05/29/2021] [Indexed: 12/25/2022] Open
Abstract
Routine implementation of genomic information for guiding selection decisions is not yet common in the majority of aquaculture species. Reduced representation sequencing approaches offer a cost‐effective solution for obtaining genome‐wide information in species with a limited availability of genomic resources. In the current study, we implemented double‐digest restriction site‐associated DNA sequencing (ddRAD‐seq) on an Arctic charr strain with the longest known history of selection (approximately 40 years) aiming to improve selection decisions. In total, 1730 animals reared at four different farms in Sweden and spanning from year classes 2013–2017 were genotyped using ddRAD‐seq. Approximately 5000 single nucleotide polymorphisms (SNPs) were identified, genetic diversity‐related metrics were estimated, and genome‐wide association studies (GWAS) for body length at different time points and age of sexual maturation were conducted. Low genetic differentiation amongst animals from the different farms was observed based on both the results from pairwise Fst values and principal component analysis (PCA). The existence of associations was investigated between the mean genome‐wide heterozygosity of each full‐sib family (year class 2017) and the corresponding inbreeding coefficient or survival to the eyed stage. A moderate correlation (−0.33) was estimated between the mean observed heterozygosity of each full‐sib family and the corresponding inbreeding coefficient, while no linear association was obtained with the survival to the eyed stage. GWAS did not detect loci with major effect for any of the studied traits. However, genomic regions explaining more than 1% of the additive genetic variance for either studied traits were suggested across 14 different chromosomes. Overall, key insights valuable for future selection decisions of Arctic charr have been obtained, suggesting ddRAD as an attractive genotyping platform for obtaining genome‐wide information in a cost‐effective manner.
Collapse
Affiliation(s)
- Christos Palaiokostas
- Department of Animal Breeding and Genetics Swedish University of Agricultural Sciences Uppsala Sweden
| | - Anam Anjum
- Department of Animal Breeding and Genetics Swedish University of Agricultural Sciences Uppsala Sweden
| | - Henrik Jeuthe
- Department of Animal Breeding and Genetics Swedish University of Agricultural Sciences Uppsala Sweden
- Aquaculture Center North Kälarne Sweden
| | - Khrystyna Kurta
- Department of Animal Breeding and Genetics Swedish University of Agricultural Sciences Uppsala Sweden
| | - Fernando Lopes Pinto
- Department of Animal Breeding and Genetics Swedish University of Agricultural Sciences Uppsala Sweden
| | - Dirk Jan Koning
- Department of Animal Breeding and Genetics Swedish University of Agricultural Sciences Uppsala Sweden
| |
Collapse
|
3
|
Peñaloza C, Manousaki T, Franch R, Tsakogiannis A, Sonesson AK, Aslam ML, Allal F, Bargelloni L, Houston RD, Tsigenopoulos CS. Development and testing of a combined species SNP array for the European seabass (Dicentrarchus labrax) and gilthead seabream (Sparus aurata). Genomics 2021; 113:2096-2107. [PMID: 33933591 PMCID: PMC8276775 DOI: 10.1016/j.ygeno.2021.04.038] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 03/30/2021] [Accepted: 04/27/2021] [Indexed: 12/23/2022]
Abstract
SNP arrays are powerful tools for high-resolution studies of the genetic basis of complex traits, facilitating both selective breeding and population genomic research. The European seabass (Dicentrarchus labrax) and the gilthead seabream (Sparus aurata) are the two most important fish species for Mediterranean aquaculture. While selective breeding programmes increasingly underpin stock supply for this industry, genomic selection is not yet widespread. Genomic selection has major potential to expedite genetic gain, particularly for traits practically impossible to measure on selection candidates, such as disease resistance and fillet characteristics. The aim of our study was to design a combined-species 60 K SNP array for European seabass and gilthead seabream, and to test its performance on farmed and wild populations from numerous locations throughout the species range. To achieve this, high coverage Illumina whole-genome sequencing of pooled samples was performed for 24 populations of European seabass and 27 populations of gilthead seabream. This resulted in a database of ~20 million SNPs per species, which were then filtered to identify high-quality variants and create the final set for the development of the ‘MedFish’ SNP array. The array was then tested by genotyping a subset of the discovery populations, highlighting a high conversion rate to functioning polymorphic assays on the array (92% in seabass; 89% in seabream) and repeatability (99.4–99.7%). The platform interrogates ~30 K markers in each species, includes features such as SNPs previously shown to be associated with performance traits, and is enriched for SNPs predicted to have high functional effects on proteins. The array was demonstrated to be effective at detecting population structure across a wide range of fish populations from diverse geographical origins, and to examine the extent of haplotype sharing among Mediterranean farmed fish populations. In conclusion, the new MedFish array enables efficient and accurate high-throughput genotyping for genome-wide distributed SNPs for each fish species, and will facilitate stock management, population genomics approaches, and acceleration of selective breeding through genomic selection. Α 60 K SNP array (MedFish) was designed for European seabass and gilthead seabream from wild and domesticated populations. The array exhibited a high conversion rate (92% in seabass; 89% in seabream) and repeatability (99.4 and 99.7%). The MedFish array is expected to facilitate stock management and acceleration of selective breeding via genomic selection.
Collapse
Affiliation(s)
- C Peñaloza
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian EH25 9RG, UK
| | - T Manousaki
- Hellenic Centre for Marine Research, Thalassocosmos Gournes Pediados, 71500 Irakleio, Crete, Greece
| | - R Franch
- Padova University, Via Ugo Bassi, 58yB, I-35131 Padova, Italy
| | - A Tsakogiannis
- Hellenic Centre for Marine Research, Thalassocosmos Gournes Pediados, 71500 Irakleio, Crete, Greece
| | - A K Sonesson
- Nofima, Norwegian Institute of Food, Fisheries and Aquaculture Research, PO Box 210, N-1432 Ås, Norway
| | - M L Aslam
- Nofima, Norwegian Institute of Food, Fisheries and Aquaculture Research, PO Box 210, N-1432 Ås, Norway
| | - F Allal
- MARBEC, University of Montpellier, Ifremer, CNRS, IRD, 34250 Palavas-les-Flots, France
| | - L Bargelloni
- Padova University, Via Ugo Bassi, 58yB, I-35131 Padova, Italy
| | - R D Houston
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian EH25 9RG, UK.
| | - C S Tsigenopoulos
- Hellenic Centre for Marine Research, Thalassocosmos Gournes Pediados, 71500 Irakleio, Crete, Greece.
| |
Collapse
|
4
|
Pappas F, Palaiokostas C. Genotyping Strategies Using ddRAD Sequencing in Farmed Arctic Charr ( Salvelinus alpinus). Animals (Basel) 2021; 11:899. [PMID: 33801139 PMCID: PMC8004150 DOI: 10.3390/ani11030899] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 03/13/2021] [Accepted: 03/16/2021] [Indexed: 12/17/2022] Open
Abstract
Incorporation of genomic technologies into fish breeding programs is a modern reality, promising substantial advances regarding the accuracy of selection, monitoring the genetic diversity and pedigree record verification. Single nucleotide polymorphism (SNP) arrays are the most commonly used genomic tool, but the investments required make them unsustainable for emerging species, such as Arctic charr (Salvelinus alpinus), where production volume is low. The requirement to genotype a large number of animals for breeding practices necessitates cost effective genotyping approaches. In the current study, we used double digest restriction site-associated DNA (ddRAD) sequencing of either high or low coverage to genotype Arctic charr from the Swedish national breeding program and performed analytical procedures to assess their utility in a range of tasks. SNPs were identified and used for deciphering the genetic structure of the studied population, estimating genomic relationships and implementing an association study for growth-related traits. Missing information and underestimation of heterozygosity in the low coverage set were limiting factors in genetic diversity and genomic relationship analyses, where high coverage performed notably better. On the other hand, the high coverage dataset proved to be valuable when it comes to identifying loci that are associated with phenotypic traits of interest. In general, both genotyping strategies offer sustainable alternatives to hybridization-based genotyping platforms and show potential for applications in aquaculture selective breeding.
Collapse
Affiliation(s)
| | - Christos Palaiokostas
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, P.O. Box 7090, 750 07 Uppsala, Sweden;
| |
Collapse
|
5
|
Abd El-Wahab MMH, Aljabri M, Sarhan MS, Osman G, Wang S, Mabrouk M, El-Shabrawi HM, Gabr AMM, Abd El-Haliem AM, O’Sullivan DM, El-Soda M. High-Density SNP-Based Association Mapping of Seed Traits in Fenugreek Reveals Homology with Clover. Genes (Basel) 2020; 11:E893. [PMID: 32764325 PMCID: PMC7464718 DOI: 10.3390/genes11080893] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 07/28/2020] [Accepted: 08/02/2020] [Indexed: 12/02/2022] Open
Abstract
Fenugreek as a self-pollinated plant is ideal for genome-wide association mapping where traits can be marked by their association with natural mutations. However, fenugreek is poorly investigated at the genomic level due to the lack of information regarding its genome. To fill this gap, we genotyped a collection of 112 genotypes with 153,881 SNPs using double digest restriction site-associated DNA sequencing. We used 38,142 polymorphic SNPs to prove the suitability of the population for association mapping. One significant SNP was associated with both seed length and seed width, and another SNP was associated with seed color. Due to the lack of a comprehensive genetic map, it is neither possible to align the newly developed markers to chromosomes nor to predict the underlying genes. Therefore, systematic targeting of those markers to homologous genomes of other legumes can overcome those problems. A BLAST search using the genomic fenugreek sequence flanking the identified SNPs showed high homology with several members of the Trifolieae tribe indicating the potential of translational approaches to improving our understanding of the fenugreek genome. Using such a comprehensively-genotyped fenugreek population is the first step towards identifying genes underlying complex traits and to underpin fenugreek marker-assisted breeding programs.
Collapse
Affiliation(s)
- Mustafa M. H. Abd El-Wahab
- Department of Agronomy, Faculty of Agriculture, Cairo University, Giza 12613, Egypt; (M.M.H.A.E.-W.); (M.M.)
| | - Maha Aljabri
- Department of Biology, Faculty of Applied Sciences, Umm Al-Qura University, Makkah 21955, Saudi Arabia; (M.A.); (G.O.)
- Research Laboratories Centre, Faculty of Applied Science, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| | - Mohamed S. Sarhan
- Environmental Studies and Research Unit, Cairo University, Giza 12613, Egypt;
| | - Gamal Osman
- Department of Biology, Faculty of Applied Sciences, Umm Al-Qura University, Makkah 21955, Saudi Arabia; (M.A.); (G.O.)
- Research Laboratories Centre, Faculty of Applied Science, Umm Al-Qura University, Makkah 21955, Saudi Arabia
- Agricultural Genetic Engineering Research Institute (AGERI), ARC, Giza 12915, Egypt
| | - Shichen Wang
- Genomics and Bioinformatics Service Texas A&M AgriLife Research, Amarillo College Station, Amarillo, TX 77845, USA;
| | - Mahmoud Mabrouk
- Department of Agronomy, Faculty of Agriculture, Cairo University, Giza 12613, Egypt; (M.M.H.A.E.-W.); (M.M.)
| | - Hattem M. El-Shabrawi
- Plant Biotechnology Department, National Research Center, Giza 12622, Egypt; (H.M.E.-S.); (A.M.M.G.)
| | - Ahmed M. M. Gabr
- Plant Biotechnology Department, National Research Center, Giza 12622, Egypt; (H.M.E.-S.); (A.M.M.G.)
| | - Ahmed M. Abd El-Haliem
- Plant Physiology, University of Amsterdam, Swammerdam Institute for Life Sciences Amsterdam, 1098 XH Amsterdam, The Netherlands;
| | - Donal M. O’Sullivan
- School of Agriculture, Policy and Development, University of Reading, Whiteknights, Reading RG6 6AR, UK;
| | - Mohamed El-Soda
- Department of Genetics, Faculty of Agriculture, Cairo University, Giza 12613, Egypt
| |
Collapse
|
6
|
Palaiokostas C, Clarke SM, Jeuthe H, Brauning R, Bilton TP, Dodds KG, McEwan JC, De Koning DJ. Application of Low Coverage Genotyping by Sequencing in Selectively Bred Arctic Charr ( Salvelinus alpinus). G3 (BETHESDA, MD.) 2020; 10:2069-2078. [PMID: 32312839 PMCID: PMC7263669 DOI: 10.1534/g3.120.401295] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 04/16/2020] [Indexed: 12/12/2022]
Abstract
Arctic charr (Salvelinus alpinus) is a species of high economic value for the aquaculture industry, and of high ecological value due to its Holarctic distribution in both marine and freshwater environments. Novel genome sequencing approaches enable the study of population and quantitative genetic parameters even on species with limited or no prior genomic resources. Low coverage genotyping by sequencing (GBS) was applied in a selected strain of Arctic charr in Sweden originating from a landlocked freshwater population. For the needs of the current study, animals from year classes 2013 (171 animals, parental population) and 2017 (759 animals; 13 full sib families) were used as a template for identifying genome wide single nucleotide polymorphisms (SNPs). GBS libraries were constructed using the PstI and MspI restriction enzymes. Approximately 14.5K SNPs passed quality control and were used for estimating a genomic relationship matrix. Thereafter a wide range of analyses were conducted in order to gain insights regarding genetic diversity and investigate the efficiency of the genomic information for parentage assignment and breeding value estimation. Heterozygosity estimates for both year classes suggested a slight excess of heterozygotes. Furthermore, FST estimates among the families of year class 2017 ranged between 0.009 - 0.066. Principal components analysis (PCA) and discriminant analysis of principal components (DAPC) were applied aiming to identify the existence of genetic clusters among the studied population. Results obtained were in accordance with pedigree records allowing the identification of individual families. Additionally, DNA parentage verification was performed, with results in accordance with the pedigree records with the exception of a putative dam where full sib genotypes suggested a potential recording error. Breeding value estimation for juvenile growth through the usage of the estimated genomic relationship matrix clearly outperformed the pedigree equivalent in terms of prediction accuracy (0.51 opposed to 0.31). Overall, low coverage GBS has proven to be a cost-effective genotyping platform that is expected to boost the selection efficiency of the Arctic charr breeding program.
Collapse
Affiliation(s)
- Christos Palaiokostas
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Box 7090, 750 07 Uppsala, Sweden,
| | - Shannon M Clarke
- Invermay Agricultural Centre, AgResearch, Private Bag 50034, Mosgiel 9053, New Zealand
| | - Henrik Jeuthe
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Box 7090, 750 07 Uppsala, Sweden
- Aquaculture Center North, Åvägen 17, 844 61 Kälarne, Sweden, and
| | - Rudiger Brauning
- Invermay Agricultural Centre, AgResearch, Private Bag 50034, Mosgiel 9053, New Zealand
| | - Timothy P Bilton
- Invermay Agricultural Centre, AgResearch, Private Bag 50034, Mosgiel 9053, New Zealand
- Department of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand
| | - Ken G Dodds
- Invermay Agricultural Centre, AgResearch, Private Bag 50034, Mosgiel 9053, New Zealand
| | - John C McEwan
- Invermay Agricultural Centre, AgResearch, Private Bag 50034, Mosgiel 9053, New Zealand
| | - Dirk-Jan De Koning
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Box 7090, 750 07 Uppsala, Sweden
| |
Collapse
|
7
|
Abstract
Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly available R as packages while offering few options. The R package MXM offers a variety of feature selection algorithms, and has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models that can be plugged into the feature selection algorithms (for example with time to event data the user can choose among Cox, Weibull, log logistic or exponential regression); c) it includes an algorithm for detecting multiple solutions (many sets of statistically equivalent features, plain speaking, two features can carry statistically equivalent information when substituting one with the other does not effect the inference or the conclusions); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R (In a 16GB RAM terminal for example, R cannot directly load data of 16GB size. By utilizing the proper package, we load the data and then perform feature selection.). In this paper, we qualitatively compare MXM with other relevant feature selection packages and discuss its advantages and disadvantages. Further, we provide a demonstration of MXM's algorithms using real high-dimensional data from various applications.
Collapse
Affiliation(s)
- Michail Tsagris
- Department of Economics, University of Crete, Rethymnon, 74100, Greece
- Department of Computer Science, University of Crete, Heraklion, Crete, 70013, Greece
- Statistical Learning Lab, Foundation of Research and Technology Hellas, Heraklion, Crete, 70013, Greece
| | - Ioannis Tsamardinos
- Department of Computer Science, University of Crete, Heraklion, Crete, 70013, Greece
- Institute of Applied and Computational Mathematics, Foundation of Research and Technology Hellas, Heraklion, Crete, 70013, Greece
- Gnosis Data Analysis (PC), Heraklion, Crete, 71305, Greece
| |
Collapse
|