1
|
An HE, Mun MH, Malik A, Kim CB. Development of a two-layer machine learning model for the forensic application of legal and illegal poppy classification based on sequence data. Forensic Sci Int Genet 2024; 71:103061. [PMID: 38820740 DOI: 10.1016/j.fsigen.2024.103061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 02/09/2024] [Accepted: 05/06/2024] [Indexed: 06/02/2024]
Abstract
Poppies are beneficial plants with a variety of applications, including medicinal, edible, ornamental, and industrial purposes. Some Papaver species are forensically significant plants because they contain opium, a narcotic substance. Internationally trafficked species of illegal poppies are being identified by DNA barcoding employing multiple markers in response to their forensic value. However, effective markers for precise species identification of legal and illegal poppies are still under discussion, with research on illegal poppies focusing on Papaver somniferum L., and species identification studies of Papaver bracteatum and Papaver setigerum DC. still lacking. As a result, in order to evaluate the performance of genetic markers and classify their DNA sequences in the genus Papaver, this study developed the first machine learning-based two-layer model, in which the first layer classifies legal and illegal poppies from the given sequence and the second layer identifies species of illegal poppies using their sequences. We constructed the dataset and investigated biological features from four markers, internal transcribed spacer 1 (ITS1), internal transcribed spacer 2 (ITS2), transfer RNA Leucine (trnL), transfer RNA Leucine - transfer RNA Phenylalanine intergenic spacer (trnL-trnF intergenic spacer) and their combination, using four machine learning algorithms, K-nearest neighbor (KNN), Naïve Bayes (NB), extreme gradient boost (XGBoost) and Random Forest (RF). According to our findings, for Layer 1 to classify legal and illegal poppies, KNN-based models using combined ITS region achieved the greatest performance of accuracy 0.846 and 0.889 using training and test sets, respectively. Additionally, for Layer 2 to identify illegal poppy species, KNN-based models using combined ITS region achieved the best performance of 0.833 and 1.000 for using training and test sets, respectively. To validate the model, the combined ITS region, which includes ITS 1 and 2 sequences, from blind poppy samples were used as a case study, with the Layer 1 correctly classifying legal and illegal poppies with over 0.830 accuracy. Layer 2 correctly identified P. setigerum DC., however, only one of the three P. somniferum L. species was accurately identified. Nevertheless, our research shows that machine learning can be used to classify and identify legal and illegal poppy species using DNA barcodes which can then be used as an efficient and effective forensic tool for improved law enforcement and a safer society.
Collapse
Affiliation(s)
- Hyung-Eun An
- Department of Biotechnology, Sangmyung University, Seoul 03016, the Republic of Korea
| | - Min-Ho Mun
- Department of Biotechnology, Sangmyung University, Seoul 03016, the Republic of Korea
| | - Adeel Malik
- Institute of Intelligence Informatics Technology, Sangmyung University, Seoul 03016, the Republic of Korea
| | - Chang-Bae Kim
- Department of Biotechnology, Sangmyung University, Seoul 03016, the Republic of Korea.
| |
Collapse
|
2
|
Kaňuková Š, Ondreičková K, Mihálik D, Kraic J. New Set of EST-STR Markers for Discrimination of Related Papaver somniferum L. Varieties. Life (Basel) 2023; 14:72. [PMID: 38255686 PMCID: PMC10820365 DOI: 10.3390/life14010072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 12/22/2023] [Accepted: 12/29/2023] [Indexed: 01/24/2024] Open
Abstract
Papaver somniferum L. is cultivated for its edible seeds and for the production of alkaloids. A serious problem in seed trade and processing is the intentional mixing of excellent food-quality seeds with non-food-grade-quality seeds. Tracking the correct or illegitimate handling of seeds requires an efficient method for discrimination and individualization of poppy varieties. As in human and animal forensics, DNA variable regions containing short tandem repeats (STRs) located either in non-coding DNA or in gene sequences (EST-STRs) are preferred markers for discrimination between genotypes. Primers designed for 10 poppy EST-STR loci not analyzed so far were tested for their discriminatory ability on a set of 23 related P. somniferum L. genotypes. Thirty-three EST-STR alleles were identified together. Their polymorphic information content (PIC) values were in the range of 0.175-0.649. The PI value varied in the range of 0.140-0.669, and the cumulative PI was 1.2 × 10-5. PIsibs values varied between 0.436 and 0.820 and the cumulative value was lower (5.0 × 10-3). All analyzed genotypes were distinguished mutually, each with its own unique EST-STR profile. These newly developed EST-STR markers more effectively discriminated P. somniferum L. genotypes, even those genotypes whose DNA profiles were previously identical.
Collapse
Affiliation(s)
- Šarlota Kaňuková
- Department of Applied Biology and Genetics, Research Institute of Plant Production, National Agricultural and Food Centre, Bratislavska cesta 122, 92168 Piestany, Slovakia; (Š.K.); (K.O.); (D.M.)
- Department of Biotechnology, Faculty of Natural Sciences, University of Ss. Cyril and Methodius, Namestie J. Herdu 2, 91701 Trnava, Slovakia
| | - Katarína Ondreičková
- Department of Applied Biology and Genetics, Research Institute of Plant Production, National Agricultural and Food Centre, Bratislavska cesta 122, 92168 Piestany, Slovakia; (Š.K.); (K.O.); (D.M.)
| | - Daniel Mihálik
- Department of Applied Biology and Genetics, Research Institute of Plant Production, National Agricultural and Food Centre, Bratislavska cesta 122, 92168 Piestany, Slovakia; (Š.K.); (K.O.); (D.M.)
- Department of Biotechnology, Faculty of Natural Sciences, University of Ss. Cyril and Methodius, Namestie J. Herdu 2, 91701 Trnava, Slovakia
| | - Ján Kraic
- Department of Applied Biology and Genetics, Research Institute of Plant Production, National Agricultural and Food Centre, Bratislavska cesta 122, 92168 Piestany, Slovakia; (Š.K.); (K.O.); (D.M.)
- Department of Biotechnology, Faculty of Natural Sciences, University of Ss. Cyril and Methodius, Namestie J. Herdu 2, 91701 Trnava, Slovakia
| |
Collapse
|
3
|
Affiliation(s)
- David Love
- United States Drug Enforcement Administration, Special Testing and Research Laboratory, USA
| | - Nicole S. Jones
- RTI International, Applied Justice Research Division, Center for Forensic Sciences, 3040 E. Cornwallis Road, Research Triangle Park, NC, 22709-2194, USA
- 70113 Street, N.W., Suite 750, Washington, DC, 20005-3967, USA
| |
Collapse
|
4
|
Zhu Y, Huang Y, Wei K, Yu J, Jiang J. Full-length transcriptome analysis of Zanthoxylum nitidum (Roxb.) DC. PeerJ 2023; 11:e15321. [PMID: 37163151 PMCID: PMC10164372 DOI: 10.7717/peerj.15321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/10/2023] [Indexed: 05/11/2023] Open
Abstract
Zanthoxylum nitidum (Roxb.) DC. (Z. nitidum) is a type of Chinese Dao-di herb, also called Liangmianzhen, which is widely used to treat arthralgia, rheumatic arthralgia, and stomach pain. However, genomic resources for Z. nitidum are still scarce. This study provides transcriptomic resources for Z. nitidum by applying single-molecule real-time (SMRT) sequencing technology. In total, 456,109 circular consensus sequencing (CCS) reads were generated with a mean length of 2,216 bp from Z. nitidum roots, old stems, young branches, leaves, flowers, and fruits. Of these total reads, 353,932 were full-length nonchimeric (FLNC) reads with an average length of 1,996 bp. A total of 16,163 transcripts with a mean length of 1,171 bp were acquired. Of these transcripts, 14,231 (88%) were successfully annotated using public databases. Across all the 16,163 transcripts, we identified 6,255 long non-coding RNAs (lncRNAs) and 22,780 simple sequence repeats (SSRs). Furthermore, 3,482 transcription factors were identified. Among the SSR loci, 1-3 nucleotide repeats were dominant, occupying 99.36% of the total SSR loci, with mono-, di-, and tri-nucleotide repeats accounting for 61.80%, 19.89%, and 5.02% of the total SSR loci, respectively. A total of 36 out of 100 randomly selected primer pairs were verified to be positive, 20 of which showed polymorphism. These findings enrich the genetic resources available for facilitating future studies and research on relevant topics such as population genetics in Z. nitidum.
Collapse
Affiliation(s)
- Yanxia Zhu
- Guangxi Key Laboratory of Medicinal Resources Protection and Genetic Improvement, Guangxi Botanical Garden of Medicinal Plants, Nanning, China
| | - Yanfen Huang
- Guangxi Key Laboratory of Medicinal Resources Protection and Genetic Improvement, Guangxi Botanical Garden of Medicinal Plants, Nanning, China
| | - Kunhua Wei
- Guangxi Key Laboratory of Medicinal Resources Protection and Genetic Improvement, Guangxi Botanical Garden of Medicinal Plants, Nanning, China
| | - Junnan Yu
- ChongQing Jinzhi Quality Certification Co., LTD, Chongqing, China
| | - Jianping Jiang
- Guangxi Key Laboratory for High-quality Formation and Utilization of Dao-di Herbs, Guangxi Botanical Garden of Medicinal Plants, Nanning, China
| |
Collapse
|
5
|
Wang L, Li F, Wang N, Gao Y, Liu K, Zhang G, Sun J. Characterization of the Dicranostigma leptopodum chloroplast genome and comparative analysis within subfamily Papaveroideae. BMC Genomics 2022; 23:794. [PMID: 36460956 PMCID: PMC9717546 DOI: 10.1186/s12864-022-09049-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 11/25/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Dicranostigma leptopodum (Maxim.) Fedde is a perennial herb with bright yellow flowers, well known as "Hongmao Cao" for its medicinal properties, and is an excellent early spring flower used in urban greening. However, its molecular genomic information remains largely unknown. Here, we sequenced and analyzed the chloroplast genome of D. leptopodum to discover its genome structure, organization, and phylogenomic position within the subfamily Papaveroideae. RESULTS The chloroplast genome size of D. leptopodum was 162,942 bp, and D. leptopodum exhibited a characteristic circular quadripartite structure, with a large single-copy (LSC) region (87,565 bp), a small single-copy (SSC) region (18,759 bp) and a pair of inverted repeat (IR) regions (28,309 bp). The D. leptopodum chloroplast genome encoded 113 genes, including 79 protein-coding genes, 30 tRNA genes, and four rRNA genes. The dynamics of the genome structures, genes, IR contraction and expansion, long repeats, and single sequence repeats exhibited similarities, with slight differences observed among the eight Papaveroideae species. In addition, seven interspace regions and three coding genes displayed highly variable divergence, signifying their potential to serve as molecular markers for phylogenetic and species identification studies. Molecular evolution analyses indicated that most of the genes were undergoing purifying selection. Phylogenetic analyses revealed that D. leptopodum formed a clade with the tribe Chelidonieae. CONCLUSIONS Our study provides detailed information on the D. leptopodum chloroplast genome, expanding the available genomic resources that may be used for future evolution and genetic diversity studies.
Collapse
Affiliation(s)
- Lei Wang
- grid.453074.10000 0000 9797 0900College of Horticulture and Plant Protection, Henan University of Science and Technology, Luoyang, 471023 Henan China
| | - Fuxing Li
- grid.453074.10000 0000 9797 0900College of Horticulture and Plant Protection, Henan University of Science and Technology, Luoyang, 471023 Henan China
| | - Ning Wang
- grid.453074.10000 0000 9797 0900College of Horticulture and Plant Protection, Henan University of Science and Technology, Luoyang, 471023 Henan China
| | - Yongwei Gao
- grid.66741.320000 0001 1456 856XLaboratory of Systematic Evolution and Biogeography of Woody Plants, School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083 China
| | - Kangjia Liu
- grid.66741.320000 0001 1456 856XLaboratory of Systematic Evolution and Biogeography of Woody Plants, School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083 China
| | - Gangmin Zhang
- grid.66741.320000 0001 1456 856XLaboratory of Systematic Evolution and Biogeography of Woody Plants, School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083 China
| | - Jiahui Sun
- grid.410318.f0000 0004 0632 3409State Key Laboratory Breeding Base of Dao‑di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700 China
| |
Collapse
|
6
|
Chang M, Kim JY, Lee H, Lee EJ, Lee WH, Moon S, Choe S, Choung CM. Development of diagnostic SNP markers and a novel SNP genotyping assay for distinguishing opium poppies. Forensic Sci Int 2022; 339:111416. [DOI: 10.1016/j.forsciint.2022.111416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 07/14/2022] [Accepted: 08/02/2022] [Indexed: 11/04/2022]
|
7
|
He Y, Chen J, Tang C, Deng Q, Guo L, Cheng Y, Li Z, Wang T, Xu J, Gao C. Genetic Diversity and Population Structure of Fusarium commune Causing Strawberry Root Rot in Southcentral China. Genes (Basel) 2022; 13:genes13050899. [PMID: 35627284 PMCID: PMC9140712 DOI: 10.3390/genes13050899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 05/10/2022] [Accepted: 05/12/2022] [Indexed: 02/04/2023] Open
Abstract
Strawberry plants and fruits are vulnerable to infections by a broad range of pathogens and pests. However, knowledge about the epidemiology of pathogens causing strawberry diseases is limited. In this study, we analyzed Fusarium commune, a major fungal pathogen causing strawberry root rot, from diseased strawberry root tissues in southcentral China. A total of 354 isolates were obtained from 11 locations that spanned about 700 km from both south to north and east to west. Multilocus genotypes of all isolates were obtained using seven polymorphic simple sequence repeat markers developed in this study. Our analyses revealed significant genetic diversity within each of the 11 local populations of F. commune. STRUCTURE analysis revealed that the optimal number of genetic populations for the 354 strains was two, with most local geographic populations containing isolates in both genetic clusters. Interestingly, many isolates showed allelic ancestry to both genetic clusters, consistent with recent hybridization between the two genetic clusters. In addition, though alleles and genotypes were frequently shared among local populations, statistically significant genetic differentiations were found among the local populations. However, the observed F. commune population genetic distances were not correlated with geographic distances. Together, our analyses suggest that populations of F. commune causing strawberry root rot are likely endemic to southcentral China, with each local population containing shared and unique genetic elements. Though the observed gene flow among geographic regions was relatively low, human activities will likely accelerate pathogen dispersals, resulting in the generation of new genotypes through mating and recombination.
Collapse
Affiliation(s)
- Yunlu He
- Institute of Bast Fiber Crops and Center of Southern Economic Crops, Chinese Academy of Agricultural Sciences, Changsha 410205, China
| | - Jia Chen
- Institute of Bast Fiber Crops and Center of Southern Economic Crops, Chinese Academy of Agricultural Sciences, Changsha 410205, China
| | - Chao Tang
- Institute of Bast Fiber Crops and Center of Southern Economic Crops, Chinese Academy of Agricultural Sciences, Changsha 410205, China
| | - Qiao Deng
- Institute of Bast Fiber Crops and Center of Southern Economic Crops, Chinese Academy of Agricultural Sciences, Changsha 410205, China
| | - Litao Guo
- Institute of Bast Fiber Crops and Center of Southern Economic Crops, Chinese Academy of Agricultural Sciences, Changsha 410205, China
| | - Yi Cheng
- Institute of Bast Fiber Crops and Center of Southern Economic Crops, Chinese Academy of Agricultural Sciences, Changsha 410205, China
| | - Zhimin Li
- Institute of Bast Fiber Crops and Center of Southern Economic Crops, Chinese Academy of Agricultural Sciences, Changsha 410205, China
| | - Tuhong Wang
- Institute of Bast Fiber Crops and Center of Southern Economic Crops, Chinese Academy of Agricultural Sciences, Changsha 410205, China
| | - Jianping Xu
- Department of Biology, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Chunsheng Gao
- Institute of Bast Fiber Crops and Center of Southern Economic Crops, Chinese Academy of Agricultural Sciences, Changsha 410205, China
| |
Collapse
|