1
|
Kang M, Kim S, Lee DB, Hong C, Hwang KB. Gene-specific machine learning for pathogenicity prediction of rare BRCA1 and BRCA2 missense variants. Sci Rep 2023; 13:10478. [PMID: 37380723 DOI: 10.1038/s41598-023-37698-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/26/2023] [Indexed: 06/30/2023] Open
Abstract
Machine learning-based pathogenicity prediction helps interpret rare missense variants of BRCA1 and BRCA2, which are associated with hereditary cancers. Recent studies have shown that classifiers trained using variants of a specific gene or a set of genes related to a particular disease perform better than those trained using all variants, due to their higher specificity, despite the smaller training dataset size. In this study, we further investigated the advantages of "gene-specific" machine learning compared to "disease-specific" machine learning. We used 1068 rare (gnomAD minor allele frequency (MAF) < 0.005) missense variants of 28 genes associated with hereditary cancers for our investigation. Popular machine learning classifiers were employed: regularized logistic regression, extreme gradient boosting, random forests, support vector machines, and deep neural networks. As features, we used MAFs from multiple populations, functional prediction and conservation scores, and positions of variants. The disease-specific training dataset included the gene-specific training dataset and was > 7 × larger. However, we observed that gene-specific training variants were sufficient to produce the optimal pathogenicity predictor if a suitable machine learning classifier was employed. Therefore, we recommend gene-specific over disease-specific machine learning as an efficient and effective method for predicting the pathogenicity of rare BRCA1 and BRCA2 missense variants.
Collapse
Affiliation(s)
- Moonjong Kang
- Research Center, Software Division, NGeneBio, Seoul, 08390, Korea
| | - Seonhwa Kim
- Research Center, Software Division, NGeneBio, Seoul, 08390, Korea
| | - Da-Bin Lee
- Department of Computer Science and Engineering, Graduate School, Soongsil University, Seoul, 06978, Korea
| | - Changbum Hong
- Research Center, Software Division, NGeneBio, Seoul, 08390, Korea.
| | - Kyu-Baek Hwang
- Department of Computer Science and Engineering, Graduate School, Soongsil University, Seoul, 06978, Korea.
| |
Collapse
|
2
|
Kim M, Hwang KB. An empirical evaluation of sampling methods for the classification of imbalanced data. PLoS One 2022; 17:e0271260. [PMID: 35901023 PMCID: PMC9333262 DOI: 10.1371/journal.pone.0271260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 06/28/2022] [Indexed: 11/18/2022] Open
Abstract
In numerous classification problems, class distribution is not balanced. For example, positive examples are rare in the fields of disease diagnosis and credit card fraud detection. General machine learning methods are known to be suboptimal for such imbalanced classification. One popular solution is to balance training data by oversampling the underrepresented (or undersampling the overrepresented) classes before applying machine learning algorithms. However, despite its popularity, the effectiveness of sampling has not been rigorously and comprehensively evaluated. This study assessed combinations of seven sampling methods and eight machine learning classifiers (56 varieties in total) using 31 datasets with varying degrees of imbalance. We used the areas under the precision-recall curve (AUPRC) and receiver operating characteristics curve (AUROC) as the performance measures. The AUPRC is known to be more informative for imbalanced classification than the AUROC. We observed that sampling significantly changed the performance of the classifier (paired t-tests P < 0.05) only for few cases (12.2% in AUPRC and 10.0% in AUROC). Surprisingly, sampling was more likely to reduce rather than improve the classification performance. Moreover, the adverse effects of sampling were more pronounced in AUPRC than in AUROC. Among the sampling methods, undersampling performed worse than others. Also, sampling was more effective for improving linear classifiers. Most importantly, we did not need sampling to obtain the optimal classifier for most of the 31 datasets. In addition, we found two interesting examples in which sampling significantly reduced AUPRC while significantly improving AUROC (paired t-tests P < 0.05). In conclusion, the applicability of sampling is limited because it could be ineffective or even harmful. Furthermore, the choice of the performance measure is crucial for decision making. Our results provide valuable insights into the effect and characteristics of sampling for imbalanced classification.
Collapse
Affiliation(s)
- Misuk Kim
- Department of Computer Science and Engineering, Graduate School, Soongsil University, Seoul, Korea
| | - Kyu-Baek Hwang
- Department of Computer Science and Engineering, Graduate School, Soongsil University, Seoul, Korea
- * E-mail:
| |
Collapse
|
3
|
Nguyen TM, Le HL, Hwang KB, Hong YC, Kim JH. Predicting High Blood Pressure Using DNA Methylome-Based Machine Learning Models. Biomedicines 2022; 10:biomedicines10061406. [PMID: 35740428 PMCID: PMC9220060 DOI: 10.3390/biomedicines10061406] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 06/06/2022] [Accepted: 06/10/2022] [Indexed: 12/12/2022] Open
Abstract
DNA methylation modification plays a vital role in the pathophysiology of high blood pressure (BP). Herein, we applied three machine learning (ML) algorithms including deep learning (DL), support vector machine, and random forest for detecting high BP using DNA methylome data. Peripheral blood samples of 50 elderly individuals were collected three times at three visits for DNA methylome profiling. Participants who had a history of hypertension and/or current high BP measure were considered to have high BP. The whole dataset was randomly divided to conduct a nested five-group cross-validation for prediction performance. Data in each outer training set were independently normalized using a min–max scaler, reduced dimensionality using principal component analysis, then fed into three predictive algorithms. Of the three ML algorithms, DL achieved the best performance (AUPRC = 0.65, AUROC = 0.73, accuracy = 0.69, and F1-score = 0.73). To confirm the reliability of using DNA methylome as a biomarker for high BP, we constructed mixed-effects models and found that 61,694 methylation sites located in 15,523 intragenic regions and 16,754 intergenic regions were significantly associated with BP measures. Our proposed models pioneered the methodology of applying ML and DNA methylome data for early detection of high BP in clinical practices.
Collapse
Affiliation(s)
- Thi Mai Nguyen
- Department of Integrative Bioscience & Biotechnology, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea;
| | - Hoang Long Le
- Department of Computer Science & Engineering, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea;
| | - Kyu-Baek Hwang
- School of Computer Science & Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Korea;
| | - Yun-Chul Hong
- Department of Preventive Medicine, College of Medicine, Seoul National University, Seoul 03080, Korea;
- Institute of Environmental Medicine, Seoul National University Medical Research Center, Seoul 03080, Korea
| | - Jin Hee Kim
- Department of Integrative Bioscience & Biotechnology, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea;
- Correspondence: ; Tel.: +82-2-3408-3655
| |
Collapse
|
4
|
Li H, Na S, Hwang KB, Paek E. TIDD: tool-independent and data-dependent machine learning for peptide identification. BMC Bioinformatics 2022; 23:109. [PMID: 35354356 PMCID: PMC8969291 DOI: 10.1186/s12859-022-04640-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 03/16/2022] [Indexed: 11/10/2022] Open
Abstract
Background In shotgun proteomics, database search engines have been developed to assign peptides to tandem mass (MS/MS) spectra and at the same time post-processing (or rescoring) approaches over the search results have been proposed to increase the number of confident peptide identifications. The most popular post-processing approaches such as Percolator and PeptideProphet have improved rates of peptide identifications by combining multiple scores from database search engines while applying machine learning techniques. Existing post-processing approaches, however, are limited when dealing with results from new search engines because their features for machine learning must be optimized specifically for each search engine. Results We propose a universal post-processing tool, called TIDD, which supports confident peptide identifications regardless of the search engine adopted. TIDD can work for any (including newly developed) search engines because it calculates universal features that assess peptide-spectrum match quality while it allows additional features provided by search engines (or users) as well. Even though it relies on universal features independent of search tools, TIDD showed similar or better performance than Percolator in terms of peptide identification. TIDD identified 10.23–38.95% more PSMs than target-decoy estimation for MSFragger, which is not supported by Percolator. TIDD offers an easy-to-use simple graphical user interface for user convenience. Conclusions TIDD successfully eliminated the requirement for an optimal feature engineering per database search tool, and thus, can be applied directly to any database search results including newly developed ones. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04640-y.
Collapse
Affiliation(s)
- Honglan Li
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| | - Seungjin Na
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea
| | - Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Republic of Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea.
| |
Collapse
|
5
|
Hwang KB, Lee IH, Li H, Won DG, Hernandez-Ferrer C, Negron JA, Kong SW. Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings. Sci Rep 2019; 9:3219. [PMID: 30824715 PMCID: PMC6397176 DOI: 10.1038/s41598-019-39108-2] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Accepted: 01/16/2019] [Indexed: 12/30/2022] Open
Abstract
Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipelines comprising the combination of 7 short-read aligners and 10 variant calling algorithms (VCAs), and observed remarkable differences in the number of variants called by different pipelines (max/min ratio: 1.3~3.4). The similarity between variant call sets was more closely determined by VCAs rather than by short-read aligners. Remarkably, reported minor allele frequency had a substantial effect on concordance between pipelines (concordance rate ratio: 0.11~0.92; Wald tests, P < 0.001), entailing more discordant results for rare and novel variants. We compared the performance of analytic pipelines and pipeline ensembles using gold-standard variant call sets and the catalog of variants from the 1000 Genomes Project. Notably, a single pipeline using BWA-MEM and GATK-HaplotypeCaller performed comparable to the pipeline ensembles for ‘callable’ regions (~97%) of the human reference genome. While a single pipeline is capable of analyzing common variants in most genomic regions, our findings demonstrated the limitations and challenges in analyzing rare or novel variants, especially for non-European genomes.
Collapse
Affiliation(s)
- Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Korea
| | - In-Hee Lee
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA
| | - Honglan Li
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Korea
| | - Dhong-Geon Won
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Korea
| | - Carles Hernandez-Ferrer
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA
| | - Jose Alberto Negron
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA
| | - Sek Won Kong
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, 02115, USA. .,Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
6
|
Mun DG, Bhin J, Kim S, Kim H, Jung JH, Jung Y, Jang YE, Park JM, Kim H, Jung Y, Lee H, Bae J, Back S, Kim SJ, Kim J, Park H, Li H, Hwang KB, Park YS, Yook JH, Kim BS, Kwon SY, Ryu SW, Park DY, Jeon TY, Kim DH, Lee JH, Han SU, Song KS, Park D, Park JW, Rodriguez H, Kim J, Lee H, Kim KP, Yang EG, Kim HK, Paek E, Lee S, Lee SW, Hwang D. Proteogenomic Characterization of Human Early-Onset Gastric Cancer. Cancer Cell 2019; 35:111-124.e10. [PMID: 30645970 DOI: 10.1016/j.ccell.2018.12.003] [Citation(s) in RCA: 154] [Impact Index Per Article: 30.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 08/22/2018] [Accepted: 12/10/2018] [Indexed: 02/08/2023]
Abstract
We report proteogenomic analysis of diffuse gastric cancers (GCs) in young populations. Phosphoproteome data elucidated signaling pathways associated with somatic mutations based on mutation-phosphorylation correlations. Moreover, correlations between mRNA and protein abundances provided potential oncogenes and tumor suppressors associated with patient survival. Furthermore, integrated clustering of mRNA, protein, phosphorylation, and N-glycosylation data identified four subtypes of diffuse GCs. Distinguishing these subtypes was possible by proteomic data. Four subtypes were associated with proliferation, immune response, metabolism, and invasion, respectively; and associations of the subtypes with immune- and invasion-related pathways were identified mainly by phosphorylation and N-glycosylation data. Therefore, our proteogenomic analysis provides additional information beyond genomic analyses, which can improve understanding of cancer biology and patient stratification in diffuse GCs.
Collapse
Affiliation(s)
- Dong-Gi Mun
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| | - Jinhyuk Bhin
- Department of New Biology and Center for Plant Aging Research, Institute for Basic Science, DGIST, Daegu 711-873, Republic of Korea; Division of Molecular Pathology, Oncode Institute, the Netherlands Cancer Institute, 1066CX Amsterdam, the Netherlands
| | - Sangok Kim
- Department of Life Science and Ewha Research Center for Systems Biology, Ewha Womans University, Seoul 120-750, Republic of Korea
| | - Hyunwoo Kim
- Department of Computer Science and Engineering, Hanyang University, Seoul 133-791, Republic of Korea; Research Data Hub Center, Korea Institute of Science and Technology Information, Daejeon 34141, Republic of Korea
| | - Jae Hun Jung
- Department of Applied Chemistry, College of Applied Sciences, Kyung Hee University, Yong-in 446-701, Republic of Korea
| | - Yeonjoo Jung
- Department of Life Science and Ewha Research Center for Systems Biology, Ewha Womans University, Seoul 120-750, Republic of Korea
| | - Ye Eun Jang
- Department of Life Science and Ewha Research Center for Systems Biology, Ewha Womans University, Seoul 120-750, Republic of Korea
| | - Jong Moon Park
- Gachon Institute of Pharmaceutical Sciences, Gachon College of Pharmacy, Gachon University, Incheon 406-799, Republic of Korea
| | - Hokeun Kim
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| | - Yeonhwa Jung
- Department of Life Science and Ewha Research Center for Systems Biology, Ewha Womans University, Seoul 120-750, Republic of Korea
| | - Hangyeore Lee
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| | - Jingi Bae
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| | - Seunghoon Back
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| | - Su-Jin Kim
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea
| | - Jieun Kim
- Department of Life Science and Ewha Research Center for Systems Biology, Ewha Womans University, Seoul 120-750, Republic of Korea
| | - Heejin Park
- Department of Computer Science and Engineering, Hanyang University, Seoul 133-791, Republic of Korea
| | - Honglan Li
- School of Computer Science and Engineering, Soongsil University, Seoul 156-743, Republic of Korea
| | - Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul 156-743, Republic of Korea
| | - Young Soo Park
- Department of Pathology, University of Ulsan College of Medicine, Asan Medical Center, Seoul 138-873, Republic of Korea
| | - Jeong Hwan Yook
- Department of Surgery, University of Ulsan College of Medicine, Asan Medical Center, Seoul 138-873, Republic of Korea
| | - Byung Sik Kim
- Department of Surgery, University of Ulsan College of Medicine, Asan Medical Center, Seoul 138-873, Republic of Korea
| | - Sun Young Kwon
- Department of Surgery, Keimyung University School of Medicine, Daegu 700-712, Republic of Korea
| | - Seung Wan Ryu
- Department of Surgery, Keimyung University School of Medicine, Daegu 700-712, Republic of Korea
| | - Do Youn Park
- Department of Pathology, Pusan National University School of Medicine, Busan 602-739, Republic of Korea
| | - Tae Yong Jeon
- Department of Surgery, Pusan National University School of Medicine, Busan 602-739, Republic of Korea
| | - Dae Hwan Kim
- Department of Surgery, Pusan National University School of Medicine, Busan 602-739, Republic of Korea
| | - Jae-Hyuck Lee
- Department of Pathology, Chonnam National University Medical School, Gwangju 501-746, Republic of Korea
| | - Sang-Uk Han
- Department of Surgery, Ajou University School of Medicine, Suwon 443-380 Republic of Korea
| | - Kyu Sang Song
- Department of Pathology, School of Medicine, Chungnam National University, Daejeon 301-747 Republic of Korea
| | - Dongmin Park
- National Cancer Center, Goyang 410-769, Republic of Korea
| | - Jun Won Park
- National Cancer Center, Goyang 410-769, Republic of Korea
| | - Henry Rodriguez
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Jaesang Kim
- Department of Life Science and Ewha Research Center for Systems Biology, Ewha Womans University, Seoul 120-750, Republic of Korea
| | - Hookeun Lee
- Gachon Institute of Pharmaceutical Sciences, Gachon College of Pharmacy, Gachon University, Incheon 406-799, Republic of Korea
| | - Kwang Pyo Kim
- Department of Applied Chemistry, College of Applied Sciences, Kyung Hee University, Yong-in 446-701, Republic of Korea
| | - Eun Gyeong Yang
- Biomedical Research Institute, Korea Institute of Science and Technology, Seoul 136-791, Republic of Korea.
| | - Hark Kyun Kim
- National Cancer Center, Goyang 410-769, Republic of Korea.
| | - Eunok Paek
- Department of Computer Science and Engineering, Hanyang University, Seoul 133-791, Republic of Korea.
| | - Sanghyuk Lee
- Department of Life Science and Ewha Research Center for Systems Biology, Ewha Womans University, Seoul 120-750, Republic of Korea.
| | - Sang-Won Lee
- Department of Chemistry, Center for Proteogenome Research, Korea University, Seoul 136-701, Republic of Korea.
| | - Daehee Hwang
- Department of New Biology and Center for Plant Aging Research, Institute for Basic Science, DGIST, Daegu 711-873, Republic of Korea.
| |
Collapse
|
7
|
Li H, Park J, Kim H, Hwang KB, Paek E. Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments. J Proteome Res 2017; 16:2231-2239. [PMID: 28452485 DOI: 10.1021/acs.jproteome.7b00033] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteogenomic searches are useful for novel peptide identification from tandem mass spectra. Usually, separate and multistage approaches are adopted to accurately control the false discovery rate (FDR) for proteogenomic search. Their performance on novel peptide identification has not been thoroughly evaluated, however, mainly due to the difficulty in confirming the existence of identified novel peptides. We simulated a proteogenomic search using a controlled, spike-in proteomic data set. After confirming that the results of the simulated proteogenomic search were similar to those of a real proteogenomic search using a human cell line data set, we evaluated the performance of six FDR control methods-global, separate, and multistage FDR estimation, respectively, coupled to a target-decoy search and a mixture model-based method-on novel peptide identification. The multistage approach showed the highest accuracy for FDR estimation. However, global and separate FDR estimation with the mixture model-based method showed higher sensitivities than others at the same true FDR. Furthermore, the mixture model-based method performed equally well when applied without or with a reduced set of decoy sequences. Considering different prior probabilities for novel and known protein identification, we recommend using mixture model-based methods with separate FDR estimation for sensitive and reliable identification of novel peptides from proteogenomic searches.
Collapse
Affiliation(s)
- Honglan Li
- School of Computer Science and Engineering, Soongsil University , Seoul 06978, Republic of Korea
| | - Jonghun Park
- Department of Computer Science, Hanyang University , Seoul 04763, Republic of Korea
| | - Hyunwoo Kim
- Scientific Data Research Center, Korea Institute of Science and Technology Information , Daejeon 34141, Republic of Korea
| | - Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University , Seoul 06978, Republic of Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University , Seoul 04763, Republic of Korea
| |
Collapse
|
8
|
Li H, Joh YS, Kim H, Paek E, Lee SW, Hwang KB. Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification. BMC Genomics 2016; 17:1031. [PMID: 28155652 PMCID: PMC5259817 DOI: 10.1186/s12864-016-3327-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available. Results To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches. Conclusions We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3327-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Honglan Li
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Republic of Korea
| | - Yoon Sung Joh
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| | - Hyunwoo Kim
- Scientific Data Research Center, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| | - Sang-Won Lee
- Department of Chemistry, Research Institute for Natural Sciences, Korea University, Seoul, 02841, Republic of Korea
| | - Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Republic of Korea.
| |
Collapse
|
9
|
Park JH, Cho B, Kwon H, Prilutsky D, Yun JM, Choi HC, Hwang KB, Lee IH, Kim JI, Kong SW. I148M variant in PNPLA3 reduces central adiposity and metabolic disease risks while increasing nonalcoholic fatty liver disease. Liver Int 2015; 35:2537-46. [PMID: 26148225 DOI: 10.1111/liv.12909] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 06/29/2015] [Indexed: 12/14/2022]
Abstract
BACKGROUND & AIMS The I148M variant because of the substitution of C to G in PNPLA3 (rs738409) is associated with the increased risk of nonalcoholic fatty liver disease (NAFLD). In liver, I148M variant reduces hydrolytic function of PNPLA3, which results in hepatic steatosis; however, its association with the other clinical phenotype such as adiposity and metabolic diseases is not well established. METHODS To identify the impact of I148M variant on clinical risk factors of NAFLD, we recruited 1363 generally healthy Korean males after excluding alcoholic and secondary causes of hepatic steatosis. Central adiposity was assessed by computed tomography, and hepatic steatosis was evaluated by abdominal ultrasonography. RESULTS The participants were predominantly middle-aged (49.0 ± 7.1 years; range 30-60 years), and the frequency of NAFLD was 44.2%. The rs738409-G allele carriers had a 1.19-fold increased risk for NAFLD (minor allele frequency 0.43; allelic odds ratio 1.38; P = 4.3 × 10(-5) ). Interestingly, the rs738409 GG carriers showed significantly lower levels of visceral and subcutaneous adiposity (P < 0.001 and = 0.015, respectively), BMI (P < 0.001), triglycerides (P < 0.001) and insulin resistance (P = 0.002) compared to CC carriers. These negative associations between clinical risk factors and rs738409-G dosage were more prominent in non-NAFLD group compared to those in NAFLD group. CONCLUSIONS The I148M variant, although increasing the risk of NAFLD, was associated with reduced levels of central adiposity, BMI, serum triglycerides and insulin resistance, suggesting differential roles in fat storage and distribution according to cell types and metabolic status.
Collapse
Affiliation(s)
- Jin-Ho Park
- Informatics Program, Boston Children's Hospital, Boston, MA, USA.,Department of Family Medicine, Seoul National University Hospital, Seoul, South Korea
| | - BeLong Cho
- Department of Family Medicine, Seoul National University Hospital, Seoul, South Korea
| | - Hyuktae Kwon
- Department of Family Medicine, Healthcare Research Institute, Seoul National University Hospital Healthcare System Gangnam Center, Seoul, South Korea
| | - Daria Prilutsky
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Jae Moon Yun
- Department of Family Medicine, Seoul National University Hospital, Seoul, South Korea
| | - Ho Chun Choi
- Department of Family Medicine, Seoul National University Hospital, Seoul, South Korea
| | - Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, South Korea
| | - In-Hee Lee
- Samsung Genome Institute, Institute for Refractory Cancer Research, Samsung Medical Center, Seoul, South Korea
| | - Jong-Il Kim
- Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, South Korea.,Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul, South Korea
| | - Sek Won Kong
- Informatics Program, Boston Children's Hospital, Boston, MA, USA.,Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
10
|
Rhee JK, Li H, Joung JG, Hwang KB, Zhang BT, Shin SY. Survey of computational haplotype determination methods for single individual. Genes Genomics 2015. [DOI: 10.1007/s13258-015-0342-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
11
|
Seok HS, Song T, Kong SW, Hwang KB. An Efficient Search Algorithm for Finding Genomic-Range Overlaps Based on the Maximum Range Length. IEEE/ACM Trans Comput Biol Bioinform 2015; 12:778-784. [PMID: 26357316 DOI: 10.1109/tcbb.2014.2369042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Efficient search algorithms for finding genomic-range overlaps are essential for various bioinformatics applications. A majority of fast algorithms for searching the overlaps between a query range (e.g., a genomic variant) and a set of N reference ranges (e.g., exons) has time complexity of O(k + logN), where kdenotes a term related to the length and location of the reference ranges. Here, we present a simple but efficient algorithm that reduces k, based on the maximum reference range length. Specifically, for a given query range and the maximum reference range length, the proposed method divides the reference range set into three subsets: always, potentially, and never overlapping. Therefore, search effort can be reduced by excluding never overlapping subset. We demonstrate that the running time of the proposed algorithm is proportional to potentially overlapping subset size, that is proportional to the maximum reference range length if all the other conditions are the same. Moreover, an implementation of our algorithm was 13.8 to 30.0 percent faster than one of the fastest range search methods available when tested on various genomic-range data sets. The proposed algorithm has been incorporated into a disease-linked variant prioritization pipeline for WGS (http://gnome.tchlab.org) and its implementation is available at http://ml.ssu.ac.kr/gSearch.
Collapse
|
12
|
Hwang KB, Lee IH, Park JH, Hambuch T, Choe Y, Kim M, Lee K, Song T, Neu MB, Gupta N, Kohane IS, Green RC, Kong SW. Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods. Hum Mutat 2014; 35:936-44. [PMID: 24829188 DOI: 10.1002/humu.22587] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 04/29/2014] [Indexed: 12/29/2022]
Abstract
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates.
Collapse
Affiliation(s)
- Kyu-Baek Hwang
- Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, Boston Children's Hospital, Boston, Massachusetts; School of Computer Science and Engineering, Soongsil University, Seoul, 156-743, South Korea
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Li H, Hwang KB, Mun DG, Kim H, Lee H, Lee SW, Paek E. Estimating influence of cofragmentation on peptide quantification and identification in iTRAQ experiments by simulating multiplexed spectra. J Proteome Res 2014; 13:3488-97. [PMID: 24918111 DOI: 10.1021/pr500060d] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Isobaric tag-based quantification such as iTRAQ and TMT is a promising approach to mass spectrometry-based quantification in proteomics as it provides wide proteome coverage with greatly increased experimental throughput. However, it is known to suffer from inaccurate quantification and identification of a target peptide due to cofragmentation of multiple peptides, which likely leads to under-estimation of differentially expressed peptides (DEPs). A simple method of filtering out cofragmented spectra with less than 100% precursor isolation purity (PIP) would decrease the coverage of iTRAQ/TMT experiments. In order to estimate the impact of cofragmentation on quantification and identification of iTRAQ-labeled peptide samples, we generated multiplexed spectra with varying degrees of PIP by mixing the two MS/MS spectra of 100% PIP obtained in global proteome profiling experiments on gastric tumor-normal tissue pair proteomes labeled by 4-plex iTRAQ. Despite cofragmentation, the simulation experiments showed that more than 99% of multiplexed spectra with PIP greater than 80% were correctly identified by three different database search engines-MODa, MS-GF+, and Proteome Discoverer. Using the multiplexed spectra that have been correctly identified, we estimated the effect of cofragmentation on peptide quantification. In 74% of the multiplexed spectra, however, the cancer-to-normal expression ratio was compressed, and a fair number of spectra showed the "ratio inflation" phenomenon. On the basis of the estimated distribution of distortions on quantification, we were able to calculate cutoff values for DEP detection from cofragmented spectra, which were corrected according to a specific PIP and probability of type I (or type II) error. When we applied these corrected cutoff values to real cofragmented spectra with PIP larger than or equal to 70%, we were able to identify reliable DEPs by removing about 25% of DEPs, which are highly likely to be false positives. Our experimental results provide useful insight into the effect of cofragmentation on isobaric tag-based quantification methods. The simulation procedure as well as the corrected cutoff calculation method could be adopted for quantifying the effect of cofragmentation and reducing false positives (or false negatives) in the DEP identification with general quantification experiments based on isobaric labeling techniques.
Collapse
Affiliation(s)
- Honglan Li
- School of Computer Science and Engineering, Soongsil University , Seoul 156-743, Republic of Korea
| | | | | | | | | | | | | |
Collapse
|
14
|
Lee IH, Lee K, Hsing M, Choe Y, Park JH, Kim SH, Bohn JM, Neu MB, Hwang KB, Green RC, Kohane IS, Kong SW. Prioritizing disease-linked variants, genes, and pathways with an interactive whole-genome analysis pipeline. Hum Mutat 2014; 35:537-47. [PMID: 24478219 DOI: 10.1002/humu.22520] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Accepted: 01/23/2014] [Indexed: 01/02/2023]
Abstract
Whole-genome sequencing (WGS) studies are uncovering disease-associated variants in both rare and nonrare diseases. Utilizing the next-generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here, we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline-gNOME-to prioritize phenotype-associated variants while minimizing false-positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype-associated variants, and the result summaries are provided at variant, gene, and genome levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared with population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole-exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME's accuracy of variant annotation and the enrichment of loss-of-function variants in known cancer pathways. gNOME Web server and source codes are freely available to the academic community (http://gnome.tchlab.org).
Collapse
Affiliation(s)
- In-Hee Lee
- Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, Department of Medicine, Boston Children's Hospital, Boston, Massachusetts, 02115
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Abstract
Identifying genes indispensable for an organism‘s life and their characteristics is one of the central questions in current biological research, and hence it would be helpful to develop computational approaches towards the prediction of essential genes. The performance of a predictor is usually measured by the area under the receiver operating characteristic curve (AUC). We propose a novel method by implementing genetic algorithms to maximize the partial AUC that is restricted to a specific interval of lower false positive rate (FPR), the region relevant to follow-up experimental validation. Our predictor uses various features based on sequence information, proteinprotein interaction network topology, and gene expression profiles. A feature selection wrapper was developed to alleviate the over-fitting problem and to weigh each feature’s relevance to prediction. We evaluated our method using the proteome of budding yeast. Our implementation of genetic algorithms maximizing the partial AUC below 0.05 or 0.10 of FPR outperformed other popular classification methods. [BMB Reports 2013; 46(1): 41-46]
Collapse
Affiliation(s)
- Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, Korea
| | | | | | | |
Collapse
|
16
|
Abstract
Background Multidimensional scaling (MDS) is a widely used approach to dimensionality reduction. It has been applied to feature selection and visualization in various areas. Among diverse MDS methods, the classical MDS is a simple and theoretically sound solution for projecting data objects onto a low dimensional space while preserving the original distances among them as much as possible. However, it is not trivial to apply it to genome-scale data (e.g., microarray gene expression profiles) on regular desktop computers, because of its high computational complexity. Results We implemented a highly-efficient software application, called CFMDS (CUDA-based Fast MultiDimensional Scaling), which produces an approximate solution of the classical MDS based on CUDA (compute unified device architecture) and the divide-and-conquer principle. CUDA is a parallel computing architecture exploiting the power of the GPU (graphics processing unit). The principle of divide-and-conquer was adopted for circumventing the small memory problem of usual graphics cards. Our application software has been tested on various benchmark datasets including microarrays and compared with the classical MDS algorithms implemented using C# and MATLAB. In our experiments, CFMDS was more than a hundred times faster for large data than such general solutions. Regarding the quality of dimensionality reduction, our approximate solutions were as good as those from the general solutions, as the Pearson's correlation coefficients between them were larger than 0.9. Conclusions CFMDS is an expeditious solution for the data dimensionality reduction problem. It is especially useful for efficient processing of genome-scale data consisting of several thousands of objects in several minutes.
Collapse
Affiliation(s)
- Sungin Park
- School of Computer Science and Engineering, Soongsil University, Seoul 156-743, Korea
| | | | | |
Collapse
|
17
|
Abstract
BACKGROUND Various processes such as annotation and filtering of variants or comparison of variants in different genomes are required in whole-genome or exome analysis pipelines. However, processing different databases and searching among millions of genomic loci is not trivial. RESULTS gSearch compares sequence variants in the Genome Variation Format (GVF) or Variant Call Format (VCF) with a pre-compiled annotation or with variants in other genomes. Its search algorithms are subsequently optimized and implemented in a multi-threaded manner. The proposed method is not a stand-alone annotation tool with its own reference databases. Rather, it is a search utility that readily accepts public or user-prepared reference files in various formats including GVF, Generic Feature Format version 3 (GFF3), Gene Transfer Format (GTF), VCF and Browser Extensible Data (BED) format. Compared to existing tools such as ANNOVAR, gSearch runs more than 10 times faster. For example, it is capable of annotating 52.8 million variants with allele frequencies in 6 min. AVAILABILITY gSearch is available at http://ml.ssu.ac.kr/gSearch. It can be used as an independent search tool or can easily be integrated to existing pipelines through various programming environments such as Perl, Ruby and Python.
Collapse
Affiliation(s)
- Taemin Song
- School of Computer Science and Engineering, Soongsil University, Seoul 156-743, South Korea
| | | | | | | | | | | |
Collapse
|
18
|
Abstract
MOTIVATION MicroRNAs (miRNAs) and mRNAs constitute an important part of gene regulatory networks, influencing diverse biological phenomena. Elucidating closely related miRNAs and mRNAs can be an essential first step towards the discovery of their combinatorial effects on different cellular states. Here, we propose a probabilistic learning method to identify synergistic miRNAs involving regulation of their condition-specific target genes (mRNAs) from multiple information sources, i.e. computationally predicted target genes of miRNAs and their respective expression profiles. RESULTS We used data sets consisting of miRNA-target gene binding information and expression profiles of miRNAs and mRNAs on human cancer samples. Our method allowed us to detect functionally correlated miRNA-mRNA modules involved in specific biological processes from multiple data sources by using a balanced fitness function and efficient searching over multiple populations. The proposed algorithm found two miRNA-mRNA modules, highly correlated with respect to their expression and biological function. Moreover, the mRNAs included in the same module showed much higher correlations when the related miRNAs were highly expressed, demonstrating our method's ability for finding coherent miRNA-mRNA modules. Most members of these modules have been reported to be closely related with cancer. Consequently, our method can provide a primary source of miRNA and target sets presumed to constitute closely related parts of gene regulatory pathways.
Collapse
Affiliation(s)
- Je-Gun Joung
- Center for Bioinformation Technology, Seoul National University, Seoul, Korea
| | | | | | | | | |
Collapse
|
19
|
Hwang KB, Zhang BT. Bayesian Model Averaging of Bayesian Network Classifiers Over Multiple Node-Orders: Application to Sparse Datasets. ACTA ACUST UNITED AC 2005; 35:1302-10. [PMID: 16366254 DOI: 10.1109/tsmcb.2005.850162] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Bayesian model averaging (BMA) can resolve the overfitting problem by explicitly incorporating the model uncertainty into the analysis procedure. Hence, it can be used to improve the generalization performance of Bayesian network classifiers. Until now, BMA of Bayesian network classifiers has only been performed in some restricted forms, e.g., the model is averaged given a single node-order, because of its heavy computational burden. However, it can be hard to obtain a good node-order when the available training dataset is sparse. To alleviate this problem, we propose BMA of Bayesian network classifiers over several distinct node-orders obtained using the Markov chain Monte Carlo sampling technique. The proposed method was examined using two synthetic problems and four real-life datasets. First, we show that the proposed method is especially effective when the given dataset is very sparse. The classification accuracy of averaging over multiple node-orders was higher in most cases than that achieved using a single node-order in our experiments. We also present experimental results for test datasets with unobserved variables, where the quality of the averaged node-order is more important. Through these experiments, we show that the difference in classification performance between the cases of multiple node-orders and single node-order is related to the level of noise, confirming the relative benefit of averaging over multiple node-orders for incomplete data. We conclude that BMA of Bayesian network classifiers over multiple node-orders has an apparent advantage when the given dataset is sparse and noisy, despite the method's heavy computational cost.
Collapse
|
20
|
Chang JH, Hwang KB, Oh SJ, Zhang BT. Bayesian network learning with feature abstraction for gene-drug dependency analysis. J Bioinform Comput Biol 2005; 3:61-77. [PMID: 15751112 DOI: 10.1142/s0219720005000874] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2004] [Revised: 06/23/2004] [Accepted: 07/02/2004] [Indexed: 12/20/2022]
Abstract
Combined analysis of the microarray and drug-activity datasets has the potential of revealing valuable knowledge about various relations among gene expressions and drug activities in the malignant cell. In this paper, we apply Bayesian networks, a tool for compact representation of the joint probability distribution, to such analysis. For the alleviation of data dimensionality problem, the huge datasets were condensed using a feature abstraction technique. The proposed analysis method was applied to the NCI60 dataset (http://discover.nci.nih.gov) consisting of gene expression profiles and drug activity patterns on human cancer cell lines. The Bayesian networks, learned from the condensed dataset, identified most of the salient pairwise correlations and some known relationships among several features in the original dataset, confirming the effectiveness of the proposed feature abstraction method. Also, a survey of the recent literature confirms the several relationships appearing in the learned Bayesian network to be biologically meaningful.
Collapse
Affiliation(s)
- Jeong-Ho Chang
- Biointelligence Laboratory, School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea.
| | | | | | | |
Collapse
|
21
|
Kong SW, Hwang KB, Kim RD, Zhang BT, Greenberg SA, Kohane IS, Park PJ. CrossChip: a system supporting comparative analysis of different generations of Affymetrix arrays. Bioinformatics 2005; 21:2116-7. [PMID: 15684227 PMCID: PMC2819168 DOI: 10.1093/bioinformatics/bti288] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY To increase compatibility between different generations of Affymetrix GeneChip arrays, we propose a method of filtering probes based on their sequences. Our method is implemented as a web-based service for downloading necessary materials for converting the raw data files (*.CEL) for comparative analysis. The user can specify the appropriate level of filtering by setting the criteria for the minimum overlap length between probe sequences and the minimum number of usable probe pairs per probe set. Our website supports a within-species comparison for human and mouse GeneChip arrays. AVAILABILITY http://www.crosschip.org
Collapse
Affiliation(s)
- Sek Won Kong
- Bauer Center for Genomics Research, Harvard University, Cambridge, MA, USA.
| | | | | | | | | | | | | |
Collapse
|
22
|
Hwang KB. [Primary health care services in medically underserved areas: since the start of operation 1 year ago]. Taehan Kanho 1982; 21:6-7. [PMID: 6922307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
23
|
Hwang KB. [Training of nurses' aides in Korea]. Taehan Kanho 1977; 16:74-6. [PMID: 244674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|