Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

223
(from Reference Citation Analysis)

Article PDFs (18)

Cited by > 0 (146)

Searched Name

Manabu Torii

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	POS0544 INFLUENCE OF EATING HABITS ON FRAILTY AMONG PATIENTS WITH RHEUMATOID ARTHRITIS: KURAMA COHORT. Ann Rheum Dis 2021. [DOI: 10.1136/annrheumdis-2021-eular.2511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Abstract Background:Rheumatoid arthritis (RA) is a chronic inflammatory disorder that contributes to accelerating frailty, a clinical state of increased vulnerability due to declined physiological function. Although accumulating evidence suggests the importance of nutritional therapy for frailty in the general population, there is little evidence on dietary recommendations for preventing frailty in patients with RA.Objectives:The present study aimed to reveal clinical associations between frailty status, eating habits and RA disease activity.Methods:We conducted a cross-sectional study of 306 female outpatients enrolled from the KURAMA (Kyoto University Rheumatoid Arthritis Management Alliance) cohort database. The participants were classified into three groups (robust, prefrail and frail) according to simplified frailty scale (SOF index), and dietary data were collected using a self-reported food frequency questionnaire as previously reported. We performed multivariate logistic analyses for the presence of frailty/prefrailty with or without eating habits.Results:Frail group showed physical decline such as decreased skeletal muscle index, hand grip strength and walking speed, and DAS28-ESR in the frail group was significantly higher compared to that in the others. In multivariate logistic analysis, the presence of frailty/prefrailty was correlated with DAS28-ESR (OR 1.71, p=0.00004) and methotrexate use (OR 0.47, p=0.0097). Cochran-Armitage trend test also showed that the intake frequency of five ingredients (meat, fish, milk, fruits and vegetables) was inversely associated with the prevalence of frailty/prefrailty. In additional multivariate logistic analyses with dietary habits, habitual intake of fish (at least three times per week), rather than meat or other foods, was independently correlated with the presence of frailty/prefrailty (OR 0.33, p=0.00027).Conclusion:Our results suggest that habitual intake of fish, rather than meat or other foods, may be beneficial in preventing frailty among RA patients.References:[1]Ferrucci, L. & Fabbri, E. Inflammageing: chronic inflammation in ageing, cardiovascular disease, and frailty. Nat Rev Cardiol 15, 505-522, doi:10.1038/s41569-018-0064-2 (2018).[2]Hernandez Morante, J. J., Gomez Martinez, C. & Morillas-Ruiz, J. M. Dietary factors associated with frailty in old adults: a review of nutritional interventions to prevent frailty development. Nutrients 11, doi:10.3390/nu11010102 (2019).Table 1.Multivariate logistic analysis for RA patients with prefrailty or frailtyvariables including eating habitsFish + MeatAllOR (95% CI)P valueOR (95% CI)P valueDAS28-ESR1.78 (1.34 - 2.37)0.000031.73 (1.30 - 2.30)0.00009MTX use0.43 (0.23 - 0.79)0.00550.42 (0.23 - 0.78)0.0050Age (1 year)1.02 (1.00 - 1.05)0.0371.03 (1.01 - 1.06)0.0015PSL use1.23 (0.69 - 2.21)0.491.22 (0.67 - 2.20)0.51Duration of RA (1 year)1.00 (0.98 - 1.02)0.721.00 (0.98 - 1.02)0.84Body mass index1.00 (0.93 - 1.07)0.980.99 (0.92 - 1.07)0.85Biological agents use1.02 (0.60 - 1.72)0.941.04 (0.62 - 1.77)0.87Fish dish0.31 (0.17 - 0.55)0.000040.33 (0.18 - 0.61)0.00027Meat dish0.86 (0.49 - 1.50)0.600.89 (0.51 - 1.57)0.69Milk0.71 (0.41 - 1.24)0.23Vegetable0.95 (0.47 - 1.93)0.89Fruits0.77 (0.41 - 1.42)0.40Figure 1.The prevalence of prefrailty or frailty for subjects by intake frequencyAcknowledgements:We thank S. Nakagawa and M. Iida for technical assistance.Disclosure of Interests:Masao Katsushima: None declared, Hiroto Minamino: None declared, Mie Torii: None declared, Motomu Hashimoto Speakers bureau: M.H. receives grants and/or speaker fees from Bristol-Meyers, Eisai, Eli Lilly, and Tanabe Mitsubishi., Grant/research support from: M.H. belongs to the department financially supported by Nagahama City, Shiga, Japan, Toyooka City, Hyogo, Japan and five pharmaceutical companies (Tanabe-Mitsubishi, Chugai, UCB Japan, Ayumi and Asahi-Kasei).KURAMA cohort study is supported by a grant from Daiichi Sankyo Co. Ltd., Wataru Yamamoto: None declared, Ryu Watanabe Grant/research support from: R.W. belongs to the department that is financially supported by Nagahama City, Shiga, Japan, Toyooka City, Hyogo, Japan and five pharmaceutical companies (Tanabe-Mitsubishi, Chugai, UCB Japan, Ayumi and Asahi-Kasei). KURAMA cohort study is supported by a grant from Daiichi Sankyo Co. Ltd., Kosaku Murakami: None declared, Koichi Murata Grant/research support from: K.M. belongs to the department that is financially supported by Nagahama City, Shiga, Japan, Toyooka City, Hyogo, Japan and five pharmaceutical companies (Tanabe-Mitsubishi, Chugai, UCB Japan, Ayumi and Asahi-Kasei).KURAMA cohort study is supported by a grant from Daiichi Sankyo Co. Ltd., Masao Tanaka Grant/research support from: M.T. belongs to the department that is financially supported by Nagahama City, Shiga, Japan, Toyooka City, Hyogo, Japan and five pharmaceutical companies (Tanabe-Mitsubishi, Chugai, UCB Japan, Ayumi and Asahi-Kasei).KURAMA cohort study is supported by a grant from Daiichi Sankyo Co. Ltd., Hiromu Ito Speakers bureau: H.I. receives a research grant and/or speaker fee from Bristol-Myers, Eisai, Mochida, Taisho, and Asahi-Kasei., Grant/research support from: H.I. belongs to the department that is financially supported by Nagahama City, Shiga, Japan, Toyooka City, Hyogo, Japan and five pharmaceutical companies (Tanabe-Mitsubishi, Chugai, UCB Japan, Ayumi and Asahi-Kasei). KURAMA cohort study is supported by a grant from Daiichi Sankyo Co. Ltd., Akio Morinobu Speakers bureau: A.M. has received speaking fees and/or research grants from Eli Lilly Japan K.K., Ono Pharmaceutical Co., Pfizer Inc., UCB Japan, AbbVie G.K., Asahi Kasei Pharma and Chugai Pharmaceutical Co. Ltd., Grant/research support from: A.M. has received speaking fees and/or research grants from Eli Lilly Japan K.K., Ono Pharmaceutical Co., Pfizer Inc., UCB Japan, AbbVie G.K., Asahi Kasei Pharma and Chugai Pharmaceutical Co. Ltd. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
2	Extracting health-related causality from twitter messages using natural language processing. BMC Med Inform Decis Mak 2019;19:79. [PMID: 30943954 PMCID: PMC6448183 DOI: 10.1186/s12911-019-0785-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open Abstract BACKGROUND Twitter messages (tweets) contain various types of topics in our daily life, which include health-related topics. Analysis of health-related tweets would help us understand health conditions and concerns encountered in our daily lives. In this paper we evaluate an approach to extracting causalities from tweets using natural language processing (NLP) techniques. METHODS Lexico-syntactic patterns based on dependency parser outputs are used for causality extraction. We focused on three health-related topics: "stress", "insomnia", and "headache." A large dataset consisting of 24 million tweets are used. RESULTS The results show the proposed approach achieved an average precision between 74.59 to 92.27% in comparisons with human annotations. CONCLUSIONS Manual analysis on extracted causalities in tweets reveals interesting findings about expressions on health-related topic posted by Twitter users. Collapse Key Words Causal relationships Causality Cause-effect Natural language processing (NLP) Twitter Collapse MESH Headings Causality Datasets as Topic Headache Humans Information Storage and Retrieval Natural Language Processing Sleep Initiation and Maintenance Disorders Social Media Stress, Psychological Text Messaging Collapse Grants Collapse
3	Abstract P3-03-21: Usefulness of sentinel lymph node biopsy by indocyanine green fluorescence method for cN0 breast cancer patients. Cancer Res 2019. [DOI: 10.1158/1538-7445.sabcs18-p3-03-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract Background. Indocyanine green (ICG) fluorescence method (ICG-f) has been recently widely used in sentinel lymph node (SLN) detection. The advantages of ICG-f are no radiation exposure, no limitation to use in high-volume medical centers without radioactive facility, and to confirm lymph flow as a real-time image from outside the body. ICG-f identified an average of 2.3-3.4 SLNs and the detection rate was 99%, compared to 1.7-2 SLNs by RI methods. Long-term observation after SNB using ICG-f has not been reported, including arm lymphedema as the complication of this method.We evaluate the usefulness of SLN biopsy (SNB) for cN0 breast cancer patients from data of multicenter cohort study on long-term results after negative SNB by ICG-f. Methods. Eleven hundred and thirty-two women were enrolled who had histologically proved clinical stage T1-4, pN0, M0 primary invasive breast cancer with SNB using ICG-f (ICG alone or combination of RI/blue dye method) sparing axillary lymph node dissection from May 2007 to December 2015. This study is retrospective, multicenter cohort study conducted at 6 centers in Japan. Primary endpoint is axillary recurrence rate. We analyzed the correlation with the axillary recurrence and adjuvant systemic therapy, adjuvant radiotherapy, and the clinicopathological characteristics. Secondary endpoint is lymphedema. Results and Discussion. The median follow-up time was 41 (range 21-117) months, and axillary recurrence was found in 6 patients (0.53%). Five out of 6 patients were not received standard adjuvant systemic therapy or adjuvant radiation therapy after breast conserving surgerybecause of patient's preference or old age. Lymphedema was identified only 4 patients in 632 patients. It is reported that axillary recurrence after SNB was 0.3-1.65%, which was consistent with our result. Lymphedema was not frequent in patients received SNB using ICG-f, because SLNs are removed along with lymphatic ducts in the limited area of axillary adipose tissue. Conclusion.Axillary recurrence after negative SNB using ICG-f was comparable to RI or blue dye method. It might be important to perform appropriate adjuvant medication or radiation therapy for preventing axillary recurrence after SNB using ICG-f. Next, ICG-f after neoadjuvant chemotherapy is to be investigated, because itis reported that removing more than 2 SLNs were associated with a lower likelihood of false negative ratio in patients with clinically node-positive disease converted to clinically node-negative after chemotherapy, and ICG-f might overcome this issue. Citation Format: Maeshima Y, Takahara S, Yamauchi A, Yamagami K, Sugie T, Yamashiro H, Kato H, Torii M, Takada M, Torii M. Usefulness of sentinel lymph node biopsy by indocyanine green fluorescence method for cN0 breast cancer patients [abstract]. In: Proceedings of the 2018 San Antonio Breast Cancer Symposium; 2018 Dec 4-8; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2019;79(4 Suppl):Abstract nr P3-03-21. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
4	A Preliminary Study of Clinical Concept Detection Using Syntactic Relations. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018;2018:1028-1035. [PMID: 30815146 PMCID: PMC6371372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023] Abstract Concept detection is an integral step in natural language processing (NLP) applications in the clinical domain. Clinical concepts are detailed (e.g., "pain in left/right upper/lower arm/leg") and expressed in diverse phrase types (e.g., noun, verb, adjective, or prepositional phrase). There are rich terminological resources in the clinical domain that include many concept synonyms. Even with these resources, concept detection remains challenging due to discontinuous and/or permuted phrase occurrences. To overcome this challenge, we investigated an approach to exploiting syntactic information. Syntactic patterns of concept phrases were mined from continuous, non-permuted forms of synonyms, and these patterns were used to detect discontinuous and/or permuted concept phrases. Experiments on 790 de-identified clinical notes showed that the proposed approach can potentially boost a recall of concept detection. Meanwhile, challenges and limitations were noticed. In this paper, we report and discuss our preliminary analysis and finding. Collapse Key Words Collapse MESH Headings Algorithms Electronic Health Records Humans Natural Language Processing Pattern Recognition, Automated Semantics Unified Medical Language System Collapse Grants U54 LM008748 NLM NIH HHS Collapse
5	Abstract PD2-07: Real-time navigation for sentinel lymph node biopsy in breast cancer patients using projection mapping with indocyanine green fluorescence. Cancer Res 2018. [DOI: 10.1158/1538-7445.sabcs17-pd2-07] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract Background) Sentinel lymph node (SLN) biopsy using indocyanine green fluorescence (fICG) method showed equal or better identification rate compared with blue dye or radioisotope (RI) method. In the fICG method, lymphatic vessels which drain into the SLNs can be seen through skin or subcutaneous tissue using near infrared camera (Photodynamic Eye®: PDE), and we can easily find the SLNs. However, whenever we observe the fluorescence images, we have to hold the PDE, turn off the operating light, and look at a monitor because fluorescence images cannot be seen directly. Medical imaging projection system (MIPS) is a new device which detects fluorescent emission from the organ and projects their images on the location of the fluorescence emission (Panasonic Connected Solutions Company, Japan). Projected images can be adjusted following the body movement or deformation of the organ. Therefore, MIPS could provide an option for real-time navigation for the SLN biopsy. The aim of this study was to evaluate the clinical utility of the MIPS. Patients and methods) Patients with clinically node-negative primary breast cancer underwent the fICG SLN biopsy using MIPS. Primary endpoint was identification rate of the fICG method using MIPS. At first, the study was conducted as an interventional study because the MIPS was the unapproved medical device. After approval of the MIPS, this study was conducted as an observational study. The study protocol was approved by the institutional review board at Kyoto University Hospital. All patients provided informed consent to participate in this study. Results) Between March 2016 and May 2017, 39 patients (40 procedures) underwent the fICG method SLN biopsy using MIPS. The median age was 55 years (range 32–74 years), and the median body mass index was 20.4 kg/m2 (range 17.7–27.7 kg/m2). About half had tumor stage T1 (58%) and 8 (20.0%) had DCIS. 8 procedures (20%) were performed after preoperative systemic therapy (PST). As MIPS itself can illuminate the operating field, SLN biopsy using MIPS was successfully performed without operating light in all procedures. At least one SLN was detected using MIPS for all procedures and the identification rate was 100% (95% CI: 91–100%). Median number of SLNs detected by MIPS was 3 (range 1–9) for all procedures, and 3 (range 2–8) for procedures after PST. Two pathologically positive SLNs and one SLN which included isolated tumor cells were detected by MIPS. In 25 procedures, RI was also used. 62 of 97 SLNs detected by MIPS (64%) were also detected by RI. However, no SLNs were detected only by RI. Conclusions) Although we still may not be able to avoid RI method because 25/40 (62.5%) procedures required the combined use of RI method, the fICG methods SLN biopsy using MIPS, which showed comparable identification rate of SLN with the conventional methods, could be useful tool with a view of allowing us to perform a real-time navigation surgery. Acknowledgements) This study was supported by Acceleration Transformative research for Medical innovation, Japan Agency for Medical Research and Development (AMED). Citation Format: Takada M, Takeuchi M, Suzuki E, Sato F, Matsumoto Y, Torii M, Sakita-Kawaguchi N, Nakayama Y, Okuda T, Nishino H, Seo S, Hatano E, Toi M. Real-time navigation for sentinel lymph node biopsy in breast cancer patients using projection mapping with indocyanine green fluorescence [abstract]. In: Proceedings of the 2017 San Antonio Breast Cancer Symposium; 2017 Dec 5-9; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2018;78(4 Suppl):Abstract nr PD2-07. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
6	Abstract P4-01-10: Development of photoacoustic vascular imaging system for breast cancer. Cancer Res 2017. [DOI: 10.1158/1538-7445.sabcs16-p4-01-10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract Background: Tumor angiogenesis and hypoxia are associated with breast cancer growth and metastasis. Photoacoustic (PA) tomography is an optical imaging technology that visualizes distribution and oxygenation status of hemoglobin with high spatial resolution. Initially we developed a photoacoustic mammography (PAM) having a flat-shaped scanning detector that could detect breast tumors. Nevertheless, the flat-shaped detector array has the drawback of a limited view. Here we developed a novel PAM system with a hemispherical-shaped detector array (HDA), which enables us to identify microvasculatures non-invasively and allow the collection of nearly spatially isotropic three-dimensional reconstructed image of blood vessels. This non-invasive vascular imaging system may be able to characterize tumor angiogenesis and analyze the status of microcirculation. The aim of this study was to analyze the imaging findings of tumor-related vasculature in breast cancer patients. Patients and method: A PAM system with HDA has been generated in a cooperation project between Canon Inc., Japan, and Kyoto University. Twenty-two primary breast cancer patients, including 5 patients with non-invasive cancer and 17 patients with invasive cancer, diagnosed between December 2014 and December 2015 underwent the PAM imaging analysis. We also applied the breast deformation algorithm from the breast shape in a MRI image to that in a PA image in order to create a fusion image of the two modalities for the analysis. Features of peri- and intra-tumoral vasculature, and their oxygenation status were evaluated. The study protocol was approved by the institutional review board at Kyoto University Hospital (UMIN000012251). All patients provided informed consent to participate in this study. Results: The abnormal peri-tumoral vasculature was detected in 86% of all non-invasive and invasive disease cases. In invasive cancer cases, most tumor-related blood vessels were centripetally directed toward the tumor, and 93% of centripetal blood vessels appeared to be disrupted or rapidly narrowed at the tumor boundary. The centripetal blood vessel structure was frequently observed in invasive cancer compared with non-invasive cancer (61% vs 35%). PA images before and after preoperative chemotherapy were obtained in one case, where intra-tumoral blood vessels became finer after chemotherapy, reflecting normalization of intra-tumoral microcirculation induced by chemotherapy. Conclusions: A PAM system with HDA has provided a high-resolution vascular images of primary breast cancers. The morphological differences of peri-tumoral vasculature were observed between invasive disease and non-invasive disease. These results suggest the potential of PA imaging as a non-invasive tool to analyze tumor vasculature of human breast cancers and maybe be helpful for breast cancer diagnosis. (Acknowledgements) This work was partially supported by the Innovative Techno-Hub for Integrated Medical Bio-imaging Project of the Special Coordination Funds for Promoting Science and Technology from the Ministry of Education, Culture, Sports, Science, and Technology, Japan. Citation Format: Toi M, Asao Y, Takada M, Kataoka M, Endo T, Kawashima M, Yamaga I, Nakayama Y, Tokiwa M, Fakhrejahani E, Torii M, Kawaguchi-Sakita N, Kanao S, Matsumoto Y, Yagi T, Sakurai T, Togashi K, Shiina T. Development of photoacoustic vascular imaging system for breast cancer [abstract]. In: Proceedings of the 2016 San Antonio Breast Cancer Symposium; 2016 Dec 6-10; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2017;77(4 Suppl):Abstract nr P4-01-10. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
7	Visualization of tumor-related blood vessels in human breast by photoacoustic imaging system with a hemispherical detector array. Sci Rep 2017;7:41970. [PMID: 28169313 PMCID: PMC5294462 DOI: 10.1038/srep41970] [Citation(s) in RCA: 147] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 12/28/2016] [Indexed: 12/19/2022] Open Abstract Noninvasive measurement of the distribution and oxygenation state of hemoglobin (Hb) inside the tissue is strongly required to analyze the tumor-associated vasculatures. We developed a photoacoustic imaging (PAI) system with a hemispherical-shaped detector array (HDA). Here, we show that PAI system with HDA revealed finer vasculature, more detailed blood-vessel branching structures, and more detailed morphological vessel characteristics compared with MRI by the use of breast shape deformation of MRI to PAI and their fused image. Morphologically abnormal peritumoral blood vessel features, including centripetal photoacoustic signals and disruption or narrowing of vessel signals, were observed and intratumoral signals were detected by PAI in breast cancer tissues as a result of the clinical study of 22 malignant cases. Interestingly, it was also possible to analyze anticancer treatment-driven changes in vascular morphological features and function, such as improvement of intratumoral blood perfusion and relevant changes in intravascular hemoglobin saturation of oxygen. This clinical study indicated that PAI appears to be a promising tool for noninvasive analysis of human blood vessels and may contribute to improve cancer diagnosis. Collapse Key Words Collapse MESH Headings Adult Aged Aged, 80 and over Algorithms Blood Vessels/diagnostic imaging Blood Vessels/pathology Breast/blood supply Breast/diagnostic imaging Breast/pathology Breast Neoplasms/blood supply Breast Neoplasms/diagnostic imaging Breast Neoplasms/pathology Carcinoma, Ductal, Breast/blood supply Carcinoma, Ductal, Breast/diagnostic imaging Carcinoma, Ductal, Breast/pathology Female Humans Image Processing, Computer-Assisted Magnetic Resonance Imaging Middle Aged Photoacoustic Techniques/instrumentation Photoacoustic Techniques/methods Young Adult Collapse Grants Collapse
8	69P Cisplatin based preoperative chemotherapy regimens for basal-like breast cancer potentially improve prognosis even in patients without pCR: A retrospective analysis from a single-institution. Ann Oncol 2016. [DOI: 10.1016/s0923-7534(21)00229-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
9	69P Cisplatin based preoperative chemotherapy regimens for basal-like breast cancer potentially improve prognosis even in patients without pCR: A retrospective analysis from a single-institution. Ann Oncol 2016. [DOI: 10.1093/annonc/mdw575.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
10	Mining Health-Related Issues in Consumer Product Reviews by Using Scalable Text Analytics. BIOMEDICAL INFORMATICS INSIGHTS 2016;8:1-11. [PMID: 27375358 PMCID: PMC4915789 DOI: 10.4137/bii.s37791] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Revised: 05/01/2016] [Accepted: 05/17/2016] [Indexed: 11/25/2022] Abstract In an era when most of our life activities are digitized and recorded, opportunities abound to gain insights about population health. Online product reviews present a unique data source that is currently underexplored. Health-related information, although scarce, can be systematically mined in online product reviews. Leveraging natural language processing and machine learning tools, we were able to mine 1.3 million grocery product reviews for health-related information. The objectives of the study were as follows: (1) conduct quantitative and qualitative analysis on the types of health issues found in consumer product reviews; (2) develop a machine learning classifier to detect reviews that contain health-related issues; and (3) gain insights about the task characteristics and challenges for text analytics to guide future research. Collapse Key Words big data consumer health informatics natural language processing online product reviews syndromic surveillance text mining Collapse MESH Headings Collapse Grants Collapse
11	Abstract P4-03-03: Detection of the tumor vasculature and the hypoxic status of breast lesions using second-generation photoacoustic mammography: An exploratory study. Cancer Res 2016. [DOI: 10.1158/1538-7445.sabcs15-p4-03-03] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract Background: Tumor angiogenesis and hypoxia are associated with breast cancer growth and metastasis. Photoacoustic mammography (PAM) non-invasively visualizes hemoglobin distribution inside the breast by detecting thermoelastic waves from hemoglobin generated by the irradiation of a near-infrared laser pulse. Oxygen saturation (SO2) can be calculated using photoacoustic (PA) signals obtained by two laser pulses of different wavelengths. We further improved the spatial resolution of PAM by approximately 1 mm and enhanced detectability by using a high-sensitivity detector. This new PAM technique can obtain both PAM images and ultrasonography (US) images simultaneously. The aim of this study was to explore the clinical usefulness of this PAM technique. Patients and methods: Women who had breast lesions were eligible for this study. The participants' lesions were measured using the new PAM technique before they began treatment. The PAM images were evaluated by 5 physicians. First, the lesions were identified using only the PAM images. Second, we used US or contrast-enhanced magnetic resonance images (CE-MRI) to identify the locations of the lesions. Next, we evaluated the photoacoustic (PA) signals based on their locations. Peri-tumoral PA signals were defined as linear signals that congregated in the peri-tumoral area, boundary PA signals were defined as peri-tumoral signals that were disrupted at the lesion's boundaries, and intra-tumoral PA signals were defined as any significant PA signals inside the tumor. SO2 was illustrated using a color scale. The study protocol was approved by the institutional review board at Kyoto University Hospital, Japan (UMIN000007464). Results: PAM was performed on 48 breast lesions in 45 patients, including 36 invasive carcinoma lesions, 8 ductal carcinoma in situ (DCIS) lesions, and 4 benign lesions. Evaluations of PA signals according to the locations of the lesion, with confirmation from US or CE-MRI, were successfully performed for 38 lesions. Peri-tumoral PA signals were detected in 33 lesions (87%), disrupted boundary PA signals were detected in 30 lesions (79%), and intra-tumoral PA signals were detected in 25 lesions (66%). The detection rates for peri-tumoral, boundary and intra-tumoral PA signals were 94%, 87%, and 65% for invasive carcinoma, and 60%, 40%, and 80% for DCIS, respectively. Intra-tumoral PA signals tended to be weaker than peri-tumoral PA signals in invasive carcinoma lesions, and they often displayed a spotty rather than a linear shape. Intra-tumoral PA signals were observed to have lower SO2 levels than peri-tumoral PA signals in 95% of invasive carcinoma lesions and in 75% of DCIS lesions. Although peri-tumoral and boundary PA signals were also detected in a 38-mm fibroadenoma, the intra-tumoral PA signals displayed a diffuse pattern. Conclusions: We demonstrated that high spatial resolution and use in combination with US and CE-MRI facilitate the region-specific evaluation of PAM imaging. PAM could become a useful tool for the evaluation of the hypoxic status of tumors by enhancing its sensitivity. Citation Format: Takada M, Kawashima M, Kataoka M, Kanao S, Yamaga I, Torii M, Tokiwa M, Fakhrejahani E, Sakurai T, Asao Y, Haga H, Shiina T, Togashi K, Toi M. Detection of the tumor vasculature and the hypoxic status of breast lesions using second-generation photoacoustic mammography: An exploratory study. [abstract]. In: Proceedings of the Thirty-Eighth Annual CTRC-AACR San Antonio Breast Cancer Symposium: 2015 Dec 8-12; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2016;76(4 Suppl):Abstract nr P4-03-03. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
12	Risk factor detection for heart disease by applying text analytics in electronic medical records. J Biomed Inform 2015;58 Suppl:S164-S170. [PMID: 26279500 DOI: 10.1016/j.jbi.2015.08.011] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Revised: 08/06/2015] [Accepted: 08/07/2015] [Indexed: 10/23/2022] Abstract In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9 billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972. Collapse Key Words Medical records Natural language processing Risk assessment Text classification Collapse MESH Headings Collapse Grants Collapse
13	P051 Photoacoustic imaging of breast cancer and histological markers of angiogenesis. Breast 2015. [DOI: 10.1016/s0960-9776(15)70101-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
14	Involvement of TRPV1 and AQP2 in hypertonic stress by xylitol in odontoblast cells. Connect Tissue Res 2015;56:44-9. [PMID: 25372661 DOI: 10.3109/03008207.2014.984804] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Abstract AIM To examine the responses of mouse odontoblast-lineage cell line (OLC) cultures to xylitol-induced hypertonic stress. METHODOLOGY OLCs were treated with xylitol, sucrose, sorbitol, mannitol, arabinose and lyxose. Cell viability was evaluated using the 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium assay. The expression of transient receptor potential vanilloids (TRPV) 1, 3 and 4 was detected using a reverse transcriptase-polymerase chain reaction (RT-PCR) assay. The expression of aquaporin (AQP) 2 was detected using immunofluorescence and Western blotting analysis. The expression of interleukin-6 (IL-6) under xylitol-induced hypertonic stress was assessed using an enzyme-linked immunosorbent assay (ELISA). Small interfering ribonucleic acid (siRNA) for AQP-2 was used to inhibition assay. RESULTS Xylitol-induced hypertonic stress did not decrease OLC viability, unlike the other sugars tested. OLCs expressed TRPV1, 3 and 4 as well as AQP2. Xylitol inhibited lipopolysaccharide (LPS)-induced IL-6 expression after 3 h of hypertonic stress. TRPV1 mRNA expression was upregulated by xylitol. Costimulation with HgCl2 (AQP inhibitor) and Ruthenium red (TRPV1 inhibitor) decreased cell viability with xylitol stimulation. OLCs treated with siRNA against TRPV1 exhibited decreased cell viability with xylitol stimulation. CONCLUSION OLCs have high-cell viability under xylitol-induced hypertonic stress, which may be associated with TRPV1 and AQP2 expressions. Collapse Key Words AQP2 TRPV1 hypertonic stress odontoblast cells xylitol Collapse MESH Headings Collapse Grants Collapse
15	RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015;12:17-29. [PMID: 26357075 PMCID: PMC4568560 DOI: 10.1109/tcbb.2014.2372765] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023] Abstract We introduce RLIMS-P version 2.0, an enhanced rule-based information extraction (IE) system for mining kinase, substrate, and phosphorylation site information from scientific literature. Consisting of natural language processing and IE modules, the system has integrated several new features, including the capability of processing full-text articles and generalizability towards different post-translational modifications (PTMs). To evaluate the system, sets of abstracts and full-text articles, containing a variety of textual expressions, were annotated. On the abstract corpus, the system achieved F-scores of 0.91, 0.92, and 0.95 for kinases, substrates, and sites, respectively. The corresponding scores on the full-text corpus were 0.88, 0.91, and 0.92. It was additionally evaluated on the corpus of the 2013 BioNLP-ST GE task, and achieved an F-score of 0.87 for the phosphorylation core task, improving upon the results previously reported on the corpus. Full-scale processing of all abstracts in MEDLINE and all articles in PubMed Central Open Access Subset has demonstrated scalability for mining rich information in literature, enabling its adoption for biocuration and for knowledge discovery. The new system is generalizable and it will be adapted to tackle other major PTM types. RLIMS-P 2.0 online system is available online (http://proteininformationresource.org/rlimsp/) and the developed corpora are available from iProLINK (http://proteininformationresource.org/iprolink/). Collapse Key Words biology and genetics context analysis and indexing natural language processing text mining Collapse MESH Headings Computational Biology/methods Data Mining/methods Databases, Protein Natural Language Processing Phosphoproteins/analysis Phosphoproteins/chemistry Phosphoproteins/classification Phosphorylation Software Collapse Grants G08 LM010720 NLM NIH HHS R01 GM080646 NIGMS NIH HHS G08LM010720 NLM NIH HHS Collapse
16	Signaling pathway of TNF-α-induced AQP3 expression in human gingiva: Implications in the pathogenesis of periodontitis. Inflamm Res 2014. [DOI: 10.1007/bf03354056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
17	A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems. BMC Bioinformatics 2014;15:285. [PMID: 25149151 PMCID: PMC4262219 DOI: 10.1186/1471-2105-15-285] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 08/15/2014] [Indexed: 11/18/2022] Open Abstract BACKGROUND Text mining is increasingly used in the biomedical domain because of its ability to automatically gather information from large amount of scientific articles. One important task in biomedical text mining is relation extraction, which aims to identify designated relations among biological entities reported in literature. A relation extraction system achieving high performance is expensive to develop because of the substantial time and effort required for its design and implementation. Here, we report a novel framework to facilitate the development of a pattern-based biomedical relation extraction system. It has several unique design features: (1) leveraging syntactic variations possible in a language and automatically generating extraction patterns in a systematic manner, (2) applying sentence simplification to improve the coverage of extraction patterns, and (3) identifying referential relations between a syntactic argument of a predicate and the actual target expected in the relation extraction task. RESULTS A relation extraction system derived using the proposed framework achieved overall F-scores of 72.66% for the Simple events and 55.57% for the Binding events on the BioNLP-ST 2011 GE test set, comparing favorably with the top performing systems that participated in the BioNLP-ST 2011 GE task. We obtained similar results on the BioNLP-ST 2013 GE test set (80.07% and 60.58%, respectively). We conducted additional experiments on the training and development sets to provide a more detailed analysis of the system and its individual modules. This analysis indicates that without increasing the number of patterns, simplification and referential relation linking play a key role in the effective extraction of biomedical relations. CONCLUSIONS In this paper, we present a novel framework for fast development of relation extraction systems. The framework requires only a list of triggers as input, and does not need information from an annotated corpus. Thus, we reduce the involvement of domain experts, who would otherwise have to provide manual annotations and help with the design of hand crafted patterns. We demonstrate how our framework is used to develop a system which achieves state-of-the-art performance on a public benchmark corpus. Collapse Key Words Collapse MESH Headings Biomedical Research/methods Data Mining/methods Language Pattern Recognition, Automated/methods Publications Time Factors Collapse Grants G08 LM010720 NLM NIH HHS G08LM010720 NLM NIH HHS Collapse
18	RLIMS-P: an online text-mining tool for literature-based extraction of protein phosphorylation information. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014;2014:bau081. [PMID: 25122463 PMCID: PMC4131691 DOI: 10.1093/database/bau081] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Abstract Protein phosphorylation is central to the regulation of most aspects of cell function. Given its importance, it has been the subject of active research as well as the focus of curation in several biological databases. We have developed Rule-based Literature Mining System for protein Phosphorylation (RLIMS-P), an online text-mining tool to help curators identify biomedical research articles relevant to protein phosphorylation. The tool presents information on protein kinases, substrates and phosphorylation sites automatically extracted from the biomedical literature. The utility of the RLIMS-P Web site has been evaluated by curators from Phospho.ELM, PhosphoGRID/BioGrid and Protein Ontology as part of the BioCreative IV user interactive task (IAT). The system achieved F-scores of 0.76, 0.88 and 0.92 for the extraction of kinase, substrate and phosphorylation sites, respectively, and a precision of 0.88 in the retrieval of relevant phosphorylation literature. The system also received highly favorable feedback from the curators in a user survey. Based on the curators’ suggestions, the Web site has been enhanced to improve its usability. In the RLIMS-P Web site, phosphorylation information can be retrieved by PubMed IDs or keywords, with an option for selecting targeted species. The result page displays a sortable table with phosphorylation information. The text evidence page displays the abstract with color-coded entity mentions and includes links to UniProtKB entries via normalization, i.e. the linking of entity mentions to database identifiers, facilitated by the GenNorm tool and by the links to the bibliography in UniProt. Log in and editing capabilities are offered to any user interested in contributing to the validation of RLIMS-P results. Retrieved phosphorylation information can also be downloaded in CSV format and the text evidence in the BioC format. RLIMS-P is freely available. Database URL:http://www.proteininformationresource.org/rlimsp/ Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
19	iSimp in BioC standard format: enhancing the interoperability of a sentence simplification system. Database (Oxford) 2014;2014:bau038. [PMID: 24850848 PMCID: PMC4028706 DOI: 10.1093/database/bau038] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Revised: 03/07/2014] [Accepted: 04/14/2014] [Indexed: 11/16/2022] Abstract This article reports the use of the BioC standard format in our sentence simplification system, iSimp, and demonstrates its general utility. iSimp is designed to simplify complex sentences commonly found in the biomedical text, and has been shown to improve existing text mining applications that rely on the analysis of sentence structures. By adopting the BioC format, we aim to make iSimp readily interoperable with other applications in the biomedical domain. To examine the utility of iSimp in BioC, we implemented a rule-based relation extraction system that uses iSimp as a preprocessing module and BioC for data exchange. Evaluation on the training corpus of BioNLP-ST 2011 GENIA Event Extraction (GE) task showed that iSimp sentence simplification improved the recall by 3.2% without reducing precision. The iSimp simplification-annotated corpora, both our previously used corpus and the GE corpus in the current study, have been converted into the BioC format and made publicly available at the project's Web site: http://research.bioinformatics.udel.edu/isimp/. Database URL:http://research.bioinformatics.udel.edu/isimp/ Collapse Key Words Collapse MESH Headings Algorithms Data Mining/methods Internet Natural Language Processing Semantics Collapse Grants G08LM010720 NLM NIH HHS Collapse
20	Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? J Biomed Semantics 2014;5:3. [PMID: 24438362 PMCID: PMC3908466 DOI: 10.1186/2041-1480-5-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2012] [Accepted: 11/26/2013] [Indexed: 11/24/2022] Open Abstract Background Identifying phrases that refer to particular concept types is a critical step in extracting information from documents. Provided with annotated documents as training data, supervised machine learning can automate this process. When building a machine learning model for this task, the model may be built to detect all types simultaneously (all-types-at-once) or it may be built for one or a few selected types at a time (one-type- or a-few-types-at-a-time). It is of interest to investigate which strategy yields better detection performance. Results Hidden Markov models using the different strategies were evaluated on a clinical corpus annotated with three concept types (i2b2/VA corpus) and a biology literature corpus annotated with five concept types (JNLPBA corpus). Ten-fold cross-validation tests were conducted and the experimental results showed that models trained for multiple concept types consistently yielded better performance than those trained for a single concept type. F-scores observed for the former strategies were higher than those observed for the latter by 0.9 to 2.6% on the i2b2/VA corpus and 1.4 to 10.1% on the JNLPBA corpus, depending on the target concept types. Improved boundary detection and reduced type confusion were observed for the all-types-at-once strategy. Conclusions The current results suggest that detection of concept phrases could be improved by simultaneously tackling multiple concept types. This also suggests that we should annotate multiple concept types in developing a new corpus for machine learning models. Further investigation is expected to gain insights in the underlying mechanism to achieve good performance when multiple concept types are considered. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
21	Abstract P2-03-09: Tissue hemoglobin oxygen saturation measured by photoacoustic mammography correlates with microvasculature properties assessed by histological image analysis, a preliminary study. Cancer Res 2013. [DOI: 10.1158/0008-5472.sabcs13-p2-03-09] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract Background: Photoacoustic mammography (PAM) is a new optical imaging technology with the potential of imaging tumor vasculature in breast cancer.The technique is applicable to the measurement of hemoglobin oxygen saturation (SO2).We have previously published an initial clinical result using a prototype machine (Canon Inc.,Tokyo, Japan) in breast cancer. We have also presented the morphological characteristics of tumor vessels analyzed by automated image processing during AACR annual meeting 2013. Here, we report the oxygenation data obtained by PAM in relation with histological assessment of tumor vasculature and hypoxia. Methods: Forty-four breast lesions were evaluated by PAM in this IRB approved prospective study at Kyoto University Hospital, Japan. PAM evaluation was performed on both breasts whenever possible.SO2 was calculated in region of interest after the radiologist confirmed the signal was associated with the tumor location in MRI images. For the normal breast, signals obtained at the same depth of the tumor, were selected. Eighty-one histological sections from 20 available invasive carcinoma tissues at the time of this analysis were selected for immunohistochemical assessment of hypoxia by anti -carbonic anhydrase IX (CA IX) and tumor vasculature image analysis using anti-CD31. Five 0.5 mm2 area of each cancer and 3 area of normal mammary tissue associated with the same lesion were randomly selected from different sections. Total vascular area in each square was calculated by using Image Pro-Plus 7.0 software (Media Cybernetics, USA). Tumor-to-normal vascular area ratio (T/N VA) was calculated for each lesion as an index for tumor blood supply. Results: Patients’ age ranged from 36 to 83 years old. Tumor associated signals were detected by PAM in 18 out of 20 lesions for which tissues were available for histological examination. SO2 in tumor was calculated 70.6% ±13.2 and 83.3% ±10.7 in the normal counterpart. While T/N VA ranged between 0.11 to 1.14, it was almost 3 times lower in lesions with CA IX positive cytoplasmic membrane staining (0.21 vs 0.7, p-value = 0.021 Mann-Whitney Test). Normalized tumor SO2 (tumor SO2/normal counterpart SO2) was significantly lower in the group with lower T/N VA (0.9 vs. 0.8,p-value = 0.045, Student T-test). To better evaluate the accuracy of PAM measurement in calculating SO2,3780 tumor-associated and 2835 normal microvessels were analyzed by image analysis software. Tumor associated vessels had significantly smaller area (p-value<0.001) and vessels with irregular lumens were more frequent in tumor (76.5% vs 19.6% p-value <0.001) compatible with lower SO2 in tumor areas. Conclusion: Although the future result of our ongoing clinical studies of PAM measurement in breast cancer patients are more than necessary, the strong correlation between histological evaluation of hypoxia and angiogenesis with PAM measurement of oxygenation shows the promising prospective for clinical application of this new technology in breast cancer. Citation Information: Cancer Res 2013;73(24 Suppl): Abstract nr P2-03-09. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
22	A framework for biomedical figure segmentation towards image-based document retrieval. BMC SYSTEMS BIOLOGY 2013;7 Suppl 4:S8. [PMID: 24565394 PMCID: PMC3856606 DOI: 10.1186/1752-0509-7-s4-s8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract The figures included in many of the biomedical publications play an important role in understanding the biological experiments and facts described within. Recent studies have shown that it is possible to integrate the information that is extracted from figures in classical document classification and retrieval tasks in order to improve their accuracy. One important observation about the figures included in biomedical publications is that they are often composed of multiple subfigures or panels, each describing different methodologies or results. The use of these multimodal figures is a common practice in bioscience, as experimental results are graphically validated via multiple methodologies or procedures. Thus, for a better use of multimodal figures in document classification or retrieval tasks, as well as for providing the evidence source for derived assertions, it is important to automatically segment multimodal figures into subfigures and panels. This is a challenging task, however, as different panels can contain similar objects (i.e., barcharts and linecharts) with multiple layouts. Also, certain types of biomedical figures are text-heavy (e.g., DNA sequences and protein sequences images) and they differ from traditional images. As a result, classical image segmentation techniques based on low-level image features, such as edges or color, are not directly applicable to robustly partition multimodal figures into single modal panels. In this paper, we describe a robust solution for automatically identifying and segmenting unimodal panels from a multimodal figure. Our framework starts by robustly harvesting figure-caption pairs from biomedical articles. We base our approach on the observation that the document layout can be used to identify encoded figures and figure boundaries within PDF files. Taking into consideration the document layout allows us to correctly extract figures from the PDF document and associate their corresponding caption. We combine pixel-level representations of the extracted images with information gathered from their corresponding captions to estimate the number of panels in the figure. Thus, our approach simultaneously identifies the number of panels and the layout of figures. In order to evaluate the approach described here, we applied our system on documents containing protein-protein interactions (PPIs) and compared the results against a gold standard that was annotated by biologists. Experimental results showed that our automatic figure segmentation approach surpasses pure caption-based and image-based approaches, achieving a 96.64% accuracy. To allow for efficient retrieval of information, as well as to provide the basis for integration into document classification and retrieval systems among other, we further developed a web-based interface that lets users easily retrieve panels containing the terms specified in the user queries. Collapse Key Words Collapse MESH Headings Biomedical Research Computational Biology/methods Computer Graphics Image Processing, Computer-Assisted Information Storage and Retrieval/methods Collapse Grants Collapse
23	BioC: a minimalist approach to interoperability for biomedical text processing. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013;2013:bat064. [PMID: 24048470 PMCID: PMC3889917 DOI: 10.1093/database/bat064] [Citation(s) in RCA: 100] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Abstract A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net/ Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
24	Evaluation of Oxygenation in Breast Cancers by Photoacoustic Mammography: Clinical and Histological Comparison. Ann Oncol 2013. [DOI: 10.1093/annonc/mdt144.6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
25	Continuously Rethinking the Definition of Influenza for Surveillance Systems. Med Decis Making 2013;33:860-8. [DOI: 10.1177/0272989x13478482] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Objective. In the Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE), influenza was originally defined by a list of 29 and later by a list of 12 diagnosis codes. This article describes a dependent Bayesian procedure designed to improve the ESSENCE system and exploit multiple sources of information without being biased by redundancy. Methods. We obtained 13,096 cases within the Armed Forces Health Longitudinal Technological Application electronic medical records that included an influenza laboratory test. A Dependent Bayesian Expert System (D-BESt) was used to predict influenza from diagnoses, symptoms, reason for visit, temperature, month of visit, category of enrollment, and demographics. For each case, D-BESt sequentially selects the most discriminating piece of information, calculates its likelihood ratio conditioned on previously selected information, and updates the case’s probability of influenza. Results. When the analysis was limited to definitions based on diagnoses and was applied to a sample of patients for whom laboratory tests had been ordered, the areas under the receiver operating characteristic curve (AUCs) for the previous (29-diagnosis) and current (12-diagnosis) ESSENCE lists and the D-BESt algorithm were, respectively, 0.47, 0.36, and 0.77. Including other sources of information further improved the AUC for D-BESt to 0.79. At the best cutoff point for D-BESt, where the receiver operating characteristic curve for D-BESt is farthest from the diagonal line, the D-BESt algorithm correctly classified 84% of cases (specificity = 88%, sensitivity = 62%). In comparison, the current ESSENCE approach of using a list of 12 diagnoses correctly classified only 31% of this sample of cases (specificity = 29%, sensitivity = 42%). Conclusions. False alarms in ESSENCE surveillance systems can be reduced if a probabilistic dynamic learning system is used. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
26	AOSP46 EVALUATION OF BREAST TISSUE MICROVASCULATURE WITH DIGITAL PATHOLOGY AND IMAGE ANALYSIS BRINGS NEW INSIGHTS INTO TUMOUR ANGIOGENESIS ASSESSMENT. Eur J Cancer 2013. [DOI: 10.1016/s0959-8049(13)70056-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
27	Pooling annotated corpora for clinical concept extraction. J Biomed Semantics 2013;4:3. [PMID: 23294871 PMCID: PMC3599895 DOI: 10.1186/2041-1480-4-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Accepted: 01/02/2013] [Indexed: 11/12/2022] Open Abstract Background The availability of annotated corpora has facilitated the application of machine learning algorithms to concept extraction from clinical notes. However, high expenditure and labor are required for creating the annotations. A potential alternative is to reuse existing corpora from other institutions by pooling with local corpora, for training machine taggers. In this paper we have investigated the latter approach by pooling corpora from 2010 i2b2/VA NLP challenge and Mayo Clinic Rochester, to evaluate taggers for recognition of medical problems. The corpora were annotated for medical problems, but with different guidelines. The taggers were constructed using an existing tagging system MedTagger that consisted of dictionary lookup, part of speech (POS) tagging and machine learning for named entity prediction and concept extraction. We hope that our current work will be a useful case study for facilitating reuse of annotated corpora across institutions. Results We found that pooling was effective when the size of the local corpus was small and after some of the guideline differences were reconciled. The benefits of pooling, however, diminished as more locally annotated documents were included in the training data. We examined the annotation guidelines to identify factors that determine the effect of pooling. Conclusions The effectiveness of pooling corpora, is dependent on several factors, which include compatibility of annotation guidelines, distribution of report types and size of local and foreign corpora. Simple methods to rectify some of the guideline differences can facilitate pooling. Our findings need to be confirmed with further studies on different corpora. To facilitate the pooling and reuse of annotated corpora, we suggest that – i) the NLP community should develop a standard annotation guideline that addresses the potential areas of guideline differences that are partly identified in this paper; ii) corpora should be annotated with a two-pass method that focuses first on concept recognition, followed by normalization to existing ontologies; and iii) metadata such as type of the report should be created during the annotation process. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
28	Synthesis and Biological Evaluation of a 6-Aminofuro[3,2–c]pyridin-3(2H)-one Series of GPR 119 Agonists. ACTA ACUST UNITED AC 2012;62:537-44. [DOI: 10.1055/s-0032-1323760] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
29	Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules. J Am Med Inform Assoc 2012;19:867-74. [PMID: 22707745 DOI: 10.1136/amiajnl-2011-000766] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open Abstract OBJECTIVE This paper describes the coreference resolution system submitted by Mayo Clinic for the 2011 i2b2/VA/Cincinnati shared task Track 1C. The goal of the task was to construct a system that links the markables corresponding to the same entity. MATERIALS AND METHODS The task organizers provided progress notes and discharge summaries that were annotated with the markables of treatment, problem, test, person, and pronoun. We used a multi-pass sieve algorithm that applies deterministic rules in the order of preciseness and simultaneously gathers information about the entities in the documents. Our system, MedCoref, also uses a state-of-the-art machine learning framework as an alternative to the final, rule-based pronoun resolution sieve. RESULTS The best system that uses a multi-pass sieve has an overall score of 0.836 (average of B(3), MUC, Blanc, and CEAF F score) for the training set and 0.843 for the test set. DISCUSSION A supervised machine learning system that typically uses a single function to find coreferents cannot accommodate irregularities encountered in data especially given the insufficient number of examples. On the other hand, a completely deterministic system could lead to a decrease in recall (sensitivity) when the rules are not exhaustive. The sieve-based framework allows one to combine reliable machine learning components with rules designed by experts. CONCLUSION Using relatively simple rules, part-of-speech information, and semantic type properties, an effective coreference resolution system could be designed. The source code of the system described is available at https://sourceforge.net/projects/ohnlp/files/MedCoref. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
30	Feasibility of pooling annotated corpora for clinical concept extraction. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2012;2012:38. [PMID: 22779047 PMCID: PMC3392069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Abstract Availability of annotated corpora has facilitated application of machine learning algorithms to concept extraction from clinical notes. However, it is expensive to prepare annotated corpora in individual institutions, and pooling of annotated corpora from other institutions is a potential solution. In this paper we investigate whether pooling of corpora from two different sources, can improve performance and portability of resultant machine learning taggers for medical problem detection. Specifically, we pool corpora from 2010 i2b2/VA NLP challenge and Mayo Clinic Rochester, to evaluate taggers for recognition of medical problems. Contrary to our expectations, pooling of corpora is found to decrease the F1-score. We examine the annotation guidelines to identify factors for incompatibility of the corpora and suggest development of a standard annotation guideline by the clinical NLP community to allow compatibility of annotated corpora. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
31	Bayesian Processing of Context-Dependent Text. Med Decis Making 2012;32:E1-9. [DOI: 10.1177/0272989x12439753] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Abstract Objective. This article aims to examine whether words listed in reasons for appointments could effectively predict laboratory-verified influenza cases in syndromic surveillance systems. Methods. Data were collected from the Armed Forces Health Longitudinal Technological Application medical record system. We used 2 algorithms to combine the impact of words within reasons for appointments: Dependent (DBSt) and Independent (IBSt) Bayesian System. We used receiver operating characteristic curves to compare the accuracy of these 2 methods of processing reasons for appointments against current and previous lists of diagnoses used in the Department of Defense’s syndromic surveillance system. Results. We examined 13,096 cases, where the results of influenza tests were available. Each reason for an appointment had an average of 3.5 words (standard deviation = 2.2 words). There was no difference in performance of the 2 algorithms. The area under the curve for IBSt was 0.58 and for DBSt was 0.56. The difference was not statistically significant (McNemar statistic = 0.0054; P = 0.07). Conclusions. These data suggest that reasons for appointments can improve the accuracy of lists of diagnoses in predicting laboratory-verified influenza cases. This study recommends further exploration of the DBSt algorithm and reasons for appointments in predicting likely influenza cases. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
32	A hybrid approach to sentiment sentence classification in suicide notes. BIOMEDICAL INFORMATICS INSIGHTS 2012;5:43-50. [PMID: 22879759 PMCID: PMC3409488 DOI: 10.4137/bii.s8961] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Abstract This paper describes the sentiment classification system developed by the Mayo Clinic team for the 2011 I2B2/VA/Cincinnati Natural Language Processing (NLP) Challenge. The sentiment classification task is to assign any pertinent emotion to each sentence in suicide notes. We have implemented three systems that have been trained on suicide notes provided by the I2B2 challenge organizer—a machine learning system, a rule-based system, and a system consisting of a combination of both. Our machine learning system was trained on re-annotated data in which apparently inconsistent emotion assignment was adjusted. Then, the machine learning methods by RIPPER and multinomial Naïve Bayes classifiers, manual pattern matching rules, and the combination of the two systems were tested to determine the emotions within sentences. The combination of the machine learning and rule-based system performed best and produced a micro-average F-score of 0.5640. Collapse Key Words machine learning natural language processing sentiment classification suicidal emotion Collapse MESH Headings Collapse Grants Collapse
33	A review of automated text classification in event-based biosurveillance. EMERGING HEALTH THREATS JOURNAL 2011. [DOI: 10.3402/ehtj.v4i0.11057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
34	The gene normalization task in BioCreative III. BMC Bioinformatics 2011;12 Suppl 8:S2. [PMID: 22151901 PMCID: PMC3269937 DOI: 10.1186/1471-2105-12-s8-s2] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open Abstract BACKGROUND We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k). RESULTS We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively. CONCLUSIONS By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance. Collapse Key Words Collapse MESH Headings Algorithms Animals Data Mining/methods Data Mining/standards Genes Humans National Library of Medicine (U.S.) Periodicals as Topic United States Collapse Grants 5R01 GM083649-03 NIGMS NIH HHS 5R01LM009836 NLM NIH HHS 1-R01-LM009959-01A1 NLM NIH HHS 3T15 LM009451-03S1 NLM NIH HHS 5R01 LM008111-05 NLM NIH HHS 5R01 LM010120-02 NLM NIH HHS 5R01LM010125 NLM NIH HHS R01 LM009959 NLM NIH HHS Intramural NIH HHS Collapse
35	Using machine learning for concept extraction on clinical documents from multiple data sources. J Am Med Inform Assoc 2011;18:580-7. [PMID: 21709161 DOI: 10.1136/amiajnl-2011-000155] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open Abstract OBJECTIVE Concept extraction is a process to identify phrases referring to concepts of interests in unstructured text. It is a critical component in automated text processing. We investigate the performance of machine learning taggers for clinical concept extraction, particularly the portability of taggers across documents from multiple data sources. METHODS We used BioTagger-GM to train machine learning taggers, which we originally developed for the detection of gene/protein names in the biology domain. Trained taggers were evaluated using the annotated clinical documents made available in the 2010 i2b2/VA Challenge workshop, consisting of documents from four data sources. RESULTS As expected, performance of a tagger trained on one data source degraded when evaluated on another source, but the degradation of the performance varied depending on data sources. A tagger trained on multiple data sources was robust, and it achieved an F score as high as 0.890 on one data source. The results also suggest that performance of machine learning taggers is likely to improve if more annotated documents are available for training. CONCLUSION Our study shows how the performance of machine learning taggers is degraded when they are ported across clinical documents from different sources. The portability of taggers can be enhanced by training on datasets from multiple sources. The study also shows that BioTagger-GM can be easily extended to detect clinical concept mentions with good performance. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
36	dbOGAP - an integrated bioinformatics resource for protein O-GlcNAcylation. BMC Bioinformatics 2011;12:91. [PMID: 21466708 PMCID: PMC3083348 DOI: 10.1186/1471-2105-12-91] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2010] [Accepted: 04/06/2011] [Indexed: 12/31/2022] Open Abstract Background Protein O-GlcNAcylation (or O-GlcNAc-ylation) is an O-linked glycosylation involving the transfer of β-N-acetylglucosamine to the hydroxyl group of serine or threonine residues of proteins. Growing evidences suggest that protein O-GlcNAcylation is common and is analogous to phosphorylation in modulating broad ranges of biological processes. However, compared to phosphorylation, the amount of protein O-GlcNAcylation data is relatively limited and its annotation in databases is scarce. Furthermore, a bioinformatics resource for O-GlcNAcylation is lacking, and an O-GlcNAcylation site prediction tool is much needed. Description We developed a database of O-GlcNAcylated proteins and sites, dbOGAP, primarily based on literature published since O-GlcNAcylation was first described in 1984. The database currently contains ~800 proteins with experimental O-GlcNAcylation information, of which ~61% are of humans, and 172 proteins have a total of ~400 O-GlcNAcylation sites identified. The O-GlcNAcylated proteins are primarily nucleocytoplasmic, including membrane- and non-membrane bounded organelle-associated proteins. The known O-GlcNAcylated proteins exert a broad range of functions including transcriptional regulation, macromolecular complex assembly, intracellular transport, translation, and regulation of cell growth or death. The database also contains ~365 potential O-GlcNAcylated proteins inferred from known O-GlcNAcylated orthologs. Additional annotations, including other protein posttranslational modifications, biological pathways and disease information are integrated into the database. We developed an O-GlcNAcylation site prediction system, OGlcNAcScan, based on Support Vector Machine and trained using protein sequences with known O-GlcNAcylation sites from dbOGAP. The site prediction system achieved an area under ROC curve of 74.3% in five-fold cross-validation. The dbOGAP website was developed to allow for performing search and query on O-GlcNAcylated proteins and associated literature, as well as for browsing by gene names, organisms or pathways, and downloading of the database. Also available from the website, the OGlcNAcScan tool presents a list of predicted O-GlcNAcylation sites for given protein sequences. Conclusions dbOGAP is the first public bioinformatics resource to allow systematic access to the O-GlcNAcylated proteins, and related functional information and bibliography, as well as to an O-GlcNAcylation site prediction tool. The resource will facilitate research on O-GlcNAcylation and its proteomic identification. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
37	An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics. Int J Med Inform 2011;80:56-66. [PMID: 21134784 PMCID: PMC3904285 DOI: 10.1016/j.ijmedinf.2010.10.015] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Revised: 09/20/2010] [Accepted: 10/19/2010] [Indexed: 11/15/2022] Abstract PURPOSE Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles. METHODS Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers. RESULTS Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period. CONCLUSIONS The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages. Collapse Key Words natural language processing information storage and retrieval medical informatics applications disease notification disease outbreaks biosurveillance internet Collapse MESH Headings Communicable Diseases/classification Communicable Diseases/diagnosis Communicable Diseases/epidemiology Disease Outbreaks/statistics & numerical data Humans Internet Mass Media/classification Population Surveillance/methods Public Health Informatics Collapse Grants R01 LM009959 NLM NIH HHS Collapse
38	Document classification for mining host pathogen protein-protein interactions. Artif Intell Med 2010;49:155-60. [PMID: 20472411 DOI: 10.1016/j.artmed.2010.04.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2009] [Revised: 03/19/2010] [Accepted: 03/23/2010] [Indexed: 10/19/2022] Abstract OBJECTIVE Scientific findings regarding human pathogens and their host responses are buried in the growing volume of biomedical literature and there is an urgent need to mine information pertaining to pathogenesis-related proteins especially host pathogen protein-protein interactions (HP-PPIs) from literature. METHODS In this paper, we report our exploration of developing an automated system to identify MEDLINE abstracts referring to HP-PPIs. An annotated corpus consisting of 1360 MEDLINE abstracts was generated. With this corpus, we developed and evaluated document classification systems using support vector machines (SVMs). We also investigated the effects of three feature selection methods:information gain (IG), chi(2) test, and specific mutual information (SI). The performance was measured using normalized discounted cumulative gain (NDCG) and positive predictive value (PPV) and all measures were obtained through 10-fold cross validation. RESULTS NDCG measures for classification systems using all features or a subset of features selected using IG and chi(2) test range from 0.83 to 0.89 while classification systems built based on features selected using SI had relatively lower NDCG measures. The classification system achieved a PPV of 50.7% for the top 10% ranked documents comparing to a baseline PPV of 10.0%. CONCLUSIONS Our results indicate that document classification systems can be constructed to efficiently retrieve HP-PPI related documents. Feature selection was effective in reducing the dimensionality of features to build a compact system. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
39	The relationship between CA/C ratio and individual differences in dynamic accommodative responses while viewing stereoscopic images. J Vis 2010. [DOI: 10.1167/7.15.65] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
40	The relationship between CA/C ratio and individual differences in dynamic accommodative responses while viewing stereoscopic images. J Vis 2009;9:21.1-13. [DOI: 10.1167/9.13.21] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2009] [Accepted: 10/17/2009] [Indexed: 11/24/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
41	Support vector machine-based mucin-type o-linked glycosylation site prediction using enhanced sequence feature encoding. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2009;2009:640-644. [PMID: 20351933 PMCID: PMC2815398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/29/2023] Abstract Glycosylation is a common and complex protein post-translational modification (PTM). In particular, mucin-type O-linked glycosylation is abundant and plays important biological functions. The number of determined glycosylation sites is still small and there remains the need of accurate computational prediction for annotation and functional understanding of proteins. PTM site prediction can be formulated as a machine learning task. An important step in applying machine learning to this task is encoding protein fragments as feature vectors. Here we assess existing encoding methods as well as an enhanced encoding method named composition of monomer spectrum (CMS) using support vector machines (SVMs). SVMs employing the existing encoding methods achieved AUC (area under ROC curve) of 90.3-91.3%, and ones employing CMS achieved AUC of 92.4%. Analysis of different encoding methods suggests the potential in further improving the prediction. Collapse Key Words Collapse MESH Headings Algorithms Area Under Curve Artificial Intelligence Binding Sites Computational Biology/methods Glycosylation Mucins/metabolism Protein Processing, Post-Translational Collapse Grants R01 LM009959 NLM NIH HHS Collapse
42	BioTagger-GM: a gene/protein name recognition system. J Am Med Inform Assoc 2008;16:247-55. [PMID: 19074302 DOI: 10.1197/jamia.m2844] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open Abstract OBJECTIVES Biomedical named entity recognition (BNER) is a critical component in automated systems that mine biomedical knowledge in free text. Among different types of entities in the domain, gene/protein would be the most studied one for BNER. Our goal is to develop a gene/protein name recognition system BioTagger-GM that exploits rich information in terminology sources using powerful machine learning frameworks and system combination. DESIGN BioTagger-GM consists of four main components: (1) dictionary lookup-gene/protein names in BioThesaurus and biomedical terms in UMLS Metathesaurus are tagged in text, (2) machine learning-machine learning systems are trained using dictionary lookup results as one type of feature, (3) post-processing-heuristic rules are used to correct recognition errors, and (4) system combination-a voting scheme is used to combine recognition results from multiple systems. MEASUREMENTS The BioCreAtIvE II Gene Mention (GM) corpus was used to evaluate the proposed method. To test its general applicability, the method was also evaluated on the JNLPBA corpus modified for gene/protein name recognition. The performance of the systems was evaluated through cross-validation tests and measured using precision, recall, and F-Measure. RESULTS BioTagger-GM achieved an F-Measure of 0.8887 on the BioCreAtIvE II GM corpus, which is higher than that of the first-place system in the BioCreAtIvE II challenge. The applicability of the method was also confirmed on the modified JNLPBA corpus. CONCLUSION The results suggest that terminology sources, powerful machine learning frameworks, and system combination can be integrated to build an effective BNER system. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
43	Overview of BioCreative II gene mention recognition. Genome Biol 2008;9 Suppl 2:S2. [PMID: 18834493 PMCID: PMC2559986 DOI: 10.1186/gb-2008-9-s2-s2] [Citation(s) in RCA: 193] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open Abstract Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F₁score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
44	A comparison study on algorithms of detecting long forms for short forms in biomedical text. BMC Bioinformatics 2007;8 Suppl 9:S5. [PMID: 18047706 PMCID: PMC2217663 DOI: 10.1186/1471-2105-8-s9-s5] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open Abstract MOTIVATION With more and more research dedicated to literature mining in the biomedical domain, more and more systems are available for people to choose from when building literature mining applications. In this study, we focus on one specific kind of literature mining task, i.e., detecting definitions of acronyms, abbreviations, and symbols in biomedical text. We denote acronyms, abbreviations, and symbols as short forms (SFs) and their corresponding definitions as long forms (LFs). The study was designed to answer the following questions; i) how well a system performs in detecting LFs from novel text, ii) what the coverage is for various terminological knowledge bases in including SFs as synonyms of their LFs, and iii) how to combine results from various SF knowledge bases. METHOD We evaluated the following three publicly available detection systems in detecting LFs for SFs: i) a handcrafted pattern/rule based system by Ao and Takagi, ALICE, ii) a machine learning system by Chang et al., and iii) a simple alignment-based program by Schwartz and Hearst. In addition, we investigated the conceptual coverage of two terminological knowledge bases: i) the UMLS (the Unified Medical Language System), and ii) the BioThesaurus (a thesaurus of names for all UniProt protein records). We also implemented a web interface that provides a virtual integration of various SF knowledge bases. RESULTS We found that detection systems agree with each other on most cases, and the existing terminological knowledge bases have a good coverage of synonymous relationship for frequently defined LFs. The web interface allows people to detect SF definitions from text and to search several SF knowledge bases. AVAILABILITY The web site is http://gauss.dbb.georgetown.edu/liblab/SFThesaurus. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
45	Enhancing acronym/abbreviation knowledge bases with semantic information. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2007;2007:731-735. [PMID: 18693933 PMCID: PMC2655902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/15/2007] [Revised: 07/19/2007] [Accepted: 10/11/2007] [Indexed: 05/26/2023] Abstract OBJECTIVE In the biomedical domain, a terminology knowledge base that associates acronyms/abbreviations (denoted as SFs) with the definitions (denoted as LFs) is highly needed. For the construction such terminology knowledge base, we investigate the feasibility to build a system automatically assigning semantic categories to LFs extracted from text. METHODS Given a collection of pairs (SF,LF) derived from text, we i) assess the coverage of LFs and pairs (SF,LF) in the UMLS and justify the need of a semantic category assignment system; and ii) automatically derive name phrases annotated with semantic category and construct a system using machine learning. RESULTS Utilizing ADAM, an existing collection of (SF,LF) pairs extracted from MEDLINE, our system achieved an f-measure of 87% when assigning eight UMLS-based semantic groups to LFs. The system has been incorporated into a web interface which integrates SF knowledge from multiple SF knowledge bases. Web site: http://gauss.dbb.georgetown.edu/liblab/SFThesurus. Collapse Key Words Collapse MESH Headings Abbreviations as Topic Artificial Intelligence Internet Knowledge Bases MEDLINE Semantics Terminology as Topic Unified Medical Language System Collapse Grants R01 LM009959 NLM NIH HHS Collapse
46	Roxithromycin inhibits tumor necrosis factor-alpha-induced matrix metalloproteinase-1 expression through regulating mitogen-activated protein kinase phosphorylation and Ets-1 expression. J Periodontal Res 2007;42:53-61. [PMID: 17214640 DOI: 10.1111/j.1600-0765.2006.00914.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Abstract BACKGROUND AND OBJECTIVE In periodontitis, matrix metalloproteinases (MMPs) are upregulated in response to locally released inflammatory cytokines, resulting in pathologic processes. Roxithromycin is a 14-membered ring macrolide antibiotic with broad-spectrum antibacterial effects against oral pathogens and immunomodulatory effects. Recently, we reported that roxithromycin inhibits tumor necrosis factor (TNF)-alpha-induced vascular endothelial growth factor expression in human periodontal ligament (HPDL) cell cultures. In the present study, we examined the effect of roxithromycin on TNF-alpha-induced MMP-1 production by HPDL cells. MATERIAL AND METHODS Cultured cells were incubated with 1% fetal bovine serum for 24 h, followed by treatment with 10 ng/ml TNF-alpha, 10 microM roxithromycin, and mitogen-activated protein kinase inhibitor at various concentrations. Culture supernatants and sediments were collected at different time-points and used for enzyme-linked immunosorbent assays, and northern and western blot analyses. RESULTS In HPDL cell cultures, roxithromycin strongly inhibited TNF-alpha-induced MMP-1 mRNA expression and production. The inhibition of MMP-1 gene expression by roxithromycin was dependent on de novo protein synthesis and was regulated at the transcriptional level. Roxithromycin significantly inhibited TNF-alpha-induced c-Jun N-terminal kinase activation (JNP) and marginally inhibited extracellular signal-regulated kinase (ERK) 1/2 activation, but not p38 mitogen-activated protein kinase activation. Furthermore, roxithromycin reduced the induction of Ets-1, one of the critical factors in MMP-1 transcription. CONCLUSION Roxithromycin inhibits TNF-alpha-mediated MMP-1 induction through the downregulation of ERK1/2 and JNK activation and the subsequent reduction of Ets-1, suggesting that roxithromycin may have therapeutic use in periodontitis and other chronic inflammatory conditions involving MMP-1 induction. Collapse Key Words Collapse MESH Headings Adult Anti-Bacterial Agents/pharmacology Cells, Cultured Down-Regulation/drug effects Enzyme Activation/drug effects Gene Expression Regulation/drug effects Humans JNK Mitogen-Activated Protein Kinases/antagonists & inhibitors Male Matrix Metalloproteinase Inhibitors Mitogen-Activated Protein Kinase 1/antagonists & inhibitors Mitogen-Activated Protein Kinases/drug effects Periodontal Ligament/cytology Periodontal Ligament/drug effects Phosphorylation/drug effects Proto-Oncogene Protein c-ets-1/drug effects Roxithromycin/pharmacology Transcription, Genetic/drug effects Tumor Necrosis Factor-alpha/antagonists & inhibitors p38 Mitogen-Activated Protein Kinases/drug effects Collapse Grants Collapse
47	Malaria parasite induces tryptophan-related immune suppression in mice. Parasitology 2007;134:923-30. [PMID: 17316473 DOI: 10.1017/s0031182007002326] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Abstract Plasmodium spp. cause the worst parasitic diseases in humans and evade host immunity in complicated ways. Activated catabolism of tryptophan in dendritic cells is thought to suppress immunity, which is mediated by an inducible rate-limiting enzyme of tryptophan catabolism, indoleamine 2,3 dioxygenase (IDO), via both tryptophan depletion and production of toxic metabolites. In various infections, including malaria, IDO is known to be activated but its biological significance is unclear; therefore, we investigated whether malaria parasites induce IDO to suppress host immune responses. We found that enzymatic activity of IDO was elevated systematically in our mouse malaria model, and was abolished by in vivo IDO inhibition with 1-methyl tryptophan. Experimental infection with Plasmodium yoelii showed that IDO inhibition slightly suppressed parasite density in association with enhanced proliferation and IFN-gamma production by CD4+ T cells in response to malaria parasites. Our observations suggest that induction of IDO is one of the immune mechanisms of malaria parasites. Collapse Key Words Collapse MESH Headings Animals Antimalarials/pharmacology CD4-Positive T-Lymphocytes/drug effects CD4-Positive T-Lymphocytes/physiology Chloroquine/pharmacology Enzyme Activation Enzyme Inhibitors/pharmacology Erythrocytes/parasitology Female Indoleamine-Pyrrole 2,3,-Dioxygenase/antagonists & inhibitors Indoleamine-Pyrrole 2,3,-Dioxygenase/drug effects Indoleamine-Pyrrole 2,3,-Dioxygenase/immunology Indoleamine-Pyrrole 2,3,-Dioxygenase/metabolism Interferon-gamma/blood Kynurenine/blood Malaria/immunology Malaria/metabolism Mice Mice, Inbred C57BL Plasmodium yoelii/drug effects Plasmodium yoelii/immunology Time Factors Tryptophan/analogs & derivatives Tryptophan/blood Tryptophan/immunology Tryptophan/metabolism Tryptophan/pharmacology Collapse Grants Collapse
48	SORTAL ANAPHORA RESOLUTION IN MEDLINE ABSTRACTS. Comput Intell 2007. [DOI: 10.1111/j.1467-8640.2007.00292.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
49	[S40]: Dissociation of corticothalamic and thalamocortical axon targeting by an ephA7‐mediated mechanism. Int J Dev Neurosci 2006. [DOI: 10.1016/j.ijdevneu.2006.09.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
50	Quantitative assessment of dictionary-based protein named entity tagging. J Am Med Inform Assoc 2006;13:497-507. [PMID: 16799122 PMCID: PMC1561801 DOI: 10.1197/jamia.m2085] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open Abstract OBJECTIVE Natural language processing (NLP) approaches have been explored to manage and mine information recorded in biological literature. A critical step for biological literature mining is biological named entity tagging (BNET) that identifies names mentioned in text and normalizes them with entries in biological databases. The aim of this study was to provide quantitative assessment of the complexity of BNET on protein entities through BioThesaurus, a thesaurus of gene/protein names for UniProt knowledgebase (UniProtKB) entries that was acquired using online resources. METHODS We evaluated the complexity through several perspectives: ambiguity (i.e., the number of genes/proteins represented by one name), synonymy (i.e., the number of names associated with the same gene/protein), and coverage (i.e., the percentage of gene/protein names in text included in the thesaurus). We also normalized names in BioThesaurus and measures were obtained twice, once before normalization and once after. RESULTS The current version of BioThesaurus has over 2.6 million names or 2.1 million normalized names covering more than 1.8 million UniProtKB entries. The average synonymy is 3.53 (2.86 after normalization), ambiguity is 2.31 before normalization and 2.32 after, while the coverage is 94.0% based on the BioCreAtive data set comprising MEDLINE abstracts containing genes/proteins. CONCLUSION The study indicated that names for genes/proteins are highly ambiguous and there are usually multiple names for the same gene or protein. It also demonstrated that most gene/protein names appearing in text can be found in BioThesaurus. Collapse Key Words Collapse MESH Headings Dictionaries as Topic Genes Names Natural Language Processing Proteins Vocabulary, Controlled Collapse Grants Collapse