1
|
Deshpande P, Rasin A. Correlation Aware Relevance-Based Semantic Index for Clinical Big Data Repository. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01095-w. [PMID: 38653911 DOI: 10.1007/s10278-024-01095-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 03/07/2024] [Accepted: 03/14/2024] [Indexed: 04/25/2024]
Abstract
In this paper, we focus on indexing mechanisms for unstructured clinical big integrated data repository systems. Clinical data is unstructured and heterogeneous, which comes in different files and formats. Accessing data efficiently and effectively are critical challenges. Traditional indexing mechanisms are difficult to apply on unstructured data, especially by identifying correlation information between clinical data elements. In this research work, we developed a correlation-aware relevance-based index that retrieves clinical data by fetching most relevant cases efficiently. In our previous work, we designed a methodology that categorizes medical data based on the semantics of data elements and merges them into an integrated repository. We developed a data integration system for medical data sources that combines heterogeneous medical data and provides access to knowledge-based database repositories to different users. In this research work, we designed an indexing system using semantic tags extracted from clinical data sources and medical ontologies that retrieves relevant data from database repositories and speeds up the process of data retrieval. Our objective is to provide an integrated biomedical database repository that can be used by radiologists as a reference, or for patient care, or by researchers. In this paper, we focus on designing a technique that performs data processing for data integration, learn the semantic properties of data elements, and develop a correlation-aware topic index that facilitates efficient data retrieval. We generated semantic tags by identifying key elements from integrated clinical cases using topic modeling techniques. We investigated a technique that identifies tags for merged categories and provides an index to fetch data from an integrated database repository. We developed a topic coherence matrix that shows how well a topic is supported by a corpus from clinical cases and medical ontologies. We were able to find more relevant results using an annotation index from an integrated database repository, and there was a 61% increase in a recall. We evaluated results with the help of experts and compared them with naive index (index with all terms from the corpus). Our approach improved data retrieval quality by providing most relevant results and reduced data retrieval time as we applied correlation-aware index on an integrated data repository. Topic indexing approach proposed in this research work identifies tags based on a correlation between different data elements, improves data retrieval time, and provides most relevant cases as an outcome of this system.
Collapse
Affiliation(s)
- Priya Deshpande
- Department of Electrical and Computer Engineering, Marquette University, Milwaukee, WI, 53233, USA.
| | | |
Collapse
|
2
|
Deshpande P, Rasin A, Tchoua R, Furst J, Raicu D, Schinkel M, Trivedi H, Antani S. Biomedical heterogeneous data categorization and schema mapping toward data integration. Front Big Data 2023; 6:1173038. [PMID: 37139170 PMCID: PMC10149933 DOI: 10.3389/fdata.2023.1173038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 03/17/2023] [Indexed: 05/05/2023] Open
Abstract
Data integration is a well-motivated problem in the clinical data science domain. Availability of patient data, reference clinical cases, and datasets for research have the potential to advance the healthcare industry. However, the unstructured (text, audio, or video data) and heterogeneous nature of the data, the variety of data standards and formats, and patient privacy constraint make data interoperability and integration a challenge. The clinical text is further categorized into different semantic groups and may be stored in different files and formats. Even the same organization may store cases in different data structures, making data integration more challenging. With such inherent complexity, domain experts and domain knowledge are often necessary to perform data integration. However, expert human labor is time and cost prohibitive. To overcome the variability in the structure, format, and content of the different data sources, we map the text into common categories and compute similarity within those. In this paper, we present a method to categorize and merge clinical data by considering the underlying semantics behind the cases and use reference information about the cases to perform data integration. Evaluation shows that we were able to merge 88% of clinical data from five different sources.
Collapse
Affiliation(s)
- Priya Deshpande
- Marquette University, Milwaukee, WI, United States
- *Correspondence: Priya Deshpande
| | | | | | - Jacob Furst
- DePaul University, Chicago, IL, United States
| | | | - Michiel Schinkel
- Center for Experimental and Molecular Medicine (CEMM), University of Amsterdam, Amsterdam, Netherlands
| | | | - Sameer Antani
- National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| |
Collapse
|
3
|
Feric Z, Bohm Agostini N, Beene D, Signes-Pastor AJ, Halchenko Y, Watkins D, MacKenzie D, Karagas M, Manjourides J, Alshawabkeh A, Kaeli D. A Secure and Reusable Software Architecture for Supporting Online Data Harmonization. PROCEEDINGS : ... IEEE INTERNATIONAL CONFERENCE ON BIG DATA. IEEE INTERNATIONAL CONFERENCE ON BIG DATA 2021; 2021:2801-2812. [PMID: 35449545 PMCID: PMC9020435 DOI: 10.1109/bigdata52589.2021.9671538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Retrospective data harmonization across multiple research cohorts and studies is frequently done to increase statistical power, provide comparison analysis, and create a richer data source for data mining. However, when combining disparate data sources, harmonization projects face data management and analysis challenges. These include differences in the data dictionaries and variable definitions, privacy concerns surrounding health data representing sensitive populations, and lack of properly defined data models. With the availability of mature open-source web-based database technologies, developing a complete software architecture to overcome the challenges associated with the harmonization process can alleviate many roadblocks. By leveraging state-of-the-art software engineering and database principles, we can ensure data quality and enable cross-center online access and collaboration. This paper outlines a complete software architecture developed and customized using the Django web framework, leveraged to harmonize sensitive data collected from three NIH-support birth cohorts. We describe our framework and show how we successfully overcame challenges faced when harmonizing data from these cohorts. We discuss our efforts in data cleaning, data sharing, data transformation, data visualization, and analytics, while reflecting on what we have learned to date from these harmonized datasets.
Collapse
Affiliation(s)
- Zlatan Feric
- Dept. of Electrical and Computer Engineering, Northeastern University
| | | | - Daniel Beene
- Community Environmental Health Program, College of Pharmacy, Health Sciences Center, University of New Mexico
| | | | - Yuliya Halchenko
- Department of Epidemiology, Geisel School of Medicine at Dartmouth
| | - Deborah Watkins
- Environmental Health Sciences, School of Public Health, University of Michigan
| | - Debra MacKenzie
- Community Environmental Health Program, College of Pharmacy, Health Sciences Center, University of New Mexico
| | - Margaret Karagas
- Department of Epidemiology, Geisel School of Medicine at Dartmouth
| | | | - Akram Alshawabkeh
- Dept. of Civil and Environmental Engineering, Northeastern University
| | - David Kaeli
- Dept. of Electrical and Computer Engineering, Northeastern University
| |
Collapse
|
4
|
Courtot M, Gupta D, Liyanage I, Xu F, Burdett T. BioSamples database: FAIRer samples metadata to accelerate research data management. Nucleic Acids Res 2021; 50:D1500-D1507. [PMID: 34747489 PMCID: PMC8728232 DOI: 10.1093/nar/gkab1046] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/13/2021] [Accepted: 10/14/2021] [Indexed: 12/04/2022] Open
Abstract
The BioSamples database at EMBL-EBI is the central institutional repository for sample metadata storage and connection to EMBL-EBI archives and other resources. The technical improvements to our infrastructure described in our last update have enabled us to scale and accommodate an increasing number of communities, resulting in a higher number of submissions and more heterogeneous data. The BioSamples database now has a valuable set of features and processes to improve data quality in BioSamples, and in particular enriching metadata content and following FAIR principles. In this manuscript, we describe how BioSamples in 2021 handles requirements from our community of users through exemplar use cases: increased findability of samples and improved data management practices support the goals of the ReSOLUTE project, how the plant community benefits from being able to link genotypic to phenotypic information, and we highlight how cumulatively those improvements contribute to more complex multi-omics data integration supporting COVID-19 research. Finally, we present underlying technical features used as pillars throughout those use cases and how they are reused for expanded engagement with communities such as FAIRplus and the Global Alliance for Genomics and Health. Availability: The BioSamples database is freely available at http://www.ebi.ac.uk/biosamples. Content is distributed under the EMBL-EBI Terms of Use available at https://www.ebi.ac.uk/about/terms-of-use. The BioSamples code is available at https://github.com/EBIBioSamples/biosamples-v4 and distributed under the Apache 2.0 license.
Collapse
Affiliation(s)
- Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Dipayan Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Isuru Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Fuqi Xu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
6
|
Gheita TA, Noor RA, Abualfadl E, Abousehly OS, El-Gazzar II, El Shereef RR, Senara S, Abdalla AM, Khalil NM, ElSaman AM, Tharwat S, Nasef SI, Mohamed EF, Noshy N, El-Essawi DF, Moshrif AH, Fawzy RM, El-Najjar AR, Hammam N, Ismail F, ElKhalifa M, Samy N, Hassan E, Abaza NM, ElShebini E, Fathi HM, Salem MN, Abdel-Fattah YH, Saad E, Abd Elazim MI, Eesa NN, El-Bahnasawy AS, El-Hammady DH, El-Shanawany AT, Ibrahim SE, Said EA, El-Saadany HM, Selim ZI, Fawzy SM, Raafat HA. Adult systemic lupus erythematosus in Egypt: The nation-wide spectrum of 3661 patients and world-wide standpoint. Lupus 2021; 30:1526-1535. [PMID: 33951965 DOI: 10.1177/09612033211014253] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
OBJECTIVE The aim of this study was to present the epidemiology, clinical manifestations and treatment pattern of systemic lupus erythematosus (SLE) in Egyptian patients over the country and compare the findings to large cohorts worldwide. Objectives were extended to focus on the age at onset and gender driven influence on the disease characteristics. PATIENTS AND METHOD This population-based, multicenter, cross-sectional study included 3661 adult SLE patients from Egyptian rheumatology departments across the nation. Demographic, clinical, and therapeutic data were assessed for all patients. RESULTS The study included 3661 patients; 3296 females and 365 males (9.03:1) and the median age was 30 years (17-79 years), disease duration 4 years (0-75 years) while the median age at disease onset was 25 years (4-75 years). The overall estimated prevalence of adult SLE in Egypt was 6.1/100,000 population (1.2/100,000 males and 11.3/100,000 females).There were 316 (8.6%) juvenile-onset (Jo-SLE) and 3345 adult-onset (Ao-SLE). Age at onset was highest in South and lowest in Cairo (p < 0.0001). CONCLUSION SLE in Egypt had a wide variety of clinical and immunological manifestations, with some similarities with that in other nations and differences within the same country. The clinical characteristics, autoantibodies and comorbidities are comparable between Ao-SLE and Jo-SLE. The frequency of various clinical and immunological manifestations varied between gender. Additional studies are needed to determine the underlying factors contributing to gender and age of onset differences.
Collapse
Affiliation(s)
- Tamer A Gheita
- Rheumatology Department, Faculty of Medicine, Cairo University, Cairo, Egypt
| | - Rasha Abdel Noor
- Internal Medicine Department, Rheumatology Unit, Tanta University, Gharbia, Egypt
| | - Esam Abualfadl
- Rheumatology Department, Faculty of Medicine, Sohag University, Sohag, Egypt.,Qena/Luxor Hospitals, Qena, Egypt
| | - Osama S Abousehly
- Rheumatology Department, Faculty of Medicine, Sohag University, Sohag, Egypt
| | - Iman I El-Gazzar
- Rheumatology Department, Faculty of Medicine, Cairo University, Cairo, Egypt
| | - Rawhya R El Shereef
- Rheumatology Department, Faculty of Medicine, Minia University, Minia, Egypt
| | - Soha Senara
- Rheumatology Department, Faculty of Medicine, Fayoum University, Fayoum, Egypt
| | - Ahmed M Abdalla
- Rheumatology Department, Faculty of Medicine, Aswan University, Aswan, Egypt
| | - Noha M Khalil
- Internal Medicine Department, Rheumatology Unit, Faculty of Medicine, Cairo University, Cairo, Egypt
| | - Ahmed M ElSaman
- Rheumatology Department, Faculty of Medicine, Sohag University, Sohag, Egypt
| | - Samar Tharwat
- Rheumatology Unit, Internal Medicine, Mansoura University, Dakahlia, Egypt
| | - Samah I Nasef
- Rheumatology Department, Faculty of Medicine, Suez-Canal University, Ismailia, Egypt
| | - Eman F Mohamed
- Internal Medicine Department, Rheumatology Unit, Faculty of Medicine (Girls), Al-Azhar University, Cairo, Egypt
| | - Nermeen Noshy
- Internal Medicine Department, Rheumatology Unit, Faculty of Medicine, Ain-Shams University, Cairo, Egypt
| | - Dina F El-Essawi
- Internal Medicine Department, Rheumatology Unit, Egyptian Atomic Energy Authority (EAEA), Cairo, Egypt
| | - Abdel Hafeez Moshrif
- Rheumatology Department, Faculty of Medicine, Al-Azhar University, Assuit, Egypt
| | - Rasha M Fawzy
- Rheumatology Department, Faculty of Medicine, Benha University, Kalubia, Egypt
| | - Amany R El-Najjar
- Rheumatology Department, Faculty of Medicine, Zagazig University, Sharkia, Egypt
| | - Nevin Hammam
- Rheumatology Department, Faculty of Medicine, Assuit University, Assuit, Egypt.,Rheumatology Department, University of California San Francisco (UCSF), San Francisco, CA, USA
| | - Faten Ismail
- Rheumatology Department, Faculty of Medicine, Minia University, Minia, Egypt
| | - Marwa ElKhalifa
- Internal Medicine Department, Rheumatology Unit, Faculty of Medicine, Alexandria University, Alexandria, Egypt
| | - Nermeen Samy
- Internal Medicine Department, Rheumatology Unit, Faculty of Medicine, Ain-Shams University, Cairo, Egypt
| | - Eman Hassan
- Internal Medicine Department, Rheumatology Unit, Faculty of Medicine, Alexandria University, Alexandria, Egypt
| | - Nouran M Abaza
- Rheumatology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Emad ElShebini
- Internal Medicine Department, Rheumatology Unit, Menoufiya University, Menoufiya, Egypt
| | - Hanan M Fathi
- Rheumatology Department, Faculty of Medicine, Fayoum University, Fayoum, Egypt
| | - Mohamed N Salem
- Internal Medicine Department, Rheumatology Unit, Faculty of Medicine, Beni-Suef University, Beni-Suef, Egypt
| | - Yousra H Abdel-Fattah
- Rheumatology Department, Faculty of Medicine, Alexandria University, Alexandria, Egypt
| | - Ehab Saad
- Rheumatology Department, Faculty of Medicine, South Valley University, Qena, Egypt
| | - Mervat I Abd Elazim
- Rheumatology Department, Faculty of Medicine, Beni-Suef University, Beni-Suef, Egypt
| | - Nahla N Eesa
- Rheumatology Department, Faculty of Medicine, Cairo University, Cairo, Egypt
| | - Amany S El-Bahnasawy
- Rheumatology Department, Faculty of Medicine, Mansoura University, Dakahlia, Egypt
| | - Dina H El-Hammady
- Rheumatology Department, Faculty of Medicine, Helwan University, Cairo, Egypt
| | - Amira T El-Shanawany
- Rheumatology Department, Faculty of Medicine, Menoufiya University, Menoufiya, Egypt
| | - Soha E Ibrahim
- Rheumatology Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Emtethal A Said
- Rheumatology Department, Faculty of Medicine, Benha University, Kalubia, Egypt
| | - Hanan M El-Saadany
- Rheumatology Department, Faculty of Medicine, Tanta University, Gharbia, Egypt
| | - Zahraa I Selim
- Rheumatology Department, Faculty of Medicine, Assuit University, Assuit, Egypt
| | - Samar M Fawzy
- Rheumatology Department, Faculty of Medicine, Cairo University, Cairo, Egypt
| | - Hala A Raafat
- Rheumatology Department, Faculty of Medicine, Cairo University, Cairo, Egypt
| |
Collapse
|