1
|
Gayet-Ageron A, Ben Messaoud K, Richards M, Jaksic C, Gobeill J, Liyanapathirana J, Mottin L, Naderi N, Ruch P, Mariot Z, Calmy A, Friedman J, Leibovici L, Schroter S. Gender and geographical bias in the editorial decision-making process of biomedical journals: a case-control study. BMJ Evid Based Med 2024:bmjebm-2024-113083. [PMID: 39721743 DOI: 10.1136/bmjebm-2024-113083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/24/2024] [Indexed: 12/28/2024]
Abstract
OBJECTIVES To assess whether the gender (primary) and geographical affiliation (post-hoc) of the first and/or last authors are associated with publication decisions after peer review. DESIGN Case-control study. SETTING Biomedical journals. PARTICIPANTS Original peer-reviewed manuscripts submitted between 1 January 2012 and 31 December 2019. MAIN OUTCOME MEASURE Manuscripts accepted (cases) and rejected for publication (controls). RESULTS Of 6213 included manuscripts, 5294 (85.2%) first and 5479 (88.1%) last authors' gender were identified; 2511 (47.4%) and 1793 (32.7%) were women, respectively. The proportion of women first and last authors was 48.4% (n=1314) and 32.2% (n=885) among cases and 46.4% (n=1197) and 33.2% (n=908) among controls. After adjustment, the association between the first author's gender and acceptance for publication remained non-significant 1.04 (0.92 to 1.17). Acceptance for publication was lower for first authors affiliated to Asia 0.58 (0.46 to 0.73), Africa 0.75 (0.41 to 1.36) and South America 0.68 (0.40 to 1.16) compared with Europe, and for first author affiliated to upper-middle country-income 0.66 (0.47 to 0.95) and lower-middle/low 0.69 (0.46 to 1.03) compared with high country-income group. It was significantly higher when both first and last authors were affiliated to different countries from same geographical and income groups 1.35 (1.03 to 1.77), different countries and geographical but same income groups 1.50 (1.14 to 1.96) or different countries, geographical and income groups 1.78 (1.27 to 2.50) compared with authors from similar countries. The study funding was independently associated with the acceptance for publication (when compared with no funding, 1.40; 1.04 to 1.89 for funding by association & foundations, 2.76; 1.87 to 4.10 for international organisations, 1.30; 1.04 to 1.62 for non-profit & associations & foundations). The reviewers' recommendations of the original submitted version were significantly associated with the outcome (unadjusted 5.36; 4.98 to 5.78 for acceptance compared with rejection). Gender of the first author was not associated with reviewers' recommendations (adjusted 0.96, 0.87 to 1.06). CONCLUSIONS We did not identify evidence of gender bias during the editorial decision-making process for papers sent out to peer review. However, the under-representation in manuscripts accepted for publication of first authors affiliated to Asia, Africa or South America and those affiliated to upper/lower-middle and low country-income group, indicates poor representation of global scientists' opinion and supports growing demands for improving equity, diversity and inclusion in biomedical research. The more diverse the countries and incomes of the first and last authors, the greater the chances of the publication being accepted.
Collapse
Affiliation(s)
- Angèle Gayet-Ageron
- Division of Clinical Epidemiology, University Hospitals Geneva, Geneva, Switzerland
- Department of Health and Community Medicine, University of Geneva, Geneva, Switzerland
| | - Khaoula Ben Messaoud
- Department of Health and Community Medicine, University of Geneva, Geneva, Switzerland
| | | | - Cyril Jaksic
- Department of Health and Community Medicine, University of Geneva, Geneva, Switzerland
| | - Julien Gobeill
- SIB Text Mining, Swiss Institute of Bioinformatics Geneva, Geneva, Switzerland
| | - Jeevanthi Liyanapathirana
- SIB Text Mining, Swiss Institute of Bioinformatics Geneva, Geneva, Switzerland
- BiTeM Group, Information Sciences, HES-SO Geneve, Le Lignon, Switzerland
| | - Luc Mottin
- SIB Text Mining, Swiss Institute of Bioinformatics Geneva, Geneva, Switzerland
| | - Nona Naderi
- CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Université Paris-Saclay, Gif-sur-Yvette, Île-de-France, France
| | - Patrick Ruch
- SIB Text Mining, Swiss Institute of Bioinformatics Geneva, Geneva, Switzerland
- BiTeM Group, Information Sciences, HES-SO Geneve, Le Lignon, Switzerland
| | - Zoé Mariot
- Internal Medicine Division, Hôpital Riviera-Chablais, Rennaz, Switzerland
| | - Alexandra Calmy
- Division of Infectious Diseases, HIV Unit, Geneva University Hospitals, Geneva, Switzerland
| | - Julia Friedman
- Clinical Microbiology & Infection Editorial Office, Rabin Medical Center, Petah Tikva, Israel
| | - Leonard Leibovici
- Internal Medicine E, Rabin Medical Center, Petah Tikva, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Sara Schroter
- BMJ Publishing Group, London, UK
- Faculty of Public Health and Policy, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
2
|
Knafou J, Haas Q, Borissov N, Counotte M, Low N, Imeri H, Ipekci AM, Buitrago-Garcia D, Heron L, Amini P, Teodoro D. Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature. Syst Rev 2023; 12:94. [PMID: 37277872 DOI: 10.1186/s13643-023-02247-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 04/24/2023] [Indexed: 06/07/2023] Open
Abstract
BACKGROUND The COVID-19 pandemic has led to an unprecedented amount of scientific publications, growing at a pace never seen before. Multiple living systematic reviews have been developed to assist professionals with up-to-date and trustworthy health information, but it is increasingly challenging for systematic reviewers to keep up with the evidence in electronic databases. We aimed to investigate deep learning-based machine learning algorithms to classify COVID-19-related publications to help scale up the epidemiological curation process. METHODS In this retrospective study, five different pre-trained deep learning-based language models were fine-tuned on a dataset of 6365 publications manually classified into two classes, three subclasses, and 22 sub-subclasses relevant for epidemiological triage purposes. In a k-fold cross-validation setting, each standalone model was assessed on a classification task and compared against an ensemble, which takes the standalone model predictions as input and uses different strategies to infer the optimal article class. A ranking task was also considered, in which the model outputs a ranked list of sub-subclasses associated with the article. RESULTS The ensemble model significantly outperformed the standalone classifiers, achieving a F1-score of 89.2 at the class level of the classification task. The difference between the standalone and ensemble models increases at the sub-subclass level, where the ensemble reaches a micro F1-score of 70% against 67% for the best-performing standalone model. For the ranking task, the ensemble obtained the highest recall@3, with a performance of 89%. Using an unanimity voting rule, the ensemble can provide predictions with higher confidence on a subset of the data, achieving detection of original papers with a F1-score up to 97% on a subset of 80% of the collection instead of 93% on the whole dataset. CONCLUSION This study shows the potential of using deep learning language models to perform triage of COVID-19 references efficiently and support epidemiological curation and review. The ensemble consistently and significantly outperforms any standalone model. Fine-tuning the voting strategy thresholds is an interesting alternative to annotate a subset with higher predictive confidence.
Collapse
Affiliation(s)
- Julien Knafou
- University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland.
| | | | - Nikolay Borissov
- University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland
- CTU Bern, University of Bern, Bern, Switzerland
| | - Michel Counotte
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
- Wageningen Bioveterinary Research, Wageningen University & Research, Wageningen, The Netherlands
| | - Nicola Low
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
| | - Hira Imeri
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
| | - Aziz Mert Ipekci
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
| | | | - Leonie Heron
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
| | - Poorya Amini
- Risklick AG, Bern, Switzerland
- CTU Bern, University of Bern, Bern, Switzerland
| | - Douglas Teodoro
- University of Applied Sciences and Arts of Western Switzerland (HES-SO), Rue de la Tambourine 17, 1227, Geneva, Switzerland.
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.
| |
Collapse
|
3
|
Caucheteur D, May Pendlington Z, Roncaglia P, Gobeill J, Mottin L, Matentzoglu N, Agosti D, Osumi-Sutherland D, Parkinson H, Ruch P. COVoc and COVTriage: novel resources to support literature triage. Bioinformatics 2023; 39:6895097. [PMID: 36511598 PMCID: PMC9825781 DOI: 10.1093/bioinformatics/btac800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 10/28/2022] [Accepted: 12/12/2022] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Since early 2020, the coronavirus disease 2019 (COVID-19) pandemic has confronted the biomedical community with an unprecedented challenge. The rapid spread of COVID-19 and ease of transmission seen worldwide is due to increased population flow and international trade. Front-line medical care, treatment research and vaccine development also require rapid and informative interpretation of the literature and COVID-19 data produced around the world, with 177 500 papers published between January 2020 and November 2021, i.e. almost 8500 papers per month. To extract knowledge and enable interoperability across resources, we developed the COVID-19 Vocabulary (COVoc), an application ontology related to the research on this pandemic. The main objective of COVoc development was to enable seamless navigation from biomedical literature to core databases and tools of ELIXIR, a European-wide intergovernmental organization for life sciences. RESULTS This collaborative work provided data integration into SIB Literature services, an application ontology (COVoc) and a triage service named COVTriage and based on annotation processing to search for COVID-related information across pre-defined aspects with daily updates. Thanks to its interoperability potential, COVoc lends itself to wider applications, hopefully through further connections with other novel COVID-19 ontologies as has been established with Coronavirus Infectious Disease Ontology. AVAILABILITY AND IMPLEMENTATION The data at https://github.com/EBISPOT/covoc and the service at https://candy.hesge.ch/COVTriage.
Collapse
Affiliation(s)
| | - Zoë May Pendlington
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Paola Roncaglia
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Julien Gobeill
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva 1206, Switzerland
- BiTeM Group, Information Sciences, HES-SO/HEG Genève, Carouge 1227, Switzerland
| | - Luc Mottin
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva 1206, Switzerland
- BiTeM Group, Information Sciences, HES-SO/HEG Genève, Carouge 1227, Switzerland
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva 1205, Switzerland
| | - Nicolas Matentzoglu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Semanticly Ltd, London, WC2H 9JQ, UK
| | - Donat Agosti
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva 1206, Switzerland
- Plazi, Bern 3007, Switzerland
| | - David Osumi-Sutherland
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Patrick Ruch
- SIB Text Mining Group, Swiss Institute of Bioinformatics, Geneva 1206, Switzerland
- BiTeM Group, Information Sciences, HES-SO/HEG Genève, Carouge 1227, Switzerland
| |
Collapse
|
4
|
Pasche E, Mottaz A, Caucheteur D, Gobeill J, Michel PA, Ruch P. Variomes: a high recall search engine to support the curation of genomic variants. Bioinformatics 2022; 38:2595-2601. [PMID: 35274687 PMCID: PMC9048643 DOI: 10.1093/bioinformatics/btac146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 02/07/2022] [Accepted: 03/10/2022] [Indexed: 12/02/2022] Open
Abstract
Motivation Identification and interpretation of clinically actionable variants is a critical bottleneck. Searching for evidence in the literature is mandatory according to ASCO/AMP/CAP practice guidelines; however, it is both labor-intensive and error-prone. We developed a system to perform triage of publications relevant to support an evidence-based decision. The system is also able to prioritize variants. Our system searches within pre-annotated collections such as MEDLINE and PubMed Central. Results We assess the search effectiveness of the system using three different experimental settings: literature triage; variant prioritization and comparison of Variomes with LitVar. Almost two-thirds of the publications returned in the top-5 are relevant for clinical decision-support. Our approach enabled identifying 81.8% of clinically actionable variants in the top-3. Variomes retrieves on average +21.3% more articles than LitVar and returns the same number of results or more results than LitVar for 90% of the queries when tested on a set of 803 queries; thus, establishing a new baseline for searching the literature about variants. Availability and implementation Variomes is publicly available at https://candy.hesge.ch/Variomes. Source code is freely available at https://github.com/variomes/sibtm-variomes. SynVar is publicly available at https://goldorak.hesge.ch/synvar. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Emilie Pasche
- SIB Text Mining Group, Swiss Institute of Bioinformatics, 1206 Geneva, Switzerland.,BiTeM Group, Information Sciences, 1227 Carouge, Switzerland HES-SO/HEG
| | - Anaïs Mottaz
- SIB Text Mining Group, Swiss Institute of Bioinformatics, 1206 Geneva, Switzerland.,BiTeM Group, Information Sciences, 1227 Carouge, Switzerland HES-SO/HEG
| | - Déborah Caucheteur
- SIB Text Mining Group, Swiss Institute of Bioinformatics, 1206 Geneva, Switzerland.,BiTeM Group, Information Sciences, 1227 Carouge, Switzerland HES-SO/HEG
| | - Julien Gobeill
- SIB Text Mining Group, Swiss Institute of Bioinformatics, 1206 Geneva, Switzerland.,BiTeM Group, Information Sciences, 1227 Carouge, Switzerland HES-SO/HEG
| | - Pierre-André Michel
- SIB Text Mining Group, Swiss Institute of Bioinformatics, 1206 Geneva, Switzerland.,BiTeM Group, Information Sciences, 1227 Carouge, Switzerland HES-SO/HEG
| | - Patrick Ruch
- SIB Text Mining Group, Swiss Institute of Bioinformatics, 1206 Geneva, Switzerland.,BiTeM Group, Information Sciences, 1227 Carouge, Switzerland HES-SO/HEG
| |
Collapse
|
5
|
Allot A, Lee K, Chen Q, Luo L, Lu Z. LitSuggest: a web-based system for literature recommendation and curation using machine learning. Nucleic Acids Res 2021; 49:W352-W358. [PMID: 33950204 DOI: 10.1093/nar/gkab326] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 01/02/2023] Open
Abstract
Searching and reading relevant literature is a routine practice in biomedical research. However, it is challenging for a user to design optimal search queries using all the keywords related to a given topic. As such, existing search systems such as PubMed often return suboptimal results. Several computational methods have been proposed as an effective alternative to keyword-based query methods for literature recommendation. However, those methods require specialized knowledge in machine learning and natural language processing, which can make them difficult for biologists to utilize. In this paper, we propose LitSuggest, a web server that provides an all-in-one literature recommendation and curation service to help biomedical researchers stay up to date with scientific literature. LitSuggest combines advanced machine learning techniques for suggesting relevant PubMed articles with high accuracy. In addition to innovative text-processing methods, LitSuggest offers multiple advantages over existing tools. First, LitSuggest allows users to curate, organize, and download classification results in a single interface. Second, users can easily fine-tune LitSuggest results by updating the training corpus. Third, results can be readily shared, enabling collaborative analysis and curation of scientific literature. Finally, LitSuggest provides an automated personalized weekly digest of newly published articles for each user's project. LitSuggest is publicly available at https://www.ncbi.nlm.nih.gov/research/litsuggest.
Collapse
Affiliation(s)
- Alexis Allot
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kyubum Lee
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA.,Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Qingyu Chen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Ling Luo
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|