Kacer EO. Evaluating AI-based breastfeeding chatbots: quality, readability, and reliability analysis.
PLoS One 2025;
20:e0319782. [PMID:
40096051 PMCID:
PMC11913278 DOI:
10.1371/journal.pone.0319782]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 02/08/2025] [Indexed: 03/19/2025] Open
Abstract
BACKGROUND
In recent years, expectant and breastfeeding mothers commonly use various breastfeeding-related social media applications and websites to seek breastfeeding-related information. At the same time, AI-based chatbots-such as ChatGPT, Gemini, and Copilot-have become increasingly prevalent on these platforms (or on dedicated websites), providing automated, user-oriented breastfeeding guidance.
AIM
The goal of our study is to understand the relative performance of three AI-based chatbots: ChatGPT, Gemini, and Copilot, by evaluating the quality, reliability, readability, and similarity of the breastfeeding information they provide.
METHODS
Two researchers evaluated the information provided by three different AI-based breastfeeding chatbots: ChatGPT version 3.5, Gemini, and Copilot. A total of 50 frequently asked questions about breastfeeding were identified and used in the study, divided into two categories (Baby-Centered Questions and Mother-Centered Questions), and evaluated using five scoring criteria, including the Quality Information Provision for Patients (EQIP) scale, the Simple Measure of Gobbledygook (SMOG) scale, the Similarity Index (SI), the Modified Dependability Scoring System (mDISCERN), and the Global Quality Scale (GQS).
RESULTS
The evaluation of AI chatbots' answers showed statistically significant differences across all criteria (p < 0.05). Copilot scored highest on the EQIP, SMOG, and SI scales, while Gemini excelled in mDISCERN and GQS evaluations. No significant difference was found between Copilot and Gemini for mDISCERN and GQS scores. All three chatbots demonstrated high reliability and quality, though their readability required university-level education. Notably, ChatGPT displayed high originality, while Copilot exhibited the greatest similarity in responses.
CONCLUSION
AI chatbots provide reliable answers to breastfeeding questions, but the information can be hard to understand. While more reliable than other online sources, their accuracy and usability are still in question. Further research is necessary to facilitate the integration of advanced AI in healthcare.
Collapse