Institut Català de la Salut
[Amidei J, Kaltenbrunner A, Ferreira De Sá JG] AI and Data for Society Research Group, Internet Interdisciplinary Institute, Universitat Oberta de Catalunya, Barcelona, Spain. [Nieto R] eHealth Lab Research Group, Faculty of Psychology and Educational Sciences, Universitat Oberta de Catalunya, Barcelona, Spain. [Serrat M] Unitat d’Expertesa en Síndromes de Sensibilització Central, Servei de Reumatologia, Vall d’Hebron Hospital Universitari, Barcelona, Spain. Escola Universitària de Fisioteràpia, Escoles Universitàries Gimbernat, Barcelona, Spain. [Albajes K] Psyclinic Mental Health, Barcelona, Spain
Vall d'Hebron Barcelona Hospital Campus
2025-05-08T11:09:14Z
2025-05-08T11:09:14Z
2025
Automated assessment; Chronic pain; Large language models
Avaluació automatitzada; Dolor crònic; Models de llenguatge extens
Evaluación automatizada; Dolor crónico; Modelos de lenguaje extenso
Background: Chronic pain, affecting more than 20% of the global population, has an enormous pernicious impact on individuals as well as economic ramifications at both the health and social levels. Accordingly, tools that enhance pain assessment can considerably impact people suffering from pain and society at large. In this context, assessment methods based on individuals’ personal experiences, such as written narratives (WNs), offer relevant insights into understanding pain from a personal perspective. This approach can uncover subjective, intricate, and multifaceted aspects that standardized questionnaires can overlook. However, WNs can be time-consuming for clinicians. Therefore, a tool that uses WNs while reducing the time required for their evaluation could have a significantly beneficial impact on people's pain assessment. Objective: This study is the first evaluation of the potential of applying large language models (LLMs) to assist clinicians in assessing patients’ pain expressed through WNs. Methods: We performed an experiment based on 43 WNs made by people with fibromyalgia and qualitatively evaluated in a prior study. Focusing on pain severity and disability, we prompt GPT-4 (with temperature parameter settings 0 or 1) to assign scores and scores’ explanations, to these WNs. Then, we quantitatively compare GPT-4 scores with experts’ scores of the same narratives, using statistical measures such as Pearson correlations, root mean squared error, the weighted version of the Gwet agreement coefficient, and Krippendorff α. Additionally, 2 experts specialized in chronic pain conducted a qualitative analysis of the scores’ explanation to assess their accuracy and potential applicability of GPT’s analysis for future pain narrative evaluations. Results: Our analysis reveals that GPT-4’s performance in assessing pain narratives yielded promising results. GPT-4 was comparable in terms of agreement with experts (with a weighted percentage agreement higher than 0.95), correlations with standardized measurements (for example in the range of 0.43 and 0.49 between the Revised Fibromyalgia Impact Questionnaire and GTP-4 with temperatures 1), and low error rates (root mean squared error of 1.20 for severity and 1.44 for disability). Moreover, experts generally deemed the ratings provided by GPT-4, as well as the scores’ explanation, to be adequate. However, we observe that GPT has a slight tendency to overestimate pain severity and disability with a lower SD than expert estimates. Conclusions: These findings underline the potential of LLMs in facilitating the assessment of WNs of people with fibromyalgia, offering a novel approach to understanding and evaluating patient pain experiences. Integrating automated assessments through LLMs presents opportunities for streamlining and enhancing the assessment process, paving the way for improved patient care and tailored interventions in the chronic pain management field.
Universitat Oberta de Catalunya supported this study by covering the fees associated with the publication of this manuscript.
Article
Published version
English
Qüestionaris; Algorismes; Fibromiàlgia; Dolor crònic; Dolor - Mesurament; PHENOMENA AND PROCESSES::Mathematical Concepts::Algorithms; DISEASES::Pathological Conditions, Signs and Symptoms::Signs and Symptoms::Neurologic Manifestations::Pain::Chronic Pain; DISEASES::Musculoskeletal Diseases::Muscular Diseases::Fibromyalgia; ANALYTICAL, DIAGNOSTIC AND THERAPEUTIC TECHNIQUES, AND EQUIPMENT::Diagnosis::Diagnostic Techniques and Procedures::Physical Examination::Neurologic Examination::Pain Measurement; ANALYTICAL, DIAGNOSTIC AND THERAPEUTIC TECHNIQUES, AND EQUIPMENT::Investigative Techniques::Epidemiologic Methods::Data Collection::Surveys and Questionnaires; FENÓMENOS Y PROCESOS::conceptos matemáticos::algoritmos; ENFERMEDADES::afecciones patológicas, signos y síntomas::signos y síntomas::manifestaciones neurológicas::dolor::dolor crónico; ENFERMEDADES::enfermedades musculoesqueléticas::enfermedades musculares::fibromialgia; TÉCNICAS Y EQUIPOS ANALÍTICOS, DIAGNÓSTICOS Y TERAPÉUTICOS::diagnóstico::técnicas y procedimientos diagnósticos::exploración física::exploración neurológica::medida del dolor; TÉCNICAS Y EQUIPOS ANALÍTICOS, DIAGNÓSTICOS Y TERAPÉUTICOS::técnicas de investigación::métodos epidemiológicos::recopilación de datos::encuestas y cuestionarios
JMIR Publications
Journal of Medical Internet Research;27
https://doi.org/10.2196/65903
Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/
Articles científics - HVH [3439]