Exploring the Capacity of Large Language Models to Assess the Chronic Pain Experience: Algorithm Development and Validation

Other authors

Institut Català de la Salut

[Amidei J, Kaltenbrunner A, Ferreira De Sá JG] AI and Data for Society Research Group, Internet Interdisciplinary Institute, Universitat Oberta de Catalunya, Barcelona, Spain. [Nieto R] eHealth Lab Research Group, Faculty of Psychology and Educational Sciences, Universitat Oberta de Catalunya, Barcelona, Spain. [Serrat M] Unitat d’Expertesa en Síndromes de Sensibilització Central, Servei de Reumatologia, Vall d’Hebron Hospital Universitari, Barcelona, Spain. Escola Universitària de Fisioteràpia, Escoles Universitàries Gimbernat, Barcelona, Spain. [Albajes K] Psyclinic Mental Health, Barcelona, Spain

Vall d'Hebron Barcelona Hospital Campus

Publication date

2025-05-08T11:09:14Z

2025-05-08T11:09:14Z

2025



Abstract

Automated assessment; Chronic pain; Large language models


Avaluació automatitzada; Dolor crònic; Models de llenguatge extens


Evaluación automatizada; Dolor crónico; Modelos de lenguaje extenso


Background: Chronic pain, affecting more than 20% of the global population, has an enormous pernicious impact on individuals as well as economic ramifications at both the health and social levels. Accordingly, tools that enhance pain assessment can considerably impact people suffering from pain and society at large. In this context, assessment methods based on individuals’ personal experiences, such as written narratives (WNs), offer relevant insights into understanding pain from a personal perspective. This approach can uncover subjective, intricate, and multifaceted aspects that standardized questionnaires can overlook. However, WNs can be time-consuming for clinicians. Therefore, a tool that uses WNs while reducing the time required for their evaluation could have a significantly beneficial impact on people's pain assessment. Objective: This study is the first evaluation of the potential of applying large language models (LLMs) to assist clinicians in assessing patients’ pain expressed through WNs. Methods: We performed an experiment based on 43 WNs made by people with fibromyalgia and qualitatively evaluated in a prior study. Focusing on pain severity and disability, we prompt GPT-4 (with temperature parameter settings 0 or 1) to assign scores and scores’ explanations, to these WNs. Then, we quantitatively compare GPT-4 scores with experts’ scores of the same narratives, using statistical measures such as Pearson correlations, root mean squared error, the weighted version of the Gwet agreement coefficient, and Krippendorff α. Additionally, 2 experts specialized in chronic pain conducted a qualitative analysis of the scores’ explanation to assess their accuracy and potential applicability of GPT’s analysis for future pain narrative evaluations. Results: Our analysis reveals that GPT-4’s performance in assessing pain narratives yielded promising results. GPT-4 was comparable in terms of agreement with experts (with a weighted percentage agreement higher than 0.95), correlations with standardized measurements (for example in the range of 0.43 and 0.49 between the Revised Fibromyalgia Impact Questionnaire and GTP-4 with temperatures 1), and low error rates (root mean squared error of 1.20 for severity and 1.44 for disability). Moreover, experts generally deemed the ratings provided by GPT-4, as well as the scores’ explanation, to be adequate. However, we observe that GPT has a slight tendency to overestimate pain severity and disability with a lower SD than expert estimates. Conclusions: These findings underline the potential of LLMs in facilitating the assessment of WNs of people with fibromyalgia, offering a novel approach to understanding and evaluating patient pain experiences. Integrating automated assessments through LLMs presents opportunities for streamlining and enhancing the assessment process, paving the way for improved patient care and tailored interventions in the chronic pain management field.


Universitat Oberta de Catalunya supported this study by covering the fees associated with the publication of this manuscript.

Document Type

Article


Published version

Language

English

Publisher

JMIR Publications

Related items

Journal of Medical Internet Research;27

https://doi.org/10.2196/65903

Recommended citation

This citation was generated automatically.

Rights

Attribution 4.0 International

http://creativecommons.org/licenses/by/4.0/

This item appears in the following Collection(s)