Synthetic ECG generation for data augmentation and transfer learning in arrhythmia classification

Núñez Rodríguez, José Fernando; Arjona Martínez, Jamie; Béjar Alonso, Javier

Synthetic ECG generation for data augmentation and transfer learning in arrhythmia classification

dc.contributor

Universitat Politècnica de Catalunya. Doctorat en Intel·ligència Artificial

dc.contributor

Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa

dc.contributor

Universitat Politècnica de Catalunya. Departament de Ciències de la Computació

dc.contributor

Universitat Politècnica de Catalunya. IMP - Information Modeling and Processing

dc.contributor

Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group

dc.contributor.author

Núñez Rodríguez, José Fernando

dc.contributor.author

Arjona Martínez, Jamie

dc.contributor.author

Béjar Alonso, Javier

dc.date.issued

2024-11-27

dc.identifier

Nuñez, J.; Arjona, J.; Bejar, J. Synthetic ECG generation for data augmentation and transfer learning in arrhythmia classification. 2024. DOI 10.48550/arXiv.2411.18456 .

dc.identifier

https://arxiv.org/abs/2411.18456

dc.identifier

https://hdl.handle.net/2117/427708

dc.identifier

10.48550/arXiv.2411.18456

dc.description.abstract

Deep learning models need a sufficient amount of data in order to be able to find the hidden patterns in it. It is the purpose of generative modeling to learn the data distribution, thus allowing us to sample more data and augment the original dataset. In the context of physiological data, and more specifically electrocardiogram (ECG) data, given its sensitive nature and expensive data collection, we can exploit the benefits of generative models in order to enlarge existing datasets and improve downstream tasks, in our case, classification of heart rhythm. In this work, we explore the usefulness of synthetic data generated with different generative models from Deep Learning namely Diffweave, Time-Diffusion and Time-VQVAE in order to obtain better classification results for two open source multivariate ECG datasets. Moreover, we also investigate the effects of transfer learning, by fine-tuning a synthetically pre-trained model and then progressively adding increasing proportions of real data. We conclude that although the synthetic samples resemble the real ones, the classification improvement when simply augmenting the real dataset is barely noticeable on individual datasets, but when both datasets are merged the results show an increase across all metrics for the classifiers when using synthetic samples as augmented data. From the fine-tuning results the Time-VQVAE generative model has shown to be superior to the others but not powerful enough to achieve results close to a classifier trained with real data only. In addition, methods and metrics for measuring closeness between synthetic data and the real one have been explored as a side effect of the main research questions of this study.

dc.description.abstract

Preprint

dc.format

23 p.

dc.format

application/pdf

dc.language

eng

dc.rights

Open Access

dc.subject

Àrees temàtiques de la UPC::Enginyeria biomèdica

dc.subject

Àrees temàtiques de la UPC::Ciències de la salut

dc.subject

Synthetic data

dc.subject

Transfer learning

dc.subject

Time series

dc.subject

Physiological signals

dc.subject

ECG

dc.title

Synthetic ECG generation for data augmentation and transfer learning in arrhythmia classification

dc.type

External research report

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

E-prints [73124]