Synthetic ECG generation for data augmentation and transfer learning in arrhythmia classification

dc.contributor
Universitat Politècnica de Catalunya. Doctorat en Intel·ligència Artificial
dc.contributor
Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa
dc.contributor
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.contributor
Universitat Politècnica de Catalunya. IMP - Information Modeling and Processing
dc.contributor
Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group
dc.contributor.author
Núñez Rodríguez, José Fernando
dc.contributor.author
Arjona Martínez, Jamie
dc.contributor.author
Béjar Alonso, Javier
dc.date.issued
2024-11-27
dc.identifier
Nuñez, J.; Arjona, J.; Bejar, J. Synthetic ECG generation for data augmentation and transfer learning in arrhythmia classification. 2024. DOI 10.48550/arXiv.2411.18456 .
dc.identifier
https://arxiv.org/abs/2411.18456
dc.identifier
https://hdl.handle.net/2117/427708
dc.identifier
10.48550/arXiv.2411.18456
dc.description.abstract
Deep learning models need a sufficient amount of data in order to be able to find the hidden patterns in it. It is the purpose of generative modeling to learn the data distribution, thus allowing us to sample more data and augment the original dataset. In the context of physiological data, and more specifically electrocardiogram (ECG) data, given its sensitive nature and expensive data collection, we can exploit the benefits of generative models in order to enlarge existing datasets and improve downstream tasks, in our case, classification of heart rhythm. In this work, we explore the usefulness of synthetic data generated with different generative models from Deep Learning namely Diffweave, Time-Diffusion and Time-VQVAE in order to obtain better classification results for two open source multivariate ECG datasets.
dc.description.abstract
Deep learning models need a sufficient amount of data in order to be able to find the hidden patterns in it. It is the purpose of generative modeling to learn the data distribution, thus allowing us to sample more data and augment the original dataset. In the context of physiological data, and more specifically electrocardiogram (ECG) data, given its sensitive nature and expensive data collection, we can exploit the benefits of generative models in order to enlarge existing datasets and improve downstream tasks, in our case, classification of heart rhythm. In this work, we explore the usefulness of synthetic data generated with different generative models from Deep Learning namely Diffweave, Time-Diffusion and Time-VQVAE in order to obtain better classification results for two open source multivariate ECG datasets. Moreover, we also investigate the effects of transfer learning, by fine-tuning a synthetically pre-trained model and then progressively adding increasing proportions of real data. We conclude that although the synthetic samples resemble the real ones, the classification improvement when simply augmenting the real dataset is barely noticeable on individual datasets, but when both datasets are merged the results show an increase across all metrics for the classifiers when using synthetic samples as augmented data. From the fine-tuning results the Time-VQVAE generative model has shown to be superior to the others but not powerful enough to achieve results close to a classifier trained with real data only. In addition, methods and metrics for measuring closeness between synthetic data and the real one have been explored as a side effect of the main research questions of this study.
dc.description.abstract
Preprint
dc.format
23 p.
dc.format
application/pdf
dc.language
eng
dc.rights
Open Access
dc.subject
Àrees temàtiques de la UPC::Enginyeria biomèdica
dc.subject
Àrees temàtiques de la UPC::Ciències de la salut
dc.subject
Synthetic data
dc.subject
Transfer learning
dc.subject
Time series
dc.subject
Physiological signals
dc.subject
ECG
dc.title
Synthetic ECG generation for data augmentation and transfer learning in arrhythmia classification
dc.type
External research report


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

E-prints [73124]