Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa
Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group
2023-03
This paper shows the added value of using the existing specific domain knowledge to generate new derivated variables to complement a target dataset and the benefits of including these new variables into further data analysis methods. The main contribution of the paper is to propose a methodology to generate these new variables as a part of preprocessing, under a double approach: creating 2nd generation know dge-driven variables, catching the experts criteria used for reasoning on the field or 3rd generation data-driven indicators, these created by clustering original variables. And Data Mining and Artificial Intelligence techniques like Clustering or Traffic light Panels help to obtain successful results. Some results of the project INSESS-COVID19 are presented, Basic descriptive analysis gives simple results that eventhough they are useful to support basic policy-making, especially in health, a much richer global perspective is acquired after including derivated variables. When 2nd generation variables are available and can be introduced in the method for creating 3rd generation data, added value is obtained from both basic analysis and building new data-driven indicators.
Peer Reviewed
Postprint (author's final draft)
Article
English
Àrees temàtiques de la UPC::Matemàtiques i estadística::Matemàtica aplicada a les ciències; Àrees temàtiques de la UPC::Matemàtiques i estadística::Estadística matemàtica::Anàlisi multivariant; Artificial intelligence; COVID-19 (Disease); Multivariate analysis; Data science; Intelligent decision support; Health; COVID19; Mental health; Traffic light panels; Preprocessing; Explainable AI; Intelligent decision support; Intel·ligència artificial; COVID-19 (Malaltia); Anàlisi multivariable; Classificació AMS::68 Computer science::68T Artificial intelligence; Classificació AMS::62 Statistics::62H Multivariate analysis
https://www.worldscientific.com/doi/abs/10.1142/S0218213023400110
Open Access
E-prints [72987]