Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
Martínez Parra, Conrado
2022-01-26
In many data stream applications we need to perform some analysis in a "window" or subsequence of contiguous elements, quite often the last M elements seen or the elements seen in the last X time units. For example, we might be interested in obtaining a random sample of the distinct elements seen in the last 10 minutes, or estimate how many distinct elements have been processed among the last 100000 processed items. Given the restrictions in processing time and memory available, exact solutions become unfeasible and we seek for randomized algorithms which are fast, have low memory requirements and provide probabilistic guarantees. In this project we will implement some of the algorithms available in the literature and conduct extensive experiments to assess their performance and compare their relative merits; we will also develop novel and original algorithms or variants of existing algorithm to compare them with the state-of-the-art solutions. We will mostly focus in algorithms to obtain random samples, a fundamental task for more complex statistical inference: detecting outliers, finding frequent items, detecting unusual patterns, etc.
Bachelor thesis
Anglès
Àrees temàtiques de la UPC::Informàtica::Enginyeria del software; Random data (Statistics); Algorithms; Probabilistic structures; affirmative sampling; random sampling; Dades aleatòries (Estadística); Algorismes
Universitat Politècnica de Catalunya
Open Access
Treballs acadèmics [82541]