Data stream analysis in sliding windows: random sampling and other problems

Altres autors/es

Universitat Politècnica de Catalunya. Departament de Ciències de la Computació

Martínez Parra, Conrado

Data de publicació

2022-01-26

Resum

In many data stream applications we need to perform some analysis in a "window" or subsequence of contiguous elements, quite often the last M elements seen or the elements seen in the last X time units. For example, we might be interested in obtaining a random sample of the distinct elements seen in the last 10 minutes, or estimate how many distinct elements have been processed among the last 100000 processed items. Given the restrictions in processing time and memory available, exact solutions become unfeasible and we seek for randomized algorithms which are fast, have low memory requirements and provide probabilistic guarantees. In this project we will implement some of the algorithms available in the literature and conduct extensive experiments to assess their performance and compare their relative merits; we will also develop novel and original algorithms or variants of existing algorithm to compare them with the state-of-the-art solutions. We will mostly focus in algorithms to obtain random samples, a fundamental task for more complex statistical inference: detecting outliers, finding frequent items, detecting unusual patterns, etc.

Tipus de document

Bachelor thesis

Llengua

Anglès

Publicat per

Universitat Politècnica de Catalunya

Citació recomanada

Aquesta citació s'ha generat automàticament.

Drets

Open Access

Aquest element apareix en la col·lecció o col·leccions següent(s)