Abstract:
|
This project constructs a different cost-based pushdown police solution for querying in multidimensional environments. The integration of Qbeast, a novel index, in the Cassandra distributed database caused the need from frameworks, as Spark, to be aware and act in consecuence. We will see three approaches, the last one of them in a theorical frame: filter pushdown, sampling and a speculative physic data strategy. Each one of their implementations are detailed in the document, alongside with an explanation of the class modified. The solutions were tested with mixed data volumns, to see in which cases is efficient to follow that path. Results show that with low rows the new behaviour goes hand in hand with the default, but in intensive cases (starting with one gigabyte files) the speed-up begins to grow. |