Abstract:
|
The thesis you are about to read is the resolution of a Data Mining classification problem. The problem was proposed in the Data Mining Competition of 2013 which is organized yearly by Prudsys AG (a top company in the data mining field). The task was centered on e-commerce where the objective was to predict if a customer is going to place an order in a certain web-shop. Following the structure of the CRISP-DM Process, in the first part of the thesis (Business Understanding) we will talk about the background of the problem. It goes without saying that e-commerce has become a cutting edge industry and has been growing rapidly throughout the last decade. This growth has been led by the development of the internet and technologies such as tablets or cell phones. Nowadays, nearly every shop has a web-shop where you can buy exactly the same. It is up to you moving from your sofa… The middle part of the thesis corresponds to the Data Understanding and Data Preparation phases. In this part I will explain how to deal with the data set given, how to handle the great amount of existing missing values and how to choose the correct attributes for the purchase prediction. Important emphasis has to be made to the preparation of the dataset as it is a keystone of a successful data mining work. In the next section of the thesis I will talk about the Modeling and Evaluation phases. To learn more about different modeling techniques I will use four different algorithms which I will afterwards evaluate with two types of criteria: numerical (f-measure) and visual (lift chart). The lift chart will allow me to compare my models to a random selection method. To finish with my thesis, I will draw the conclusions of my thesis, evaluate the achievement of the objectives previously set and mention the possible tasks that could be carried out in the future to optimize the thesis. |