Analysis of the user queries of an e-commerce bookstore in terms of the Library of Congress classification and key publishers

Publication date

2020-03-05T15:20:07Z

2020-03-05T15:20:07Z

2013

Abstract

Introduction. A key aspect of data mining and its success in extracting useful knowledge is the way in which the data is represented. In this paper we propose representing the relations inherent in an e-commerce bookstore search log as a graph, which allows us to apply and customize graph metrics and algorithms to identify structures and key elements. This approach complements traditional transactional mining by facilitating the identification of underlying structural relationships. Method. The data are pre-processed and represented as a graph which permits the calculation of the descriptive metrics: hubs, bridges and community modularity. These metrics are then interpreted in terms of the book topics (Library of Congress Classification) and publishers. Analysis. The relations between users, books and publishers are studied. We calculate statistics based on the graph metrics and visualize the communities and structure of the graphs. Then we identify the top publishers and categories in terms of the community, hub and bridge structures of the graph. Results. We have successfully represented the Web activity data log as a graph, defining the relations between books and users based on activity; analysed the graphs based on the specific graph metrics of communities, hubs and bridges; and evaluated the utility of the analysis by using the graph structure to identify the key information of interest in terms of top publishers and book categories. Conclusion. We have defined a graph-based method for analysing transactional data which complements traditional transactional mining techniques in order to obtain business knowledge that can be used immediately for cross-selling and recommendation, or, in the medium term, for book catalogue organization.


We would like to thank Sandra Alvarez García of the University of La Coruña, Spain, for the data pre-processing of the book information using their Library of Congress catalogue API. This research is partially supported by the Spanish MEC (project HIPERGRAPH TIN2009-14560-C03-01).

Document Type

Article


Published version

Language

English

Publisher

University of Borås

Related items

Information Research. 2013 Dec;18(4)

info:eu-repo/grantAgreement/ES/3PN/TIN2009-14560-C03-01

Recommended citation

This citation was generated automatically.

Rights

This document is published under a Creative Commons License https://creativecommons.org/licenses/by-nc-nd/3.0/

https://creativecommons.org/licenses/by-nc-nd/3.0/

This item appears in the following Collection(s)