Chodak G, Suchacka G, Chawla Y. HTTP-level e-commerce data based on server access logs for an online store.
COMPUTER NETWORKS 2020;
183:107589. [PMID:
35023998 PMCID:
PMC7540248 DOI:
10.1016/j.comnet.2020.107589]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 09/30/2020] [Accepted: 10/05/2020] [Indexed: 06/14/2023]
Abstract
Web server logs have been extensively used as a source of data on the characteristics of Web traffic and users' navigational patterns. In particular, Web bot detection and online purchase prediction using methods from artificial intelligence (AI) are currently key areas of research. However, in reality, it is hard to obtain logs from actual online stores and there is no common dataset that can be used across different studies. Moreover, there is a lack of studies exploring Web traffic over a longer period of time, due to the unavailability of long-term data from server logs. The need to develop reliable models of Web traffic, Web user navigation, and e-customer behaviour calls for an up-to-date, large-volume e-commerce dataset on Web traffic. Similarly, AI problems require a sufficient amount of solid, real-life data to train and validate new models and methods. Thus, to meet a demand of a publicly available long-term e-commerce dataset, we collected access log data describing the operation of an online store over a six-month period. Using a program written in the C# language, data were aggregated, transformed, and anonymized. As a result, we release this EClog dataset in CSV format, which covers 183 days of HTTP-level e-commerce traffic. The data will be beneficial for research in many areas, including computer science, data science, management, and sociology.
Collapse