Volume 14 Number 2 (Feb. 2019)
Home > Archive > 2019 > Volume 14 Number 2 (Feb. 2019) >
JCP 2019 Vol.14(2): 125-133 ISSN: 1796-203X
doi: 10.17706/jcp.14.2.125-133

A Systematic Model of Big Data Analytics for Clustering Browsing Records into Sessions Based on Web Log Data

Chung Yung1, Chia-Ching Chen1, Yu-Lan Yuan2, Ching Li3
1Dept. of Computer Science and Information Engineering, National Dong Hwa University, Taiwan, R.O.C.
2Dept. of Landscape Architecture, Tunghai University, Taiwan, R.O.C.
3 Graduate Institute of Sport, Leisure and Hospitality Management, National Taiwan Normal University, R.O.C.
….

Abstract—This paper presents a systematic model of big data analytics for clustering browsing records into sessions based on the web log data. With the rapid development of the Internet and World Wide Web technologies, the behavior of web users becomes more and more complicated. The analysis on web log data may reveal some hint at the browsing behavior of web users. Since the information of browsing sessions has a great impact on the effectiveness of analysis on web log data, especially in the precision of describing the behavior of web users, this motivates our work in developing a systematic model of clustering browsing sessions. First, we present a five-phase architecture that we develop for big data analytics. We have built a computing environment with the architecture, and we have implemented a few methods of big data analytics with such an architecture. Then, we propose the new systematic model, called EDCP model, of big data analytics for clustering browsing records into sessions based on the web log data. Since the analysis on the web log data with various goals may pose distinct criteria for clustering browsing records into sessions, the design of EDCP model allows simple adaption for the distinct criteria in order to meet the need of various goals. We demonstrate the application of EDCP model with the session criteria given by a research group in the tourism and recreation area. We present the experiments of applying EDCP model on the web log data from the official web site provided by Taiwan Tourism Bureau with a goal of clustering the browsing sessions for the web users of 2018 Taiwan Lantern Festival. As a summary, we have a total of 344,963,578 browsing records in the web log data, and we find 55,318,326 records among them are related to 2018 Taiwan Lantern Festival. Our systematic model successfully clusters the records into 307,154 browsing sessions, as a result.

Index Terms—Big data analytics, network user behavior, web log mining.

[PDF]

Cite: Chung Yung, Chia-Ching Chen, Yu-Lan Yuan, Ching Li, "A Systematic Model of Big Data Analytics for Clustering Browsing Records into Sessions Based on Web Log Data," Journal of Computers vol. 14, no. 2, pp. 125-133, 2019.

General Information

ISSN: 1796-203X
Frequency: Monthly (2006-2014); Bimonthly (Since 2015)
Editor-in-Chief: Prof. Liansheng Tan
Executive Editor: Ms. Nina Lee
Abstracting/ Indexing: DBLP, EBSCO,  ProQuest, INSPEC, ULRICH's Periodicals Directory, WorldCat, CNKI,etc
E-mail: jcp@iap.org
  • Sep 13, 2018 News!

    Vol 13, No 10 has been published with online version   [Click]

  • Mar 20, 2019 News!

    Vol 14, No 3 has been published with online version   [Click]

  • Feb 22, 2019 News!

    Vol 14, No 2 has been published with online version 8 papers are published in this issue after peer review   [Click]

  • Jan 04, 2019 News!

    Vol 14, No 1 has been published with online version   [Click]

  • Nov 20, 2018 News!

    Vol 13, No 12 has been published with online version 10 papers are published in this issue after peer review

  • Read more>>