Automatically identifying coherent web sessions from browser logs

Ye, Chaoyu (2019) Automatically identifying coherent web sessions from browser logs. PhD thesis, University of Nottingham.

[img] PDF (Thesis - as examined) - Repository staff only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (13MB)

Abstract

Due to the increasing diversity in both user’s behaviour and the types of web tasks performed, many studies in information retrieval (IR) are turning towards session-based retrieval rather than single URL-query pairs. However, extracting the meaningful session data from the raw discrete logs is still a significant challenge. Most prior studies have been based on datasets where the logs of each user’s web history were simply divided by fixed periods of inactivity, such as 5, 15, or 30 minutes [52,31]. There have also been some attempts beyond these simplistic fixed timeouts [91]. Rather than covering all web activities, they focus on search-related activities only. Consequently, it is necessary to finding a meaningful way to cluster all activities including both searching and browsing on a web browser. The goal of this study is to find a way to better automatically segment users’ web activity into sessions. There are three research stages: 1) how people understand their mental model in the session segmentation, 2) how these self-identified sessions look in practically implemented weblogs, and 3) how we can algorithmically identify these sessions from browser activity, and how each algorithm performs. To answer these questions, firstly a qualitative study was conducted and a taxonomy of six factors related to the user-identified sessions was generated. Then a Chrome Extension was built that provided the practical reflection of user-identified sessions with comprehensive sets of web logs including both user interaction and visit details. This helped to gather a ground truth dataset to support further evaluation. Finally, several algorithmic approaches to automatically clustering web activities closer to user-identified sessions were evaluated.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Wilson, Max
Rodden, Tom
Keywords: web sessions, browsers, internet, information retrieval
Subjects: Q Science > QA Mathematics > QA 75 Electronic computers. Computer science
Faculties/Schools: UK Campuses > Faculty of Science > School of Computer Science
Item ID: 59119
Depositing User: Ye, Chaoyu
Date Deposited: 15 Jul 2020 10:14
Last Modified: 15 Jul 2020 10:15
URI: http://eprints.nottingham.ac.uk/id/eprint/59119

Actions (Archive Staff Only)

Edit View Edit View