Stylistic atructures: a computational approach to text classification

Forsyth, Richard (1996) Stylistic atructures: a computational approach to text classification. PhD thesis, University of Nottingham.

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (2MB) | Preview

Abstract

The problem of authorship attribution has received attention both in the academic world (e.g. did Shakespeare or Marlowe write Edward III?) and outside (e.g. is this confession really the words of the accused or was it made up by someone else?). Previous studies by statisticians and literary scholars have sought "verbal habits" that characterize particular authors consistently. By and large, this has meant looking for distinctive rates of usage of specific marker words -- as in the classic study by Mosteller and Wallace of the Federalist Papers.

The present study is based on the premiss that authorship attribution is just one type of text classification and that advances in this area can be made by applying and adapting techniques from the field of machine learning.

Five different trainable text-classification systems are described, which differ from current stylometric practice in a number of ways, in particular by using a wider variety of marker patterns than customary and by seeking such markers automatically, without being told what to look for. A comparison of the strengths and weaknesses of these systems, when tested on a representative range of text-classification problems, confirms the importance of paying more attention than usual to alternative methods of representing distinctive differences between types of text.

The thesis concludes with suggestions on how to make further progress towards the goal of a fully automatic, trainable text-classification system.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Clarke, David
Elliman, David
Keywords: computational stylometry, authorship attribution, text classification machine learning
Subjects: Q Science > Q Science (General)
P Language and literature > P Philology. Linguistics
Faculties/Schools: UK Campuses > Faculty of Science > School of Psychology
Item ID: 13445
Depositing User: EP, Services
Date Deposited: 17 Jul 2013 09:20
Last Modified: 20 Dec 2017 13:53
URI: https://eprints.nottingham.ac.uk/id/eprint/13445

Actions (Archive Staff Only)

Edit View Edit View