Deep neural networks for spam classification

Kasmani, Mohamed Khizer (2013) Deep neural networks for spam classification. [Dissertation (University of Nottingham only)]

[img] PDF - Registered users only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (2MB)

Abstract

This project elucidates the development of a spam filtering method using deep neural networks. A classification model employing algorithms such as Error Back Propagation (EBP) and Restricted Boltzmann Machines (RBM) is used to identify spam and non-spam emails. Moreover, a spam classification system employing deep neural network algorithms is developed, which has been tested on Enron email dataset in order to help users manage large volumes of email and, furthermore, their email folders. The sample size of the data used for this study -- collected from Enron business users – comprises 158 users and 200,399 emails at an average of 757 emails per user. It has been observed that most users use folders to classify their emails, with some employing a fewer numbers of folders than others.

The process of developing a spam classifier employing deep neural networks involves a sequence of several steps. The data was split chronologically and in half, and a flat approach was used instead of a hierarchical approach. Training the support vector machine (SVM) for each user for each field was conducted along with that for each folder for all the emails for that user. A combination of weights is found using regression for each folder and the threshold was performed for optimal score. The project is developed using java code and implemented in two training phases. The deep neural network code is run on Enron spam-ham email data set sample, and the results are shown to compare the actual spam mails against legitimate emails.

Item Type: Dissertation (University of Nottingham only)
Keywords: Deep Neural Networks, Spam Classification, Neural Networks, Deep Belief Networks, Boltzmann Algorithm.
Depositing User: Gonzalez-Orbegoso, Mrs Carolina
Date Deposited: 25 Nov 2015 12:26
Last Modified: 22 Sep 2016 15:14
URI: http://eprints.nottingham.ac.uk/id/eprint/30908

Actions (Archive Staff Only)

Edit View Edit View