Examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach

Nguyen, Anh (2020) Examining the impact of stratified sampling on model performance in automated image caption: a topic modelling approach. [Dissertation (University of Nottingham only)]

[thumbnail of 20243144_BUSI4374_Dissertation.pdf] PDF - Registered users only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB)

Abstract

Deep learning's recent rapid development has prompted scientists to investigate a wide range of complex data problems. Among them, automated image captioning has increasingly drawn the attention of many researchers due to its challenging but intriguing architecture that involves a combination of image processing and text analytics. It has also attracted investment from businesses thanks to several practical applications, including image retrieval, impaired vision support, product tagging and automatic drive.

This dissertation investigates the application of a stratified sample split in evaluating the performance of the automated caption model. Despite its popularity in machine learning, stratification has not yet been directly applied in previous works of image captioning, as (1) researchers often utilise pre-defined sample split from data providers, (2) image and annotation are unstructured data that require more novel methodology in clustering versus structured data.

By applying topic modelling to images' annotations, this dissertation validated the positive impact of stratified sampling towards prediction results compared to the usage of a simple random split. The findings also specified the sample size territory where this strategy delivered the best performance and unveiled the reason behind this phenomenon. Finally, the study provided a more comprehensive understanding of the problem with insights on the behaviours of different support techniques in topic modelling and image encoding.

Item Type: Dissertation (University of Nottingham only)
Keywords: automated image caption, deep learning, topic modelling, stratified sampling, convolutional neural network
Depositing User: Nguyen, Anh
Date Deposited: 20 Apr 2023 08:42
Last Modified: 20 Apr 2023 08:42
URI: https://eprints.nottingham.ac.uk/id/eprint/66357

Actions (Archive Staff Only)

Edit View Edit View