Forced Attention for Image Captioning

Devarapalli, Hemanth

doi:10.25394/PGS.7408883.v1

Hemanth_Thesis_Final.pdf (1.28 MB)

Forced Attention for Image Captioning

thesis

posted on 2019-01-17, 14:27 authored by Hemanth DevarapalliHemanth Devarapalli

Automatic generation of captions for a given image is an active research area in Artificial Intelligence. The architectures have evolved from using metadata of the images on which classical machine learning was employed to neural networks. Two different styles of architectures evolved in the neural network space for image captioning: Encoder-Attention-Decoder architecture, and the transformer architecture. This study is an attempt to modify the attention to allow any object to be specified. An archetypical Encoder-Attention-Decoder architecture (Show, Attend, and Tell (Xu et al., 2015)) is employed as a baseline for this study, and a modification of the Show, Attend, and Tell architecture is proposed. Both the architectures are evaluated on the MSCOCO (Lin et al., 2014) dataset, and seven metrics: BLEU – 1, 2, 3, 4 (Papineni, Roukos, Ward & Zhu, 2002), METEOR (Banerjee & Lavie, 2005), ROGUE L (Lin, 2004), and CIDer (Vedantam, Lawrence & Parikh, 2015) are calculated. Finally, the statistical significance of the results is evaluated by performing paired t tests.

History

Degree Type

Master of Science

Department

Computer and Information Technology

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Dr. Julia Rayz

Additional Committee Member 2

Dr. Baijian Yang

Additional Committee Member 3

Dr. John Springer

Usage metrics

Keywords

Artificial intelligence.Natural language processsing Image Captioning Deep Learning Artificial Intelligence and Image Processing Natural Language Processing

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Forced Attention for Image Captioning

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Usage metrics

Categories

Keywords

Licence

Exports