From Pixels to Phrases: Enhancing Image Captioning with LSTM Model
Author(s):
Pulkit Dwivedi
Keywords:
Image captioning, Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNNs), Natural lan- guage processing (NLP), Deep learning
Abstract
Generating natural language captions for images is an important task that requires understanding and identi- fying the objects within an image. However, the effectiveness of image caption generation has not been thoroughly proven. To address this gap, we propose a novel approach that com- bines Convolutional Neural Networks (CNNs) and Long Short- Term Memory (LSTM) models to generate image captions. Our approach comprises two sub-models: an Object Identification model and a Localization model that extract information about objects and their spatial relationships from images. We then use LSTM models to process the extracted text data, encoding the text input sequence as a fixed-length output vector. Finally, we integrate the image vector outputs and the corresponding descriptions to train the image caption generator model. We compare the performance of our LSTM-based model with other dense models, including VGG-16 and Transformer-based models, using the Flickr8k dataset. Our experimental results demonstrate that our LSTM-based approach outperforms previous VGG and Transformer-based models, as well as state-of-the-art image captioning models. By integrating image and text data using LSTM models, our approach provides a new benchmark for image caption generation, advancing the state-of-the-art in this critical area of research.
Article Details
Unique Paper ID: 159666
Publication Volume & Issue: Volume 9, Issue 12
Page(s): 486 - 492
Article Preview & Download
Share This Article
Conference Alert
NCSST-2023
AICTE Sponsored National Conference on Smart Systems and Technologies
Last Date: 25th November 2023
SWEC- Management
LATEST INNOVATION’S AND FUTURE TRENDS IN MANAGEMENT