From Pixels to Phrases: Enhancing Image Captioning with LSTM Model

Pulkit Dwivedi

From Pixels to Phrases: Enhancing Image Captioning with LSTM Model
Author(s):
Pulkit Dwivedi
Keywords:
Image captioning, Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNNs), Natural lan- guage processing (NLP), Deep learning
Abstract
Generating natural language captions for images is an important task that requires understanding and identi- fying the objects within an image. However, the effectiveness of image caption generation has not been thoroughly proven. To address this gap, we propose a novel approach that com- bines Convolutional Neural Networks (CNNs) and Long Short- Term Memory (LSTM) models to generate image captions. Our approach comprises two sub-models: an Object Identification model and a Localization model that extract information about objects and their spatial relationships from images. We then use LSTM models to process the extracted text data, encoding the text input sequence as a fixed-length output vector. Finally, we integrate the image vector outputs and the corresponding descriptions to train the image caption generator model. We compare the performance of our LSTM-based model with other dense models, including VGG-16 and Transformer-based models, using the Flickr8k dataset. Our experimental results demonstrate that our LSTM-based approach outperforms previous VGG and Transformer-based models, as well as state-of-the-art image captioning models. By integrating image and text data using LSTM models, our approach provides a new benchmark for image caption generation, advancing the state-of-the-art in this critical area of research.
Article Details
Unique Paper ID: 159666 Publication Volume & Issue: Volume 9, Issue 12 Page(s): 486 - 492
Article Preview & Download

Share This Article

Join our RMS

Conference Alert

NCSEM 2024

National Conference on Sustainable Engineering and Management - 2024

Last Date: 15th March 2024

Latest Publication

A Survey on Route Planning Drone Based Delivery Sy...
Paper ID : IJIRT166903
FAKE JOB POST PREDICTION...
Paper ID : IJIRT166900
Mechanical behavior of Reinforced Concrete Beams r...
Paper ID : IJIRT166899
Strengthening Techniques for Transparent Reinforce...
Paper ID : IJIRT166896
KNOWLEDGE, ATTITUDE, PRACTICE AND BELIEFS ABOUT BR...
Paper ID : IJIRT166892

( An International Open Access , Peer-reviewed, Refereed Journal )

( An International Open Access Journal & and ISSN Approved )

Call For Paper July 2024 Last Date 25 - July 2024

Impact Factor 8.017 (Year 2024)

Join our RMS

Conference Alert

NCSEM 2024

Latest Publication

Call For Paper

Volume 11 Issue 1

Last Date for paper submitting for Latest Issue is 25 June 2024

About Us

Social Media

Google Verified Reviews

Contact Details

Policies

Important Links

Browse Rsearch papers