Realistic Human Face Synthesis Using DCGAN on CelebA Dataset with Text-to-Image Extension

  • Unique Paper ID: 192069
  • Volume: 12
  • Issue: 9
  • PageNo: 105-110
  • Abstract:
  • The synthesis of realistic human imagery has long been a benchmark challenge in the field of computer vision and generative modeling due to the intricate spatial hierarchies and high-dimensional features of the human face. This project explores the application of Deep Convolutional Generative Adversarial Networks (DCGAN) to bridge the gap between random noise and high-fidelity facial synthesis. By leveraging the CelebA dataset, which contains over 200,000 celebrity images, the study focuses on training a robust adversarial framework consisting of two competing neural networks: a Generator and a Discriminator. The Generator is designed to upsample a 100-dimensional latent noise vector through a series of transposed convolutional layers to produce a 64*64*3 pixel image. Simultaneously, the Discriminator utilizes standard convolutional layers to distinguish between authentic images from the dataset and synthetic images produced by the generator. To ensure architectural stability and mitigate common GAN failures—such as mode collapse and vanishing gradients—the project implements specific design strategies, including the use of the Adam optimizer and batch normalization. Experimental results demonstrate the model’s efficacy, achieving a training accuracy of 96%. The evolution of the training process shows a clear trajectory where the loss functions stabilize as the generator learns to replicate complex facial attributes, including variations in lighting, pose, and expression. Furthermore, the project extends its scope to include Text-to-Image synthesis, integrating CNN and LSTM architectures to generate visual content from descriptive natural language prompts. The findings confirm that DCGANs are highly effective for unsupervised representation learning and image synthesis. While the current implementation successfully generates diverse 64*64 resolution faces, future iterations aim to incorporate StyleGAN or Progressive Growing GANs (PGGAN) to achieve higher resolutions and finer control over specific facial attributes such as age, gender, and accessories.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{192069,
        author = {Shivaprasad Satla},
        title = {Realistic Human Face Synthesis Using DCGAN on CelebA Dataset with Text-to-Image Extension},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {9},
        pages = {105-110},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=192069},
        abstract = {The synthesis of realistic human imagery has long been a benchmark challenge in the field of computer vision and generative modeling due to the intricate spatial hierarchies and high-dimensional features of the human face. This project explores the application of Deep Convolutional Generative Adversarial Networks (DCGAN) to bridge the gap between random noise and high-fidelity facial synthesis. By leveraging the CelebA dataset, which contains over 200,000 celebrity images, the study focuses on training a robust adversarial framework consisting of two competing neural networks: a Generator and a Discriminator. The Generator is designed to upsample a 100-dimensional latent noise vector through a series of transposed convolutional layers to produce a 64*64*3 pixel image. Simultaneously, the Discriminator utilizes standard convolutional layers to distinguish between authentic images from the dataset and synthetic images produced by the generator. To ensure architectural stability and mitigate common GAN failures—such as mode collapse and vanishing gradients—the project implements specific design strategies, including the use of the Adam optimizer and batch normalization. Experimental results demonstrate the model’s efficacy, achieving a training accuracy of 96%. The evolution of the training process shows a clear trajectory where the loss functions stabilize as the generator learns to replicate complex facial attributes, including variations in lighting, pose, and expression. Furthermore, the project extends its scope to include Text-to-Image synthesis, integrating CNN and LSTM architectures to generate visual content from descriptive natural language prompts. The findings confirm that DCGANs are highly effective for unsupervised representation learning and image synthesis. While the current implementation successfully generates diverse 64*64 resolution faces, future iterations aim to incorporate StyleGAN or Progressive Growing GANs (PGGAN) to achieve higher resolutions and finer control over specific facial attributes such as age, gender, and accessories.},
        keywords = {Deep Convolutional Generative Adversarial Network (DCGAN), Generative Adversarial Networks (GAN), Face Synthesis, CelebA Dataset, Image Generation, Latent Space, Generator–Discriminator, Adversarial Training, Batch Normalization, Adam Optimizer, Mode Collapse.},
        month = {January},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 12
  • Issue: 9
  • PageNo: 105-110

Realistic Human Face Synthesis Using DCGAN on CelebA Dataset with Text-to-Image Extension

Related Articles