Text And Depth Controllable Video Generation Using AI

  • Unique Paper ID: 201013
  • PageNo: 165-171
  • Abstract:
  • Text and depth guided controllable video generation is an advanced AI-based approach that synthesizes videos using textual descriptions and depth information. The system proposes an automated AI-driven solution that converts textual scripts into visually coherent videos with minimal user intervention. The system employs Natural Language Processing (NLP) techniques to analyze, preprocess, and segment user-provided scripts into meaningful scenes. These scenes are then transformed into visually relevant video segments using latent diffusion models pre-trained for still image synthesis and promoted for video generation through temporal modules.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{201013,
        author = {Mrs. Nithya B and Deepika S and Sharmila S and Shamitha S},
        title = {Text And Depth Controllable Video Generation Using AI},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {no},
        pages = {165-171},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=201013},
        abstract = {Text and depth guided controllable video generation is an advanced AI-based approach that synthesizes videos using textual descriptions and depth information. The system proposes an automated AI-driven solution that converts textual scripts into visually coherent videos with minimal user intervention.
The system employs Natural Language Processing (NLP) techniques to analyze, preprocess, and segment user-provided scripts into meaningful scenes. These scenes are then transformed into visually relevant video segments using latent diffusion models pre-trained for still image synthesis and promoted for video generation through temporal modules.},
        keywords = {Text-to-Video Generation, Depth Estimation, Latent Diffusion Models, Natural Language Processing, Temporal Consistency, AI Video Synthesis, Script Analysis, Scene Generation.},
        month = {May},
        }

Cite This Article

B, M. N., & S, D., & S, S., & S, S. (2026). Text And Depth Controllable Video Generation Using AI. International Journal of Innovative Research in Technology (IJIRT), 165–171.

Related Articles