Video Comprehension

Toward a World Model: Building a Foundational Video Understanding Model

Summary


This project focuses on building advanced algorithms for intelligent video comprehension, enabling efficient search and retrieval of specific events across large, complex video datasets through Multimodal Learning (vision, language, and knowledge graphs) and Retrieval-Augmented Generation (RAG).

Funding Agency:
South Dakota State University | Startup Fund

Team:
Chulwoo Pack (PI) | McComish Dept. of EECS, SDSU
Harsh Dubey (M.S. Student) | McComish Dept. of EECS, SDSU
Muktiar Ali (Ph.D. Student) | McComish Dept. of EECS, SDSU
Sugam Mishura (M.S. Student) | McComish Dept. of EECS, SDSU
Omeshamisu Anigala (Ph.D. Student) | McComish Dept. of EECS, SDSU

Duration:
2023-2026

Total Funding:
$73,000


External Resources:

  • Video Comprehension Score (VCS)
  • Dense Caption Dataset (CLIP-CC)
  • Dense Caption Generator Forthcoming
  • Multimodal Video Anomaly Detection Forthcoming

Related Publications:

2025

  1. Leveraging Textual Memory and Key Frame Reasoning for Full Video Understanding Using Off-the-Shelf LLMs and VLMs (Student Abstract)
    Harsh Dubey, and Chulwoo Pack
    In Proceedings of the AAAI Conference on Artificial Intelligence, 2025
  2. PEARL: Perceptual and Analytical Representation Learning for Video Anomaly Detection
    Omeshamisu Anigala, Kwanghee Won, and Chulwoo Pack
    SIGAPP Appl. Comput. Rev., Apr 2025