I am a Ph.D. student at IDCOM in the School of Engineering, University of Edinburgh. I am a member of Vision Group and VIOS, working under the supervision of Dr. Steven McDonagh and Dr. Laura Sevilla. My interest lies in Multimodal Learning, Spatial-Temporal Understanding in Foundation Models, and Generative AI.
Before moving to the UK, I spent a wonderful year in Germany working on building lip-syncing and synthetic media generation models. I also spent three months at Visual Computing & Artificial Intelligence group at Technical University of Munich with Prof. Matthias Nießner.
I completed MS by Research at CVIT, IIIT Hyderabad under the guidance of Prof. C.V. Jawahar and Prof. Vinay P. Namboodiri. My graduate research focused on Lip-Sync, Talking Head Generation, and Face Reenactment, along with their optimization for real-world problems. Additionally, I worked on the task of Table Detection in Document Images with high accuracy under the supervision of Prof. C.V. Jawahar and Dr. Ajoy Mondal. Prior to this, I worked as a Data Scientist and a team lead with several companies, broadly in the domains of Facial Recognition, Video Surveillance using AI, and Document Image Processing.
Google Scholar | LinkedIn | CV