My research interests lie at the intersection of 3D vision, Generative AI, and computer vision, with a particular focus on optical flow, stereo and depth estimation, and 3D reconstruction.
Currently work with models such as Vision Transformers (ViT-16, Swin), DINO, Diffusion Models, NeRF, and 3D Gaussian Splatting.
Developing a unified model for optical flow, stereo, and depth estimation using continual learning to enhance cross-task knowledge transfer and efficiency.
Extending Freestyle Layout-to-Image Synthesis (FLIS) to 3D by integrating spatial masks and multi-view consistency cues for generating coherent 3D structures or multi-view 2D renderings.
Optimized contrastive learning under small batch constraints using HyperOpt with ASHA scheduler. SogCLR with AdamW outperformed CLIP and CyCLIP,
achieving 28.45% Mean Recall on COCO and 55.99% Top@10 accuracy on ImageNet.
Developed a multimodal predictive framework for calorie estimation by integrating meal images, CGM data, and Viome features.
Utilized ResNet50, BERT, and cyclic encoding, training with a custom RMSRE loss function to achieve a 0.2294 score.
Developed two key modules for MRI Quality Control analysis: High Contrast Spatial Resolution and Low Contrast Object Detectability.
Utilized image preprocessing, NCC, multi-Otsu thresholding, and Hough Transform techniques to extract and analyze MRI phantom features.