My research interests lie at the intersection of Machine Learning, LLM's/VLM's, and computer vision. I am particularly interested in dense prediction problems such as optical flow, stereo and depth estimation, as well as perception tasks involving VLM's and LLM's.
My current work involves models including Vision Transformers (ViT-16 and Swin), DINOv2, and PaliGemma-2.
Developing a unified model for optical flow, stereo, and depth estimation using continual learning to enhance cross-task knowledge transfer and efficiency.
Extending Freestyle Layout-to-Image Synthesis (FLIS) to 3D by integrating spatial masks and multi-view consistency cues for generating coherent 3D structures or multi-view 2D renderings.
Optimized contrastive learning under small batch constraints using HyperOpt with ASHA scheduler. SogCLR with AdamW outperformed CLIP and CyCLIP,
achieving 28.45% Mean Recall on COCO and 55.99% Top@10 accuracy on ImageNet.
Developed a multimodal predictive framework for calorie estimation by integrating meal images, CGM data, and Viome features.
Utilized ResNet50, BERT, and cyclic encoding, training with a custom RMSRE loss function to achieve a 0.2294 score.
Developed two key modules for MRI Quality Control analysis: High Contrast Spatial Resolution and Low Contrast Object Detectability.
Utilized image preprocessing, NCC, multi-Otsu thresholding, and Hough Transform techniques to extract and analyze MRI phantom features.