Object Segmentation in the Wild with Foundation Models
Gaze-guided SAM adaptation for vision-assisted neuro-prostheses in cluttered real-world scenes.
This project adapts Segment Anything (SAM) to an egocentric assistive-robotics setting using gaze-fixation prompts, temporal gaze projection, and outlier filtering. Fine-tuning on domain data improves mask quality for object grasping scenarios.
Key outcomes:
- Implemented gaze-to-prompt pipeline with frame projection and DBSCAN filtering.
- Demonstrated mIoU improvements up to
+0.50after fine-tuning. - Example class improvement: blue bowl from
0.41to0.86mIoU.
Tech stack:
- Python
- PyTorch
- Segment Anything (ViT-B)
- Homography estimation
- DBSCAN
Links: