Object Segmentation in the Wild with Foundation Models

Gaze-guided SAM adaptation for vision-assisted neuro-prostheses in cluttered real-world scenes.

This project adapts Segment Anything (SAM) to an egocentric assistive-robotics setting using gaze-fixation prompts, temporal gaze projection, and outlier filtering. Fine-tuning on domain data improves mask quality for object grasping scenarios.

Key outcomes:

  • Implemented gaze-to-prompt pipeline with frame projection and DBSCAN filtering.
  • Demonstrated mIoU improvements up to +0.50 after fine-tuning.
  • Example class improvement: blue bowl from 0.41 to 0.86 mIoU.

Tech stack:

  • Python
  • PyTorch
  • Segment Anything (ViT-B)
  • Homography estimation
  • DBSCAN

Links: