Are Vision Foundation Models Foundational for Electron Microscopy Image Segmentation?
Published in Learning Meaningful Representations of Life (LMRL) Workshop at ICLR 2026, 2026
Recommended citation: Caterina Fuster-Barceló, Virginie Uhlmann. "Are Vision Foundation Models Foundational for Electron Microscopy Image Segmentation?" Learning Meaningful Representations of Life (LMRL) Workshop at ICLR 2026, 2026. https://openreview.net/forum?id=BbaIt2S2mU¬eId=BbaIt2S2mU
Are Vision Foundation Models Foundational for Electron Microscopy Image Segmentation?
Accepted at: Learning Meaningful Representations of Life (LMRL) Workshop at ICLR 2026
Final version: OpenReview
Preprint: arXiv
Authors: Caterina Fuster-Barceló, Virginie Uhlmann
Abstract
This paper studies whether vision foundation models provide latent representations that are general enough to support effective transfer across heterogeneous electron microscopy datasets for mitochondria segmentation.
Using the Lucchi++ and VNC datasets together with DINOv2, DINOv3, and OpenCLIP, the work evaluates both frozen-backbone adaptation and parameter-efficient fine-tuning via LoRA.
Across models, training on a single EM dataset produced good segmentation performance and LoRA consistently improved in-domain results. In contrast, training jointly on multiple EM datasets led to severe degradation, with only marginal gains from parameter-efficient fine-tuning.
Analysis of the latent representation space revealed a persistent domain mismatch between the two datasets despite their visual similarity, suggesting that current PEFT strategies are insufficient to obtain a single robust model across heterogeneous EM domains without additional domain-alignment mechanisms.