Vision encoders pretrained with captioning can match or even outperform vision encoders trained contrastively when finetuned downstream! Check out our latest study lead by @mtschannen
Vision encoders pretrained with captioning can match or even outperform vision encoders trained contrastively when finetuned downstream! Check out our latest study lead by @mtschannen
18
89
572
212K
299
Download Image