What You See is What You Read? Improving Text-Image Alignment Evaluation

Google Research, The Hebrew University of Jerusalem
*Equal Contribution

NeurIPS 2023
arXiv Code

🤗

Test Dataset

📄

Train Dataset

🖼

Train Images


Focusing on image-text alignment, we introduce SeeTRUE, a comprehensive benchmark, and two effective methods: a zero-shot VQA-based approach and a synthetically-trained, fine-tuned model, both enhancing alignment tasks and text-to-image reordering.

BibTeX

@article{yarom2023you,
  title={What You See is What You Read? Improving Text-Image Alignment Evaluation},
  author={Yarom, Michal and Bitton, Yonatan and Changpinyo, Soravit and Aharoni, Roee and Herzig, Jonathan and Lang, Oran and Ofek, Eran and Szpektor, Idan},
  journal={arXiv preprint arXiv:2305.10400},
  year={2023}
}
-->
-->