ECLIPSE:

A Resource-Efficient Text-to-Image Prior for Image Generations

Arizona State University

CVPR 2024

Improving the Parameter and Data Efficiency of the Text-to-Image Priors for UnCLIP Family Models.

Method

CLIP Contrastive Learning is enough to achieve the SOTA the Text-to-Image prior without diffusion process. This allows us to train SOTA model with only 33M parameters and 0.6M image-text pairs.

Examples

ECLIPSE (w Kandinsky v2.2 diffusion image decoder) trained on 5M image-text pairs using only 200 GPU Hours.

Relevant Projects

WOUAF

Weight Modulation for User Attribution and Fingerprinting in T2I Models.

ConceptBed

Evaluating Concept Learning Abilities of T2I Models