List of VQGAN+CLIP Implementations

I’ve been in a VQGAN+CLIP craze lately, so here’s a list of all VQGAN+CLIP implementations I found on the internet (The symbol 🔰 means perfect for non-programmers alike. If you don’t know where to start, you can start with these):

VQGAN+CLIP implementations

Name	Author	Description / Features
VQGAN+CLIP (codebook sampling method)	@RiversHaveWings	The original VQGAN+CLIP notebook of Katherine Crowson (@RiversHaveWings).
AI Art Machine	@hillelogram	🔰 Very accessible Colab notebook. Has advanced options that are explained in a beginner-friendly level.
Create realistic AI-Generated Images with VQGAN+CLIP	@minimaxir	🔰 Has good UI affordances and more descriptive explanation of parameters. Have options for deterministic output by using icon-based input/target images.
VQGAN+CLIP (with pooling and quantize method)	@ak92501	Has an optional Gradio demo for a more streamlined experience.
Zoetrope 5	@classpectanon	Has advanced parameters for more controlled AI art generation. I haven’t tried this yet, but it may be good to flesh your artwork more.
VQGAN+CLIP Python command-line interface	@nerdyrodent	Not a Google Colab notebook, but a Github repo that you can fork and run locally. Provides a command-line interface to generate AI-art on the fly.
VQGAN+CLIP (z+quantize method with augmentations)	@somewheresy	It seems to be the first English-translated notebook of Katherine Crowson (@RiversHaveWings).
CLIPIT PixelDraw	@dribnet	A very interesting fork of the VQGAN+CLIP notebooks that uses PixelDraw to generate pixel art given a prompt.
Nightcafe Studio	NightCafe Studio	Not a Colab notebook, but rather a managed service where you need to setup an account. I can’t comment how different the outputs are compared to the Colab notebooks.
Kapwing AI Video Generator	Kapwing	A web-hosted version of CLIP VQGAN. Generates videos after processing. It’s not as customizable, but the processing time is relatively fast!

CLIP-guided art generators

These aren’t necessarily VQGAN implementations, but can produce AI art nonetheless:

Name	Author	Description / Features
The Big Sleep: BigGAN x CLIP	@advadnoun	Uses a CLIP-guided BigGAN generator. I can’t comment on the quality of the outputs, but this is exciting to try as well!
Aleph-Image	@advadnoun	Uses a CLIP-guided DALL-E decoder. Try it out for more interesting results!
CLIP Guided Diffusion HQ 512x512	@RiversHaveWings	Uses OpenAI’s 512x512 class-conditional ImageNet diffusion model with CLIP. It is fixed at 512x512, but it also has a 256x256 version.

The common denominator across these works is that they are guided by OpenAI’s CLIP so that the image matches the text description. For more CLIP-guided projects, check out this Reddit post from February.

Resources

If you wish to learn more about VQGAN and CLIP, I suggest reading the following:

Alien Dreams: An Emerging Art Scene by Charlie Snell: gives a good overview and history of the recent AI Art scene. Traces its roots from the introduction of CLIP and its pairing of VQGAN today.
The Illustrated VQGAN: by yours truly, here I tried to explain how VQGAN works in a conceptual level. It starts with how images are “perceived” then ends with the whole VQGAN system.

Of course, nothing beats reading the original papers themselves:

Esser, P., Rombach, R. and Ommer, B., 2021. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12873-12883).
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J. and Krueger, G., 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020.

Did I miss anything? Just comment below!

Changelog

08-22-2021: Added Kapwing and PixelDraw
08-21-2021: This blogpost was featured in Comet’s newsletter!