I’ve been in a VQGAN+CLIP craze lately, so here’s a list of all VQGAN+CLIP implementations I found on the internet (The symbol 🔰 means perfect for non-programmers alike. If you don’t know where to start, you can start with these):

VQGAN+CLIP implementations

Name Author Description / Features
VQGAN+CLIP (codebook sampling method) @RiversHaveWings The original VQGAN+CLIP notebook of Katherine Crowson (@RiversHaveWings).
AI Art Machine @hillelogram 🔰 Very accessible Colab notebook. Has advanced options that are explained in a beginner-friendly level.
Create realistic AI-Generated Images with VQGAN+CLIP @minimaxir 🔰 Has good UI affordances and more descriptive explanation of parameters. Have options for deterministic output by using icon-based input/target images.
VQGAN+CLIP (with pooling and quantize method) @ak92501 Has an optional Gradio demo for a more streamlined experience.
Zoetrope 5 @classpectanon Has advanced parameters for more controlled AI art generation. I haven’t tried this yet, but it may be good to flesh your artwork more.
VQGAN+CLIP Python command-line interface @nerdyrodent Not a Google Colab notebook, but a Github repo that you can fork and run locally. Provides a command-line interface to generate AI-art on the fly.
VQGAN+CLIP (z+quantize method with augmentations) @somewheresy It seems to be the first English-translated notebook of Katherine Crowson (@RiversHaveWings).
CLIPIT PixelDraw @dribnet A very interesting fork of the VQGAN+CLIP notebooks that uses PixelDraw to generate pixel art given a prompt.
Nightcafe Studio NightCafe Studio Not a Colab notebook, but rather a managed service where you need to setup an account. I can’t comment how different the outputs are compared to the Colab notebooks.
Kapwing AI Video Generator Kapwing A web-hosted version of CLIP VQGAN. Generates videos after processing. It’s not as customizable, but the processing time is relatively fast!

CLIP-guided art generators

These aren’t necessarily VQGAN implementations, but can produce AI art nonetheless:

Name Author Description / Features
The Big Sleep: BigGAN x CLIP @advadnoun Uses a CLIP-guided BigGAN generator. I can’t comment on the quality of the outputs, but this is exciting to try as well!
Aleph-Image @advadnoun Uses a CLIP-guided DALL-E decoder. Try it out for more interesting results!
CLIP Guided Diffusion HQ 512x512 @RiversHaveWings Uses OpenAI’s 512x512 class-conditional ImageNet diffusion model with CLIP. It is fixed at 512x512, but it also has a 256x256 version.

The common denominator across these works is that they are guided by OpenAI’s CLIP so that the image matches the text description. For more CLIP-guided projects, check out this Reddit post from February.


If you wish to learn more about VQGAN and CLIP, I suggest reading the following:

  • Alien Dreams: An Emerging Art Scene by Charlie Snell: gives a good overview and history of the recent AI Art scene. Traces its roots from the introduction of CLIP and its pairing of VQGAN today.
  • The Illustrated VQGAN: by yours truly, here I tried to explain how VQGAN works in a conceptual level. It starts with how images are “perceived” then ends with the whole VQGAN system.

Of course, nothing beats reading the original papers themselves:

  1. Esser, P., Rombach, R. and Ommer, B., 2021. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12873-12883).
  2. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J. and Krueger, G., 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020.

Did I miss anything? Just comment below!


  • 08-22-2021: Added Kapwing and PixelDraw
  • 08-21-2021: This blogpost was featured in Comet’s newsletter!