I’ve been in a VQGAN+CLIP craze lately, so here’s a list of all VQGAN+CLIP implementations I found on the internet (The symbol 🔰 means perfect for non-programmers alike. If you don’t know where to start, you can start with these):
|Name||Author||Description / Features|
|VQGAN+CLIP (codebook sampling method)||@RiversHaveWings||The original VQGAN+CLIP notebook of Katherine Crowson (@RiversHaveWings).|
|AI Art Machine||@hillelogram||🔰 Very accessible Colab notebook. Has advanced options that are explained in a beginner-friendly level.|
|Create realistic AI-Generated Images with VQGAN+CLIP||@minimaxir||🔰 Has good UI affordances and more descriptive explanation of parameters. Have options for deterministic output by using icon-based input/target images.|
|VQGAN+CLIP (with pooling and quantize method)||@ak92501||Has an optional Gradio demo for a more streamlined experience.|
|Zoetrope 5||@classpectanon||Has advanced parameters for more controlled AI art generation. I haven’t tried this yet, but it may be good to flesh your artwork more.|
|VQGAN+CLIP Python command-line interface||@nerdyrodent||Not a Google Colab notebook, but a Github repo that you can fork and run locally. Provides a command-line interface to generate AI-art on the fly.|
|VQGAN+CLIP (z+quantize method with augmentations)||@somewheresy||It seems to be the first English-translated notebook of Katherine Crowson (@RiversHaveWings).|
|CLIPIT PixelDraw||@dribnet||A very interesting fork of the VQGAN+CLIP notebooks that uses PixelDraw to generate pixel art given a prompt.|
|Nightcafe Studio||NightCafe Studio||Not a Colab notebook, but rather a managed service where you need to setup an account. I can’t comment how different the outputs are compared to the Colab notebooks.|
|Kapwing AI Video Generator||Kapwing||A web-hosted version of CLIP VQGAN. Generates videos after processing. It’s not as customizable, but the processing time is relatively fast!|
CLIP-guided art generators
These aren’t necessarily VQGAN implementations, but can produce AI art nonetheless:
|Name||Author||Description / Features|
|The Big Sleep: BigGAN x CLIP||@advadnoun||Uses a CLIP-guided BigGAN generator. I can’t comment on the quality of the outputs, but this is exciting to try as well!|
|Aleph-Image||@advadnoun||Uses a CLIP-guided DALL-E decoder. Try it out for more interesting results!|
|CLIP Guided Diffusion HQ 512x512||@RiversHaveWings||Uses OpenAI’s 512x512 class-conditional ImageNet diffusion model with CLIP. It is fixed at 512x512, but it also has a 256x256 version.|
The common denominator across these works is that they are guided by OpenAI’s CLIP so that the image matches the text description. For more CLIP-guided projects, check out this Reddit post from February.
If you wish to learn more about VQGAN and CLIP, I suggest reading the following:
- Alien Dreams: An Emerging Art Scene by Charlie Snell: gives a good overview and history of the recent AI Art scene. Traces its roots from the introduction of CLIP and its pairing of VQGAN today.
- The Illustrated VQGAN: by yours truly, here I tried to explain how VQGAN works in a conceptual level. It starts with how images are “perceived” then ends with the whole VQGAN system.
Of course, nothing beats reading the original papers themselves:
- Esser, P., Rombach, R. and Ommer, B., 2021. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12873-12883).
- Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J. and Krueger, G., 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020.
Did I miss anything? Just comment below!
- 08-22-2021: Added Kapwing and PixelDraw
- 08-21-2021: This blogpost was featured in Comet’s newsletter!