List of VQGAN+CLIP Implementations
I’ve been in a VQGAN+CLIP craze lately, so here’s a list of all VQGAN+CLIP implementations I found on the internet (The symbol 🔰 means perfect for non-programmers alike. If you don’t know where to start, you can start with these):
VQGAN+CLIP implementations
Name | Author | Description / Features |
---|---|---|
VQGAN+CLIP (codebook sampling method) | @RiversHaveWings | The original VQGAN+CLIP notebook of Katherine Crowson (@RiversHaveWings). |
AI Art Machine | @hillelogram | 🔰 Very accessible Colab notebook. Has advanced options that are explained in a beginner-friendly level. |
Create realistic AI-Generated Images with VQGAN+CLIP | @minimaxir | 🔰 Has good UI affordances and more descriptive explanation of parameters. Have options for deterministic output by using icon-based input/target images. |
VQGAN+CLIP (with pooling and quantize method) | @ak92501 | Has an optional Gradio demo for a more streamlined experience. |
Zoetrope 5 | @classpectanon | Has advanced parameters for more controlled AI art generation. I haven’t tried this yet, but it may be good to flesh your artwork more. |
VQGAN+CLIP Python command-line interface | @nerdyrodent | Not a Google Colab notebook, but a Github repo that you can fork and run locally. Provides a command-line interface to generate AI-art on the fly. |
VQGAN+CLIP (z+quantize method with augmentations) | @somewheresy | It seems to be the first English-translated notebook of Katherine Crowson (@RiversHaveWings). |
CLIPIT PixelDraw | @dribnet | A very interesting fork of the VQGAN+CLIP notebooks that uses PixelDraw to generate pixel art given a prompt. |
Nightcafe Studio | NightCafe Studio | Not a Colab notebook, but rather a managed service where you need to setup an account. I can’t comment how different the outputs are compared to the Colab notebooks. |
Kapwing AI Video Generator | Kapwing | A web-hosted version of CLIP VQGAN. Generates videos after processing. It’s not as customizable, but the processing time is relatively fast! |
CLIP-guided art generators
These aren’t necessarily VQGAN implementations, but can produce AI art nonetheless:
Name | Author | Description / Features |
---|---|---|
The Big Sleep: BigGAN x CLIP | @advadnoun | Uses a CLIP-guided BigGAN generator. I can’t comment on the quality of the outputs, but this is exciting to try as well! |
Aleph-Image | @advadnoun | Uses a CLIP-guided DALL-E decoder. Try it out for more interesting results! |
CLIP Guided Diffusion HQ 512x512 | @RiversHaveWings | Uses OpenAI’s 512x512 class-conditional ImageNet diffusion model with CLIP. It is fixed at 512x512, but it also has a 256x256 version. |
The common denominator across these works is that they are guided by OpenAI’s CLIP so that the image matches the text description. For more CLIP-guided projects, check out this Reddit post from February.
Resources
If you wish to learn more about VQGAN and CLIP, I suggest reading the following:
- Alien Dreams: An Emerging Art Scene by Charlie Snell: gives a good overview and history of the recent AI Art scene. Traces its roots from the introduction of CLIP and its pairing of VQGAN today.
- The Illustrated VQGAN: by yours truly, here I tried to explain how VQGAN works in a conceptual level. It starts with how images are “perceived” then ends with the whole VQGAN system.
Of course, nothing beats reading the original papers themselves:
- Esser, P., Rombach, R. and Ommer, B., 2021. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12873-12883).
- Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J. and Krueger, G., 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020.
Did I miss anything? Just comment below!
Changelog
- 08-22-2021: Added Kapwing and PixelDraw
- 08-21-2021: This blogpost was featured in Comet’s newsletter!