Long-form text-to-images generation, using a pipeline of deep generative models (GPT-3 and Stable Diffusion)
e.g. story -> Stable Diffusion -> illustrations
Right now, Stable Diffusion can only take in a short prompt. What if you want to illustrate a full story? Cue Long Stable Diffusion, a pipeline of generative models to do just that with just a bash script!
Yep! We just published Never Hire a Herd of Goats to Mow your Lawn, an AI-generated story illustrated by this repo.
trending on art station
for better results..docx
, for easy copy-pasting.I made this to automate my self, ie. prompt AI for illustrations to accompany AI-generated stories, for the Stories by AI podcast. Come check us out! And please suggest ways to improve—comments and pull requests are always welcome 😃
This was also just a weekend hackathon project to reward myself for doing a lot of work the past couple of months, and for feeling guilty about not using my wonderful and beautiful Titan RTXs to their full potential.
This bash script runs what you need. It assumes 2 GPUs with 24GB memory each. See the note above, under Steps, to change this assumption for your compute needs. I had too much fun with multiprocessing and making it faster.
bash run.sh -f three_little_pigs
To run your own text, replace three_little_pigs
with the name of your new .txt
file, put in the texts/
folder.
bash run.sh -f <name_of_txtfile_in_texts_dir>
export OPENAI_TOKEN=<your_token>
.txt
file in the texts/
folderCurrently two methods for generating the image prompts from text are supported.
.txt
file into smaller chronological bits of text, and then generates an image prompt for each bit of text.Additional methods yet to be implemented are following:
.txt.
file, then prompts GPT-3 to generate image prompts from the summary.Currently one type of output is supported
Additional output formats yet to be implemented are:
run_two_gpus.sh
: This is the main entry script into the program to parallelize across GPUs easily.
run.py
: Where most of the magic happens: getting image prompts from GPT-3, making images from those prompts (using stable diffusion, multithreading), saving all those and also dumping those images and prompts to a docx file. This is what run_two_gpus.sh
calls.
stable_diffusion.py
: Just runs stable diffusion if you want to use it by itself (I do). run.py
calls it.
dump_docx.py
: Just dumps image prompts and images into a single docx for a particular text. Again, it’s useful if you want to use it by itself on the saved images and prompts. I do, because I’m actually overwriting the file when multiprocessing and sometimes will just use this as a postprocessing step. Yes, you can join those and change that but I don’t really care, since sometimes my GPUs misbehave and I’ll need to rerun it anyways.
texts/
: Folder to put your texts in, as a .txt
file.
image_prompts/
: Generated image prompts by GPT-3 based on your text.
images
: Generated images by Stable Diffusion based on GPT-3’s image prompts.
docx/
: Microsoft Word document for a text with images and their prompts all in one.
clean_lexica.py
: Preprocessing step for Stable Diffusion prompts from Lexica - clean up the prompts and put them into a single file.
effective_prompts_fs.txt
: Effective “prompt-English” to use for few-shot translation from English GPT-3 prompts to prompt-English (1884 tokens).
Multi-processing is optimized for 2 Titan RTXs, with 24GB RAM each. Changing the number of GPUs to parallelize on is a simple edit in run_two_gpus.sh
: just copy the first line and change CUDA_VISIBLE_DEVICES to the appropriate GPU id.
Changing the number of processes for each GPU is an argument that can be passed in through run_two_gpus.sh
as -n <num_processes_per_gpu>
for each run. This is an int used in run.py
. I’ve found that my GPUs can handle 3, but are happier with 2.