Running Stable Diffusion locally on Windows

Published 26.08.2022 • Last modified 24.09.2023

Stable Diffusion samples

ATTENTION: there are now several forks of SD that offer web GUI’s and can install on Windows in one click (most notably sd-webgui). If you want to run Stable Diffusion programmatically and don’t mind some extra steps, read on.

Prerequisites #

In order to run Stable Diffusion locally on Windows, you will need a Nvidia GPU with at least 8 GB of VRAM and the latest Game Ready drivers. We will use WSL2 (Windows subsystem for Linux) to run Stable Diffusion.

WSL2 can be installed with this command (on up-to-date w10 and w11 installations):

wsl --install

Getting started #

After installation is complete, you should have a Ubuntu installation that can be entered by typing wsl into the command prompt.

Now we’re ready to install the following dependencies:

sudo apt update
sudo apt upgrade
sudo apt install git-lfs python3-pip

# PyTorch with CUDA support
pip install torch --extra-index-url https://download.pytorch.org/whl/cu116

pip install diffusers==0.3.0 transformers scipy ftfy

It’s time to write some Python. Make a new directory in the home directory called stable_diffusion:

mkdir stable_diffusion
cd stable_diffusion
touch main.py

You can use any text editor, but I strongly recommend using VS Code, as it has an extension called Remote - WSL. It will let you open a folder in the Ubuntu installation and also install extensions. You can get autocomplete and syntax highlighting by installing the Python extension to WSL2.

import torch
from torch import autocast
from diffusers import StableDiffusionPipeline
# get your token at https://huggingface.co/settings/tokens
token = "TOKEN"
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", 
revision="fp16", torch_dtype=torch.float16, use_auth_token=token)

pipe.to("cuda")

You will need an account on Hugging Face as it is used to download the model. Create a token and paste it in.

We’re setting torch_dtype and revision to use 16-bit floats to save on VRAM, which allows the model to be run on GPU’s with less than 10 GB of VRAM. It shouldn’t have an effect on the output images though. If your GPU has enough VRAM, you can leave them out.

With the token in place, run the code above with python3 main.py. If it throws HTTP error 403, you need to accept the license here.

It should now start downloading the model from Hugging Face and exit. If it doesn’t, make sure that you have the right pip packages and a valid token.

On my installation of Windows 11 and the latest GeForce drivers, CUDA runs out of the box without any extra installation. This might not be the case for you. If any CUDA error is thrown, install the CUDA drivers according to the CUDA documentation (Option 1). It will install CUDA 11.7, which should be backwards compatible with the version that PyTorch requires.

Now we’re finally ready to generate some images.

import torch
from torch import autocast
from diffusers import StableDiffusionPipeline

# get your token at https://huggingface.co/settings/tokens
token = "TOKEN"
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", 
revision="fp16", torch_dtype=torch.float16, use_auth_token=token)

pipe.to("cuda")

prompt = "a photograph of an astronaut riding a horse"

with autocast("cuda"):
  output = pipe(prompt)
  image = output["sample"][0]
  file = prompt.replace(" ", "_").replace(",", "")
  image.save(f"{file}.png")

The code above will generate one 512x512 image from the specified prompt. Depending on how much VRAM your GPU has, it might throw RuntimeError: CUDA out of memory. You should be able to generate at least a 512x512 image with 8 GB of VRAM. Try closing other GPU intensive applications you might have open.

Collage of generated artwork inspired by Firewatch

"Beautiful landscape with mountains, inspired by Firewatch artwork, sunset"

While the images take ~10 seconds each to generate on my GPU, more time is spent loading everything into memory than doing the actual work. This can be fixed by keeping pipe in memory, as the pipe can be reused for multiple images.

Improved script #

We can use a neat trick with VS Code to separate the script into ‘blocks’ using “# %%” as highlighted:

# %%
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline
# get your token at https://huggingface.co/settings/tokens
token = "TOKEN"
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", 
revision="fp16", torch_dtype=torch.float16, use_auth_token=token)

pipe.to("cuda")

# %%
prompt = "a photograph of an astronaut riding a horse"
steps = 50
width = 512
height = 512

with autocast("cuda"):
  for i in range(4):
    output = pipe(prompt, width=width, height=height, num_inference_steps=steps)
    image = output["sample"][0]
    file = prompt.replace(" ", "_").replace(",", "")
    image.save(f"{file}-{i}.png")

Assuming you have the Python extension installed, VS Code should allow you to run the 1st cell to initialize pipe, then run the prompt cell as many times as you want. This is much than rerunning the whole file with each prompt. This script will generate 4 images sequentially and save them. You can modify it to generate higher resolution images provided you have enough VRAM.

Screenshot of VS Code running in Jupiter Notebook mode

Sources #