SD-Latent-Upscaler

★ 168

潜在空间放大高效增采样多级串联保留细节

对Stable Diffusion的潜在表示(latents)进行放大，使用小型神经网络高效提升分辨率并尽量保留细节，支持串联多级放大。

💡 在流程中对latent进行高质量放大以提升最终输出分辨率

🍴 13 Forks💻 Python🔄 2024-05-22

🔗 GitHub 原文

📦

网盘下载

复制链接后前往夸克网盘下载

https://pan.quark.cn/s/86a6deb1b5f6

📄 README

SD-Latent-Upscaler

Upscaling stable diffusion latents using a small neural network.

Very similar to my latent interposer, this small model can be used to upscale latents in a way that doesn’t ruin the image. I mostly explain some of the issues with upscaling latents in this issue. Think of this as an ESRGAN for latents, except severely undertrained.

Currently, SDXL has some minimal hue shift issues. Because of course it does.

Installation

ComfyUI

To install it, simply clone this repo to your custom_nodes folder using the following command: git clone https://github.com/city96/SD-Latent-Upscaler custom_nodes/SD-Latent-Upscaler.

Alternatively, you can download the comfy_latent_upscaler.py file to your ComfyUI/custom_nodes folder as well. You may need to install hfhub using the command pip install huggingface-hub inside your venv.

If you need the model weights for something else, they are hosted on HF under the same Apache2 license as the rest of the repo.

Auto1111

Currently not supported but it should be possible to use it at the hires-fix part.

Local models

The node pulls the required files from huggingface hub by default. You can create a models folder and place the modules there if you have a flaky connection or prefer to use it completely offline, it will load them locally instead. The path should be: ComfyUI/custom_nodes/SD-Latent-Upscaler/models

Alternatively, just clone the entire HF repo to it: git clone https://huggingface.co/city96/SD-Latent-Upscaler custom_nodes/SD-Latent-Upscaler/models

Usage/principle

Usage is fairly simple. You use it anywhere where you would upscale a latent. If you need a higher scale factor (e.g. x4), simply chain two of the upscalers.

As part of a workflow – notice how the second stage works despite the low denoise of 0.2. The image remains relatively unchanged.

Training

Upscaler v2.0

I decided to do some more research and change the network architecture alltogether. This one is just a bunch of Conv2d layers with an Upsample at the beginning, similar to before except I reduced the kernel size/padding and instead added more layers.

Trained for 1M iterations on DIV2K + Flickr2K. I changed to AdamW + L1 loss (from SGD and MSE loss) and added a OneCycleLR scheduler.

Upscaler v1.0

This version was still relatively undertrained. Mostly a proof-of-concept.

Trained for 1M iterations on DIV2K + Flickr2K.

Loss graphs for v1.0 models

(Left is training loss, right is validation loss.)