This page permanently redirects to gemini://agnos.is/projects/open-webui-filters/gpu-scaling-filter/.

🖥️ GPU Scaling Filter

This is a simple filter that reduces the number of GPU layers in use

by Ollama when it detects that Ollama has crashed (via empty response

coming in to OpenWebUI). Right now, the logic is very basic, just

using static numbers to reduce GPU layer counts. It doesn't take into

account the number of layers in models or dynamically monitor VRAM

use.

There are three settings:

Initial Reduction: Number of layers to immediately set when an Ollama crash is detected. Defaults to 20.
Scaling Step: Number of layers to reduce by on subsequent crashes (down to a minimum of 0, i.e. 100% CPU inference). Defaults to 5.
Show Status: Whether or not to inform the user that the conversation is running slower due to GPU layer downscaling.

‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗

Proxy Information

Original URL: gemini://agnos.is/projects/open-webui-filters/gpu-scaling-filter
Status Code: Success (20)
Meta: text/gemini;lang=en-US
Capsule Response Time: 10.451833 milliseconds
Gemini-to-HTML Time: 0.239189 milliseconds

This content has been proxied by September (ba2dc).