LOCAL
LLM
NODE
A ComfyUI node that connects to your local vLLM server and generates Stable Diffusion prompts on the fly. Wildcard expansion, prefix tags, live preview. No cloud. No API keys. Runs entirely on your hardware.
Installation
cd ComfyUI/custom_nodes git clone https://github.com/OATH-Studio/comfy-vLLM
requests
vllm serve ./models/Qwen2.5-3B \ --host 0.0.0.0 \ --port 8765 \ --served-model-name Qwen2.5-3B
# Find the node under utils/llm # Wire: vLLM Prompt → CLIPTextEncode → KSampler
Workflow Connection
The single combined_prompt output is already assembled as prefix, generated_text — wire it straight into CLIPTextEncode.
Features
Auto Model Detection
Queries /v1/models at runtime. No model name to configure — swap models in vLLM and the node picks it up automatically.
Wildcard Expansion
Use {red|blue|green} syntax anywhere in your prompt. Multiple wildcards resolve independently before hitting the model.
Prefix Tags
A dedicated prefix field for quality anchors like "masterpiece, best quality". Prepended to output, never sent to the model.
Live Preview
The node face shows a three-section breakdown: prefix, raw generated text, and the final combined string after each run.
Retry Logic
Configurable retry count handles empty responses, timeouts, and transient vLLM errors without failing the whole workflow.
Completions Format
Uses structured completion prompts with stop sequences so the model returns tags only — no preamble, no sign-off, no filler.
Wildcard Syntax
Wildcards are resolved before the prompt reaches vLLM. Each run picks a different combination, so you can generate a wide variety of prompts from a single template.
Input Prompt
A {red|blue|green} dragon,
{breathing fire into the sky|
coiled around a mountain peak in a storm|
diving into a glowing ocean abyss|
rearing up against a blood moon}After Expansion → vLLM
A blue dragon, diving into a glowing ocean abyss
Combined Output
masterpiece, best quality, highres, blue dragon, deep ocean, bioluminescent glow, ancient scales, cinematic underwater light rays, epic fantasy
Node Inputs
Model Guide
Tested with the Qwen2.5 family. The node works with any model vLLM can serve, but instruction-following quality varies significantly with size.
Completion Prompt Format
### Stable Diffusion prompt tags (comma separated, no sentences): Input: <your expanded prompt> Output: ← model continues here, stops at first newline
Stop sequences [\n, ###, Input:] prevent the model from running past a single line. If output is still conversational, lower temperature to 0.3–0.5 or upgrade to a 3B+ model.
FAQ
Do I need an API key or internet connection?
+No. Everything runs locally. The node connects to your vLLM instance over localhost. No data leaves your machine.
Which vLLM version is required?
+vLLM 0.4 or later. The node uses the standard OpenAI-compatible /v1/completions endpoint which has been stable since 0.4.
Why not use the chat completions endpoint?
+The completions endpoint with a structured prompt and stop sequences gives more predictable tag-only output from small models. Chat completions work better on larger models but smaller ones tend to be more conversational that way.
Can I run multiple models?
+The node queries /v1/models and uses the first result. If you run multiple vLLM instances on different ports, add multiple nodes to your workflow — one per port.
Why does the model still return conversational responses?
+The model is likely too small. Qwen2.5-0.5B and 1.5B struggle with strict format following. Try 3B or larger, and drop temperature to 0.3–0.5.
Does the node cache responses?
+No. IS_CHANGED returns NaN so ComfyUI re-runs the node every execution. Each generation calls vLLM fresh. If you want deterministic output, lower temperature toward 0.
Need Custom
AI Tooling?
This node is a small example of what we build. We design and develop custom AI pipelines, local inference tooling, ComfyUI integrations, and production workflows for studios and independent creators who want control over their stack.
- Local LLM integration and prompt engineering
- Custom ComfyUI nodes and workflow automation
- vLLM / Ollama deployment and optimisation
- End-to-end AI image and video pipelines
- On-premise — your hardware, your data
This project uses
Free · Open Source · MIT License
YOUR MODEL.
YOUR PROMPTS.
Built by OATH Studio. We make open tools for AI artists and studios, and take on custom development work for teams who need something specific. No cloud dependencies. No subscriptions.