vLLM Prompt Node for ComfyUI | OATH Studio The OATH Studio

Open Source · ComfyUI Custom Node
$ vllm-prompt-node --version 1.0.0
 LOCAL
LLM
NODE 
 A ComfyUI node that connects to your local vLLM server and generates Stable Diffusion prompts on the fly. Wildcard expansion, prefix tags, live preview. No cloud. No API keys. Runs entirely on your hardware. 
 View on GitHub  Quick Install 
Installation
01Clone into custom_nodes
cd ComfyUI/custom_nodes
git clone https://github.com/OATH-Studio/comfy-vLLM
02dependency used
requests
03Start vLLM
vllm serve ./models/Qwen2.5-3B \
  --host 0.0.0.0 \
  --port 8765 \
  --served-model-name Qwen2.5-3B
04Restart ComfyUI
# Find the node under utils/llm
# Wire: vLLM Prompt → CLIPTextEncode → KSampler
Workflow Connection
vLLM Prompt Node
→
CLIPTextEncode [positive]
→
KSampler
 The single combined_prompt output is already assembled as prefix, generated_text — wire it straight into CLIPTextEncode. 
Features
01
Auto Model DetectionQueries /v1/models at runtime. No model name to configure — swap models in vLLM and the node picks it up automatically.
02
Wildcard ExpansionUse {red|blue|green} syntax anywhere in your prompt. Multiple wildcards resolve independently before hitting the model.
03
Prefix TagsA dedicated prefix field for quality anchors like "masterpiece, best quality". Prepended to output, never sent to the model.
04
Live PreviewThe node face shows a three-section breakdown: prefix, raw generated text, and the final combined string after each run.
05
Retry LogicConfigurable retry count handles empty responses, timeouts, and transient vLLM errors without failing the whole workflow.
06
Completions FormatUses structured completion prompts with stop sequences so the model returns tags only — no preamble, no sign-off, no filler.
Wildcard Syntax
 Wildcards are resolved before the prompt reaches vLLM. Each run picks a different combination, so you can generate a wide variety of prompts from a single template. 
Input Prompt
A {red|blue|green} dragon,
{breathing fire into the sky|
 coiled around a mountain peak in a storm|
 diving into a glowing ocean abyss|
 rearing up against a blood moon}
After Expansion → vLLM
A blue dragon, diving
into a glowing ocean abyssCombined Output
masterpiece, best quality, highres,
blue dragon, deep ocean, bioluminescent
glow, ancient scales, cinematic
underwater light rays, epic fantasy
Node Inputs
InputTypeDefaultDescription
promptSTRING—Instruction sent to vLLM. Supports {wild|card} syntax.
prefixSTRINGmasterpiece…Quality tags prepended to output. Not sent to the model.
hostSTRINGlocalhostvLLM server host.
portINT8765vLLM server port.
max_tokensINT128Maximum tokens to generate.
temperatureFLOAT0.7Sampling temperature. Lower = more consistent.
retriesINT3Retry attempts on empty or failed responses.
Model Guide
 Tested with the Qwen2.5 family. The node works with any model vLLM can serve, but instruction-following quality varies significantly with size. 
Qwen2.5-0.5B
Too small — unreliable instruction following
Qwen2.5-1.5B
Usable — occasional filler
Qwen2.5-3B
Recommended — clean output, reliable format
Qwen2.5-32B
Best quality — overkill for most workflows
Completion Prompt Format
### Stable Diffusion prompt tags (comma separated, no sentences):
Input: <your expanded prompt>
Output: ← model continues here, stops at first newline Stop sequences [\n, ###, Input:] prevent the model from running past a single line. If output is still conversational, lower temperature to 0.3–0.5 or upgrade to a 3B+ model. 
FAQ
Do I need an API key or internet connection?+
No. Everything runs locally. The node connects to your vLLM instance over localhost. No data leaves your machine.
Which vLLM version is required?+
vLLM 0.4 or later. The node uses the standard OpenAI-compatible /v1/completions endpoint which has been stable since 0.4.
Why not use the chat completions endpoint?+
The completions endpoint with a structured prompt and stop sequences gives more predictable tag-only output from small models. Chat completions work better on larger models but smaller ones tend to be more conversational that way.
Can I run multiple models?+
The node queries /v1/models and uses the first result. If you run multiple vLLM instances on different ports, add multiple nodes to your workflow — one per port.
Why does the model still return conversational responses?+
The model is likely too small. Qwen2.5-0.5B and 1.5B struggle with strict format following. Try 3B or larger, and drop temperature to 0.3–0.5.
Does the node cache responses?+
No. IS_CHANGED returns NaN so ComfyUI re-runs the node every execution. Each generation calls vLLM fresh. If you want deterministic output, lower temperature toward 0.
OATH Studio
 Need Custom
AI Tooling? 
 This node is a small example of what we build. We design and develop custom AI pipelines, local inference tooling, ComfyUI integrations, and production workflows for studios and independent creators who want control over their stack. 
 Local LLM integration and prompt engineering
 Custom ComfyUI nodes and workflow automation
 vLLM / Ollama deployment and optimisation
 End-to-end AI image and video pipelines
 On-premise — your hardware, your data
 Get In Touch 
This project uses
RuntimevLLM 0.4+
Endpoint/v1/completions
Model detection/v1/models
Stop sequences\n · ### · Input:
Wildcard enginePython re · recursive
ComfyUI hooksIS_CHANGED · OUTPUT_NODE
Dependenciesrequests
Free · Open Source · MIT License
 YOUR MODEL.
YOUR PROMPTS.
 Clone on GitHub  Commission Custom Tooling 
 Built by OATH Studio. We make open tools for AI artists and studios, and take on custom development work for teams who need something specific. No cloud dependencies. No subscriptions. 
LOCALLLMNODE

Installation

Features

Auto Model Detection

Wildcard Expansion

Prefix Tags

Live Preview

Retry Logic

Completions Format

Wildcard Syntax

Node Inputs

Model Guide

FAQ

Do I need an API key or internet connection?

Which vLLM version is required?

Why not use the chat completions endpoint?

Can I run multiple models?

Why does the model still return conversational responses?

Does the node cache responses?

Need CustomAI Tooling?

YOUR MODEL.YOUR PROMPTS.

LOCAL
LLM
NODE

Need Custom
AI Tooling?

YOUR MODEL.
YOUR PROMPTS.