Qwen3.5-9B-Uncensored (GGUF)
An uncensored GGUF merge of Qwen 3.5 9B, ready for local deployment with Ollama, llama.cpp, or any GGUF-compatible runtime.
Background
This model is built upon HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive. The original base model has the following known issues:
- Ollama deployment failure โ The base model cannot be directly deployed via Ollama due to architecture/format incompatibilities. This GGUF version resolves the issue by converting and merging the weights into a single GGUF file that Ollama can load natively.
- Broken multimodal input โ The base model's packaging causes multimodal (e.g., image) input to malfunction. Although the underlying Qwen 3.5 architecture supports vision capabilities, the way the original model was packaged breaks multimodal inference.
This repo provides a Q4_K_M quantized GGUF version that fixes the Ollama deployment issue while keeping the model compact and efficient.
Quick Start
Ollama (Recommended)
- Create a Modelfile:
FROM ./Qwen3.5-9B-Uncensored-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 16384
TEMPLATE """{{- if .System }}{{ .System }}{{ end }}
{{- range .Messages }}
<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
SYSTEM "You are a helpful assistant."
- Build and run:
# Download the GGUF file
huggingface-cli download LEONW24/Qwen3.5-9B-Uncensored Qwen3.5-9B-Uncensored-Q4_K_M.gguf --local-dir .
# Create the Ollama model
ollama create qwen35-uncensored -f Modelfile
# Run
ollama run qwen35-uncensored
llama.cpp
# Download
huggingface-cli download LEONW24/Qwen3.5-9B-Uncensored Qwen3.5-9B-Uncensored-Q4_K_M.gguf --local-dir .
# Run with llama-cli
llama-cli -m Qwen3.5-9B-Uncensored-Q4_K_M.gguf -p "Hello, who are you?" -n 256 -ngl 99
llama-cpp-python (OpenAI-compatible API)
pip install llama-cpp-python[server]
python -m llama_cpp.server \
--model Qwen3.5-9B-Uncensored-Q4_K_M.gguf \
--n_gpu_layers 99 \
--chat_format chatml
Then call http://localhost:8000/v1/chat/completions with any OpenAI-compatible client.
Python (ctransformers / llama-cpp-python)
from llama_cpp import Llama
llm = Llama(
model_path="Qwen3.5-9B-Uncensored-Q4_K_M.gguf",
n_gpu_layers=-1, # offload all layers to GPU
n_ctx=16384,
)
output = llm.create_chat_completion(
messages=[{"role": "user", "content": "Hello!"}]
)
print(output["choices"][0]["message"]["content"])
Model Details
| Property | Value |
|---|---|
| Architecture | Qwen 3.5 |
| Parameters | ~9B |
| Format | GGUF (Q4_K_M quantization) |
| File size | ~6.3 GB |
| Context window | Up to 131072 tokens |
| Languages | English, Chinese, multilingual |
| License | Apache 2.0 |
Used In: PhdBooster
This model serves as the local fallback vision model in PhdBooster โ an AI-powered browsing assistant that helps PhD students optimize their short video feeds.
What is PhdBooster?
PhD life is stressful. You open Douyin or Xiaohongshu to relax, but all you get is ads and news. PhdBooster is built on OpenClaw โ it scrolls through short video platforms while you write papers, uses vision models to actually "see" every video, and automatically likes & bookmarks content matching your taste, training the recommendation algorithm to serve you better.
You're writing a paper
-> PhdBooster is browsing videos for you
-> AI "sees" each video
-> Matches your taste? Auto like & bookmark
-> Platform algorithm learns your preferences
-> You open your phone โ feed is perfect
How this model fits in
PhdBooster uses a two-stage filtering funnel:
- Text quick-filter โ Parse title, hashtags, and author info to skip obvious non-targets (~60% filtered out)
- Visual deep-filter โ Screenshot the video and send to a vision model for analysis against your preference policy
The vision analysis uses a dual fallback strategy:
- Primary: Kimi 2.5 (Moonshot AI) โ fast, high quality
- Fallback: This model (Qwen3.5-9B-Uncensored) via local Ollama โ completely free, works offline, no content filtering
The uncensored nature of this model is important for PhdBooster's use case: it needs to analyze all types of visual content without refusals or safety-triggered false negatives.
Tech Stack
| Component | Choice |
|---|---|
| Agent Framework | OpenClaw |
| Browser Automation | OpenClaw Browser (Chrome CDP) |
| Primary LLM | step-3.5-flash:free (OpenRouter) |
| Primary Vision | Kimi 2.5 (Moonshot AI) |
| Fallback Vision | This model (Ollama) |
| Platforms | Douyin, Xiaohongshu |
For more details, see the PhdBooster README.
Notes
- This is an uncensored model โ it has reduced safety filters compared to the official release. Use responsibly.
- GGUF format enables efficient CPU + GPU inference without requiring the full PyTorch/transformers stack.
- For multi-GPU setups with Ollama, set
OLLAMA_NUM_PARALLELand adjustnum_gpuas needed.
- Downloads last month
- 3,527
4-bit