Qwen3.5-9B-Uncensored (GGUF)

An uncensored GGUF merge of Qwen 3.5 9B, ready for local deployment with Ollama, llama.cpp, or any GGUF-compatible runtime.

Background

This model is built upon HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive. The original base model has the following known issues:

Ollama deployment failure — The base model cannot be directly deployed via Ollama due to architecture/format incompatibilities. This GGUF version resolves the issue by converting and merging the weights into a single GGUF file that Ollama can load natively.
Broken multimodal input — The base model's packaging causes multimodal (e.g., image) input to malfunction. Although the underlying Qwen 3.5 architecture supports vision capabilities, the way the original model was packaged breaks multimodal inference.

This repo provides a Q4_K_M quantized GGUF version that fixes the Ollama deployment issue while keeping the model compact and efficient.

Quick Start

Ollama (Recommended)

Create a Modelfile:

FROM ./Qwen3.5-9B-Uncensored-Q4_K_M.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 16384

TEMPLATE """{{- if .System }}{{ .System }}{{ end }}
{{- range .Messages }}
<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

SYSTEM "You are a helpful assistant."

Build and run:

# Download the GGUF file
huggingface-cli download LEONW24/Qwen3.5-9B-Uncensored Qwen3.5-9B-Uncensored-Q4_K_M.gguf --local-dir .

# Create the Ollama model
ollama create qwen35-uncensored -f Modelfile

# Run
ollama run qwen35-uncensored

llama.cpp

# Download
huggingface-cli download LEONW24/Qwen3.5-9B-Uncensored Qwen3.5-9B-Uncensored-Q4_K_M.gguf --local-dir .

# Run with llama-cli
llama-cli -m Qwen3.5-9B-Uncensored-Q4_K_M.gguf -p "Hello, who are you?" -n 256 -ngl 99

llama-cpp-python (OpenAI-compatible API)

pip install llama-cpp-python[server]

python -m llama_cpp.server \
  --model Qwen3.5-9B-Uncensored-Q4_K_M.gguf \
  --n_gpu_layers 99 \
  --chat_format chatml

Then call http://localhost:8000/v1/chat/completions with any OpenAI-compatible client.

Python (ctransformers / llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="Qwen3.5-9B-Uncensored-Q4_K_M.gguf",
    n_gpu_layers=-1,  # offload all layers to GPU
    n_ctx=16384,
)

output = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Hello!"}]
)
print(output["choices"][0]["message"]["content"])

Model Details

Property	Value
Architecture	Qwen 3.5
Parameters	~9B
Format	GGUF (Q4_K_M quantization)
File size	~6.3 GB
Context window	Up to 131072 tokens
Languages	English, Chinese, multilingual
License	Apache 2.0

Used In: PhdBooster

This model serves as the local fallback vision model in PhdBooster — an AI-powered browsing assistant that helps PhD students optimize their short video feeds.

What is PhdBooster?

PhD life is stressful. You open Douyin or Xiaohongshu to relax, but all you get is ads and news. PhdBooster is built on OpenClaw — it scrolls through short video platforms while you write papers, uses vision models to actually "see" every video, and automatically likes & bookmarks content matching your taste, training the recommendation algorithm to serve you better.

You're writing a paper
  -> PhdBooster is browsing videos for you
    -> AI "sees" each video
      -> Matches your taste? Auto like & bookmark
        -> Platform algorithm learns your preferences
          -> You open your phone — feed is perfect

How this model fits in

PhdBooster uses a two-stage filtering funnel:

Text quick-filter — Parse title, hashtags, and author info to skip obvious non-targets (~60% filtered out)
Visual deep-filter — Screenshot the video and send to a vision model for analysis against your preference policy

The vision analysis uses a dual fallback strategy:

Primary: Kimi 2.5 (Moonshot AI) — fast, high quality
Fallback: This model (Qwen3.5-9B-Uncensored) via local Ollama — completely free, works offline, no content filtering

The uncensored nature of this model is important for PhdBooster's use case: it needs to analyze all types of visual content without refusals or safety-triggered false negatives.

Tech Stack

Component	Choice
Agent Framework	OpenClaw
Browser Automation	OpenClaw Browser (Chrome CDP)
Primary LLM	step-3.5-flash:free (OpenRouter)
Primary Vision	Kimi 2.5 (Moonshot AI)
Fallback Vision	This model (Ollama)
Platforms	Douyin, Xiaohongshu

For more details, see the PhdBooster README.

Notes

This is an uncensored model — it has reduced safety filters compared to the official release. Use responsibly.
GGUF format enables efficient CPU + GPU inference without requiring the full PyTorch/transformers stack.
For multi-GPU setups with Ollama, set OLLAMA_NUM_PARALLEL and adjust num_gpu as needed.

Downloads last month: 3,527

GGUF

Model size

10B params

Architecture

qwen35

Hardware compatibility

4-bit

Model tree for LEONW24/Qwen3.5-9B-Uncensored

Base model

HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive

Quantized

(6)

this model