Qwen3.5-9B-Uncensored (GGUF)

An uncensored GGUF merge of Qwen 3.5 9B, ready for local deployment with Ollama, llama.cpp, or any GGUF-compatible runtime.

Background

This model is built upon HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive. The original base model has the following known issues:

  1. Ollama deployment failure โ€” The base model cannot be directly deployed via Ollama due to architecture/format incompatibilities. This GGUF version resolves the issue by converting and merging the weights into a single GGUF file that Ollama can load natively.
  2. Broken multimodal input โ€” The base model's packaging causes multimodal (e.g., image) input to malfunction. Although the underlying Qwen 3.5 architecture supports vision capabilities, the way the original model was packaged breaks multimodal inference.

This repo provides a Q4_K_M quantized GGUF version that fixes the Ollama deployment issue while keeping the model compact and efficient.

Quick Start

Ollama (Recommended)

  1. Create a Modelfile:
FROM ./Qwen3.5-9B-Uncensored-Q4_K_M.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 16384

TEMPLATE """{{- if .System }}{{ .System }}{{ end }}
{{- range .Messages }}
<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

SYSTEM "You are a helpful assistant."
  1. Build and run:
# Download the GGUF file
huggingface-cli download LEONW24/Qwen3.5-9B-Uncensored Qwen3.5-9B-Uncensored-Q4_K_M.gguf --local-dir .

# Create the Ollama model
ollama create qwen35-uncensored -f Modelfile

# Run
ollama run qwen35-uncensored

llama.cpp

# Download
huggingface-cli download LEONW24/Qwen3.5-9B-Uncensored Qwen3.5-9B-Uncensored-Q4_K_M.gguf --local-dir .

# Run with llama-cli
llama-cli -m Qwen3.5-9B-Uncensored-Q4_K_M.gguf -p "Hello, who are you?" -n 256 -ngl 99

llama-cpp-python (OpenAI-compatible API)

pip install llama-cpp-python[server]

python -m llama_cpp.server \
  --model Qwen3.5-9B-Uncensored-Q4_K_M.gguf \
  --n_gpu_layers 99 \
  --chat_format chatml

Then call http://localhost:8000/v1/chat/completions with any OpenAI-compatible client.

Python (ctransformers / llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="Qwen3.5-9B-Uncensored-Q4_K_M.gguf",
    n_gpu_layers=-1,  # offload all layers to GPU
    n_ctx=16384,
)

output = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Hello!"}]
)
print(output["choices"][0]["message"]["content"])

Model Details

Property Value
Architecture Qwen 3.5
Parameters ~9B
Format GGUF (Q4_K_M quantization)
File size ~6.3 GB
Context window Up to 131072 tokens
Languages English, Chinese, multilingual
License Apache 2.0

Used In: PhdBooster

This model serves as the local fallback vision model in PhdBooster โ€” an AI-powered browsing assistant that helps PhD students optimize their short video feeds.

What is PhdBooster?

PhD life is stressful. You open Douyin or Xiaohongshu to relax, but all you get is ads and news. PhdBooster is built on OpenClaw โ€” it scrolls through short video platforms while you write papers, uses vision models to actually "see" every video, and automatically likes & bookmarks content matching your taste, training the recommendation algorithm to serve you better.

You're writing a paper
  -> PhdBooster is browsing videos for you
    -> AI "sees" each video
      -> Matches your taste? Auto like & bookmark
        -> Platform algorithm learns your preferences
          -> You open your phone โ€” feed is perfect

How this model fits in

PhdBooster uses a two-stage filtering funnel:

  1. Text quick-filter โ€” Parse title, hashtags, and author info to skip obvious non-targets (~60% filtered out)
  2. Visual deep-filter โ€” Screenshot the video and send to a vision model for analysis against your preference policy

The vision analysis uses a dual fallback strategy:

  • Primary: Kimi 2.5 (Moonshot AI) โ€” fast, high quality
  • Fallback: This model (Qwen3.5-9B-Uncensored) via local Ollama โ€” completely free, works offline, no content filtering

The uncensored nature of this model is important for PhdBooster's use case: it needs to analyze all types of visual content without refusals or safety-triggered false negatives.

Tech Stack

Component Choice
Agent Framework OpenClaw
Browser Automation OpenClaw Browser (Chrome CDP)
Primary LLM step-3.5-flash:free (OpenRouter)
Primary Vision Kimi 2.5 (Moonshot AI)
Fallback Vision This model (Ollama)
Platforms Douyin, Xiaohongshu

For more details, see the PhdBooster README.


Notes

  • This is an uncensored model โ€” it has reduced safety filters compared to the official release. Use responsibly.
  • GGUF format enables efficient CPU + GPU inference without requiring the full PyTorch/transformers stack.
  • For multi-GPU setups with Ollama, set OLLAMA_NUM_PARALLEL and adjust num_gpu as needed.
Downloads last month
3,527
GGUF
Model size
10B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LEONW24/Qwen3.5-9B-Uncensored

Quantized
(6)
this model