Qwen3.5-9B GGUF

This repository contains a GGUF export based on unsloth/Qwen3.5-9B, prepared for local inference with llama.cpp-compatible runtimes such as llama.cpp, LM Studio, Ollama, Jan, and similar tools.

Although the fine-tuning data mixture included FunctionGemma-related function-calling data, this is still a Qwen 3.5 model family export. The GGUF file format, architecture family, tokenizer family, and runtime compatibility remain Qwen-based. The added data affects model behavior, not the underlying model format.

Model Summary

Base model: unsloth/Qwen3.5-9B
Format: GGUF
Intended runtime: llama.cpp and compatible UIs/runtimes
Quantization: Q4_K_M
Primary use case: local text generation / chat inference

Training / Source Workflow

The source workflow for this export was configured in a separate training repo using:

LoRA SFT via ms-swift
Base model: unsloth/Qwen3.5-9B
Training datasets:
- nohurry/Opus-4.6-Reasoning-3000x-filtered
- google/mobile-actions
Export path: merged checkpoint converted to GGUF with llama.cpp

Effect of the FunctionGemma-Related Data

The FunctionGemma-related portion of the training mix affects how the model responds, especially for function-calling, action-oriented instructions, and structured assistant behavior.

It does not change:

the GGUF file format
the base architecture family
the tokenizer family
llama.cpp compatibility

It may change:

function-calling style
structured output tendencies
tool-use prompting behavior
action-planning and instruction-following behavior

Files

This repository contains a GGUF export quantized as:

*Q4_K_M.gguf

Prompting

This model is based on the Qwen 3.5 chat family, so use a normal chat-style prompt format in your inference runtime.

Example:

You are a helpful assistant.

User: Explain how function calling works in a local LLM runtime.
Assistant:

If your runtime supports native Qwen chat templates, prefer that.

llama.cpp Usage

Example:

./llama-cli \
  -m ./model.gguf \
  -c 4096 \
  -ngl 99 \
  -p "You are a helpful assistant.\n\nUser: Write a short explanation of GGUF.\nAssistant:"

Notes

GGUF files are for inference, not further fine-tuning.
Memory usage depends on quantization level, context length, and runtime settings.
Performance and output quality will vary across runtimes and prompt formatting.

Limitations

Quantized models can lose some accuracy relative to higher-precision checkpoints.
This model may hallucinate facts, tools, or function-call arguments.
Verify important outputs before production use.

License

This repository follows the license and usage constraints of the base model and any training data used in the workflow. Review:

unsloth/Qwen3.5-9B
nohurry/Opus-4.6-Reasoning-3000x-filtered
google/mobile-actions

Credits

Base model: Qwen / Unsloth
Training stack: ms-swift
GGUF conversion: llama.cpp

Downloads last month: 168

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

4-bit

Model tree for slyfox1186/qwen3.5-9b-opus-4.6-functiongemma.gguf

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

unsloth/Qwen3.5-9B

Quantized

(3)

this model