Qwen3.5-9B GGUF

This repository contains a GGUF export based on unsloth/Qwen3.5-9B, prepared for local inference with llama.cpp-compatible runtimes such as llama.cpp, LM Studio, Ollama, Jan, and similar tools.

Although the fine-tuning data mixture included FunctionGemma-related function-calling data, this is still a Qwen 3.5 model family export. The GGUF file format, architecture family, tokenizer family, and runtime compatibility remain Qwen-based. The added data affects model behavior, not the underlying model format.

Model Summary

  • Base model: unsloth/Qwen3.5-9B
  • Format: GGUF
  • Intended runtime: llama.cpp and compatible UIs/runtimes
  • Quantization: Q4_K_M
  • Primary use case: local text generation / chat inference

Training / Source Workflow

The source workflow for this export was configured in a separate training repo using:

  • LoRA SFT via ms-swift
  • Base model: unsloth/Qwen3.5-9B
  • Training datasets:
    • nohurry/Opus-4.6-Reasoning-3000x-filtered
    • google/mobile-actions
  • Export path: merged checkpoint converted to GGUF with llama.cpp

Effect of the FunctionGemma-Related Data

The FunctionGemma-related portion of the training mix affects how the model responds, especially for function-calling, action-oriented instructions, and structured assistant behavior.

It does not change:

  • the GGUF file format
  • the base architecture family
  • the tokenizer family
  • llama.cpp compatibility

It may change:

  • function-calling style
  • structured output tendencies
  • tool-use prompting behavior
  • action-planning and instruction-following behavior

Files

This repository contains a GGUF export quantized as:

  • *Q4_K_M.gguf

Prompting

This model is based on the Qwen 3.5 chat family, so use a normal chat-style prompt format in your inference runtime.

Example:

You are a helpful assistant.

User: Explain how function calling works in a local LLM runtime.
Assistant:

If your runtime supports native Qwen chat templates, prefer that.

llama.cpp Usage

Example:

./llama-cli \
  -m ./model.gguf \
  -c 4096 \
  -ngl 99 \
  -p "You are a helpful assistant.\n\nUser: Write a short explanation of GGUF.\nAssistant:"

Notes

  • GGUF files are for inference, not further fine-tuning.
  • Memory usage depends on quantization level, context length, and runtime settings.
  • Performance and output quality will vary across runtimes and prompt formatting.

Limitations

  • Quantized models can lose some accuracy relative to higher-precision checkpoints.
  • This model may hallucinate facts, tools, or function-call arguments.
  • Verify important outputs before production use.

License

This repository follows the license and usage constraints of the base model and any training data used in the workflow. Review:

  • unsloth/Qwen3.5-9B
  • nohurry/Opus-4.6-Reasoning-3000x-filtered
  • google/mobile-actions

Credits

  • Base model: Qwen / Unsloth
  • Training stack: ms-swift
  • GGUF conversion: llama.cpp
Downloads last month
168
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for slyfox1186/qwen3.5-9b-opus-4.6-functiongemma.gguf

Finetuned
Qwen/Qwen3.5-9B
Quantized
(3)
this model