Qwen3.5-9B GGUF
This repository contains a GGUF export based on unsloth/Qwen3.5-9B, prepared for local inference with llama.cpp-compatible runtimes such as llama.cpp, LM Studio, Ollama, Jan, and similar tools.
Although the fine-tuning data mixture included FunctionGemma-related function-calling data, this is still a Qwen 3.5 model family export. The GGUF file format, architecture family, tokenizer family, and runtime compatibility remain Qwen-based. The added data affects model behavior, not the underlying model format.
Model Summary
- Base model:
unsloth/Qwen3.5-9B - Format: GGUF
- Intended runtime:
llama.cppand compatible UIs/runtimes - Quantization:
Q4_K_M - Primary use case: local text generation / chat inference
Training / Source Workflow
The source workflow for this export was configured in a separate training repo using:
- LoRA SFT via
ms-swift - Base model:
unsloth/Qwen3.5-9B - Training datasets:
nohurry/Opus-4.6-Reasoning-3000x-filteredgoogle/mobile-actions
- Export path: merged checkpoint converted to GGUF with
llama.cpp
Effect of the FunctionGemma-Related Data
The FunctionGemma-related portion of the training mix affects how the model responds, especially for function-calling, action-oriented instructions, and structured assistant behavior.
It does not change:
- the GGUF file format
- the base architecture family
- the tokenizer family
llama.cppcompatibility
It may change:
- function-calling style
- structured output tendencies
- tool-use prompting behavior
- action-planning and instruction-following behavior
Files
This repository contains a GGUF export quantized as:
*Q4_K_M.gguf
Prompting
This model is based on the Qwen 3.5 chat family, so use a normal chat-style prompt format in your inference runtime.
Example:
You are a helpful assistant.
User: Explain how function calling works in a local LLM runtime.
Assistant:
If your runtime supports native Qwen chat templates, prefer that.
llama.cpp Usage
Example:
./llama-cli \
-m ./model.gguf \
-c 4096 \
-ngl 99 \
-p "You are a helpful assistant.\n\nUser: Write a short explanation of GGUF.\nAssistant:"
Notes
- GGUF files are for inference, not further fine-tuning.
- Memory usage depends on quantization level, context length, and runtime settings.
- Performance and output quality will vary across runtimes and prompt formatting.
Limitations
- Quantized models can lose some accuracy relative to higher-precision checkpoints.
- This model may hallucinate facts, tools, or function-call arguments.
- Verify important outputs before production use.
License
This repository follows the license and usage constraints of the base model and any training data used in the workflow. Review:
unsloth/Qwen3.5-9Bnohurry/Opus-4.6-Reasoning-3000x-filteredgoogle/mobile-actions
Credits
- Base model: Qwen / Unsloth
- Training stack:
ms-swift - GGUF conversion:
llama.cpp
- Downloads last month
- 168
4-bit