Qwen3-4B-Kimi2.5-Reasoning-Distilled : GGUF
Qwen3-4B-Kimi2.5-Reasoning-Distilled is a fine-tuned language model optimized for structured, long-form reasoning. It is derived from the Qwen3-4b-Thinking-2507 base model and fine-tuned using a specialized distillation dataset generated by Kimi-2.5-thinking.
This model is designed to bridge the gap between small, efficient models (0.6B–4B range) and the complex reasoning capabilities typically found in much larger models. It excels at breaking down problems, self-correcting, and providing detailed analytical answers.
Base Model: Qwen3-4b-Thinking-2507 Training Technique: Unsloth + QLoRa
Available Model files:
qwen3-4b-thinking-2507.BF16.ggufqwen3-4b-thinking-2507.Q8_0.ggufqwen3-4b-thinking-2507.Q6_K.ggufqwen3-4b-thinking-2507.Q4_K_M.gguf
Ollama
An Ollama Modelfile is included for easy deployment.
Provided Quants
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
| Type | Size/GB | Notes |
|---|---|---|
| Q4_K_M | 2.5 | fast, recommended |
| Q6_K | 3.3 | very good quality |
| Q8_0 | 4.2 | fast, best quality |
| f16 | 8.0 | 16 bpw, overkill |
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9
Dataset
The model was fine-tuned on the khazarai/kimi-2.5-high-reasoning-250x
Dataset Composition:
- Total Samples: 250
- Total Tokens: 1,114,407
- Teacher Model: Kimi-2.5-Thinking
Acknowledgements
Unsloth for the incredibly fast and memory-efficient training framework.
- Downloads last month
- 657
4-bit
6-bit
8-bit
16-bit
Model tree for khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled-GGUF
Base model
Qwen/Qwen3-4B-Thinking-2507