HERETIC – Mistral-Nemo 2407 12B Thinking (GGUF)

HERETIC is a reasoning-oriented variant of the Mistral-Nemo 2407 12B architecture distributed in GGUF format for efficient local inference.
The model is intended for users who want a flexible conversational assistant capable of analytical reasoning, long-form explanations, and open-ended dialogue while running entirely on local hardware.

This repository provides quantized versions optimized for llama.cpp–based runtimes and other compatible inference tools.

Model Details

Model Name: Mistral-Nemo-2407-12B-Thinking-Claude-Gemini-GPT5.2-Uncensored-HERETIC
Architecture: Mistral-Nemo (12B parameters)
Format: GGUF
Base Model: Mistral-Nemo-2407
Distribution: Quantized builds for local inference
Primary Capability: Instruction-following with extended reasoning and conversational flexibility

HERETIC focuses on encouraging multi-step reasoning and detailed responses while maintaining a natural conversational style.

Intended Use

This model is designed primarily for local deployments and experimentation.

Typical use cases include:

Personal AI assistants
Coding help and technical explanations
Analytical reasoning tasks
Brainstorming and creative writing
Prompt engineering and LLM experimentation
Offline or privacy-focused AI workflows

Out-of-Scope Use

The model should not be relied upon for:

Legal advice
Medical advice
Safety-critical decision making
Automated moderation systems

Outputs may contain inaccuracies or biased information.

Prompt Format

The model works best with structured role-based prompts.

Example conversation template:


<|system|>
You are a helpful AI assistant.

<|user|>
Explain how neural networks learn.

<|assistant|>

Some interfaces automatically apply a compatible chat template.

Running the Model

This model uses the GGUF format, making it compatible with several local inference tools.

llama.cpp

Example command:

./llama.exe -m Mistral-Nemo-2407-12B-Thinking-Claude-Gemini-GPT5.2-Uncensored-HERETIC_Q4_K_M.gguf -p "Explain quantum computing in simple terms."

Limitations

Like most large language models:

The model can generate incorrect information.
It may hallucinate facts or citations.
Output quality depends heavily on prompt design.
Responses reflect biases present in training data.

Users should critically evaluate outputs before relying on them.

Acknowledgements

This model builds on contributions from several open-source projects:

The Mistral research team for the underlying architecture
The llama.cpp ecosystem enabling efficient local inference
The GGUF format used for optimized model distribution
The open-source community that develops tools for local LLM deployment

Disclaimer

This model is provided for research, experimentation, and local use. Users are responsible for ensuring that deployments comply with applicable laws and the licensing terms of the underlying base model.

Downloads last month: 5,277

GGUF

Model size

12B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Collection including Andycurrent/Mistral-Nemo-2407-12B-Thinking-Claude-Gemini-GPT5.2-Uncensored-HERETIC-GGUF

UNCENSORED MODELS

Collection

13 items • Updated 6 days ago • 10