LCO-Embedding-Omni-3B-GGUF

GGUF quantizations of LCO-Embedding/LCO-Embedding-Omni-3B for use with llama.cpp.

Converted using ht-llama.cpp, a fork with added support for the Qwen2_5OmniThinkerForConditionalGeneration architecture.

About the model

LCO-Embedding-Omni-3B is a multimodal embedding model based on the Thinker component of Qwen 2.5 Omni, fine-tuned with LoRA and contrastive learning to produce 2048-dimensional embeddings from text, images, audio, and video. Uses last-token pooling.

See Scaling Language-Centric Omnimodal Representation Learning (NeurIPS 2025) for details.

Available files

Standard quantizations

File Quant Size Description
LCO-Embedding-Omni-3B-BF16.gguf BF16 -- Full precision, no quality loss
LCO-Embedding-Omni-3B-Q8_0.gguf Q8_0 -- Near-lossless quantization
LCO-Embedding-Omni-3B-Q4_K_M.gguf Q4_K_M -- Good balance of quality and size
LCO-Embedding-Omni-3B-Q3_K_M.gguf Q3_K_M -- Smaller, some quality loss
LCO-Embedding-Omni-3B-Q2_K.gguf Q2_K -- Smallest, more quality loss

Importance matrix (imatrix) quantizations

Quantized with an importance matrix computed from WikiText-2 calibration data for improved quality at low bit widths.

File Quant Size Description
LCO-Embedding-Omni-3B-IQ4_XS.gguf IQ4_XS -- 4.25 bpw, imatrix-optimized
LCO-Embedding-Omni-3B-IQ3_M.gguf IQ3_M -- 3.66 bpw, imatrix-optimized
LCO-Embedding-Omni-3B-IQ3_XS.gguf IQ3_XS -- 3.3 bpw, imatrix-optimized
LCO-Embedding-Omni-3B-IQ2_M.gguf IQ2_M -- 2.7 bpw, imatrix-optimized

Multimodal projection

File Quant Size Description
mmproj-LCO-Embedding-Omni-3b-F16.gguf F16 -- Vision + audio projection (required for multimodal)

For text-only embedding, you only need one of the text model GGUFs. For multimodal (image/audio/video), you also need the mmproj file.

Quantization quality

Measured on 8 diverse text sentences (2048-dim embeddings). BF16 is the reference.

Embedding quality vs BF16

Results will be added after quantization testing.

pgvector retrieval quality (query with quant, corpus in BF16)

Results will be added after quantization testing.

Usage

Build llama.cpp

git clone https://github.com/heiervang-technologies/ht-llama.cpp
cd ht-llama.cpp
cmake -B build
cmake --build build --target llama-embedding llama-server -j$(nproc)

Text embeddings (CLI)

./build/bin/llama-embedding \
  -m LCO-Embedding-Omni-3B-Q8_0.gguf \
  --pooling last \
  -p "Your text here"

Text embeddings (server)

./build/bin/llama-server \
  -m LCO-Embedding-Omni-3B-Q8_0.gguf \
  --embedding --pooling last

curl -s http://localhost:8080/embeddings \
  -d '{"content": "Your text here"}'

Multimodal embeddings (vision + audio)

Requires the mmproj file:

./build/bin/llama-server \
  -m LCO-Embedding-Omni-3B-Q8_0.gguf \
  --mmproj mmproj-LCO-Embedding-Omni-3b-F16.gguf \
  --embedding --pooling last
# Image embedding (base64-encoded image)
curl -s http://localhost:8080/embeddings \
  -d '{"content": [{"prompt_string": "<__media__>", "multimodal_data": ["<base64-image-data>"]}]}'

# Audio embedding (base64-encoded WAV)
curl -s http://localhost:8080/embeddings \
  -d '{"content": [{"prompt_string": "<__media__>", "multimodal_data": ["<base64-audio-data>"]}]}'

JSON output (for programmatic use)

./build/bin/llama-embedding \
  -m LCO-Embedding-Omni-3B-Q8_0.gguf \
  --pooling last \
  --embd-output-format json \
  -p "Your text here"

Notes

  • This is a quantization of LCO-Embedding/LCO-Embedding-Omni-3B -- see the original model card for benchmarks, training details, and licensing
  • The --pooling last flag is required -- this model uses last-token pooling, not mean pooling
  • Embedding dimensions: 2048
  • Contributions and bug reports welcome at ht-llama.cpp

Citations

LCO-Embedding

@article{xiao2025scaling,
  title={Scaling Language-Centric Omnimodal Representation Learning},
  author={Xiao, Chenghao and Chan, Hou Pong and Zhang, Hao and Xu, Weiwen and Aljunied, Mahani and Rong, Yu},
  journal={arXiv preprint arXiv:2510.11693},
  year={2025}
}

Qwen 2.5 Omni

@article{Qwen2.5-Omni,
  title={Qwen2.5-Omni Technical Report},
  author={Jin Xu and Zhifang Guo and Jinzheng He and Hangrui Hu and Ting He and Shuai Bai and Keqin Chen and Jialin Wang and Yang Fan and Kai Dang and Bin Zhang and Xiong Wang and Yunfei Chu and Junyang Lin},
  journal={arXiv preprint arXiv:2503.20215},
  year={2025}
}
Downloads last month
849
GGUF
Model size
3B params
Architecture
qwen2vl
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for marksverdhei/LCO-Embedding-Omni-3B-GGUF

Quantized
(1)
this model

Collection including marksverdhei/LCO-Embedding-Omni-3B-GGUF

Papers for marksverdhei/LCO-Embedding-Omni-3B-GGUF

Evaluation results