Wraith 8B (GGUF imatrix quants)

mradermacher/wraith-8b-i1-GGUF is an imatrix-weighted GGUF quant pack built from vanta-research/wraith-8b. If you’re running llama.cpp-compatible tooling (or anything that consumes GGUF), the value here is packaging: a wide menu of ready-to-download quant variants so you can trade memory for speed and quality without doing your own conversion.

The repository includes an imatrix file (useful if you want to generate your own quants) plus many prebuilt IQ* and Q* options (GGUF quant variants with different size/quality tradeoffs), with terse notes like “for the desperate” on the smallest ones. There’s also a link to a separate “static quants” repo and an external overview/download list.

What to try first: start with a mid-range quant (Q4_K_M or Q4_K_S are usually good first checks), run a small prompt suite that includes both reasoning and instruction-following tasks, and watch for increased hallucinations, failure to follow instructions, or brittle behavior on edge cases as you move to more aggressive IQ2/IQ1 sizes.

Quick stats from the listing feed at the time of writing: 3 likes · 3111 downloads.

View on Hugging Face

Source listing: https://huggingface.co/models?sort=modified