---
title: "mradermacher i1/imatrix quant card (Phi-4-reasoning-plus): static vs weighted quants"
type: summary
tags: [quantization, imatrix, accuracy, community, intermediate]
created: 2026-05-30
updated: 2026-05-30
sources: ["raw/community/community-mradermacher-imatrix.md"]
confidence: medium
llama_build: "n/a (community source, date unknown)"
source_url: "https://huggingface.co/mradermacher/Phi-4-reasoning-plus-i1-GGUF"
---

# mradermacher i1/imatrix quant card (Phi-4-reasoning-plus): static vs weighted quants

## Key Points
- mradermacher publishes two parallel repos per model: **`-i1-GGUF`** = weighted/imatrix quants (this card); **`-GGUF`** (no i1) = static quants. The `i1-` filename prefix marks every imatrix quant.
- Standing guidance line on every card: "(sorted by size, not necessarily quality. **IQ-quants are often preferable over similar sized non-IQ quants**)."
- The full prose "static vs weighted/imatrix" FAQ is NOT inline in this card — it lives on the mradermacher profile / `model_requests` page. Inline, the static-vs-imatrix and IQ-vs-K tradeoffs are encoded in the per-row Notes column.
- **Provided Quants table (Phi-4-reasoning-plus, i1/imatrix), size in GB + Notes:**
  - i1-IQ1_S 3.4 "for the desperate"; i1-IQ1_M 3.7 "mostly desperate"
  - i1-IQ2_XXS 4.2; i1-IQ2_XS 4.6; i1-IQ2_S 4.8; i1-IQ2_M 5.2
  - i1-Q2_K_S 5.3 "very low quality"; i1-Q2_K 5.6 "IQ3_XXS probably better"
  - i1-IQ3_XXS 5.9 "lower quality"; i1-IQ3_XS 6.3; i1-IQ3_S 6.6 "beats Q3_K*"; i1-Q3_K_S 6.6 "IQ3_XS probably better"; i1-IQ3_M 7.0; i1-Q3_K_M 7.5 "IQ3_S probably better"; i1-Q3_K_L 8.0 "IQ3_M probably better"
  - i1-IQ4_XS 8.0; i1-IQ4_NL 8.5 "prefer IQ4_XS"; i1-Q4_0 8.5 "fast, low quality"; i1-Q4_K_S 8.5 "optimal size/speed/quality"; i1-Q4_K_M 9.2 "fast, recommended"; i1-Q4_1 9.4
  - i1-Q5_K_S 10.3; i1-Q5_K_M 10.7; i1-Q6_K 12.1 "practically like static Q6_K"
- Key encoded rules of thumb: IQ3_S "beats Q3_K*"; at the same/near size IQ-quant beats the K-quant (Q2_K → prefer IQ3_XXS; Q3_K_S → prefer IQ3_XS; Q3_K_M → prefer IQ3_S; Q3_K_L → prefer IQ3_M; IQ4_NL → prefer IQ4_XS). Sweet spots: **Q4_K_S = "optimal size/speed/quality"**, **Q4_K_M = "fast, recommended."**
- At Q6_K the imatrix benefit vanishes: "practically like static Q6_K" (imatrix matters most at low bpw).
- Card cites the same two external references the community standardizes on: ikawrakow's PPL-vs-quant graph (nethype.de/quantpplgraph.png) and Artefact2's gist.

## Relevant Concepts
- [[concepts/imatrix]] — the weighted-vs-static distinction is the whole point of the `-i1-` repos; imatrix gains concentrate at low bpw.
- [[concepts/quantization]] — IQ-vs-K-at-equal-size ranking and the per-quant quality ladder.
- [[concepts/gguf-format]] — multi-part GGUF concatenation referenced via TheBloke README.
- [[entities/binary-imatrix]] — produces the importance matrix backing these i1 quants.
- [[entities/binary-llama-quantize]] — consumes the imatrix to emit the i1-* files.

## Source Metadata
- Type: community (HF model card)
- Author/platform: mradermacher / Hugging Face (nethype GmbH servers; nicoboss supercomputer access)
- Date: unknown; Phi-4-reasoning-plus era (~2025). FLAG: undated; note the standard FAQ prose is on a separate page, not this card.
- URL: https://huggingface.co/mradermacher/Phi-4-reasoning-plus-i1-GGUF
