wikis / llama.cpp / wiki / summaries / community-mradermacher-imatrix.md view as markdown
mradermacher i1/imatrix quant card (Phi-4-reasoning-plus): static vs weighted quants
Key Points
- mradermacher publishes two parallel repos per model:
-i1-GGUF= weighted/imatrix quants (this card);-GGUF(no i1) = static quants. Thei1-filename prefix marks every imatrix quant. - Standing guidance line on every card: "(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)."
- The full prose "static vs weighted/imatrix" FAQ is NOT inline in this card โ it lives on the mradermacher profile /
model_requestspage. Inline, the static-vs-imatrix and IQ-vs-K tradeoffs are encoded in the per-row Notes column. - Provided Quants table (Phi-4-reasoning-plus, i1/imatrix), size in GB + Notes:
- i1-IQ1_S 3.4 "for the desperate"; i1-IQ1_M 3.7 "mostly desperate"
- i1-IQ2_XXS 4.2; i1-IQ2_XS 4.6; i1-IQ2_S 4.8; i1-IQ2_M 5.2
- i1-Q2_K_S 5.3 "very low quality"; i1-Q2_K 5.6 "IQ3_XXS probably better"
- i1-IQ3_XXS 5.9 "lower quality"; i1-IQ3_XS 6.3; i1-IQ3_S 6.6 "beats Q3_K*"; i1-Q3_K_S 6.6 "IQ3_XS probably better"; i1-IQ3_M 7.0; i1-Q3_K_M 7.5 "IQ3_S probably better"; i1-Q3_K_L 8.0 "IQ3_M probably better"
- i1-IQ4_XS 8.0; i1-IQ4_NL 8.5 "prefer IQ4_XS"; i1-Q4_0 8.5 "fast, low quality"; i1-Q4_K_S 8.5 "optimal size/speed/quality"; i1-Q4_K_M 9.2 "fast, recommended"; i1-Q4_1 9.4
- i1-Q5_K_S 10.3; i1-Q5_K_M 10.7; i1-Q6_K 12.1 "practically like static Q6_K"
- Key encoded rules of thumb: IQ3_S "beats Q3_K*"; at the same/near size IQ-quant beats the K-quant (Q2_K โ prefer IQ3_XXS; Q3_K_S โ prefer IQ3_XS; Q3_K_M โ prefer IQ3_S; Q3_K_L โ prefer IQ3_M; IQ4_NL โ prefer IQ4_XS). Sweet spots: Q4_K_S = "optimal size/speed/quality", Q4_K_M = "fast, recommended."
- At Q6_K the imatrix benefit vanishes: "practically like static Q6_K" (imatrix matters most at low bpw).
- Card cites the same two external references the community standardizes on: ikawrakow's PPL-vs-quant graph (nethype.de/quantpplgraph.png) and Artefact2's gist.
Relevant Concepts
- imatrix โ the weighted-vs-static distinction is the whole point of the
-i1-repos; imatrix gains concentrate at low bpw. - quantization โ IQ-vs-K-at-equal-size ranking and the per-quant quality ladder.
- gguf format โ multi-part GGUF concatenation referenced via TheBloke README.
- binary imatrix โ produces the importance matrix backing these i1 quants.
- binary llama quantize โ consumes the imatrix to emit the i1-* files.
Source Metadata
- Type: community (HF model card)
- Author/platform: mradermacher / Hugging Face (nethype GmbH servers; nicoboss supercomputer access)
- Date: unknown; Phi-4-reasoning-plus era (~2025). FLAG: undated; note the standard FAQ prose is on a separate page, not this card.
- URL: https://huggingface.co/mradermacher/Phi-4-reasoning-plus-i1-GGUF
