---
title: "Importance Matrix (imatrix)"
type: concept
tags: [imatrix, quantization, accuracy, advanced]
created: 2026-05-30
updated: 2026-05-30
sources: [raw/imatrix-readme.md, raw/quantize-readme.md]
confidence: high
llama_build: "master (~2026-05)"
---

# Importance Matrix (imatrix)

## Definition

An importance matrix (imatrix) is a set of per-weight importance statistics. It is gathered by running the full-precision (f16) model over a body of calibration text and recording which weights matter most. During [[concepts/quantization]], [[entities/binary-llama-quantize]] uses these statistics to preserve the most important weights, which markedly improves quality at low bit widths — especially for the IQ (i-quant) types.

## How It Works

The matrix is produced by [[entities/binary-imatrix]] running over a calibration corpus, and is later consumed by [[entities/binary-llama-quantize]] via `--imatrix`.

Statistics are computed on **squared activations**. Reported quantities (available via `--show-statistics`) include:

- **Sum(Act^2)** — sum of squared activations.
- **%Active** — fraction of activations above a `1e-5` threshold.
- **Entropy / E(norm)** — activation entropy.
- **ZD Score** — see arXiv 2406.17415.
- **CosSim** — cosine similarity versus the prior layer.

The default output format is GGUF. A legacy `dat` format is available via `--output-format dat` or by using a non-`.gguf` extension, and conversion is bidirectional. Multiple matrices can be merged by passing `--in-file` repeatedly.

## Key Parameters

- **`--imatrix FILE`** (on `llama-quantize`) — the file that consumes the matrix during quantization.
- **`--process-output`** (default `false`) — whether to apply the imatrix to `output.weight`. It is usually better NOT to, hence the default.
- **`--output-format {gguf,dat}`** — output format selection; GGUF is the default.
- **`--in-file`** — repeatable, merges multiple matrices.

## When To Use

Compute an imatrix before quantizing to a low-bit type — it is effectively required for good IQ results and helps minimize both Perplexity (ppl) and KL-Divergence (kld). Larger, more representative calibration data yields a better matrix; a few hundred KB of varied text is a common choice.

## Risks & Pitfalls

- A small or unrepresentative calibration corpus produces a weaker matrix — use varied text.
- Applying the imatrix to `output.weight` is usually counterproductive; leave `--process-output` at its default of `false` unless you have a reason not to.

## Related Concepts

- [[concepts/quantization]] — the process that consumes the imatrix.
- [[entities/binary-imatrix]] — the tool that produces the matrix.
- [[entities/binary-llama-quantize]] — the tool that applies it via `--imatrix`.

## Sources

- [[summaries/imatrix-readme]]
- [[summaries/quantize-readme]]
