# Hugging Face — full corpus


<!-- ===== huggingface/README.md ===== -->

# LLM Wiki

An open-source template for building LLM-powered knowledge bases, following [Andrej Karpathy's "LLM Wiki" pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).

You provide raw sources. The LLM reads them, writes structured wiki pages, cross-links everything, and maintains it over time. You never edit the wiki directly — you curate sources and ask questions.

## How It Works

The system has three layers:

```
raw/              Sources you collect (articles, transcripts, notes, PDFs)
wiki/             LLM-written & maintained pages (summaries, concepts, entities, syntheses)
CLAUDE.md         Schema that tells the LLM how to structure everything
```

Three operations drive the workflow:

| Operation | Trigger | What happens |
|-----------|---------|--------------|
| **Ingest** | "ingest raw/my-source.txt" | LLM reads the source, creates a summary page, creates/updates concept and entity pages, adds cross-links, updates the index and log |
| **Query** | Ask any question | LLM searches the wiki, synthesizes an answer with citations, optionally creates a synthesis page for novel insights |
| **Lint** | "lint" or "health check" | LLM audits all pages for orphans, contradictions, missing links, incomplete sections, and low-confidence claims — fixes what it can, reports the rest |

## Quick Start

1. **Clone this repo**
   ```bash
   git clone https://github.com/YOUR_USERNAME/llm-wiki.git my-knowledge-base
   cd my-knowledge-base
   ```

2. **Customize CLAUDE.md** for your domain
   - Update the Purpose section with your topic
   - Replace the placeholder tagging taxonomy with your own categories
   - Adjust confidence level descriptions if needed
   - Everything else (workflows, page formats, linking rules) works as-is

3. **Drop sources into `raw/`**
   - Text files, transcripts, articles, notes — any plain text
   - These are immutable once added; the LLM never modifies them

4. **Tell the LLM to ingest**
   ```
   ingest raw/my-first-source.txt
   ```
   The LLM will create summary pages, concept pages, entity pages, cross-links, and update the index.

5. **Ask questions**
   ```
   What are the key differences between X and Y?
   ```
   The LLM answers from the wiki, citing specific pages.

6. **Run health checks**
   ```
   lint
   ```
   The LLM audits the wiki and fixes issues.

## Directory Structure

```
.
├── CLAUDE.md                      # Schema — the LLM's instructions
├── raw/                           # Your source documents (immutable)
└── wiki/
    ├── index.md                   # Master catalog of all pages
    ├── log.md                     # Append-only activity log
    ├── dashboard.md               # Dataview dashboard (Obsidian)
    ├── analytics.md               # Charts View analytics (Obsidian)
    ├── flashcards.md              # Spaced repetition cards
    ├── summaries/                 # One page per source document
    ├── concepts/                  # Concept and framework pages
    ├── entities/                  # People, tools, organizations, etc.
    ├── syntheses/                 # Cross-cutting analyses and comparisons
    ├── journal/                   # Research/session journal entries
    │   └── template.md            # Journal entry template
    └── presentations/             # Marp slide decks
```

## Enhancements

This template includes several extras beyond the core wiki pattern:

### Dataview Dashboard (`wiki/dashboard.md`)
Live queries that surface low-confidence pages, recent updates, concepts by tag, and pages with the most sources. Requires the [Dataview](https://github.com/blacksmithgu/obsidian-dataview) Obsidian plugin.

### Charts View Analytics (`wiki/analytics.md`)
Visual analytics with pie charts, bar charts, and word clouds. Requires the [Charts View](https://github.com/caronchen/obsidian-chartsview-plugin) Obsidian plugin.

### Mermaid Diagrams
Use Mermaid code blocks in any wiki page to create flowcharts, sequence diagrams, or concept maps. Native support in Obsidian and GitHub.

### Marp Slides (`wiki/presentations/`)
Create slide decks from markdown using [Marp](https://marp.app/). Drop presentation files in this directory.

### Research Journal (`wiki/journal/`)
Track your research sessions, experiments, or applied work with the included template. The LLM can reference journal entries when answering queries.

### Spaced Repetition (`wiki/flashcards.md`)
Flashcards in the format used by the [Spaced Repetition](https://github.com/st3v3nmw/obsidian-spaced-repetition) Obsidian plugin. Ask the LLM to generate flashcards from any wiki page.

### MCP Server
This repo works with Claude Code's MCP server capabilities. Point an MCP-compatible client at this repo and the LLM can read/write the wiki programmatically.

## Customizing for Your Domain

The schema in `CLAUDE.md` is domain-agnostic. To adapt it:

1. **Purpose** — Describe your knowledge domain in one paragraph
2. **Tagging taxonomy** — Replace placeholder categories with your own (e.g., for a cooking KB: `cuisine`, `technique`, `ingredient`, `equipment`)
3. **Confidence levels** — Adjust the descriptions to match your domain's evidence standards
4. **Entity types** — Update the entity page description to match what entities mean in your domain (people, tools, companies, etc.)
5. **Journal template** — Customize `wiki/journal/template.md` for your workflow

Everything else — page format, linking conventions, workflows, rules — is universal and works across domains.

## Example Domains

This template works for any knowledge-intensive topic:

- **Research notes** — papers, experiments, methodologies
- **Book analysis** — themes, characters, author techniques
- **Competitive analysis** — companies, products, market trends
- **Course notes** — lectures, readings, key concepts
- **Personal development** — frameworks, habits, book summaries
- **Technical documentation** — APIs, architectures, design patterns
- **Hobby deep-dives** — any subject you want to master

## License

MIT


<!-- ===== huggingface/wiki/index.md ===== -->

---
title: "Hugging Face KB — Master Index"
type: index
updated: 2026-06-23
---

# Hugging Face KB — Master Index

**Domain:** Hugging Face — the Hub and core ecosystem: finding/using models, datasets, and Spaces; repositories and model cards; the `huggingface_hub` library and `hf` CLI; authentication; Transformers and Datasets basics; and Inference (Providers + Endpoints).
**Corpus:** 187 provenance-stamped sources in `raw/` — the Hub docs (`huggingface/hub-docs`), the `huggingface_hub` library guides, and orientation/tutorial slices of `transformers` and `datasets`.
**Pages:** 16 (12 concepts · 2 entities · 1 summary · 1 synthesis) — the user ring plus the operator/developer ring.

## Concepts (core ideas + operational how-tos)

- [[concepts/what-is-hugging-face]] — the platform: the Hub (models/datasets/Spaces) and the library ecosystem
- [[concepts/the-hub-and-repositories]] — git + git-LFS/Xet repos, the three repo types, cloning, PRs & discussions
- [[concepts/authentication-and-tokens]] — access token types, creating tokens, `hf auth login`, `HF_TOKEN`, gated repos
- [[concepts/finding-and-downloading-models]] — search/filter, `hf_hub_download`/`snapshot_download`, `hf download`, the local cache
- [[concepts/uploading-and-sharing]] — `upload_file`/`upload_folder`, `hf upload`, `push_to_hub`, creating a repo
- [[concepts/model-cards]] — the README card, the YAML metadata block, tags/license, eval results
- [[concepts/huggingface-hub-cli]] — the `hf` CLI command set (auth, download, upload, repo, cache)
- [[concepts/transformers-basics]] — `pipeline()`, `AutoModel`/`AutoTokenizer`, `from_pretrained`
- [[concepts/running-llms-with-transformers]] — text generation with `generate()`, chat templates, basic optimization
- [[concepts/datasets-basics]] — `load_dataset`, splits, streaming, accessing rows
- [[concepts/inference-providers-and-endpoints]] — serverless Inference Providers (`InferenceClient`) vs dedicated Inference Endpoints
- [[concepts/spaces]] — Spaces SDKs (Gradio/Streamlit/Docker/static), the README config block, hardware tiers

## Entities

- [[entities/huggingface-hub-library]] — the `huggingface_hub` Python library: `HfApi`, download/upload helpers, `HfFileSystem`
- [[entities/transformers-library]] — the `transformers` library: tasks, Auto classes, install

## Summaries

- [[summaries/hub-and-libraries-catalog]] — map of the full Hub feature surface (Collections, Jobs, Webhooks, Enterprise, DOI, GGUF…) and the integrated-library ecosystem — mapped, not paged

## Syntheses (decisions)

- [[syntheses/choosing-local-vs-inference]] — run locally vs Inference Providers vs Inference Endpoints vs Spaces: cost / control / scale

## Statistics

- **Total pages**: 16
- **Concepts**: 12 · **Entities**: 2 · **Summaries**: 1 · **Syntheses**: 1
- **Sources ingested**: 187 (raw/, immutable)
- **High confidence**: 15 · **Medium confidence**: 1 · **Low confidence**: 0

## Coverage notes

Strong: the Hub (repos, cards, search, download/upload, auth/tokens), the `hf` CLI and `huggingface_hub` library, Transformers and Datasets basics, Inference Providers vs Endpoints, and Spaces. No single user-facing product version exists → freshness = source fetch date (2026-06-23).

Mapped, not paged (see [[summaries/hub-and-libraries-catalog]]): the full Transformers/Datasets/Diffusers API references, the per-task model pages, the 30+ integrated libraries, and Hub surfaces like Jobs, Webhooks, Enterprise/SSO, and Data Studio. For the exhaustive API reference and post-date changes, use huggingface.co/docs.


<!-- ===== huggingface/wiki/concepts/authentication-and-tokens.md ===== -->

---
title: "Authentication and Tokens"
type: concept
tags: [authentication, tokens, hf-token, security, gated]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-hub-security-tokens-md.md, raw/github_doc-docs-hub-security-md.md, raw/github_doc-docs-hub-models-gated-md.md, raw/github_doc-docs-source-en-guides-cli-md.md]
---

# Authentication and Tokens

**User Access Tokens** are the preferred way to authenticate to Hugging Face services. Manage them at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) — **New token**, pick a role and name.

## Token roles (scopes)

- **`fine-grained`**: scoped access to specific resources (e.g. a model or org). Recommended for production — reduced impact if leaked.
- **`read`**: read access to repos you can read (public + your/your-org private). For downloading private models or inference.
- **`write`**: adds write access to repos you can write to (create/push content, edit a model card).

Best practice: one token per app, prefer **fine-grained** for production. Some Team/Enterprise orgs require fine-grained tokens; read/write tokens are then rejected with `403` against that org's resources.

## Logging in

CLI uses a browser device flow (prints a URL + code), or paste/pass a token:

```bash
hf auth login
hf auth login --force                                   # re-login / switch tokens
hf auth login --token $HF_TOKEN --add-to-git-credential # scripted, non-interactive
hf auth whoami
hf auth logout
```

Tokens save to `~/.cache/huggingface/token` (and `stored_tokens`). In Python, `from huggingface_hub import login; login()`.

## The HF_TOKEN environment variable

Set `HF_TOKEN` to authenticate without interactive login. Note: `hf auth logout` will **not** log you out when logged in via `HF_TOKEN` — unset the env var. A token can also be passed to most loading methods (`AutoModel.from_pretrained("private/model", token="hf_...")`), used in place of a git password, or as a **bearer token** for Inference Providers.

## Gated repos

A **gated repo** requires **access requests** before users download files. Users agree to share username and email (browser-only, while logged in); approval is **automatic** (instant) or **manual** (author approves from settings or via API with a `write` token). To download in a script, authenticate first (`hf auth login` / `login()`) or pass `token=` to `from_pretrained` / `hf_hub_download` / `load_dataset`.

Authors customize the request form via model-card metadata: `extra_gated_fields`, `extra_gated_prompt`, `extra_gated_heading`, `extra_gated_eu_disallowed: true` (activates only when `gated: true`).

Other Hub security features: 2FA, Git over SSH, GPG-signed commits, malware/secrets scanning. See [[concepts/authentication-and-tokens]] usage in [[concepts/finding-and-downloading-models]].


<!-- ===== huggingface/wiki/concepts/datasets-basics.md ===== -->

---
title: "Datasets Basics: load_dataset and streaming"
type: concept
tags: [datasets, load_dataset, splits, streaming, preprocessing]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-source-load-hub-mdx.md, raw/github_doc-docs-source-quickstart-mdx.md, raw/github_doc-docs-source-stream-mdx.md, raw/github_doc-docs-source-use-dataset-mdx.md]
---

# Datasets Basics: load_dataset and streaming

The `datasets` library downloads and prepares datasets from the Hub with one function: `load_dataset`. Install: `pip install datasets` (optional modalities `datasets[audio]`, `datasets[vision]`).

## Loading, splits, configurations

Loading a single `split` returns a `Dataset`; omitting `split` returns a `DatasetDict`. Inspect without downloading via `load_dataset_builder`; list splits/configs with `get_dataset_split_names` / `get_dataset_config_names`.

```py
from datasets import load_dataset, load_dataset_builder, get_dataset_split_names, get_dataset_config_names

dataset = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="train")  # split="" -> DatasetDict
ds_builder = load_dataset_builder("cornell-movie-review-data/rotten_tomatoes")
ds_builder.info.description; ds_builder.info.features
get_dataset_split_names("cornell-movie-review-data/rotten_tomatoes")  # ['train', 'validation', 'test']

# some datasets have configurations (subsets) you must select
get_dataset_config_names("PolyAI/minds14")
dataset = load_dataset("PolyAI/minds14", "en-US", split="train")
dataset[0]["audio"]          # index a row, then a column
```

## Streaming

Set `streaming=True` to use a dataset without downloading it (streams as you iterate; useful for huge datasets, e.g. fineweb is 45TB). Returns an `IterableDataset` (no random access). Parquet datasets support column/filter pushdown.

```py
dataset = load_dataset('HuggingFaceFW/fineweb', split='train', streaming=True)
print(next(iter(dataset)))
dataset = load_dataset('HuggingFaceFW/fineweb', split='train', streaming=True, columns=["url", "date"])
dataset = load_dataset('HuggingFaceFW/fineweb', split='train', streaming=True, filters=[("language_score", ">=", 0.99)])
```

## Tokenizing (preprocess)

Use `map` with `batched=True` to tokenize the whole dataset, then `set_format` for your framework.

```py
from transformers import AutoTokenizer
from datasets import load_dataset

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
dataset = load_dataset("cornell-movie-review-data/rotten_tomatoes", split="train")

def tokenization(example):
    return tokenizer(example["text"])

dataset = dataset.map(tokenization, batched=True)
dataset.set_format(type="torch", columns=["input_ids", "token_type_ids", "attention_mask", "label"])
```

Related: [[concepts/transformers-basics]], [[concepts/finding-and-downloading-models]], [[concepts/the-hub-and-repositories]].


<!-- ===== huggingface/wiki/concepts/finding-and-downloading-models.md ===== -->

---
title: "Finding and Downloading Models"
type: concept
tags: [download, search, cache, hf-hub-download, snapshot-download]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-hub-models-downloading-md.md, raw/github_doc-docs-hub-search-md.md, raw/github_doc-docs-source-en-guides-download-md.md, raw/github_doc-docs-source-en-guides-search-md.md, raw/github_doc-docs-hub-local-cache-md.md, raw/github_doc-docs-source-en-guides-manage-cache-md.md]
---

# Finding and Downloading Models

## Searching the Hub

Full-text search (model cards, dataset cards, Spaces `app.py`) is at [huggingface.co/search](https://huggingface.co/search). Filter by models/datasets/spaces; filters are encoded in the URL (e.g. `?q=llama&type=space`).

Programmatically:

```python
from huggingface_hub import HfApi
api = HfApi()
models = api.list_models(filter=["image-classification", "pytorch", "imagenet"])
models = api.list_models(num_parameters="min:6B,max:128B")
```

CLI:

```bash
hf models ls --search "llama" --sort downloads --limit 5
hf datasets ls --author Qwen
hf spaces ls --search "3d"
hf models info Lightricks/LTX-2
```

## Downloading with the CLI

```bash
hf download gpt2 config.json                    # single file
hf download HuggingFaceH4/zephyr-7b-beta        # entire repo
hf download gpt2 config.json model.safetensors  # multiple files
hf download stabilityai/stable-diffusion-xl-base-1.0 --include "*.safetensors" --exclude "*.fp16.*"
hf download bigcode/the-stack --repo-type dataset --revision v1.1
hf download adept/fuyu-8b --local-dir fuyu      # to a folder instead of cache
hf download gpt2 config.json --token=hf_****    # private/gated
```

Last printed line is always the local path. Use `--dry-run` to preview, and an `hf://` URI shorthand: `hf download hf://datasets/bigcode/the-stack@v1.1`.

## Downloading in Python

```python
from huggingface_hub import hf_hub_download, snapshot_download

# single file -> returns local cached path
hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
hf_hub_download(repo_id="google/fleurs", filename="fleurs.py", repo_type="dataset")

# entire repo (concurrent), with filtering
snapshot_download(repo_id="gpt2", allow_patterns=["*.md", "*.json"], ignore_patterns="vocab.json")
```

`revision` accepts a branch, tag, `refs/pr/N`, or full commit hash. `snapshot_download` uses `hf_hub_download` internally, so files are cached. Integrated libraries (Transformers, etc.) download automatically via `from_pretrained`.

## The local cache

Default directory `~/.cache/huggingface/hub`. Override with `HF_HUB_CACHE` (direct path, takes priority) or `HF_HOME` (cache lives at `$HF_HOME/hub`). Each repo gets a flat folder `{type}s--{repo_id_with_/_as_--}` (e.g. `models--julien-c--EsperBERTo-small`). Inside:

- `blobs/` — file contents, content-addressed (SHA-1 for git, SHA-256 for LFS); identical files stored once.
- `snapshots/<commit>/` — per-revision views; files are **symlinks** into `../../blobs/{hash}`.
- `refs/` — maps branch/tag names to commit hashes.
- `.no_exist/` — empty markers for files known not to exist.

Do not modify cached files (corrupts the cache). On Windows without symlink support the cache runs degraded (file copies in `snapshots/`, no `blobs/`); enable Developer Mode or run as admin for symlinks.

Faster transfers use **`hf_xet`** (default with `huggingface_hub` ≥ 0.32.0); set `HF_XET_HIGH_PERFORMANCE=1` on high-RAM machines. `hf_transfer` is deprecated.

See [[concepts/authentication-and-tokens]] for gated/private access and [[concepts/uploading-and-sharing]] to push files.


<!-- ===== huggingface/wiki/concepts/huggingface-hub-cli.md ===== -->

---
title: "The hf CLI (huggingface_hub)"
type: concept
tags: [cli, hf, download, upload, cache, authentication]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-source-en-guides-cli-md.md, raw/github_doc-docs-source-en-guides-cli-extensions-md.md]
---

# The hf CLI (huggingface_hub)

The `huggingface_hub` package ships a built-in CLI `hf` (formerly `huggingface-cli`) for logging in, downloading/uploading files, and managing repos and the local cache. Install: `pip install -U "huggingface_hub"` (or run via `uvx hf ...` / `brew install hf`). Verify with `hf --help`; upgrade with `hf update`.

## Authentication

```bash
hf auth login                                            # browser flow (URL + short code)
hf auth login --token $HF_TOKEN --add-to-git-credential
hf auth whoami
hf auth logout
```

See [[concepts/authentication-and-tokens]].

## Download

`hf download` wraps `hf_hub_download` / `snapshot_download` and prints the local path on the last line. Use `--dry-run` to preview, `--token` for private/gated.

```bash
hf download gpt2 config.json                             # single file
hf download HuggingFaceH4/zephyr-7b-beta                 # entire repo
hf download stabilityai/stable-diffusion-xl-base-1.0 --include "*.safetensors" --exclude "*.fp16.*"
hf download bigcode/the-stack --repo-type dataset --revision v1.1
hf download adept/fuyu-8b --local-dir fuyu               # to a folder instead of cache
```

See [[concepts/finding-and-downloading-models]].

## Upload

`hf upload [repo_id] [local_path] [path_in_repo]` wraps `upload_file` / `upload_folder`.

```bash
hf upload my-cool-model . .
hf upload Wauplin/my-cool-model ./models/model.safetensors
hf upload Wauplin/my-cool-dataset ./data /train --repo-type=dataset
hf upload bigcode/the-stack . . --repo-type dataset --revision refs/pr/104   # PR
```

For very large folders use `hf upload-large-folder` (resumable). See [[concepts/uploading-and-sharing]].

## Repos and cache

```bash
hf repos create Wauplin/my-cool-model
hf repos delete Wauplin/my-cool-model
hf cache ls          # inspect local cache
hf cache rm model/gpt2
hf cache prune       # delete unreferenced revisions
```

`hf env` prints machine/setup info for bug reports.

## Extensions

Community commands installed from GitHub repos named `hf-<name>` appear as top-level `hf` commands: `hf extensions install <owner>/hf-<name>`, `hf extensions list`, `hf extensions remove <name>`.

Related: [[entities/huggingface-hub-library]], [[concepts/the-hub-and-repositories]].


<!-- ===== huggingface/wiki/concepts/inference-providers-and-endpoints.md ===== -->

---
title: "Inference Providers and Inference Endpoints"
type: concept
tags: [inference, inference-providers, inference-endpoints, inferenceclient, deployment]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-source-en-guides-inference-md.md, raw/github_doc-docs-source-en-guides-inference-endpoints-md.md, raw/github_doc-docs-hub-models-inference-md.md]
---

# Inference Providers and Inference Endpoints

Two ways to run inference on hosted models, plus a path to local servers, all via a unified `InferenceClient` (`huggingface_hub`). **Inference Providers** is serverless, pay-as-you-go access to many models through third-party partners; **Inference Endpoints** is a dedicated, managed deployment you provision and control.

## Inference Providers (serverless)

"Streamlined, unified access to hundreds of machine learning models, powered by our serverless inference partners." Model pages expose pay-as-you-go inference (free tier) via widgets, the [Inference Playground](https://huggingface.co/playground), and search filtering. The provider formerly "Inference API (serverless)" is now **HF Inference**, one provider among many.

```python
from huggingface_hub import InferenceClient

client = InferenceClient(provider="replicate", api_key="my_replicate_api_key")
image = client.text_to_image("A flying car crossing a futuristic cityscape.", model="black-forest-labs/FLUX.1-schnell")
```

Key rules:

- **You must specify a model**, by its id *on the Hugging Face Hub*, not the provider's id.
- Default `provider="auto"` selects the first available provider, sorted by your order at https://hf.co/settings/inference-providers.
- Supported routed providers (verbatim): Black Forest Labs, Cerebras, Clarifai, Cohere, DeepInfra, fal-ai, Featherless AI, Fireworks AI, Groq, HF Inference, Hyperbolic, Nebius AI Studio, Novita AI, Nscale, NVIDIA, OVHcloud AI Endpoints, Public AI, Replicate, Sambanova, Scaleway, Together, Wavespeed, Zai. Not every provider supports every task.

The `chat_completion` task follows the OpenAI spec; output `ChatCompletionOutput` is OpenAI-compatible, and `InferenceClient.chat.completions.create` aliases `InferenceClient.chat_completion`. Also: `AsyncInferenceClient`, experimental `MCPClient`, and a higher-level `Agent` / `tiny-agents`.

### Authentication and billing

- **Routed through Hugging Face** — pass your HF token as `api_key="hf_****"` (or a stored login); bills to your HF account.
- **Direct access to provider** — pass the provider's own key, e.g. `api_key="r8_****"` for Replicate.

Monthly inference credits by account type (Free, PRO, Enterprise Hub). Bill an org with `bill_to="<your_org_name>"` (requires Enterprise Hub).

## Inference Endpoints (dedicated)

"A product to easily deploy models to production" — dedicated, fully managed, autoscaling infrastructure on a cloud of your choice. Supports `transformers`, `sentence-transformers`, `diffusers` models from a Hub repo. Manage with `huggingface_hub` (minimum `v0.19.0`):

```python
from huggingface_hub import create_inference_endpoint

endpoint = create_inference_endpoint(
    "my-endpoint-name", repository="gpt2", framework="pytorch", task="text-generation",
    accelerator="cpu", vendor="aws", region="us-east-1", type="authenticated",
    instance_size="x2", instance_type="intel-icl",
)
```

CLI equivalent: `hf endpoints deploy ...`, with a curated catalog (`hf endpoints catalog deploy --repo openai/gpt-oss-120b`). Returns an `InferenceEndpoint` dataclass (`name`, `repository`, `status`, `url`); status moves through `pending`/`initializing` to `running`, and `endpoint.wait()` blocks until deployed. Pass a `custom_image` (e.g. TGI) for custom serving. **Lifecycle controls** cut cost: `pause()` (needs explicit `resume()`), `scale_to_zero()` (auto-restarts with a cold start), `update()` (change model, replicas, hardware), `delete()` (non-revertible).

## Connecting the client to an Endpoint or local server

Reuse the *same* `InferenceClient` — only change `model` to the target URL (an `InferenceEndpoint` also exposes `.client` / `.async_client`). It also targets **local endpoints** (llama.cpp, Ollama, vLLM, LiteLLM, TGI, mlx) when the API is OpenAI-compatible. You cannot specify both a URL and a provider — mutually exclusive.

```python
client = InferenceClient(model="https://uu149rez6gw9ehej.eu-west-1.aws.endpoints.huggingface.cloud/deepfloyd-if")
client = InferenceClient(model="http://localhost:8080")  # local OpenAI-compatible server
```

## See also

- [[concepts/authentication-and-tokens]], [[concepts/running-llms-with-transformers]], [[syntheses/choosing-local-vs-inference]], [[summaries/hub-and-libraries-catalog]]


<!-- ===== huggingface/wiki/concepts/model-cards.md ===== -->

---
title: "Model Cards"
type: concept
tags: [model-cards, readme, metadata, yaml, eval-results]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-hub-model-cards-md.md, raw/github_doc-docs-hub-model-cards-components-md.md, raw/github_doc-docs-source-en-guides-model-cards-md.md, raw/github_doc-docs-hub-eval-results-md.md]
---

# Model Cards

A **Model Card** is the `README.md` in a model repo: Markdown with a YAML metadata block at the top, essential for discoverability, reproducibility, and sharing. A good card describes the model, intended uses and limitations (biases, ethics), training params, datasets, and eval results. Add or edit it by uploading `README.md`, clicking **Edit model card** (opens a metadata UI), or via the `huggingface_hub` library.

## The YAML metadata block

Open/close with three dashes (`---`) at the top of `README.md`. Common keys: `language:`, `tags:`, `license:` ("any valid license identifier"), `datasets:` (e.g. `stanfordnlp/imdb`), `base_model:`. Metadata powers filtering at [huggingface.co/models](https://huggingface.co/models) and links datasets/base models on the model page.

## Key metadata fields

- **`library_name`** — declare the supported library (recommended for non-`transformers` models, e.g. `library_name: flair`). Since August 2024, `config.json` alone no longer implies `transformers`; set `library_name: transformers` explicitly.
- **`pipeline_tag`** — the task (e.g. `text-to-image`); drives the widget and inference APIs. Auto-inferred for `transformers` from `config.json`.
- **`base_model`** — single ID or a list (merges). The Hub infers the relation (`adapter`, `merge`, `quantized`, `finetune`); override with `base_model_relation:`.
- **`license`** — common licenses via the UI dropdown; for custom, set `license: other` plus `license_name:` and `license_link:`.
- **`datasets`**, **`new_version`**, **`tags`** (custom tags allowed, e.g. `not-for-all-audiences`).
- A link to a Paper page or arXiv adds an `arxiv:<ID>` tag automatically.

## Evaluation results

Two formats. **Legacy** `model-index` (Papers-with-code-based), embedded in card metadata:

```yaml
model-index:
  - name: Yi-34B
    results:
      - task: {type: text-generation}
        dataset: {name: ai2_arc, type: ai2_arc}
        metrics:
          - name: AI2 Reasoning Challenge (25-Shot)
            type: AI2 Reasoning Challenge (25-Shot)
            value: 64.59
        source:
          name: Open LLM Leaderboard
          url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
```

**New** format: YAML files in the repo's `.eval_results/` folder, aggregated into benchmark-dataset leaderboards. Minimal required form:

```yaml
- dataset:
    id: Idavidrein/gpqa     # Hub dataset ID (must be a Benchmark)
    task_id: gpqa_diamond
  value: 0.412
```

Optional fields include `verifyToken`, `date`, `source`, `notes`. Badges: `verified` (valid `verifyToken`, ran in HF Jobs with inspect-ai), `community` (submitted via open PR), `leaderboard`, `source`.

## Card components

Inject special components into the Markdown, e.g. `<Gallery />` to showcase generated images/videos (driven by `widget:` metadata with `text`/`output.url` pairs). Cards also support KaTeX LaTeX (`$$ ... $$` for display mode) and theme-specific images via `#hf-light-mode-only` / `#hf-dark-mode-only`.

See [[concepts/uploading-and-sharing]] and [[concepts/the-hub-and-repositories]].


<!-- ===== huggingface/wiki/concepts/running-llms-with-transformers.md ===== -->

---
title: "Running LLMs with Transformers"
type: concept
tags: [llm, generate, chat-template, text-generation, quantization]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-source-en-llm-tutorial-md.md, raw/github_doc-docs-source-en-conversations-md.md]
---

# Running LLMs with Transformers

The `generate()` API handles text generation for all generative models.

## generate()

Load a causal LM, tokenize (LLMs need `padding_side="left"`), call `generate()`, then decode.

```py
from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1", padding_side="left")

model_inputs = tokenizer(["A list of colors: red, blue"], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
# "A list of colors: red, blue, green, yellow, orange, purple, pink,"
```

`generate()` returns up to 20 tokens by default — set `max_new_tokens` for length. Decoder-only models return the prompt plus generated tokens. Common options (e.g. `model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.8)`):

| Option | Notes |
|---|---|
| `max_new_tokens` | Max generation length; always set it. |
| `do_sample` | `True` samples, `False` (default) is greedy. |
| `temperature` | Higher = more random; requires `do_sample=True`. |
| `num_beams` | `>1` enables beam search. |
| `repetition_penalty` | `>1.0` reduces repetition. |

All settings live in `GenerationConfig`; defaults come from the model's `generation_config.json`.

## Chat models and templates

Chat ("instruct") models take a list of `{"role", "content"}` messages. Format with the tokenizer's chat template (`tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")`) before `generate()` — a raw string gives suboptimal output.

### TextGenerationPipeline (chat mode)

The pipeline enters chat mode automatically for conversational models, taking the same message list directly.

```py
from transformers import pipeline

pipeline = pipeline(task="text-generation", model="HuggingFaceTB/SmolLM2-1.7B-Instruct", dtype="auto", device_map="auto")
chat = [
    {"role": "system", "content": "You are a helpful science assistant."},
    {"role": "user", "content": "Hey, can you explain gravity to me?"},
]
response = pipeline(chat, max_new_tokens=512)
print(response[0]["generated_text"][-1]["content"])
```

Continue by appending the model's reply (`response[0]["generated_text"]` holds the full history) plus a new `user` message.

## Optimization

Models load in `float32` by default (~32GB for an 8B model). Reduce memory with:

- `dtype="auto"` — usually `bfloat16` where supported.
- `device_map="auto"` — Big Model Inference dispatch across devices.
- Quantization via bitsandbytes (`pip install -U transformers bitsandbytes`):

```py
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", device_map="auto", quantization_config=quantization_config)
```

Foundations: [[concepts/transformers-basics]]. Library overview: [[entities/transformers-library]]. Hosted alternative: [[concepts/inference-providers-and-endpoints]], [[syntheses/choosing-local-vs-inference]].


<!-- ===== huggingface/wiki/concepts/spaces.md ===== -->

---
title: "Spaces"
type: concept
tags: [spaces, gradio, docker, gpu, deployment, demos]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-hub-spaces-md.md, raw/github_doc-docs-hub-spaces-overview-md.md, raw/github_doc-docs-hub-spaces-config-reference-md.md, raw/github_doc-docs-hub-spaces-sdks-gradio-md.md, raw/github_doc-docs-hub-spaces-sdks-docker-md.md, raw/github_doc-docs-hub-spaces-gpus-md.md]
---

# Spaces

[Hugging Face Spaces](https://huggingface.co/spaces) "offer a simple way to host ML demo apps directly on your profile or your organization's profile." Each Space is a **git repository** (using `git` and `git-xet`) — every pushed commit triggers an automatic rebuild and restart.

## SDKs

When creating a Space you choose an **SDK**: **Gradio**, **Docker**, or **static HTML** (Streamlit and custom Python also supported). The choice sets the `sdk` property in the README YAML block.

- **Gradio** — sets `sdk: gradio`, uses the latest Gradio version (override with `sdk_version`). Dependencies in `requirements.txt`; app in `app.py` (build a `gr.Interface` and call `.launch()`).
- **Docker** — custom containers (FastAPI, Go, etc.). Set `sdk: docker` and write a `Dockerfile`. Default exposed port is `7860`, changeable via `app_port`. The container runs as user ID 1000, so create a user and set `WORKDIR` before `COPY`. Disk is lost on restart — attach a Storage Bucket to persist.
- **Static HTML** — simple HTML/CSS/JavaScript pages.

## The README.md YAML config block

Configured through the `YAML` block at the top of the root **README.md**. Selected keys (verbatim):

```yaml
---
title: Basic Docker SDK Space
emoji: 🐳
colorFrom: purple
colorTo: gray
sdk: docker            # gradio | docker | static
sdk_version: ...       # Gradio version
python_version: "3.10" # defaults to 3.10
app_file: app.py       # main app file (gradio python or static html)
app_port: 7860         # docker only; default 7860
app_build_command: npm run build  # static only
pinned: false
models:
  - reach-vb/musicgen-large-fp16-endpoint
datasets:
  - mozilla-foundation/common_voice_13_0
---
```

Other notable keys: `suggested_hardware`, `suggested_storage`, `base_path`, `fullWidth`, `header` (`mini`/`default`), `short_description`, `thumbnail`, `tags`, `hf_oauth` (+ `hf_oauth_scopes`, `hf_oauth_expiration_minutes`, `hf_oauth_authorized_org`), `disable_embedding`, `startup_duration_timeout`, `custom_headers`, and `preload_from_hub`. The `models`/`datasets` lists are parsed automatically from your code if not specified.

## Hardware tiers

By default each Space gets 16GB RAM, 2 CPU cores, and 50GB of (non-persistent) disk free (CPU Basic); upgrade from the Settings tab. Hardware flavors (`suggested_hardware` values):

- CPU: `"cpu-basic"` (FREE), `"cpu-upgrade"` ($0.03/hr)
- GPU: `"t4-small"` ($0.40), `"t4-medium"` ($0.60), `"l4x1"` ($0.80), `"l4x4"` ($3.80), `"l40sx1"` ($1.80), `"l40sx4"` ($8.30), `"l40sx8"` ($23.50), `"a10g-small"` ($1.00), `"a10g-large"` ($1.50), `"a10g-largex2"` ($3.00), `"a10g-largex4"` ($5.00), `"a100-large"` ($2.50), `"a100x4"` ($10.00), `"a100x8"` ($20.00)

(H100 flavors were removed December 2025.) Billing is per minute, only while `Starting` or `Running` — no cost during build. For dynamic GPU on demand, see **ZeroGPU**.

## Visibility, secrets, and lifecycle

Three visibility levels: **public** (source + app open, clonable), **protected** (source private, app via embed URL — PRO/Team & Enterprise), **private** (owner-only; `404` to others).

Never hard-code secrets. In Settings add **Variables** (non-sensitive, public, copied to duplicated Spaces) or **Secrets** (private, not readable after set). Most SDKs expose both as env vars (`os.getenv('MODEL_REPO_ID')`); static Spaces read from `window.huggingface.variables`. Built-in env vars include `SPACE_ID`, `SPACE_HOST`, `SPACE_AUTHOR_NAME`, `ACCELERATOR`.

On free hardware a Space sleeps after inactivity (currently 48 hours); upgraded Spaces run indefinitely (custom sleep time allowed). Network requests are limited to ports 80, 443, and 8080. Scale paid Spaces with multiple **replicas** (each billed independently). **Duplicating** templates a new demo with its own hardware.

## See also

- [[concepts/the-hub-and-repositories]], [[concepts/inference-providers-and-endpoints]], [[syntheses/choosing-local-vs-inference]], [[summaries/hub-and-libraries-catalog]]


<!-- ===== huggingface/wiki/concepts/the-hub-and-repositories.md ===== -->

---
title: "The Hub and Repositories"
type: concept
tags: [repositories, git, xet, pull-requests, branches]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-hub-repositories-md.md, raw/github_doc-docs-hub-repositories-getting-started-md.md, raw/github_doc-docs-hub-repositories-next-steps-md.md, raw/github_doc-docs-hub-repositories-pull-requests-discussions-md.md, raw/github_doc-docs-source-en-guides-repository-md.md]
---

# The Hub and Repositories

Models, Spaces, and Datasets are hosted as **Git repositories**. Three repo types: **model** (default), **dataset**, **space**. Optimized for large ML files via **Xet** (chunk-level deduplication, faster transfers).

## Creating a repository

CLI (`hf` from `huggingface_hub`):

```bash
hf repos create lysandre/test-model
hf repos create lysandre/test-dataset --repo-type dataset
hf repos create my-cool-dataset --repo-type dataset --private
hf repos create my-gradio-space --repo-type space --space-sdk gradio
```

Python:

```python
from huggingface_hub import create_repo
create_repo("lysandre/test-model")                          # model by default
create_repo("lysandre/test-dataset", repo_type="dataset")
create_repo("lysandre/test-private", visibility="private")
```

Delete with `delete_repo(repo_id=..., repo_type=...)` / `hf repos delete`. `repo_id` is `namespace/repo_name`. Also creatable via the web UI at [huggingface.co/new](http://huggingface.co/new).

## Cloning, committing, pushing (git)

```bash
git clone https://huggingface.co/<username>/<model-name>
# datasets are namespaced under /datasets/
git clone https://huggingface.co/datasets/<username>/<dataset-name>
# over SSH (add your key at https://huggingface.co/settings/keys)
git clone git@hf.co:<username>/<model-name>
```

Track large files (>10MB) with `git-xet` (`git xet install`), then `git add` / `commit` / `push` like any git repo. `.gitattributes` is auto-populated with common large-file extensions; add more with `git xet track "*.your_extension"`. Every commit is tracked, diffs viewable in the UI. To authenticate pushes, run `hf auth login` — see [[concepts/authentication-and-tokens]].

## Branches and tags

```python
from huggingface_hub import create_branch, create_tag, list_repo_refs
create_branch("Matthijs/speecht5-tts-demo", repo_type="space", branch="handle-dog-speaker")
create_tag("bigcode/the-stack", repo_type="dataset", revision="v0.1-release", tag="v0.1.1")
list_repo_refs("bigcode/the-stack", repo_type="dataset")
```

## Pull requests and discussions

PRs and discussions work the same for all repo types. **No forks are involved**: contributors push to a special `ref` branch (e.g. `refs/pr/42`) directly on the source repo. Manage a PR locally:

```bash
git fetch origin refs/pr/42:pr/42
git checkout pr/42
# make changes, then:
git push origin pr/42:refs/pr/42
```

Comments support Markdown and LaTeX (KaTeX, delimiters `$$ ... $$` for display mode).

## Other operations

- Rename/move: `move_repo(from_id=..., to_id=...)` / `hf repos move`. You can move a repo to an organization but not to another user.
- Visibility: `update_repo_settings(repo_id=..., private=True)` / `hf repos settings <id> --private true`.
- Duplicate: `duplicate_repo(...)` (preserves full git history for Spaces; squashed for models/datasets).
- Copy files server-side: `copy_files(...)` / `hf cp`.

See also [[concepts/finding-and-downloading-models]] and [[concepts/uploading-and-sharing]].


<!-- ===== huggingface/wiki/concepts/transformers-basics.md ===== -->

---
title: "Transformers Basics: pipeline() and Auto classes"
type: concept
tags: [transformers, pipeline, autoclass, from_pretrained, inference]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-source-en-quicktour-md.md, raw/github_doc-docs-source-en-pipeline-tutorial-md.md]
---

# Transformers Basics: pipeline() and Auto classes

Transformers exposes three classes for instantiating a model (config, model, preprocessor) and two inference APIs: the high-level `Pipeline`, or the lower-level `AutoModel` + `AutoTokenizer` pair.

## pipeline()

`pipeline()` is the most convenient way to run inference. Pick a task; a default model is downloaded and cached, or pass `model=`. It supports many tasks across modalities — pass an appropriate input (text, an audio/image URL or path) and it handles the rest.

```py
from transformers import pipeline

pipeline = pipeline(task="text-generation", model="google/gemma-2-2b")
pipeline("the secret to baking a really good cake is ")
# [{'generated_text': 'the secret to baking a really good cake is 1. the right ingredients 2. the'}]

pipeline = pipeline(task="automatic-speech-recognition", model="openai/whisper-large-v3")
pipeline("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
# other tasks: e.g. task="image-classification", model="google/vit-base-patch16-224"
```

Task-specific pipelines like `TextGenerationPipeline` load via the `task` identifier. Pass a list for batched inputs. Default device is CPU (`device=-1`); select hardware with `device` (`device=0` first GPU, `device="mps"` Apple silicon) or let Accelerate place weights with `device_map="auto"`. `batch_size=N` enables batch inference (off by default).

## Auto classes + from_pretrained

`AutoModel*` / `AutoTokenizer` infer the architecture from the model name; `from_pretrained` loads weights and config from the Hub. `device_map="auto"` allocates weights to the fastest device first; `dtype="auto"` uses the stored dtype (PyTorch defaults to `torch.float32`).

```py
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

# tokenize -> generate -> decode
model_inputs = tokenizer(["The secret to baking a good cake is "], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_length=30)
tokenizer.batch_decode(generated_ids)[0]
```

For LLM text generation, chat templates, and optimization, see [[concepts/running-llms-with-transformers]]. For the library overview and `Trainer`, see [[entities/transformers-library]].


<!-- ===== huggingface/wiki/concepts/uploading-and-sharing.md ===== -->

---
title: "Uploading and Sharing"
type: concept
tags: [upload, push-to-hub, upload-folder, sharing, repositories]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-hub-models-uploading-md.md, raw/github_doc-docs-source-en-guides-upload-md.md, raw/github_doc-docs-source-en-model-sharing-md.md]
---

# Uploading and Sharing

Upload to a personal namespace or an organization. Always [[concepts/authentication-and-tokens|log in]] first (`hf auth login`).

## Upload with the CLI

```bash
# Usage:  hf upload [repo_id] [local_path] [path_in_repo]
hf upload Wauplin/my-cool-model ./models/model.safetensors model.safetensors
hf upload my-cool-model . .                      # upload current dir to repo root
hf upload Wauplin/my-cool-dataset ./data /train --repo-type=dataset
hf upload bigcode/the-stack . . --repo-type dataset --revision refs/pr/104  # to a PR
hf upload Wauplin/my-cool-model ./models . --commit-message="Epoch 34/50"
```

If the repo doesn't exist, it is created automatically. Use `--include`/`--exclude` to filter, `--delete` to remove remote files, `--create-pr` to open a PR, and `--every=10` to push every 10 minutes. For hundreds of GB / TB, use the resumable `hf upload-large-folder`.

## Upload in Python

```python
from huggingface_hub import HfApi
api = HfApi()
api.upload_file(path_or_fileobj="./README.md", path_in_repo="README.md", repo_id="username/test-dataset", repo_type="dataset")
api.upload_folder(folder_path="./local/space", repo_id="username/my-cool-space", repo_type="space")
```

`upload_folder` respects a root-level `.gitignore`, or use `allow_patterns` / `ignore_patterns` / `delete_patterns`. With `hf_xet` (default), uploads are streamed, deduplicated, and **resumable** — re-run to skip already-committed files. For very large folders use `upload_large_folder` (must set `repo_type` explicitly). Set `HF_XET_HIGH_PERFORMANCE=1` for max throughput.

## push_to_hub (Transformers and custom models)

Libraries with Hub integration expose `push_to_hub` / `from_pretrained`:

```python
from transformers import BertConfig, BertModel
model = BertModel(BertConfig())
model.push_to_hub("nielsr/my-awesome-bert-model")
model = BertModel.from_pretrained("nielsr/my-awesome-bert-model")
tokenizer.push_to_hub("my-awesome-model")
```

The `Trainer` pushes after training with `TrainingArguments(output_dir=..., push_to_hub=True)` then `trainer.push_to_hub()`, auto-adding hyperparameters and results to the model card.

For a custom PyTorch model, inherit `PyTorchModelHubMixin` to add `from_pretrained` / `push_to_hub` (and download metrics) to any `nn.Module`:

```python
from huggingface_hub import PyTorchModelHubMixin
import torch.nn as nn

class MyModel(nn.Module, PyTorchModelHubMixin, pipeline_tag="text-to-image", license="mit"):
    ...

model.save_pretrained("my-awesome-model")
model.push_to_hub("your-hf-username/my-awesome-model")
```

Instance attributes are serialized to `config.json` (used for download counts). Models need **not** be Transformers/Diffusers-compatible to get download metrics.

After uploading, add a [[concepts/model-cards|Model Card]] (a `README.md`). You can also upload via the web UI ([huggingface.co/new](http://huggingface.co/new) → Files → Add file) or plain git — see [[concepts/the-hub-and-repositories]].


<!-- ===== huggingface/wiki/concepts/what-is-hugging-face.md ===== -->

---
title: "What is Hugging Face?"
type: concept
tags: [huggingface, hub, models, datasets, spaces, ecosystem]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-hub-index-md.md, raw/github_doc-docs-hub-models-the-hub-md.md, raw/github_doc-docs-hub-datasets-overview-md.md, raw/github_doc-docs-hub-spaces-overview-md.md]
---

# What is Hugging Face?

The **Hugging Face Hub** is the reference AI platform for open ML, hosting over 2M models, 1.5M datasets, and 1.5M AI apps (Spaces), all open and publicly available, and doubling as a collaboration platform for private teams.

The Hub hosts **Git-based repositories** — version-controlled folders of any files — with commit history, diffs, branches, and over a dozen library integrations. All build on **Xet**, a storage backend that splits files into unique chunks to accelerate transfers. (For non-versioned, mutable object storage, see Storage Buckets.) See [[concepts/the-hub-and-repositories]].

## The three repository types

- **Models**: state-of-the-art checkpoints for LLM, text, vision, and audio. Each repo has a [[concepts/model-cards|Model Card]] documenting tasks, languages, and eval results.
- **Datasets**: data across NLP, Computer Vision, and Audio, each a Git repo, with Dataset Cards and Data Studio (in-browser viewer). See [[concepts/datasets-basics]].
- **Spaces**: interactive ML demo apps (Gradio, Docker, static HTML SDKs); a git repo that rebuilds on each push. See [[concepts/spaces]].

## The library ecosystem

- **`huggingface_hub`** — client library + `hf` CLI for download/upload/manage. See [[entities/huggingface-hub-library]], [[concepts/huggingface-hub-cli]].
- **`transformers`** — load/fine-tune models with `from_pretrained` / `push_to_hub`. See [[entities/transformers-library]], [[concepts/transformers-basics]].
- **`datasets`** — access and stream datasets programmatically.
- **`diffusers`** — diffusion models for image/audio generation.

For production, serve models via **Inference Providers** (serverless) or **Inference Endpoints**: [[concepts/inference-providers-and-endpoints]]. See also [[concepts/authentication-and-tokens]], [[concepts/finding-and-downloading-models]], [[concepts/uploading-and-sharing]], [[summaries/hub-and-libraries-catalog]].


<!-- ===== huggingface/wiki/entities/huggingface-hub-library.md ===== -->

---
title: "huggingface_hub (Python library)"
type: entity
tags: [huggingface_hub, hfapi, hffilesystem, download, upload]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-source-en-index-md.md, raw/github_doc-docs-source-en-quick-start-md.md, raw/github_doc-docs-source-en-installation-md.md, raw/github_doc-docs-source-en-guides-overview-md.md, raw/github_doc-docs-source-en-guides-hf-file-system-md.md]
---

# huggingface_hub (Python library)

`huggingface_hub` is the official Python client for the [Hugging Face Hub](https://hf.co). It creates/manages repositories, downloads/uploads files, fetches model/dataset metadata, and runs inference — all from Python. It also ships the [[concepts/huggingface-hub-cli]] (`hf`). Install: `pip install --upgrade huggingface_hub`; optional extras `pip install 'huggingface_hub[mcp,torch]'`.

## Core API

`HfApi` is the main programmatic interface; `hf_hub_download` fetches and caches a single file (next access loads from cache). Creating repos or uploading requires a `write` token (see [[concepts/authentication-and-tokens]]).

```py
from huggingface_hub import HfApi, hf_hub_download, login
login()  # or `hf auth login`; HF_TOKEN env var / Space secret takes priority over the stored token

hf_hub_download(repo_id="google/pegasus-xsum", filename="config.json", revision="4d33b01d79672f27f001f6abade33f22d993b151")

api = HfApi()
api.create_repo(repo_id="super-cool-model", private=True)
api.upload_file(path_or_fileobj="./README.md", path_in_repo="README.md", repo_id="lysandre/test-model")
```

## HfFileSystem

`HfFileSystem` is a pythonic, fsspec-compatible interface to the Hub (`ls`, `glob`, `cp`, `open`, ...). It builds on `HfApi` but adds overhead — prefer `HfApi` methods where possible. Because it integrates fsspec, libraries like pandas can read/write `hf://` URLs directly:

```py
from huggingface_hub import hffs
hffs.glob("datasets/my-username/my-dataset-repo/**/*.csv")

import pandas as pd
df = pd.read_csv("hf://datasets/my-username/my-dataset-repo/train.csv")
```

URL scheme: `hf://[<repo_type_prefix>]<repo_id>[@<revision>]/<path/in/repo>` (`datasets/` for datasets, `spaces/` for Spaces, none for models).

Related: [[concepts/huggingface-hub-cli]], [[concepts/finding-and-downloading-models]], [[concepts/uploading-and-sharing]], [[concepts/the-hub-and-repositories]].


<!-- ===== huggingface/wiki/entities/transformers-library.md ===== -->

---
title: "transformers (Python library)"
type: entity
tags: [transformers, pipeline, trainer, autoclass, models]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-source-en-index-md-2.md, raw/github_doc-docs-source-en-installation-md-2.md, raw/github_doc-docs-source-en-quicktour-md.md]
---

# transformers (Python library)

`transformers` is the model-definition framework for state-of-the-art ML across text, vision, audio, video, and multimodal — inference and training. It centralizes model definitions so a supported model works across the ecosystem: training frameworks (Axolotl, Unsloth, DeepSpeed, FSDP, ...), inference engines (vLLM, SGLang, TGI, ...), and adjacent libraries (llama.cpp, mlx, ...). 1M+ checkpoints on the Hub.

## Install

Tested on Python 3.10+ and PyTorch 2.4+. `pip install transformers` (or `uv pip install transformers`). Verify:

```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('hugging face is the best'))"
# [{'label': 'POSITIVE', 'score': 0.9998704791069031}]
```

## Three base classes

Every pretrained model is built from three classes:

- `PreTrainedConfig` — model attributes (attention heads, vocab size, ...).
- `PreTrainedModel` — the architecture (e.g. `LlamaModel` vs `LlamaForCausalLM`).
- Preprocessor — converts raw inputs to tensors (e.g. `PreTrainedTokenizer`, `ImageProcessingMixin`).

The **AutoClass** API (`AutoModelFor*`, `AutoTokenizer`) infers the architecture from the model name and loads it with `from_pretrained`. See [[concepts/transformers-basics]].

## Two APIs: Pipeline and Trainer

**Pipeline** — simple, optimized inference for many tasks (text generation, image segmentation, ASR, document QA, ...). **Trainer** — a complete PyTorch training/eval loop (mixed precision, torch.compile, FlashAttention, distributed). Provide a model, dataset, preprocessor, and data collator; configure with `TrainingArguments`.

```py
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="distilbert-rotten-tomatoes",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=2,
    push_to_hub=True,
)
trainer = Trainer(model=model, args=training_args, train_dataset=dataset["train"], ...)
trainer.train()
trainer.push_to_hub()
```

For LLM/VLM text generation (`generate`, streaming, decoding strategies) see [[concepts/running-llms-with-transformers]].

Related: [[concepts/transformers-basics]], [[entities/huggingface-hub-library]], [[concepts/datasets-basics]], [[syntheses/choosing-local-vs-inference]].


<!-- ===== huggingface/wiki/log.md ===== -->

---
title: "Activity Log"
type: log
---

# Activity Log

Append-only record of all wiki changes.

## Format

Each entry follows this format:
```
### YYYY-MM-DD HH:MM — [Action Type]
- **Source/Trigger**: what initiated the action
- **Pages created**: list of new pages
- **Pages updated**: list of updated pages
- **Notes**: any contradictions flagged, decisions made
```

---

### 2026-04-08 00:00 — Setup

- **Source/Trigger**: Repository initialized
- **Pages created**: index.md, log.md, dashboard.md, analytics.md, flashcards.md
- **Pages updated**: none
- **Notes**: Empty knowledge base ready for first source ingestion

---

### 2026-06-23 — Initial curation (factory build)

- **Source/Trigger**: `new_wiki.py init huggingface` — 187 sources gathered (huggingface/hub-docs, huggingface_hub library guides, orientation slices of transformers + datasets). A targeted second fetcher was added to recover Hub pages dropped by the alphabetical cap (Spaces, security-tokens, repositories).
- **Pages created**: 16 — 12 concepts (what-is-hugging-face, the-hub-and-repositories, authentication-and-tokens, finding-and-downloading-models, uploading-and-sharing, model-cards, huggingface-hub-cli, transformers-basics, running-llms-with-transformers, datasets-basics, inference-providers-and-endpoints, spaces), 2 entities (huggingface-hub-library, transformers-library), 1 summary (hub-and-libraries-catalog), 1 synthesis (choosing-local-vs-inference)
- **Pages updated**: index.md (master catalog + stats), log.md
- **Notes**: Curated to the medium rung per RECIPE; the duplicated transformers/huggingface_hub/datasets `index`/`installation`/`quicktour` files were disambiguated by `source_url`. The full library API references, per-task model pages, and Hub surfaces (Jobs/Webhooks/Enterprise/Data Studio) are mapped in the catalog, not paged.


<!-- ===== huggingface/wiki/summaries/hub-and-libraries-catalog.md ===== -->

---
title: "Hub Feature & Library Ecosystem Catalog"
type: summary
tags: [hub, catalog, libraries, ecosystem, integrations, enterprise]
updated: 2026-06-23
confidence: high
sources: [raw/github_doc-docs-hub-index-md.md, raw/github_doc-docs-hub-datasets-libraries-md.md, raw/github_doc-docs-hub-models-adding-libraries-md.md, raw/github_doc-docs-source-en-guides-integrations-md.md, raw/github_doc-docs-hub-jobs-md.md, raw/github_doc-docs-hub-collections-md.md, raw/github_doc-docs-hub-enterprise-md.md, raw/github_doc-docs-hub-gguf-md.md, raw/github_doc-docs-hub-billing-md.md, raw/github_doc-docs-hub-local-apps-md.md]
---

# Hub Feature & Library Ecosystem Catalog

A map of the Hub surface and library ecosystem — *what exists and where to look*. The Hub is "the reference AI platform for open ML," hosting over 2M models, 1.5M datasets, and 1.5M AI apps (Spaces), all on **Xet** storage for efficient large-file Git.

## The three core repository types

- **Models** — Model Cards, metadata, inference widgets, serverless inference. See [[concepts/the-hub-and-repositories]], [[concepts/finding-and-downloading-models]], [[concepts/model-cards]].
- **Datasets** — Dataset Cards, Data Studio (in-browser), streaming; 🤗 `datasets` for programmatic access. See [[concepts/datasets-basics]].
- **Spaces** — hosted ML demo apps. See [[concepts/spaces]].

Plus **Storage Buckets**: S3-like, mutable, content-addressable object storage (distinct from git repos), for checkpoints, logs, and large file collections.

## Hub feature surface (where to look)

| Area | Features (source page) |
| --- | --- |
| Repositories | Getting Started, Settings, Storage Limits, **Xet** backend, Local Cache, Pull Requests & Discussions, Notifications, Licenses |
| Models | The Model Hub, Model Cards, Eval Results, Gated Models, Up/Downloading, Libraries, Tasks, Widgets, **Inference Providers**, Download Stats |
| Datasets | Cards, Gated, Uploading, Ingesting, Streaming, Editing, Libraries, **Data Studio**, Data files Configuration |
| **Collections** | Group Models/Datasets/Spaces/Papers on a dedicated page; can also **gate** a group of models/datasets (Team & Enterprise) |
| **Jobs** | Run compute (fine-tune, GPU inference, data processing) via `hf` CLI, `huggingface_hub`, or HTTP API; UV & Docker-like interface; pay-per-second on CPU→A100s & TPUs |
| **Webhooks** | Automation triggers on repo events (listed under Repositories and Jobs) |
| **Enterprise / SSO** | Team, Enterprise, Enterprise Plus plans add SSO, Audit Logs, Storage Regions, Resource Groups, SCIM, Advanced Security, Tokens Management, Network Security |
| **DOI** | Digital Object Identifier minting for repos |
| **GGUF** | Built-in support for the GGUF binary format (tensors + metadata); filter at `library=gguf`, in-browser metadata/tensor viewer, `gguf-my-repo` quantization tool, `@huggingface/gguf` JS parser; many quant types (Q4_K, Q5_K, Q8_0, IQ-series, MXFP4, …) |
| **Local apps** | Run Hub models locally for privacy/speed/control/cost — llama.cpp, Ollama, Jan, LM Studio; enable in Local Apps settings, copy command from "Use this model" |
| Agents | CLI for AI Agents, MCP Server, Agent Skills, HF SDK, Local Agents, Agent Libraries |
| Other | Organizations, **Billing**, Security, Moderation, Paper Pages, Academia Hub, Search, Hub API Endpoints, Sign in with HF (OAuth) |

### Billing in brief

Subscriptions: PRO, Team 20$/user/mo, Enterprise from 50$/user/mo. Compute (Spaces, Inference Endpoints, Inference Providers, Jobs, ZeroGPU, private storage) is **usage-based pay-as-you-go**, billed via credit card (Stripe), invoiced monthly. Private storage above the included 1TB is **$18/TB/month**.

## Library ecosystem

The Hub "supports over a dozen libraries" — users download/upload directly from each.

**Core ML / training libraries**: 🤗 **Transformers** ([[entities/transformers-library]]), **datasets**, **diffusers**, **sentence-transformers**, **timm**, **spaCy**, **OpenCLIP**, **Asteroid**, **ESPnet**, **fastai**, **Keras/KerasNLP**, plus **TRL**, **Axolotl**, **LlamaFactory**, **Unsloth** for fine-tuning. The glue is [[entities/huggingface-hub-library]] (`huggingface_hub`): `hf_hub_download`, `snapshot_download`, `create_repo`, `upload_file`, `upload_folder`, and the `ModelHubMixin` base class (`PyTorchModelHubMixin`) for adding `from_pretrained` / `push_to_hub` to any model class.

**Data libraries** (download/stream/push, optimized Parquet): Argilla, Daft, Dask, Data Designer, Datasets, Distilabel, DuckDB, Embedding Atlas, Fenic, FiftyOne, Lance, Pandas, Polars, PyArrow, Spark, WebDataset. The Hub auto-converts the first 5GB of any dataset to **Parquet** for the Dataset Viewer.

To integrate a new library: add `huggingface_hub` as a dependency, implement download/upload (helpers or `ModelHubMixin`), generate model cards, then **register** it (PR to `huggingface.js` `model-libraries.ts`) for a label, doc links, and snippets on model pages.

## See also

- [[concepts/huggingface-hub-cli]], [[concepts/uploading-and-sharing]], [[concepts/inference-providers-and-endpoints]], [[syntheses/choosing-local-vs-inference]]


<!-- ===== huggingface/wiki/syntheses/choosing-local-vs-inference.md ===== -->

---
title: "Choosing: Local vs Inference Providers vs Endpoints vs Spaces"
type: synthesis
tags: [decision-guide, inference, deployment, local, spaces, cost]
updated: 2026-06-23
confidence: medium
sources: [raw/github_doc-docs-source-en-guides-inference-md.md, raw/github_doc-docs-source-en-guides-inference-endpoints-md.md, raw/github_doc-docs-hub-spaces-overview-md.md, raw/github_doc-docs-hub-models-downloading-md.md]
---

# Choosing: Local vs Inference Providers vs Endpoints vs Spaces

Four broad targets for *running* a Hub model. The `huggingface_hub` library gives a **unified interface** (`InferenceClient`) across most, so switching often means changing only a `model` or `provider` argument. See cited pages for detail.

## The four options at a glance

| Option | What it is | Best for | Cost model |
| --- | --- | --- | --- |
| **Local** | Download model + run in-process (transformers) or a local server (llama.cpp, Ollama, vLLM, TGI) | Privacy, full control, no per-call fee, offline | Your hardware |
| **Inference Providers** | Serverless API to many models via partner providers | Prototyping, trying many models fast, low/variable volume | Pay-as-you-go per request (free tier + credits) |
| **Inference Endpoints** | Dedicated, autoscaling managed deployment on a chosen cloud | Production, steady traffic, predictable latency, private API | Pay for provisioned hardware (pause/scale-to-zero to save) |
| **Spaces** | Hosted ML app/demo (Gradio/Docker/static) | Showcasing, portfolios, interactive demos, stakeholders | Free CPU tier; per-minute paid hardware |

## When to run locally

Local gives **Privacy**, **Speed**, **Control**, and **Cost** (no API fee). Download via CLI (`hf download HuggingFaceH4/zephyr-7b-beta`), `hf_hub_download`, `snapshot_download`, Git+git-xet, or lazy-mount with `hf-mount`. Files cache under `~/.cache/huggingface/hub`; load with `from_pretrained`. For local *serving*, point `InferenceClient(model="http://localhost:8080")` at any OpenAI-compatible server (llama.cpp, vLLM, LiteLLM, TGI, mlx); GGUF quantizations make this practical on modest hardware. Choose local when data sensitivity, offline, or avoiding per-call billing matter and you have the hardware. See [[concepts/finding-and-downloading-models]], [[concepts/running-llms-with-transformers]], [[summaries/hub-and-libraries-catalog]].

## When to use Inference Providers (serverless)

Ideal "for prototyping and testing things quickly" — compare models or serve low/spiky volume without managing infrastructure. Monthly credits (Free/PRO/Enterprise) plus pay-as-you-go. Tradeoffs: depend on a third-party provider's availability/pricing, must specify a Hub model id, not every provider supports every task, and the API exposes only a simple parameter set. See [[concepts/inference-providers-and-endpoints]].

## When to use Inference Endpoints (dedicated)

"Once you're ready to deploy your model to production, you'll need a dedicated infrastructure." For steady production traffic, predictable performance, a private API on a chosen cloud/region, custom images (e.g. TGI), and autoscaling. Cost lever: `pause()` or `scale_to_zero()` cost nothing while idle (scale-to-zero auto-restarts with a cold start). Migration is cheap — the *same* `InferenceClient` code works once you swap `model` to the endpoint URL.

## When to use Spaces

For **apps and demos**, not raw throughput: "build your ML portfolio, showcase your projects... and work collaboratively." Pick when the deliverable is an interactive UI (Gradio), a container (Docker), or a static site, with zero-ops hosting (free CPU tier, optional GPU). For on-demand GPU without paying idle, consider ZeroGPU. See [[concepts/spaces]].

## Decision heuristics

- **Sensitive data / offline / no per-call cost** → Local.
- **Exploring or comparing many models, low/variable volume** → Inference Providers.
- **Production, steady load, private dedicated API** → Inference Endpoints.
- **Interactive demo or portfolio app** → Spaces.

These compose: prototype on Providers, demo on Spaces, then graduate the hot path to an Endpoint — reusing the unified client throughout. Confidence is **medium**: the right choice is workload-specific (volume, latency, privacy, budget) and sources describe capabilities more than thresholds.

## See also

- [[concepts/inference-providers-and-endpoints]], [[concepts/spaces]], [[concepts/finding-and-downloading-models]], [[summaries/hub-and-libraries-catalog]]