Google Vertex AI — Gemini Provider — Hermes

wikis / Hermes / wiki / entities / provider-google-vertex.md view as markdown report a mistake

type: entityconfidence: highupdated: 2026-07-01hermes_version: v0.18.0sources: 2

Overview

The Google Vertex AI provider gives Hermes access to Gemini models on Google Cloud Vertex AI, routed over Vertex's OpenAI-compatible endpoint — not the native Vertex SDK/protocol. It shipped as a first-class provider in version v0.18.0 (salvaging and modernizing an earlier community PR, #8427). Provider id: vertex. Unlike provider google ai studio (static GOOGLE_API_KEY against generativelanguage.googleapis.com), Vertex has no static API key for the standard endpoint: Hermes authenticates with OAuth2, minting and auto-refreshing short-lived access tokens. Vertex is the right choice when you want Gemini usage to draw on enterprise-grade rate limits and GCP billing/credits rather than an AI Studio key.

Characteristics

Provider id: vertex
No static API key — every request needs a short-lived OAuth2 access token (≈1 hour TTL, cloud-platform scope). Hermes mints and auto-refreshes these tokens automatically; pasting a temporary token into a custom provider's api_key field does not work because it expires mid-session.
Credential resolution order: VERTEX_CREDENTIALS_PATH → GOOGLE_APPLICATION_CREDENTIALS → Application Default Credentials (ADC). Omit both env vars to fall back to ADC.
Prerequisites: a Google Cloud project with the Vertex AI API enabled and billing active; either a service-account JSON key file with the roles/aiplatform.user role, or ADC via gcloud auth application-default login (or the metadata server on a GCP VM).
google-auth dependency — installed automatically the first time you select Vertex (lazy install), or explicitly with pip install 'hermes-agent[vertex]'.
Token refresh mechanics: Hermes caches the minted token and refreshes it when it's within 5 minutes of expiry. If a session outlives the token and a request returns 401, Hermes re-mints the token and retries automatically. On a long-running gateway, if ADC's refresh token has itself expired, Hermes falls back to the service-account JSON when one is configured.
Endpoint: the token is handed to a standard OpenAI client pointed at https://aiplatform.googleapis.com/v1beta1/projects/{project}/locations/{region}/endpoints/openapi. Regional locations use a {region}-aiplatform.googleapis.com host instead.
Model IDs require the google/ vendor prefix. Available models: google/gemini-3.1-pro-preview, google/gemini-3-pro-preview, google/gemini-3-flash-preview, google/gemini-3.1-flash-lite-preview, google/gemini-2.5-pro, google/gemini-2.5-flash.
global region required for Gemini 3.x previews — the Gemini 3.x preview models are served through the global endpoint; regional endpoints (e.g. us-central1) may 404 them.
Config split by sensitivity: the credential path is a secret pointer and lives in ~/.hermes/.env; project ID and region are non-secret routing settings and live in ~/.hermes/config.yaml. VERTEX_PROJECT_ID and VERTEX_REGION env vars override vertex.project_id / vertex.region in config.yaml for per-shell overrides.
Reasoning/thinking support — Vertex exposes Gemini's thinking budget through the OpenAI-compatible surface; Hermes maps its reasoning_effort setting onto extra_body.google.thinking_config automatically, so reasoning_effort works the same as on other Gemini surfaces.
Diagnostics: hermes doctor reports whether Vertex credentials can be resolved (service-account path or ADC) and whether the provider is configured.
Released alongside a v0.18.0 provider cleanup that removed the google-gemini-cli and google-antigravity OAuth providers.

How to Use

# Option A — service account JSON (recommended for servers / gateways)
echo "VERTEX_CREDENTIALS_PATH=/path/to/service-account.json" >> ~/.hermes/.env

# Option B — Application Default Credentials (good for local dev)
gcloud auth application-default login

# Select Vertex as your provider
hermes model
# → Choose "More providers..." → "Google Vertex AI"
# → Enter your GCP project ID (or leave blank to use the one in your credentials)
# → Choose a region (default: global)
# → Select a Gemini model

# Start chatting
hermes chat

~/.hermes/.env — credential path (one of these; checked in this order, omit both to use ADC):

VERTEX_CREDENTIALS_PATH=/path/to/service-account.json
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

~/.hermes/config.yaml — non-secret routing settings:

model:
  default: google/gemini-3-flash-preview
  provider: vertex
vertex:
  project_id: my-gcp-project   # blank → use the project embedded in the credentials
  region: global               # "global" is required for the Gemini 3.x previews

Switch models mid-session (does not collect new credentials — configure Vertex with hermes model first):

/model google/gemini-3-pro-preview
/model google/gemini-3-flash-preview

Troubleshooting:

"Vertex AI credentials could not be resolved" — set VERTEX_CREDENTIALS_PATH in ~/.hermes/.env, or run gcloud auth application-default login. If the project isn't embedded in the credentials, set vertex.project_id in config.yaml.
google-auth not installed — pip install 'hermes-agent[vertex]' (also lazy-installed on first Vertex selection).
404 on Gemini 3.x models — you're probably on a regional endpoint; set region: global in the vertex: section of config.yaml (or unset VERTEX_REGION).
403 / permission denied — the service account (or ADC identity) needs the roles/aiplatform.user role on the project, and the Vertex AI API must be enabled for that project.

Related Entities

provider google ai studio — sibling Gemini provider; AI Studio uses a static GOOGLE_API_KEY against generativelanguage.googleapis.com, while Vertex uses OAuth2-minted, auto-refreshed access tokens against Vertex's OpenAI-compatible endpoint, drawing on GCP billing/credits instead of an AI Studio key
version v0.18.0 — release that shipped Google Vertex AI as a first-class provider
provider openrouter — sibling aggregator path to Gemini and other frontier models
model switching — the hermes model picker and /model mid-session switching used to configure and select Vertex
configuration reference — ~/.hermes/.env vs ~/.hermes/config.yaml split for secrets vs. routing settings