Skip to main content
OpenSRE is provider-agnostic: bring your own model. Selection is controlled by the LLM_PROVIDER environment variable, with per-provider API key and model overrides. Defaults are tracked in app/config.py and routing lives in app/services/llm_client.py.

Quick reference

ProviderLLM_PROVIDERAuthReasoning model defaultToolcall model default
AnthropicanthropicANTHROPIC_API_KEYclaude-sonnet-4-6claude-haiku-4-5-20251001
OpenAIopenaiOPENAI_API_KEYgpt-5.4-minigpt-5.4-mini
OpenRouteropenrouterOPENROUTER_API_KEYopenrouter/autoopenrouter/auto
DeepSeekdeepseekDEEPSEEK_API_KEYdeepseek-v4-prodeepseek-v4-flash
Google GeminigeminiGEMINI_API_KEYgemini-3.1-pro-previewgemini-3.1-flash-lite-preview
NVIDIA NIMnvidiaNVIDIA_API_KEYmeta/llama-3.1-405b-instructmeta/llama-3.1-8b-instruct
MiniMaxminimaxMINIMAX_API_KEYMiniMax-M3MiniMax-M2.7-highspeed
Amazon BedrockbedrockAWS IAM (AWS_REGION)us.anthropic.claude-sonnet-4-6us.anthropic.claude-haiku-4-5-20251001-v1:0
Ollama (local)ollamaNone (local daemon)llama3.2llama3.2
OpenAI Codex CLIcodexcodex login (CLI)Codex CLI defaultCodex CLI default
Claude Code CLIclaude-codeclaude login (CLI)Claude Code CLI defaultClaude Code CLI default
GitHub Copilot CLIcopilotcopilot login or gh auth login (CLI)Copilot CLI defaultCopilot CLI default
Google Antigravity CLIantigravity-cliagy (browser OAuth, OS keyring)Whatever the local agy config is set to (switch via /models inside agy)same as reasoning model
OpenSRE distinguishes two model slots per provider:
  • Reasoning model — full-capability model used for diagnosis, claim validation, and multi-step analysis.
  • Toolcall model — lightweight, lower-cost model used for tool selection and routing.

Selecting a provider

Set LLM_PROVIDER (default: anthropic) in your environment or .env file:
export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
Or run the onboarding wizard, which writes the same values to .env:
opensre onboard
In the interactive shell, /model shows curated quick-pick choices for common models. Providers with fast-changing or account-gated catalogs (OpenAI, OpenRouter, Gemini, NVIDIA, Bedrock, local CLIs, Ollama, and DeepSeek) also accept custom model IDs:
/model set openai gpt-5.5
/model set openai gpt-5.5 --toolcall-model gpt-5.4-mini
Override the default model for a slot via env vars:
export OPENAI_REASONING_MODEL=gpt-5.4-mini
export OPENAI_TOOLCALL_MODEL=gpt-5.4-mini
A shared LLM_MAX_TOKENS (default 4096) controls the response token budget for every provider.

API providers

Anthropic

export LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# Optional overrides:
export ANTHROPIC_REASONING_MODEL=claude-sonnet-4-6
export ANTHROPIC_TOOLCALL_MODEL=claude-haiku-4-5-20251001
The default. Uses the Anthropic Python SDK directly. Get an API key at console.anthropic.com.

OpenAI

export LLM_PROVIDER=openai
export OPENAI_API_KEY=sk-...
# Optional overrides:
export OPENAI_REASONING_MODEL=gpt-5.4-mini
export OPENAI_TOOLCALL_MODEL=gpt-5.4-mini
Uses the OpenAI SDK. Reasoning models (o1, o3, o4, gpt-5*) automatically use max_completion_tokens instead of max_tokens.

OpenRouter

export LLM_PROVIDER=openrouter
export OPENROUTER_API_KEY=sk-or-...
# Optional override (single value applies to both slots if set):
export OPENROUTER_MODEL=openrouter/auto
# Or per-slot:
export OPENROUTER_REASONING_MODEL=anthropic/claude-sonnet-4-6
export OPENROUTER_TOOLCALL_MODEL=openai/gpt-4o-mini
OpenAI-compatible proxy — pick any model on openrouter.ai/models. Base URL: https://openrouter.ai/api/v1.

DeepSeek

export LLM_PROVIDER=deepseek
export DEEPSEEK_API_KEY=sk-...
# Optional override (single value applies to all slots if set):
export DEEPSEEK_MODEL=deepseek-v4-pro
# Or per-slot:
export DEEPSEEK_REASONING_MODEL=deepseek-v4-pro
export DEEPSEEK_TOOLCALL_MODEL=deepseek-v4-flash
Uses DeepSeek’s official OpenAI-compatible API endpoint at https://api.deepseek.com.

Google Gemini

export LLM_PROVIDER=gemini
export GEMINI_API_KEY=...
# Optional override:
export GEMINI_MODEL=gemini-3.1-pro-preview
# Or per-slot:
export GEMINI_REASONING_MODEL=gemini-3.1-pro-preview
export GEMINI_TOOLCALL_MODEL=gemini-3.1-flash-lite-preview
Uses Google’s OpenAI-compatible endpoint at https://generativelanguage.googleapis.com/v1beta/openai/. Get an API key at aistudio.google.com.

NVIDIA NIM

export LLM_PROVIDER=nvidia
export NVIDIA_API_KEY=nvapi-...
# Optional override:
export NVIDIA_MODEL=meta/llama-3.1-405b-instruct
# Or per-slot:
export NVIDIA_REASONING_MODEL=meta/llama-3.1-405b-instruct
export NVIDIA_TOOLCALL_MODEL=meta/llama-3.1-8b-instruct
Uses NVIDIA’s OpenAI-compatible API at https://integrate.api.nvidia.com/v1. Browse available models on build.nvidia.com.

MiniMax

export LLM_PROVIDER=minimax
export MINIMAX_API_KEY=...
# Optional override (single value applies to both slots if set):
export MINIMAX_MODEL=MiniMax-M3
# Or per-slot:
export MINIMAX_REASONING_MODEL=MiniMax-M3
export MINIMAX_TOOLCALL_MODEL=MiniMax-M2.7-highspeed
OpenAI-compatible endpoint at https://api.minimax.io/v1. Temperature is fixed to 1.0 to match MiniMax recommendations.

Amazon Bedrock

export LLM_PROVIDER=bedrock
export AWS_REGION=us-east-1
# Optional overrides:
export BEDROCK_REASONING_MODEL=us.anthropic.claude-sonnet-4-6
export BEDROCK_TOOLCALL_MODEL=us.anthropic.claude-haiku-4-5-20251001-v1:0
No API key — auth uses the AWS credential chain (environment variables, shared credentials file, or IAM role). Your principal needs permission to invoke the model IDs you configure (for example Bedrock InvokeModel / Converse access scoped to those resources in IAM). Model routing:
  • Anthropic Claude on Bedrock (anthropic.claude-*, us.anthropic.claude-*, and foundation-model ARNs that contain anthropic.claude) use the existing AnthropicBedrock SDK path.
  • Other Bedrock foundation models (for example Mistral, Meta Llama, Amazon Titan IDs you enable in your account) use the Bedrock Converse API via boto3, so you can set BEDROCK_REASONING_MODEL to a non-Claude model ID when your use case requires it.
  • Application inference profile ARNs (…:application-inference-profile/…) do not encode the vendor in the ID; those are always sent through Converse, which works for any backing model in the profile.
Defaults in app/config.py are US cross-region inference profile IDs for Anthropic Claude; override with IDs or ARNs that are inference-access enabled in your account and region.

Ollama (local)

export LLM_PROVIDER=ollama
# Optional overrides:
export OLLAMA_HOST=http://localhost:11434
export OLLAMA_MODEL=llama3.2
Run any local model exposed by an Ollama daemon. No API key required — OpenSRE talks to Ollama’s OpenAI-compatible endpoint at ${OLLAMA_HOST}/v1.

CLI providers (subprocess)

CLI-backed providers shell out to a vendor CLI instead of an HTTP API. They authenticate via the vendor’s own login command; OpenSRE detects the binary on PATH (or via an explicit env var) and reuses the existing session. Investigation timeouts: Each ReAct turn runs one full CLI subprocess with the system prompt, tool schemas, and conversation history. The shared default subprocess budget is 300 seconds (Python adds a small buffer). Override per provider when needed, for example GEMINI_CLI_TIMEOUT_SECONDS, CLAUDE_CODE_TIMEOUT_SECONDS, or ANTIGRAVITY_CLI_TIMEOUT_SECONDS (clamped 30–600 where the adapter supports it).

OpenAI Codex

export LLM_PROVIDER=codex
# Authenticate the Codex CLI separately:
codex login
# Optional overrides (all blank-by-default):
export CODEX_MODEL=
export CODEX_BIN=
Requires the OpenAI Codex CLI. If CODEX_MODEL is unset, OpenSRE omits -m so codex exec uses the CLI’s currently configured model. If CODEX_BIN is unset, the binary is resolved via PATH and known install locations.

Claude Code

export LLM_PROVIDER=claude-code
# Authenticate the Claude Code CLI separately:
claude login
# Optional overrides (all blank-by-default):
export CLAUDE_CODE_MODEL=
export CLAUDE_CODE_BIN=
Requires the Claude Code CLI (npm i -g @anthropic-ai/claude-code). If CLAUDE_CODE_MODEL is unset, OpenSRE omits the --model flag and the CLI uses its configured default. If CLAUDE_CODE_BIN is unset, the binary is resolved via PATH and known install locations.

GitHub Copilot

export LLM_PROVIDER=copilot
# Authenticate the Copilot CLI separately. Either flow works — the adapter
# detects both. The interactive `/login` slash command inside `copilot` writes
# to the platform credential store; `gh auth login` is an equivalent path that
# Copilot CLI delegates to automatically.
copilot login          # OAuth device flow; preferred CLI-first onboarding
# or:
gh auth login          # logs you into the gh CLI; Copilot will use that token
# Optional overrides (all blank-by-default):
export COPILOT_MODEL=
export COPILOT_BIN=
# Optional auth bypass for automation (only used when no CLI login is detected):
# export COPILOT_GITHUB_TOKEN=
# export GH_TOKEN=
# export GITHUB_TOKEN=
Requires the GitHub Copilot CLI (npm i -g @github/copilot). Login uses the interactive /login slash command or copilot login. OpenSRE detects auth in this order: (1) COPILOT_GITHUB_TOKEN / GH_TOKEN / GITHUB_TOKEN env, (2) gh auth status when gh is on PATH (including ✓ Logged in to github.com account …, - Active account: true, or a supported - Token: prefix: gho_, github_pat_, ghu_ per Copilot docs — not ghp_), with gh auth status --hostname … when COPILOT_GH_HOST or GH_HOST targets a non-github.com host. It does not read plaintext $COPILOT_HOME/config.json (keychain-backed installs may omit it; mis-parsing arbitrary JSON risks false positives). If nothing matches, detection reports logged_in=None and the runner verifies at invoke time. If COPILOT_MODEL is unset, OpenSRE omits --model. Invocations run as copilot -p PROMPT --no-color --no-ask-user --silent so they never block on user input. BYOK / COPILOT_OFFLINE: GitHub auth may be unnecessary; a None probe can still be fine if Copilot is configured for offline or external providers only.

Google Antigravity CLI

export LLM_PROVIDER=antigravity-cli
# Authenticate the Antigravity CLI separately (browser OAuth on first run):
agy                       # interactive launch triggers Google Sign-In; token cached by OS keyring
# Stay current — 1.0.0 had OAuth hangs (fixed in 1.0.1):
agy update
# Optional overrides (all blank-by-default):
export ANTIGRAVITY_CLI_BIN=
export ANTIGRAVITY_CLI_TIMEOUT_SECONDS=300   # default 300; clamped 30–600; maps to `--print-timeout {N}s`
# Note: ANTIGRAVITY_CLI_MODEL is registered for forward-compat but currently no-op
# (agy v1.0.2 does not expose --model in headless `-p` mode). Each invocation uses
# whatever model is persisted in agy's local config; switch it interactively with
# `/models` inside the `agy` REPL. The wizard's model picker is a forward-compat
# catalog: once Google ships `--model` in headless, picking a value here will start
# being forwarded to agy via a one-line change in the adapter.
Antigravity CLI (agy) is Google’s successor to Gemini CLI. Install via curl -fsSL https://antigravity.google/cli/install.sh | bash, then run agy install to configure your shell PATH. The minimum tested version is 1.0.1 — older builds log a warning via the probe and direct you to agy update. Why two Google providers? Google’s transition announcement states that on 2026-06-18 Gemini CLI stops serving Pro/Ultra and free users. Paid Gemini Code Assist licences keep Gemini CLI indefinitely. OpenSRE keeps both gemini-cli (deprecated alias with a probe-time notice) and antigravity-cli so either group can run without surprises. As a best-effort fallback, the probe treats explicit GEMINI_API_KEY / GOOGLE_API_KEY / GOOGLE_APPLICATION_CREDENTIALS env credentials as authenticated (mirroring the Gemini CLI adapter), so users migrating across the two CLIs can keep their existing env-var-based auth without re-running the browser flow. Invocations run as agy -p PROMPT --print-timeout {N}s. The adapter never passes --continue / --conversation / --sandbox / --dangerously-skip-permissions, keeping every opensre call ephemeral.

xAI Grok Build CLI

export LLM_PROVIDER=grok-cli
# Authenticate the Grok Build CLI separately. Either path works:
grok login                 # OAuth sign-in with a SuperGrok / X Premium+ account
# ...or, for headless / CI runs, use an API key instead of a browser login:
export XAI_API_KEY=xai-...  # get one from the xAI console
# Optional overrides (all blank-by-default):
export GROK_CLI_MODEL=          # e.g. grok-build; unset → CLI configured default
export GROK_CLI_BIN=            # explicit path to the `grok` binary
export GROK_CLI_TIMEOUT_SECONDS=300   # default 300; clamped 30-600
Requires the xAI Grok Build CLI (binary: grok). Install with curl -fsSL https://x.ai/cli/install.sh | bash (macOS/Linux) or irm https://x.ai/cli/install.ps1 | iex (Windows). If GROK_CLI_MODEL is unset, OpenSRE omits -m and the CLI uses its configured default. The wizard populates the model list live from grok models at onboarding time so newly released models appear without an OpenSRE update. Invocations run as grok -p PROMPT --output-format plain, so each opensre call is a single non-interactive turn. The adapter deliberately omits --always-approve: OpenSRE drives its own tools, so Grok is used purely as a text responder and never auto-executes shell commands or file edits. Auth detection: auth is probed via grok models (~0.5 s, no LLM call), which prints “You are logged in” on success. XAI_API_KEY is treated as an authenticated fallback for headless / CI runs even when the probe result is unclear. XAI_API_KEY is forwarded only to the Grok subprocess (never via the shared CLI env allowlist), so it cannot leak into other CLI adapters.
Not to be confused with groq. The grok-cli provider is xAI’s Grok Build CLI. The separate groq provider is the Groq HTTP API (a different company); the two are unrelated.
See app/integrations/llm_cli/AGENTS.md for the adapter pattern used to add new CLI providers.

Reasoning effort (interactive shell)

In the TTY REPL (opensre with no subcommand), /effort stores a session preference for how strongly reasoning models should think before answering. It applies only when LLM_PROVIDER is openai (HTTP API) or codex (Codex CLI); other providers ignore the setting and the shell notes that.
InputSent to the model
low, medium, high, xhighsame string
maxxhigh
Run /effort alone to show the current choice (or (default) when unset) and the usage line. /new starts a fresh session but keeps /effort (and trust mode), consistent with other session prefs. Outside the REPL, optional defaults use the environment variable:
export OPENSRE_REASONING_EFFORT=high   # low | medium | high | xhigh
Session /effort overrides this for interactive runs. Implementation: app/llm_reasoning_effort.py.

Provider fallback and diagnostics

If the provider you set in LLM_PROVIDER is missing its API key, OpenSRE does not fail outright — it falls back to the next configured provider (by default it tries openai, then anthropic) so a partially configured machine still works. The trade-off is that calls can quietly go to a different provider than you intended, and you would otherwise only find out via a confusing error naming the fallback provider (for example “Anthropic credit balance too low” when you actually configured OpenAI). To make this visible:
  • A warning is logged the first time a fallback happens, naming the configured provider, the missing key, and the provider actually used:
    Configured LLM provider 'openai' is unusable (OPENAI_API_KEY is not set);
    falling back to 'anthropic'. Set OPENAI_API_KEY or change LLM_PROVIDER to use it.
    
  • /status (in the interactive shell) shows the resolved provider and flags a fallback inline, instead of just echoing LLM_PROVIDER:
    provider   anthropic (fallback from 'openai': OPENAI_API_KEY not set)
    
  • Provider errors in the interactive shell are prefixed with which provider served the request and whether it was a fallback, so the message is actionable:
    [LLM provider: anthropic — fell back from configured 'openai' (OPENAI_API_KEY not set)]
    Anthropic request rejected (HTTP 400): Your credit balance is too low ...
    
To remove a fallback, either set the missing key for your configured provider or change LLM_PROVIDER to a provider you have credentials for.

Switching providers at runtime

OpenSRE caches LLM clients on first use. To switch providers within a single process (tests, benchmarks), call reset_llm_singletons() from app.services.llm_client after updating the env vars; otherwise a fresh process picks up the new LLM_PROVIDER automatically.

Where this lives in the code