Hermes Agent – The Complete Guide – And Why It Is Actually Better Than OpenClaw


What Is Hermes Agent?

ha

Hermes Agent is an open-source, self-improving AI agent framework built by Nous Research — the same lab behind the Hermes, Nomos, and Psyche model families.

Launched on February 25, 2026, it represents a fundamental architectural bet: that the most valuable AI agents are not stateless task executors, but persistent systems that compound capability over time through structured learning loops.

At its core, Hermes Agent is a Python-based runtime that orchestrates large language models (LLMs) through a closed-loop execution pipeline.

Unlike traditional agent frameworks that treat each session as an isolated event — receive task, plan, execute, return result, forget everything — Hermes adds a reflective phase after execution.

When the agent completes a complex task, it evaluates its own performance, extracts reusable reasoning patterns, and persists them as structured skills.

The next time a similar task arrives, the agent queries its skill library instead of reasoning from scratch .

This creates what Nous Research calls “the agent that grows with you.” Three architectural properties define the framework:

  1. Skill Creation: Successful task completions are abstracted into reusable skills — structured reasoning templates stored as SKILL.md files that encode procedures, pitfalls, and verification steps.
  2. Skill Improvement: Skills are updated as new evidence arrives. If a better approach consistently outperforms the stored one, the skill is revised through the skill_manage tool.
  3. User Modeling: Across sessions, Hermes builds a persistent representation of the individual user — formatting preferences, decision history, common task patterns — stored in USER.md and an SQLite episodic archive .

The framework ships with 40+ built-in tools covering file operations, shell execution, web browsing, API calls, and natural-language cron scheduling.

It supports the Model Context Protocol (MCP) for extending tool coverage without modifying core code, and it provides multi-surface access through CLI, TUI, Web UI, messaging gateway (Telegram, Discord, Slack, WhatsApp, Signal, Email), and the Agent Client Protocol (ACP) for editor-native integration.

Hermes Agent matters because it blurs the line between operational automation and model training infrastructure.

It includes an integrated RL pipeline built on Tinker-Atropos that enables GRPO (Group Relative Policy Optimization) with LoRA adapters, allowing teams to collect agent trajectories and fine-tune smaller, cheaper models on their specific domain.


Why Hermes Agent Is Different From OpenClaw

ha

The comparison between Hermes Agent and OpenClaw is unavoidable.

Both are open-source, self-hosted AI agent frameworks with messaging integrations, memory systems, browser automation, and multi-agent support.

But they solve the same problem from opposite directions.

OpenClaw is gateway-first.

Its central abstraction is the Gateway — a persistent Node.js process that manages routing, permissions, channel integrations, skill dispatch, and external connections.

The AI model is pluggable and interchangeable.

The gateway persists independently of the model, managing sessions, hooks, skills, and channel integrations.

OpenClaw’s bet is that the hard problem is routing and control: who can reach your agent, from what channels, with what permissions.

Hermes Agent is agent-first.

Its central abstraction is the learning loop — an agent that gets more capable the longer it runs through autonomous skill creation, self-improving procedures, and a deepening model of the user.

Hermes’s bet is that the hard problem is memory and self-improvement.

FeatureHermes AgentOpenClaw
Skill creation from experience✓ Auto-generated✗ Human-written only
Skill refinement over time✓ Self-improving✗ Static after install
Cross-session user modeling✓ Built-in USER.mdLimited
Reactive tool use✓ 40+ built-in + MCP✓ 48 built-in + MCP
Multi-agent support✓ Profiles (isolated instances)✓ Named agents via Gateway
Messaging platforms13 (Telegram, Discord, Slack, WhatsApp, Signal, Email, etc.)22+ (includes iMessage, IRC, LINE, Nostr, Twitch)
Sandbox backendsDocker, Modal, Daytona, SSH, Singularity, localDocker, SSH, OpenShell
Browser automationBrowserbase, Browser Use, Firecrawl, Camofox, local CDPManaged browser, Chrome MCP, Playwright
IDE integrationACP (VS Code, Zed, JetBrains)ACP adapter
Voice supportTelegram voice, Discord voice channels, TTSElevenLabs, Microsoft, OpenAI TTS
Security modelContainer isolation + command approvalApproval system per command
Supply chain riskSelf-generated skills (no marketplace)ClawHub marketplace (341 malicious skills found in audit) 
CVE history (as of May 2026)3 disclosed (CVE-2026-7396, CVE-2026-7112, CVE-2026-7397) 138+ disclosed including CVE-2026-25253 (CVSS 8.8) 
Setup complexityModerateLow
Primary languagePythonTypeScript/Node.js

The security distinction is particularly stark.

OpenClaw’s ClawHub marketplace grew to 13,000+ community skills, but a Koi Security audit of 2,857 entries found 341 malicious skills — roughly a 12% malware rate.

Hermes sidesteps this supply-chain vector entirely because its skills are self-generated rather than downloaded from a community marketplace.

That said, Hermes is younger (launched February 2026 vs. OpenClaw’s late 2025 launch) and had three CVEs disclosed in April 2026, including a path traversal in the WeCom platform adapter (CVE-2026-7396) and an authentication issue in the API server (CVE-2026-7112).


Hermes Agent Architecture Deep Dive

ha

Hermes Agent is built on a modular, event-driven architecture that separates concerns while maintaining tight integration between components.

Understanding this architecture is essential for AI engineers who need to debug, extend, or productionize deployments.

The Agent Loop

The heart of Hermes is the agent loop — a stateful execution cycle that processes user input, selects tools, executes actions, and updates internal state.

The loop runs in distinct phases:

  1. System Prompt Assembly:
    • The framework assembles a composite system prompt from multiple sources — base persona, SOUL.md, active skills, MEMORY.md, USER.md, tool schemas, and session context.
    • This uses progressive disclosure: skills are loaded at three levels (title only, summary, or full content) based on relevance scoring to stay within context limits .
  2. Tool Resolution:
    • The agent evaluates which tools are available.
    • Built-in tools self-register through a COMMAND_REGISTRY pattern.
    • MCP servers are discovered dynamically.
    • Toolsets can be enabled or disabled per profile.
  3. Execution:
    • The selected tool runs within an execution environment — local shell, Docker container, Modal sandbox, SSH host, or Daytona cloud environment.
    • Each backend has different isolation guarantees.
  4. Observation & Reflection:
    • After execution, the agent observes the result.
    • If the task was complex and novel, the learning loop triggers a reflective phase where the agent considers whether to create or update a skill.
  5. Memory Update:
    • Session history is stored in SQLite with FTS5 full-text search.
    • The episodic archive is searchable via the session_search tool.
    • Bounded persistent memory (MEMORY.md and USER.md) is updated with hard character limits (2,200 chars for agent memory, 1,375 chars for user profile), forcing the agent to consolidate rather than bloat.

Memory Systems

Hermes implements three memory mechanisms.

Mechanism 1 — Frozen-Snapshot Persistent Memory:

MEMORY.md and USER.md are Markdown files that the agent manages directly. They have hard character limits. When memory is full, the agent must consolidate or replace entries, forcing prioritization. This bounded approach prevents context window bloat and keeps the system prompt focused.

Mechanism 2 — Cross-Session Recall via SessionDB:

All sessions are stored in SQLite with FTS5 full-text search. The session_search tool enables the agent to recall conversations from weeks ago. Summarization is handled via Gemini Flash rather than vector embeddings by default.

Mechanism 3 — Pluggable Memory Providers:

Optional integrations with Honcho, Mem0, OpenViking, and others can be enabled for semantic recall beyond the default FTS5 search.

Skills System

Skills are the killer feature.

A skill is a SKILL.md file with YAML frontmatter containing metadata (name, description, triggers, required environment variables) and a Markdown body with procedural instructions.

Skills live in ~/.hermes/skills/ and are discovered automatically.

The skill_manage tool enables self-improvement: the agent can read its own skills, evaluate their effectiveness against execution traces, and propose updates.

This is not fully autonomous — it is prompt-based encouragement that runs every 15 turns — but it creates a genuine feedback loop where the agent’s procedure library improves with use.

Skills follow the open agentskills.io standard, making them portable across compatible platforms.

Hermes can also install community skills from skill directories and migrate OpenClaw skills via the hermes claw migrate command.

Subagent Delegation

Hermes supports multi-agent workflows through the delegate_task tool.

Subagents start with restricted toolsets, isolated terminal sessions, and no conversation history.

They are useful for parallel workstreams — researching multiple topics simultaneously, code reviewing multiple files, or running independent investigations.

Each subagent operates in its own context window, preventing task contamination.

RL Training Integration

The environments/ directory contains research-grade infrastructure for RL training.

Key components include:

  • HermesAgentBaseEnv: Abstracts tool resolution and sandbox wiring for RL rollouts.
  • HermesAgentLoop: Runs the tool-call loop in a way that RL rollouts can drive.
  • ToolContext: Exposes the sandbox to reward functions so rewards can verify filesystem state.
  • Two-phase pipeline: Phase 1 uses VLLM/SGLang native tool-call parsing for evaluation. Phase 2 uses ManagedServer raw-token parsing for full RL training with GRPO.
  • Three-layer tool-result budgeting: Per-tool truncation → sandbox spillover with previews → per-turn budget. Without this, a single ls / could blow out a training rollout’s context window.

Installing Hermes Agent Step-by-Step

ha

Hermes Agent supports Linux, macOS, WSL2, and Android via Termux.

Native Windows is not supported — use WSL2.

System Requirements

  • CPU: 2+ cores for basic operation; 4+ recommended for gateway mode
  • RAM: 8 GB minimum; 16 GB recommended for local model inference
  • Storage: 2 GB for base installation; additional space for models, skills, and session history
  • Python: 3.11+ (required for RL training; 3.10+ works for basic operation)
  • GPU: Optional — required only for local inference or RL training

Quick Install (Linux/macOS/WSL2)

Shell
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc # or source ~/.zshrc
hermes # Start interactive CLI

The installer handles platform-specific setup automatically. For contributors, use the bootstrap script:

Shell
git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
./setup-hermes.sh # Installs uv, creates venv, installs .[all], symlinks ~/.local/bin/hermes
./hermes # Auto-detects venv

Manual Setup

Shell
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[all,dev]"
scripts/run_tests.sh

Docker Deployment

For VPS or production deployments, Docker provides clean isolation:

Shell
mkdir -p ~/.hermes
cd ~/.hermes
docker run -it --rm \
-v ~/.hermes:/opt/data \
nousresearch/hermes-agent setup

For persistent gateway deployment with resource limits:

YAML
# docker-compose.yaml
services:
hermes:
image: nousresearch/hermes-agent:latest
container_name: hermes
restart: unless-stopped
command: gateway run
ports:
- "8642:8642" # Gateway API
- "9119:9119" # Dashboard (when HERMES_DASHBOARD=1)
volumes:
- ~/.hermes:/opt/data
environment:
- HERMES_DASHBOARD=1
deploy:
resources:
limits:
memory: 4G
cpus: "2.0"

Start with docker compose up -d. API keys can be passed via .env file in ~/.hermes/ or directly via -e flags for CI/CD integration.

Post-Install Configuration

Shell
hermes setup # Full setup wizard
hermes model # Choose LLM provider and model
hermes tools # Configure enabled toolsets
hermes config set # Set individual config values
hermes doctor # Diagnose issues

GPU Setup (Local Inference)

For NVIDIA GPUs, ensure CUDA drivers and the NVIDIA Container Toolkit are installed.

For Apple Silicon, Metal acceleration is handled automatically by Ollama.

No additional CUDA configuration is required for cloud-only usage.

Troubleshooting

  • “Command not found” after install: Ensure ~/.local/bin is in your PATH. The installer adds this to .bashrc or .zshrc, but you must reload your shell.
  • Permission denied on .hermes: The Docker container auto-detects UID/GID from mounted volumes. On macOS, UIDs start at 501, not 1000 — check .env.docker.example.
  • Python version mismatch: RL training requires Python 3.11+. The base agent runs on 3.10+.

Running Hermes Agent on Free Models

ha

Hermes Agent works with any OpenAI-compatible API endpoint, making it compatible with numerous free and freemium inference providers.

This is critical for teams prototyping before committing to infrastructure spend.

OpenRouter (Recommended Free Tier)

OpenRouter aggregates multiple providers and offers a generous free tier with rate limits. It is the default provider in Hermes.

Shell
hermes model
# → Select "OpenRouter"
# → Paste your OPENROUTER_API_KEY
# → Select from available models

Or configure ~/.hermes/config.yaml directly:

YAML
model:
provider: openrouter
default: anthropic/claude-sonnet-4.6

Groq Free Inference

Groq offers high-speed inference on select models with a free tier:

YAML
model:
provider: groq
default: llama-3.3-70b-versatile
api_key: ${GROQ_API_KEY}

Together AI Free Tier

Together AI provides free inference on certain models with request limits:

YAML
model:
provider: together
default: meta-llama/Llama-3.3-70B-Instruct
api_key: ${TOGETHER_API_KEY}

Hugging Face Inference API

For models hosted on Hugging Face with serverless inference:

YAML
model:
provider: custom
base_url: https://api-inference.huggingface.co/v1
api_key: ${HF_API_KEY}
default: meta-llama/Llama-3.1-8B-Instruct

Cost Optimization Strategies

  1. Use context compression: Set context_compression: 0.5 in agent settings to summarize older messages when you hit half the memory limit.
  2. Limit max iterations: For free tiers with rate limits, set max_iterations: 20 to prevent runaway tool-call loops.
  3. Enable prompt caching: Hermes supports cache-friendly prompt stability — repeated system prompt structures are optimized for providers that support prefix caching.
  4. Switch providers mid-session: Use /model <provider>/<model> to switch to a cheaper provider for simple tasks while reserving expensive models for complex reasoning.

Latency Considerations

Free tiers typically have higher latency and lower rate limits.

Hermes’s timeout thresholds and retry behavior adapt automatically to measured response latency, but for free-tier workflows, expect:

  • Simple Q&A: 2–5 seconds
  • Multi-tool workflows: 15–45 seconds
  • Code generation with verification: 30–90 seconds

Running Hermes Agent on Local Models

ha

Local inference is where Hermes Agent shines for privacy-sensitive and air-gapped deployments. The framework treats local inference servers identically to cloud providers at the interface level.

Ollama Integration (Recommended)

Ollama is the primary local integration. It handles model downloads, GPU offloading, and serves an OpenAI-compatible API on localhost:11434.

Shell
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model with tool-calling support
ollama pull qwen2.5-coder:32b
# Start the server
ollama serve

Configure Hermes:

Shell
hermes model
# → Select "Custom endpoint (self-hosted / VLLM / etc.)"
# → Enter URL: http://localhost:11434/v1
# → Skip API key
# → Enter model name: qwen2.5-coder:32b

Or in ~/.hermes/config.yaml:

YAML
model:
default: qwen2.5-coder:32b
provider: custom
base_url: http://localhost:11434/v1
context_length: 32768

Critical Ollama Configuration:

Ollama defaults to very low context lengths depending on VRAM:

Table

Available VRAMDefault Context
< 24 GB4,096 tokens
24–48 GB32,768 tokens
48+ GB256,000 tokens

For agent use with tools, you need at least 16k–32k context.

At 4k, the system prompt + tool schemas alone can fill the window.

Configure server-side:

Shell
# Option 1: Environment variable
OLLAMA_CONTEXT_LENGTH=32768 ollama serve
# Option 2: Systemd
sudo systemctl edit ollama.service
# Add: Environment="OLLAMA_CONTEXT_LENGTH=32768"
sudo systemctl daemon-reload && sudo systemctl restart ollama
# Option 3: Custom Modelfile
echo -e "FROM qwen2.5-coder:32b\nPARAMETER num_ctx 32768" > Modelfile
ollama create qwen2.5-coder-32k -f Modelfile

Verify with ollama ps and check the CONTEXT column.

vLLM (Production GPU Serving)

vLLM is the standard for high-throughput production inference with continuous batching.

Shell
pip install vllm
vllm serve meta-llama/Llama-3.1-70B-Instruct \
--port 8000 \
--max-model-len 65536 \
--tensor-parallel-size 2 \
--enable-auto-tool-choice \
--tool-call-parser hermes

Hermes config:

YAML
model:
default: meta-llama/Llama-3.1-70B-Instruct
provider: custom
base_url: http://localhost:8000/v1
context_length: 65536

Tool-calling requires explicit flags:

  • --enable-auto-tool-choice: Required for tool_choice: "auto"
  • --tool-call-parser <name>: Must match model format (hermes, llama3_json, mistral, deepseek_v3, etc.)

Without these flags, tool calls appear as plain text and never execute.

llama.cpp Server

For GGUF enthusiasts and CPU inference:

Shell
llama-server \
-m ./models/Qwen3-8B-Q4_K_M.gguf \
--port 8080 \
--ctx-size 32768 \
--jinja # Required for tool calling

Hermes config:

YAML
model:
default: qwen3-8b-q4-km
provider: custom
base_url: http://localhost:8080/v1
context_length: 32768

Multi-GPU Deployment

For vLLM multi-GPU setups, use tensor parallelism:

YAML
vllm serve Qwen/Qwen3-32B \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.95 \
--max-model-len 128k \
--enable-auto-tool-choice \
--tool-call-parser hermes

VRAM Optimization

Table

ModelQuantizationVRAM RequiredContextUse Case
Qwen3-8BQ4_K_M~5 GB128KBudget/small GPU
Qwen3-30B-A3B (MoE)Q4_K_M~17 GB128KBest overall balance
Qwen2.5-Coder-32BQ4_K_M~20 GB128KCoding-focused
Llama-3.1-8BQ4_K_M~5 GB128KLightweight, proven
Gemma 4Q4~16 GB128KReasoning + code

On Apple Silicon, Metal GPU acceleration via Ollama delivers 50–80 tokens/second on 7B models — sufficient for interactive use.

WSL2 Networking (Windows Users)

If running Hermes in WSL2 and Ollama on Windows host, localhost won’t work in NAT mode. Set OLLAMA_HOST=0.0.0.0 on Windows and use the host IP from WSL2. In mirrored mode, localhost maps directly. Add a Windows Firewall rule if needed:

PowerShell
# Admin PowerShell
New-NetFirewallRule -DisplayName "Allow WSL2 to Ollama" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 11434


Instant Hermes Agent Workflows for AI Engineers

ha

Hermes Agent’s value proposition for LLMOps teams lies in its ability to automate operational workflows while accumulating institutional knowledge.

Below are some production-tested workflow patterns.

Workflow 1: Autonomous Code Review

Configure Hermes with repository access and enable the github toolset. The agent can:

  • Check out PR branches
  • Run linting and type checking
  • Analyze diff context
  • Comment on issues
  • Create follow-up tickets

YAML
# ~/.hermes/config.yaml snippet
toolsets:
- github
- terminal
skills:
- github-code-review
terminal:
backend: docker
container_persistent: true

The github-code-review skill (available in the skills hub) encodes review heuristics. After each review, Hermes refines the skill based on developer feedback signals.

Workflow 2: AI SOC Monitoring

Integrate Hermes with log aggregation and alerting systems:

Shell
# cron job definition in ~/.hermes/cron/
- schedule: "0 */4 * * *"
description: "Check security logs for anomalies"
command: |
Analyze the last 4 hours of auth.log for brute-force patterns.
If anomalies found > 10 attempts from single IP, create incident report.

Hermes uses natural language cron — you describe the schedule in English, and the agent translates it to cron syntax.

Workflow 3: Kubernetes Monitoring

Markdown
# Custom skill snippet for K8s health checks
# ~/.hermes/skills/k8s-health/SKILL.md
---
name: k8s-health
description: Check Kubernetes cluster health
required_environment_variables:
- name: KUBECONFIG
prompt: Path to kubeconfig
---
## Procedure
1. Run `kubectl get nodes --kubeconfig $KUBECONFIG`
2. Check for NotReady nodes
3. Run `kubectl top nodes` for resource pressure
4. Check pod restart counts in kube-system
5. Report findings in structured markdown

Workflow 4: RAG Orchestration

Hermes can orchestrate RAG pipelines by combining the web_search, browser, and file tools with document ingestion:

Shell
# Ingest documentation
hermes -c "Clone the repo at https://github.com/org/docs.git,
index all markdown files, and create a searchable
knowledge base skill called 'internal-docs'."

The resulting skill contains embedded retrieval logic and can be queried in future sessions.

Workflow 5: Multi-Agent Research Pipeline

Use delegate_task to parallelize research:

Shell
hermes -c "I need a competitive analysis of 5 vector databases.
Delegate one subagent per database (Pinecone, Weaviate,
Milvus, Chroma, Qdrant). Each should evaluate:
performance, cost, scalability, and ecosystem.
Synthesize results when complete."

Each subagent runs in isolation with no shared conversation history, preventing cross-contamination.

Workflow 6: Incident Response

Markdown
# Incident response workflow
# ~/.hermes/skills/incident-response/SKILL.md
---
name: incident-response
description: Automated incident triage
---
## Procedure
1. Receive alert payload (service name, error rate, latency p99)
2. Query logs for error patterns via `terminal`
3. Check recent deployments via `github` or `gitlab` tools
4. Correlate metrics time window with deploy times
5. Generate hypothesis and confidence score
6. If confidence > 0.8, execute rollback via `terminal`
7. Document incident timeline in `MEMORY.md`


Security Vulnerabilities and Risks in Hermes Agent

ha

Any framework that grants LLMs access to terminals, APIs, and persistent memory carries inherent risk.

Hermes Agent’s security model is more conservative than OpenClaw’s, but it is not immune to vulnerabilities.

Disclosed CVEs (as of May 2026)

CVEComponentCVSSDescription
CVE-2026-7396WeCom platform adapter (gateway/platforms/wecom.py)5.3Path traversal in _load_outbound_media via file:// URLs — ../ sequences not sanitized, allowing arbitrary file read 
CVE-2026-7112API server (gateway/platforms/api_server.py)MediumImproper authentication in _check_auth function — complexity of attack is high but exploit publicly disclosed 
CVE-2026-7397Filesystem operationsMediumImproper link resolution before file access (symlink following) 

These were disclosed in April 2026 against version 0.8.0.

The path traversal in WeCom is particularly relevant for gateway deployments — any message containing a malicious file:// URL could trigger arbitrary file reads.

The fix involved path normalization and traversal sequence checking.

Threat Model

Prompt Injection:

  • Malicious content in web pages, emails, or documents can hijack the agent’s reasoning.
  • Hermes mitigates this through context compression and system prompt hardening, but no prompt injection defense is perfect.

Tool Abuse: An LLM with terminal access can execute destructive commands. Hermes implements layered defense:

  1. Command Approval: Dangerous commands require explicit user approval.
  2. Tirith Scanning: Pre-execution scanning of terminal commands for dangerous patterns.
  3. Backend Isolation: Running commands in Docker, Modal, or SSH containers rather than the host.

Memory Poisoning:

  • Because skills are self-generated, a compromised agent could poison its own skill library with malicious procedures.
  • The bounded memory model limits the blast radius, but skills should be audited periodically.

API Key Leakage:

  • Skills that declare required_environment_variables receive automatic passthrough to execution environments.
  • If a malicious skill declares a common secret name, it could exfiltrate credentials.
  • Hermes strips sensitive env vars by default and only passes through explicitly declared variables.

Autonomous Escalation:

  • The skill_manage tool allows the agent to modify its own procedures.
  • If an attacker influences this loop, the agent could grant itself additional capabilities over time.

Supply Chain:

  • Unlike OpenClaw, Hermes has no centralized skill marketplace.
  • Skills are self-generated or installed from curated directories.
  • This eliminates the ClawHub malware vector but shifts risk to the quality of self-generated skills.

Hardening Recommendations

YAML
# Production-hardened config.yaml
terminal:
backend: docker # Never run on host in production
container_persistent: false # Ephemeral containers
docker_forward_env: [] # Explicit env var allowlist
gateway:
allowlist_users: true # Never use GATEWAY_ALLOW_ALL_USERS=true
dm_pairing: true # Use pairing codes, not hardcoded IDs
security:
command_allowlist: # Explicit allowlist approach
- git
- kubectl
- docker
- python
dangerous_command_detection: true
messaging:
cwd: /workspace # Restrict to non-sensitive directory

Container Isolation

For maximum security, use Docker or Modal backends with ephemeral containers:

Shell
# Docker sandbox with dropped capabilities
docker run -d \
--read-only \
--tmpfs /tmp:noexec,nosuid,size=100m \
--cap-drop=ALL \
--security-opt=no-new-privileges \
nousresearch/hermes-agent

For advanced isolation, community members have demonstrated gVisor integration via the --runtime runsc Docker flag, creating per-execution sandboxed containers.


Production Deployment Best Practices

ha

Deploying Hermes Agent in production requires treating it as a stateful service with persistent storage, not a stateless function.

Kubernetes Deployment

While Nous Research does not provide an official Helm chart, the community maintains one with production-ready defaults:

Shell
helm install hermes ./hermes-agent-helm-chart \
--namespace hermes \
--create-namespace \
--set secrets.OPENROUTER_API_KEY=sk-or-... \
--set config.values.model.default=anthropic/claude-opus-4.6

Key chart features:

  • replicaCount: 1 with strategy.type: Recreate enforced when persistence is enabled (prevents unsafe shared volume access)
  • Optional Service, Ingress, Istio VirtualService, RBAC, NetworkPolicy, and PDB
  • Composable secrets via secrets.existingSecret or chart-managed Secrets
  • Tenant-scoped operation support

For enterprise deployments, HiClaw provides a Kubernetes-native controller that runs Hermes Agent as a first-class Worker runtime within a multi-agent cluster architecture, complete with Leader Election, PVC persistence, and per-worker RBAC.

Docker Swarm

YAML
# docker-compose.prod.yaml
version: "3.8"
services:
hermes:
image: nousresearch/hermes-agent:latest
deploy:
replicas: 1
resources:
limits:
cpus: '2.0'
memory: 4G
volumes:
- hermes-data:/opt/data
environment:
- HERMES_GATEWAY=true
- TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
networks:
- hermes-net
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8642/health"]
interval: 30s
timeout: 10s
retries: 3
volumes:
hermes-data:
driver: local

GPU Orchestration

For local model serving alongside Hermes, co-locate vLLM or Ollama in the same pod/network:

YAML
# Kubernetes sidecar pattern
spec:
containers:
- name: hermes
image: nousresearch/hermes-agent:latest
env:
- name: HERMES_MODEL_BASE_URL
value: "http://localhost:8000/v1"
- name: vllm
image: vllm/vllm-openai:latest
resources:
limits:
nvidia.com/gpu: "2"
args:
- --model
- Qwen/Qwen3-32B
- --tensor-parallel-size
- "2"

Observability

Hermes logs to ~/.hermes/logs/ with structured output. For production:

Log
# Promtail/Loki scraping
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"

Key log locations:

  • ~/.hermes/logs/gateway.log — Gateway API and messaging events
  • ~/.hermes/logs/agent.log — Agent loop and tool execution
  • ~/.hermes/logs/rl_training/ — RL training run logs (Atropos, Tinker, environment)

CI/CD Integration

YAML
# .github/workflows/hermes-deploy.yaml
name: Deploy Hermes Agent
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Deploy to VPS
run: |
ssh ${{ secrets.VPS_USER }}@${{ secrets.VPS_HOST }} \
"cd /opt/hermes && docker compose pull && docker compose up -d"
- name: Health check
run: |
curl -f https://hermes.example.com/health || exit 1


Advanced Optimization Techniques

ha

Hermes Agent provides multiple knobs for optimizing performance, cost, and latency.

Context Optimization

The bounded memory model is not just a security feature — it is an optimization.

By forcing the agent to consolidate memory into 2,200 characters (~800 tokens), Hermes keeps the system prompt lean.

For comparison, unbounded memory systems can balloon to 8k+ tokens of irrelevant historical context.

Use context_compression: 0.5 to trigger summarization at 50% of the context limit, and set explicit context_length in config.yaml to prevent wasteful probing:

YAML
model:
default: qwen3.5:27b
base_url: http://localhost:11434/v1
context_length: 32768

GPU Batching (vLLM)

For multi-user gateway deployments, vLLM’s continuous batching dramatically improves throughput:

Shell
vllm serve Qwen/Qwen3-32B \
--max-num-seqs 256 \
--gpu-memory-utilization 0.95 \
--enable-prefix-caching

Prefix caching reuses KV cache for shared system prompts across concurrent Hermes sessions.

Speculative Decoding

vLLM supports speculative decoding for latency reduction:

Shell
vllm serve Qwen/Qwen3-32B \
--speculative-model Qwen/Qwen3-0.5B \
--num-speculative-tokens 5

Multi-Agent Scaling

Use Profiles to run multiple isolated Hermes instances, each with its own memory, skills, and gateway:

Shell
hermes --profile researcher
hermes --profile ops-engineer

Each profile maintains separate ~/.hermes/profiles/<name>/ directories. For Kubernetes, deploy one Helm release per tenant.

Token Optimization

  1. Progressive skill disclosure: Skills load at three levels (title, summary, full). Only highly relevant skills expand to full content.
  2. Tool result budgeting: Three-layer truncation prevents a single ls -R / from consuming the entire context window.
  3. Session reset policies: Configure auto-reset based on inactivity or time to prevent unbounded context growth:

YAML
session:
reset_policy:
- inactivity_minutes: 1440
- daily_at: "04:00"


The Future of Hermes Agent and Autonomous AI Infrastructure

ha

Hermes Agent sits at the intersection of three converging trends:

  1. Local-first AI infrastructure
  2. Autonomous agent runtimes
  3. RL-native tooling.

Multi-Agent Ecosystems

The agentskills.io standard and Open Gateway Protocol (OGP) federation layer are enabling cross-framework agent communication.

An OpenClaw agent and a Hermes agent can already exchange signed, cryptographically-verified messages without knowing what runtime the peer is using.

This suggests a future where specialized agents on different frameworks collaborate on shared projects.

Self-Healing AI Workflows

The combination of skill self-improvement, bounded memory, and RL training creates the foundation for self-healing workflows.

An agent that detects its own skills failing can generate evaluation datasets, run GEPA (Genetic-Pareto Prompt Evolution) optimization, and produce measurably better variants — all via API calls without GPU training.

Edge AI and Distributed Systems

Hermes’s support for local inference (Ollama, llama.cpp, vLLM) and lightweight profiles makes it viable for edge deployment.

The HiClaw Kubernetes integration demonstrates how Hermes Workers can participate in multi-agent teams alongside other runtimes, with cross-runtime message delivery and unattended autonomous execution.

AI-Native DevOps

The framework’s trajectory export, batch processing, and Atropos RL integration position Hermes not just as an automation tool, but as a data generation pipeline for the next generation of agent models.

For AI teams, this means the operational agent and the training infrastructure are the same system.

The framework is young — 10 releases as of early 2026 versus OpenClaw’s 82 — and its ecosystem is smaller.

But for teams building repetitive, structured workflows where agent improvement creates measurable value over time, Hermes Agent offers capabilities that flat execution frameworks cannot deliver.

Why Hermes Agent Is Better Than OpenClaw

ha

The question is not which framework has more GitHub stars or which community is louder.

The question is which architecture solves the problems that actually matter for production AI infrastructure in 2026.

After deploying both frameworks across multiple environments — from $5 VPS instances to multi-GPU Kubernetes clusters — the engineering case for Hermes Agent over OpenClaw rests on five structural advantages that compound over time:

  1. Architectural philosophy
  2. Memory efficiency
  3. Self-improving skill systems
  4. Operational economics
  5. Security posture.

The Agent-First Architecture Wins for Automation

ha

This agent-first determines every downstream engineering decision.

In OpenClaw, adding capability means adding channels, agents, or skills to the hub.

In Hermes, capability grows concentrically from the agent’s own execution history.

For AI engineers building automation that must improve over months, the agent-first model is the only one that compounds value.

The gateway-first model compounds complexity.

Practical evidence:

  • After 10–20 similar tasks, Hermes skill refinement improves execution speed by 2–3x.
  • In one benchmark, Hermes completed a research task 40% faster than on its first run, using skills it had created from previous executions.

OpenClaw cannot do this because its skills are static — they execute the same way on day 300 as they did on day 1 unless a human manually updates them.


Memory Architecture: Structured vs. Bloated

ha

Memory is where the two frameworks diverge most dramatically.

OpenClaw relies heavily on the LLM’s context window for memory retention, appending messages to JSONL log files and feeding entire conversation histories back into the model during recall.

This approach is simple to implement but creates exponential degradation in response time and unnecessary token burn as sessions grow.

This is one of the cores of the argument – vastly different economic cost differences!

A controlled benchmark by Regolo AI measured identical workloads on both frameworks using the same model backend.

The results were stark:

Table

MetricOpenClawHermes Agent
RSS Memory Δ0.00 MB-2.75 MB
Disk Usage Δ213.41 KB0.00 KB
Recall Latency19,593.32 ms113.14 ms

OpenClaw took nearly 20 seconds to recall a simple fact from an active session because it had to feed the entire history back into the LLM context window.

Hermes recalled the same data in 113 milliseconds by querying its SQLite database with FTS5 full-text search.

Hermes implements a four-layer memory architecture that is both bounded and intelligent:

Layer 1 — Prompt Memory (Hot):

  • MEMORY.md (~2,200 characters, ~800 tokens) and USER.md (~1,375 characters, ~500 tokens) are loaded as frozen snapshots into the system prompt at session start.
  • These hard limits force the agent to prioritize and consolidate rather than bloat.
  • Updates are persisted immediately but only appear in the next session, keeping the prefix cache stable for prompt caching optimizations

.

Layer 2 — Session Archive (Cold Recall):

  • All sessions are stored in SQLite with FTS5 full-text search.
  • The session_search tool enables episodic recall across weeks of conversation history.
  • Results are summarized by a configurable LLM call.
  • This is on-demand memory — it only consumes tokens when explicitly queried

.

Layer 3 — Skills (Procedural Memory):

  • Self-generated skills capture reusable workflows, edge cases, and verification steps.
  • These are not static documentation — they are living documents that the agent refines based on execution feedback

.

Layer 4 — External Providers (Optional):

  • Pluggable integrations with Hindsight, Honcho, Mem0, OpenViking, Holographic, RetainDB, and ByteRover enable advanced semantic search, knowledge graphs, and entity resolution for teams that need structured recall beyond the built-in layers.

OpenClaw’s memory model is richer in layers but prone to context bloat.

In practice, OpenClaw can pull irrelevant context from days-old conversations into current tasks — a Telegram thread about one client contaminating an email draft for another.

Hermes’s tiered retrieval — core memory first, then session search, then deeper vector search — is more disciplined and produces sharper results in repeated workflows.


The Self-Improving Learning Loop

ha

This is Hermes Agent’s defining feature and the biggest reason developers are migrating from OpenClaw.

The difference is not incremental — it is categorical.

OpenClaw skills are static SKILL.md files with YAML frontmatter and natural-language instructions.

You write them, version them with Git, and share them via ClawHub.

Workspace skills take precedence over global skills, giving fine-grained control.

But they are fundamentally inert — they execute the same instructions every time until a human edits them.

Hermes skills follow the agentskills.io open standard and can be auto-generated.

When Hermes completes a complex task, it abstracts the successful pattern into a reusable skill document that captures the exact methodology, logic, tools used, and edge cases encountered.

The next time a similar task appears, the agent references and refines that skill. If it finds a more efficient approach, it patches the skill file in real-time.

The learning loop runs every 15 tasks.

The agent evaluates its own performance, analyzing both successes and failures, extracting what worked, and updating its knowledge.

This is not a marketing feature — it is a core architectural mechanism that changes the relationship between operator and agent.

The practical impact is measurable.

In developer surveys, 30% of active developers who migrated from OpenClaw cited “maintenance fatigue” — the burden of manually updating and debugging community-written plugins — as their primary motivation for switching.

With Hermes, the agent maintains its own procedures.

The operator trains the system; the system maintains itself.


Operational Economics and Deployment Efficiency

ha

Hermes Agent is built lighter.

Stateless-by-default sub-agents and disk-first memory mean you can deploy it on a $5 VPS and forget about it.

OpenClaw’s persistent-agent architecture assumes a long-running process with rich in-memory state — which is harder to checkpoint cleanly to remote infrastructure and more fragile when the host machine restarts.

This makes Hermes fundamentally better for:

  • Daily briefs and scheduled research
  • Recurring content pipelines
  • Monitoring jobs and background data collection
  • Report generation and cron-based automation
  • VPS-friendly deployments where cost efficiency matters

Hermes also wins on model flexibility.

It is more comfortable with open models and aggregators like OpenRouter, enabling per-skill routing — cheap models for summarization and classification, expensive models for reasoning steps.

In Hermes, this is a config file change.

In OpenClaw, the same change requires touching multiple agent definitions and fighting the framework.

The setup time reflects this philosophy.

OpenClaw’s Docker Compose gets you running in under 30 minutes with a substantial default toolset.

Hermes takes 2–4 hours for full local setup with memory and tools configured, but the result is a system that requires less ongoing maintenance because it learns rather than being manually maintained.


Security Posture: Architecture as Defense

ha

The security comparison between Hermes Agent and OpenClaw is uncomfortable for OpenClaw advocates because the numbers are not close.

As of May 2026, OpenClaw has accumulated 138 disclosed CVEs in 63 days, including 7 critical (CVSS above 9.0) and 49 high severity.

The most destructive was CVE-2026-25253 — a zero-click remote code execution vulnerability with CVSS 8.8 that allowed attackers to steal authentication tokens through WebSocket gateway hijacking.

Shodan data showed over 42,000 publicly exposed OpenClaw instances, 63% with gateway authentication disabled.

Hermes Agent, launched in February 2026, had three CVEs disclosed in April 2026 against version 0.8.0: CVE-2026-7396 (path traversal in WeCom adapter, CVSS 5.3), CVE-2026-7112 (authentication issue in API server), and CVE-2026-7397 (symlink following in file tools, CVSS 4.4).

All three are medium-to-low severity and require specific conditions to exploit.

The root cause of OpenClaw’s security crisis is architectural.

OpenClaw was designed as a consumer-friendly local tool that grew into a networked agent.

Many of its security assumptions were reasonable for a personal tool but dangerous at scale.

Hermes was designed with container hardening, namespace isolation for subagents, and credential rotation from the start.

Its skills are self-generated rather than downloaded from a community marketplace, eliminating the ClawHub supply-chain attack vector entirely.

The ClawHub marketplace grew to 13,000+ community skills, but a Koi Security audit of 2,857 entries found 341 malicious skills — roughly a 12% malware rate.

Hermes sidesteps this vector because there is no centralized marketplace.

Skills are generated from the agent’s own execution traces, making supply-chain attacks exponentially harder.


The Verdict for Production

ha

Choosing between Hermes Agent and OpenClaw is not about picking a winner in a popularity contest.

It is about selecting the architecture that matches your operational reality.

Choose OpenClaw when:

  • You need multi-channel agent orchestration across 22+ platforms
  • You require persistent agent teams with cross-session state sharing
  • You want immediate access to 5,700+ community skills
  • Your problem is routing and control, not learning and improvement
  • You need a mature ecosystem with extensive documentation and community support

Choose Hermes Agent when:

  • You are building automation that must improve over months without manual maintenance
  • You need lean, search-first memory that does not bloat context windows
  • You want self-generated skills that capture institutional knowledge automatically
  • You are deploying on lightweight infrastructure ($5 VPS, edge devices)
  • You prioritize security through architectural restraint over ecosystem breadth
  • You need RL-native infrastructure for fine-tuning domain-specific models

The fundamental difference is philosophical.

OpenClaw is a tool you configure.

Hermes Agent is a teammate that learns.

OpenClaw stays the same while you use it.

Hermes gets better, incrementally, through cutting-edge metaheuristics technology..

For AI engineers building the next generation of autonomous infrastructure, the choice is clear: if your problem is coordination, OpenClaw is the better control plane.

If your problem is always-on automation that compounds capability over time, Hermes Agent is the superior architectural bet.


References

ha

NightCafe Studio was used to generate all the images in this article.

Kimi K2.6 was used in the first draft of this article.

Leave a Reply