Unblockable web access for your AI agent

An AI agent without internet is useless for anything time-sensitive. It can’t look things up, can’t verify claims, and confidently tells you outdated nonsense. I needed my agents to actually browse the web.

The problem: the web fights back. Cloudflare, Akamai, PerimeterX, Datadome — most major sites now block headless browsers, datacenter IPs, and anything that smells automated. Your agent’s web_extract call returns a 403 and now it’s guessing.

I run five Hermes Agent instances on my homelab. Over the past few months I’ve built a web access stack that handles every bot detector I’ve thrown at it.

The stack

Three tools, one proxy chain:

web_search goes through Firecrawl to SearXNG (self-hosted, multi-engine)
web_extract goes through Firecrawl
browser_* goes through Camofox (anti-detection Firefox fork)

SearXNG isn’t a separate tool Hermes calls directly. Firecrawl owns both search and extraction, and SearXNG is its search backend.

All three paths share the same proxy chain:

Privoxy (:8118) bridges HTTP to SOCKS5
SSH SOCKS5 tunnel (:1080) connects to a residential network
Raspberry Pi on a home connection, exit IP x.x.x.x

Every request leaves through a residential IP. Bot detectors see a real user on a home network.

1. SearXNG — Firecrawl’s search backend

SearXNG is a meta-search engine that queries Google, Bing, DuckDuckGo, Brave, and others without tracking you. In my setup it’s not standalone — it’s the engine behind Firecrawl’s search API. When Hermes calls web_search, Firecrawl routes the query to SearXNG.

Why not DuckDuckGo?

Hermes ships with DuckDuckGo as the default search backend. It works, but:

DDG rate-limits hard. A busy agent hits limits within hours.
Results are often thinner than Google or Bing.
You can’t pick which engines to query.
No control over result format.

Firecrawl swaps DDG for SearXNG. You get multi-engine aggregation, no rate limits, and full control. The agent doesn’t know the difference.

Deployment

SearXNG runs inside the Firecrawl Docker Compose stack (section 2 below), not as its own service. The default Firecrawl compose doesn’t include SearXNG — you add these services to the same docker-compose.yaml:

services:
  searxng-core:
    image: searxng/searxng:latest
    restart: unless-stopped
    volumes:
      - ./settings.yml:/etc/searxng/settings.yml
      - ./limiter.toml:/etc/searxng/limiter.toml
    networks:
      - backend

  searxng:
    image: nginx:alpine
    restart: unless-stopped
    ports:
      - "8888:8080"
    volumes:
      - ./nginx.conf:/etc/nginx/conf.d/default.conf
    depends_on:
      - searxng-core
    networks:
      - backend

The nginx proxy in front is critical. More on that in the pitfalls.

Key configuration (`settings.yml`)

search:
  formats:
    - html
    - json    # <- without this, the API returns 403

server:
  secret_key: "your-random-secret-here"
  limiter: true

outgoing:
  proxies:
    all://:
      - socks5h://127.0.0.1:1080    # route through residential proxy

Pitfalls I hit

JSON format returns 403. SearXNG defaults to HTML-only. If you try the JSON API (/search?q=test&format=json) without adding json to search.formats, you get a 403 Forbidden.

Bot detection blocks API clients. Even with limiter: false, SearXNG blocks requests from curl, wget, and python-requests based on User-Agent strings. Fix: put nginx in front and rewrite the User-Agent:

server {
    listen 8080;
    location / {
        proxy_pass http://searxng-core:8080;
        proxy_set_header User-Agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36";
    }
}

Docker networks need trust. If you enable the rate limiter, SearXNG needs to know that requests from Docker bridge networks (172.16.0.0/12) aren’t spoofed. Add to limiter.toml:

[botdetection.ip_limit]
trusted_proxies = ["172.16.0.0/12", "192.168.0.0/16"]

SearXNG doesn’t need its own Hermes config — Firecrawl handles it. The Hermes integration is in the Firecrawl section below.

2. Firecrawl — self-hosted web extraction

Firecrawl converts web pages to clean markdown. It handles JavaScript rendering, anti-bot bypassing, and content extraction. The cloud version costs money. Self-hosting it is free and gives you unlimited scraping.

Architecture

Five Docker containers:

Container	Role	Port
`firecrawl-api-1`	Main API + workers	3002
`firecrawl-playwright-service-1`	Headless browser for JS rendering	internal
`firecrawl-redis-1`	Caching + rate limiting	internal
`firecrawl-rabbitmq-1`	Job queue	internal
`firecrawl-nuq-postgres-1`	Job persistence + pg_cron	internal

Deployment

# docker-compose.yml (simplified)
services:
  api:
    image: ghcr.io/mendableai/firecrawl-api:latest
    ports:
      - "3002:3002"
    env_file: .env
    depends_on:
      - redis
      - rabbitmq
      - nuq-postgres
    restart: unless-stopped

  playwright-service:
    image: ghcr.io/mendableai/firecrawl-playwright-service:latest
    env_file: .env
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    restart: unless-stopped

  rabbitmq:
    image: rabbitmq:3-management-alpine
    restart: unless-stopped

  nuq-postgres:
    build: src/apps/nuq-postgres  # MUST build locally
    restart: unless-stopped

The nuq-postgres problem

Don’t use the GHCR pre-built image. It has a pg_cron config mismatch where cron.database_name doesn’t match the init script database. Crashes on startup with "can only create extension in database postgres".

Clone the Firecrawl repo so the src/apps/nuq-postgres directory is available for the compose build:

git clone --depth 1 https://github.com/mendableai/firecrawl.git /root/firecrawl/src
cd /root/firecrawl
docker compose build nuq-postgres

The compose file references build: src/apps/nuq-postgres relative to itself, so the repo needs to be cloned into the src/ subdirectory. The local build picks up the correct postgresql.conf.sample with cron.database_name = 'postgres'.

Proxy configuration

All Firecrawl traffic routes through the residential proxy. In .env:

# proxy for all outbound requests
HTTP_PROXY=http://172.17.0.1:8118
HTTPS_PROXY=http://172.17.0.1:8118

# job queue (mandatory)
NUQ_RABBITMQ_URL=amqp://rabbitmq:5672

# disable auth (self-hosted, no API key needed)
USE_DB_AUTHENTICATION=false

172.17.0.1 is the Docker bridge gateway. It routes to Privoxy on the host at :8118, which forwards through the SSH SOCKS5 tunnel to the residential IP.

Hermes integration

# ~/.hermes/.env
FIRECRAWL_API_URL=http://localhost:3002

# ~/.hermes/config.yaml
web:
  extract_backend: firecrawl
  search_backend: firecrawl

No API key needed. Self-hosted Firecrawl skips auth when USE_DB_AUTHENTICATION=false.

Resource usage

The limits look worse than reality. Actual numbers:

Container	CPU limit	RAM limit	Actual RAM
firecrawl-api	3.0	5G	~2.2G
playwright-service	2.0	3G	~192M
searxng (nginx)	—	128M	~8M
searxng-core	—	512M	~156M
redis	—	256M	~6M
rabbitmq	—	512M	~198M
nuq-postgres	—	512M	~104M
Total	5.0	~9.6G	~2.9G

Limits total ~9.6G but in practice it sits around 3G. If you’re tight on resources, the cloud Firecrawl API (500 pages/month free tier) is probably a better starting point.

3. Camofox — anti-detection browser

Camofox is a Firefox fork with C++-level fingerprint spoofing. It’s not a headless browser pretending to be real. It is a real browser with randomized fingerprints that make each session look like a different person.

Why not Playwright or Puppeteer?

Headless Chromium has tells:

navigator.webdriver is true
WebGL renderer shows “SwiftShader” (Google’s software renderer)
Canvas fingerprint is consistent across sessions
Audio context fingerprint is detectable
Plugins list is empty

Camofox randomizes all of these. With a residential IP, it’s indistinguishable from a real user.

Deployment

docker run -d   --name camofox-browser   --network host   --restart unless-stopped   -e CAMOFOX_PORT=9377   -e PROXY_HOST=localhost   -e PROXY_PORT=8118   ghcr.io/jo-inc/camofox-browser:latest

A few things worth noting:

--network host because Camofox needs direct access to localhost for Privoxy
Proxy is configured via ENV vars, not config.json
uBlock Origin comes built into the image
Health check: curl http://localhost:9377/health

Hermes integration

# ~/.hermes/.env
CAMOFOX_URL=http://localhost:9377

With CAMOFOX_URL set, Hermes routes all browser_* calls through Camofox instead of the default agent-browser:

Tool	Routes through	IP used
`browser_navigate`, `browser_snapshot`	Camofox -> Privoxy -> SOCKS5	Residential
`web_extract`	Firecrawl -> Privoxy -> SOCKS5	Residential
`web_search`	Firecrawl -> SearXNG -> Privoxy -> SOCKS5	Residential

What it beats

I’ve tested Camofox + residential IP against known aggressive bot detection:

Site	Protection	Result
ESPNcricinfo	Proprietary	Full page (was “Access Denied” without)
Ticketmaster	Akamai + Queue-It	Full page
Nike	Akamai	Full page
Instacart	Datadome	Full page
Amazon	Custom	Full page (was blocked on datacenter IP)
Cloudflare	JS challenge	Passes automatically

4. The residential proxy chain

This is the part everything else depends on. A residential IP makes bot detection mostly irrelevant. Datacenter IPs get flagged by default, but home IPs don’t.

You need a machine on a home network that stays online. A Raspberry Pi works (what I use), but so does an old laptop, a mini PC, or even a phone running Termux + Tailscale. The only requirement: it has a residential ISP connection and can hold an SSH tunnel open.

Architecture

the host VM (homelab)
    |
    v
SSH SOCKS5 Tunnel (:1080)
    | encrypted, auto-reconnect via systemd
    v
Residential Machine (Raspberry Pi, home network)
    |
    v
Internet (x.x.x.x — residential ISP)

SSH tunnel service

# /etc/systemd/system/socks5-residential.service
[Unit]
Description=SSH SOCKS5 Residential Proxy
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/ssh -D 1080 -N -f -p 6464 user@<TAILSCALE_IP>
Restart=on-failure
RestartSec=15

[Install]
WantedBy=multi-user.target

The residential machine is a Raspberry Pi on a home network, connected via Tailscale. SSH key auth only.

Privoxy (HTTP to SOCKS5 bridge)

Firecrawl and Camofox speak HTTP, not SOCKS5. Privoxy sits in between and forwards everything to the SSH tunnel.

Install it:

apt install privoxy

Edit /etc/privoxy/config, strip out everything except these two lines:

listen-address  0.0.0.0:8118
forward-socks5  /  127.0.0.1:1080  .

That’s the entire config. listen-address binds to all interfaces so Docker containers can reach it. forward-socks5 sends every request to the SSH tunnel on port 1080.

Enable and start it:

systemctl enable --now privoxy

One thing that burned me: Privoxy defaults to listen-address 127.0.0.1:8118. Docker containers on bridge networks can’t reach the host’s loopback, so all your proxy requests fail with ECONNREFUSED. The 0.0.0.0 binding is what fixes it.

Verify it works:

curl -s --proxy http://localhost:8118 https://api.ipify.org
# should show your residential IP, not your server's IP

Privoxy uses about 6MB RAM. You won’t even notice it’s there.

How it all fits together

When the agent needs to look something up:

Search:

Agent calls web_search("proxmox zfs encryption")
    -> Firecrawl receives the query
    -> Firecrawl routes to SearXNG (queries Google, Bing, DuckDuckGo)
    -> Returns aggregated results via Firecrawl API
    -> Agent gets titles, URLs, and snippets

Extract:

Agent calls web_extract("https://example.com/article")
    -> Firecrawl receives the URL
    -> Playwright renders the JS-heavy page (through residential proxy)
    -> Returns clean markdown
    -> Agent parses the content

Browser:

Agent calls browser_navigate("https://protected-site.com/data")
    -> Camofox launches Firefox with randomized fingerprints
    -> Requests go through residential proxy
    -> Cloudflare JS challenge passes automatically
    -> Agent inspects the page with browser_snapshot

Every path goes through the residential IP. Every request looks like a real user on a home network.

Gotchas

The tunnel is a single point of failure

When the residential machine goes offline:

SSH tunnel down -> SOCKS5 :1080 dead
    -> Privoxy: CONNECTION REFUSED
    -> Camofox browser launch FAILS
    -> Firecrawl extracts FAIL
    -> All web access: broken

There’s no silent fallback to the datacenter IP. That’s on purpose: a failed request is safer than a request leaking from the wrong IP. The systemd service auto-reconnects within 15 seconds.

Resource requirements

Actual usage is modest:

Firecrawl + SearXNG: ~2.9G RAM (9.6G limit), 5 CPU cores
Camofox: ~435MB RAM
Privoxy: ~6MB RAM
SSH tunnel: negligible

Total: ~3.3G RAM. The limits are set high for headroom (Firecrawl’s API container can spike under heavy load), but day to day it sits well under 4G.

Residential IP stability

Home IPs can change. If your ISP assigns dynamic IPs, the SSH tunnel breaks when the IP changes. Fixes:

Dynamic DNS on the residential machine
Tailscale (what I use), the tunnel connects to a Tailscale IP, not the public IP
Static IP if your ISP offers one

Geo-specific results

Search results and page content will reflect the residential IP’s location. Mine’s in India, so Google returns India-specific results. Usually fine for technical content, but worth knowing.

Cost

Everything here is open source. The only thing you’re paying for is the residential IP, which is your existing home internet. Total cost: $0/month beyond what you already have.

Cloud alternatives run $600+/month for the same coverage: Firecrawl Cloud at $100+, Bright Data at $500+, ScrapingBee at $50-200.

What I’m still working on

Automatic failover: if SearXNG is down, fall back to DuckDuckGo; if Firecrawl times out, fall back to Jina Reader
Search result caching with Redis to reduce load
Multi-residential-IP rotation across different home networks

I Gave My AI Agent Unblockable Internet — Firecrawl, Camofox & a Raspberry Pi

Unblockable web access for your AI agent

The stack

1. SearXNG — Firecrawl’s search backend

Why not DuckDuckGo?

Deployment

Key configuration (`settings.yml`)

Pitfalls I hit

2. Firecrawl — self-hosted web extraction

Architecture

Deployment

The nuq-postgres problem

Proxy configuration

Hermes integration

Resource usage

3. Camofox — anti-detection browser

Why not Playwright or Puppeteer?

Deployment

Hermes integration

What it beats

4. The residential proxy chain

Architecture

SSH tunnel service

Privoxy (HTTP to SOCKS5 bridge)

How it all fits together

Gotchas

The tunnel is a single point of failure

Resource requirements

Residential IP stability

Geo-specific results

Cost

What I’m still working on

Related Posts

Homelab Intrusion Defense with CrowdSec: Engine in Docker, Bouncers on Proxmox and MikroTik

I Gave My AI Agent Unblockable Internet — Firecrawl, Camofox & a Raspberry Pi

Unblockable web access for your AI agent

The stack

1. SearXNG — Firecrawl’s search backend

Why not DuckDuckGo?

Deployment

Key configuration (settings.yml)

Pitfalls I hit

2. Firecrawl — self-hosted web extraction

Architecture

Deployment

The nuq-postgres problem

Proxy configuration

Hermes integration

Resource usage

3. Camofox — anti-detection browser

Why not Playwright or Puppeteer?

Deployment

Hermes integration

What it beats

4. The residential proxy chain

Architecture

SSH tunnel service

Privoxy (HTTP to SOCKS5 bridge)

How it all fits together

Gotchas

The tunnel is a single point of failure

Resource requirements

Residential IP stability

Geo-specific results

Cost

What I’m still working on

Related Posts

Homelab Intrusion Defense with CrowdSec: Engine in Docker, Bouncers on Proxmox and MikroTik

Key configuration (`settings.yml`)