AI Infrastructure · June 23, 2026 · 11 min read

I Gave My AI Agent Unblockable Internet — Firecrawl, Camofox & a Raspberry Pi

How I gave my AI agents unblockable web access using Firecrawl, Camofox, a residential proxy chain, and SearXNG — all self-hosted on a homelab.

Unblockable web access for your AI agent

An AI agent without internet is useless for anything time-sensitive. It can’t look things up, can’t verify claims, and confidently tells you outdated nonsense. I needed my agents to actually browse the web.

The problem: the web fights back. Cloudflare, Akamai, PerimeterX, Datadome — most major sites now block headless browsers, datacenter IPs, and anything that smells automated. Your agent’s web_extract call returns a 403 and now it’s guessing.

I run five Hermes Agent instances on my homelab. Over the past few months I’ve built a web access stack that handles every bot detector I’ve thrown at it.

The stack

Three tools, one proxy chain:

SearXNG isn’t a separate tool Hermes calls directly. Firecrawl owns both search and extraction, and SearXNG is its search backend.

All three paths share the same proxy chain:

  1. Privoxy (:8118) bridges HTTP to SOCKS5
  2. SSH SOCKS5 tunnel (:1080) connects to a residential network
  3. Raspberry Pi on a home connection, exit IP x.x.x.x

Every request leaves through a residential IP. Bot detectors see a real user on a home network.

1. SearXNG — Firecrawl’s search backend

SearXNG is a meta-search engine that queries Google, Bing, DuckDuckGo, Brave, and others without tracking you. In my setup it’s not standalone — it’s the engine behind Firecrawl’s search API. When Hermes calls web_search, Firecrawl routes the query to SearXNG.

Why not DuckDuckGo?

Hermes ships with DuckDuckGo as the default search backend. It works, but:

Firecrawl swaps DDG for SearXNG. You get multi-engine aggregation, no rate limits, and full control. The agent doesn’t know the difference.

Deployment

SearXNG runs inside the Firecrawl Docker Compose stack (section 2 below), not as its own service. The default Firecrawl compose doesn’t include SearXNG — you add these services to the same docker-compose.yaml:

services:
  searxng-core:
    image: searxng/searxng:latest
    restart: unless-stopped
    volumes:
      - ./settings.yml:/etc/searxng/settings.yml
      - ./limiter.toml:/etc/searxng/limiter.toml
    networks:
      - backend

  searxng:
    image: nginx:alpine
    restart: unless-stopped
    ports:
      - "8888:8080"
    volumes:
      - ./nginx.conf:/etc/nginx/conf.d/default.conf
    depends_on:
      - searxng-core
    networks:
      - backend

The nginx proxy in front is critical. More on that in the pitfalls.

Key configuration (settings.yml)

search:
  formats:
    - html
    - json    # <- without this, the API returns 403

server:
  secret_key: "your-random-secret-here"
  limiter: true

outgoing:
  proxies:
    all://:
      - socks5h://127.0.0.1:1080    # route through residential proxy

Pitfalls I hit

JSON format returns 403. SearXNG defaults to HTML-only. If you try the JSON API (/search?q=test&format=json) without adding json to search.formats, you get a 403 Forbidden.

Bot detection blocks API clients. Even with limiter: false, SearXNG blocks requests from curl, wget, and python-requests based on User-Agent strings. Fix: put nginx in front and rewrite the User-Agent:

server {
    listen 8080;
    location / {
        proxy_pass http://searxng-core:8080;
        proxy_set_header User-Agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36";
    }
}

Docker networks need trust. If you enable the rate limiter, SearXNG needs to know that requests from Docker bridge networks (172.16.0.0/12) aren’t spoofed. Add to limiter.toml:

[botdetection.ip_limit]
trusted_proxies = ["172.16.0.0/12", "192.168.0.0/16"]

SearXNG doesn’t need its own Hermes config — Firecrawl handles it. The Hermes integration is in the Firecrawl section below.

2. Firecrawl — self-hosted web extraction

Firecrawl converts web pages to clean markdown. It handles JavaScript rendering, anti-bot bypassing, and content extraction. The cloud version costs money. Self-hosting it is free and gives you unlimited scraping.

Architecture

Five Docker containers:

ContainerRolePort
firecrawl-api-1Main API + workers3002
firecrawl-playwright-service-1Headless browser for JS renderinginternal
firecrawl-redis-1Caching + rate limitinginternal
firecrawl-rabbitmq-1Job queueinternal
firecrawl-nuq-postgres-1Job persistence + pg_croninternal

Deployment

# docker-compose.yml (simplified)
services:
  api:
    image: ghcr.io/mendableai/firecrawl-api:latest
    ports:
      - "3002:3002"
    env_file: .env
    depends_on:
      - redis
      - rabbitmq
      - nuq-postgres
    restart: unless-stopped

  playwright-service:
    image: ghcr.io/mendableai/firecrawl-playwright-service:latest
    env_file: .env
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    restart: unless-stopped

  rabbitmq:
    image: rabbitmq:3-management-alpine
    restart: unless-stopped

  nuq-postgres:
    build: src/apps/nuq-postgres  # MUST build locally
    restart: unless-stopped

The nuq-postgres problem

Don’t use the GHCR pre-built image. It has a pg_cron config mismatch where cron.database_name doesn’t match the init script database. Crashes on startup with "can only create extension in database postgres".

Clone the Firecrawl repo so the src/apps/nuq-postgres directory is available for the compose build:

git clone --depth 1 https://github.com/mendableai/firecrawl.git /root/firecrawl/src
cd /root/firecrawl
docker compose build nuq-postgres

The compose file references build: src/apps/nuq-postgres relative to itself, so the repo needs to be cloned into the src/ subdirectory. The local build picks up the correct postgresql.conf.sample with cron.database_name = 'postgres'.

Proxy configuration

All Firecrawl traffic routes through the residential proxy. In .env:

# proxy for all outbound requests
HTTP_PROXY=http://172.17.0.1:8118
HTTPS_PROXY=http://172.17.0.1:8118

# job queue (mandatory)
NUQ_RABBITMQ_URL=amqp://rabbitmq:5672

# disable auth (self-hosted, no API key needed)
USE_DB_AUTHENTICATION=false

172.17.0.1 is the Docker bridge gateway. It routes to Privoxy on the host at :8118, which forwards through the SSH SOCKS5 tunnel to the residential IP.

Hermes integration

# ~/.hermes/.env
FIRECRAWL_API_URL=http://localhost:3002

# ~/.hermes/config.yaml
web:
  extract_backend: firecrawl
  search_backend: firecrawl

No API key needed. Self-hosted Firecrawl skips auth when USE_DB_AUTHENTICATION=false.

Resource usage

The limits look worse than reality. Actual numbers:

ContainerCPU limitRAM limitActual RAM
firecrawl-api3.05G~2.2G
playwright-service2.03G~192M
searxng (nginx)128M~8M
searxng-core512M~156M
redis256M~6M
rabbitmq512M~198M
nuq-postgres512M~104M
Total5.0~9.6G~2.9G

Limits total ~9.6G but in practice it sits around 3G. If you’re tight on resources, the cloud Firecrawl API (500 pages/month free tier) is probably a better starting point.

3. Camofox — anti-detection browser

Camofox is a Firefox fork with C++-level fingerprint spoofing. It’s not a headless browser pretending to be real. It is a real browser with randomized fingerprints that make each session look like a different person.

Why not Playwright or Puppeteer?

Headless Chromium has tells:

Camofox randomizes all of these. With a residential IP, it’s indistinguishable from a real user.

Deployment

docker run -d   --name camofox-browser   --network host   --restart unless-stopped   -e CAMOFOX_PORT=9377   -e PROXY_HOST=localhost   -e PROXY_PORT=8118   ghcr.io/jo-inc/camofox-browser:latest

A few things worth noting:

Hermes integration

# ~/.hermes/.env
CAMOFOX_URL=http://localhost:9377

With CAMOFOX_URL set, Hermes routes all browser_* calls through Camofox instead of the default agent-browser:

ToolRoutes throughIP used
browser_navigate, browser_snapshotCamofox -> Privoxy -> SOCKS5Residential
web_extractFirecrawl -> Privoxy -> SOCKS5Residential
web_searchFirecrawl -> SearXNG -> Privoxy -> SOCKS5Residential

What it beats

I’ve tested Camofox + residential IP against known aggressive bot detection:

SiteProtectionResult
ESPNcricinfoProprietaryFull page (was “Access Denied” without)
TicketmasterAkamai + Queue-ItFull page
NikeAkamaiFull page
InstacartDatadomeFull page
AmazonCustomFull page (was blocked on datacenter IP)
CloudflareJS challengePasses automatically

4. The residential proxy chain

This is the part everything else depends on. A residential IP makes bot detection mostly irrelevant. Datacenter IPs get flagged by default, but home IPs don’t.

You need a machine on a home network that stays online. A Raspberry Pi works (what I use), but so does an old laptop, a mini PC, or even a phone running Termux + Tailscale. The only requirement: it has a residential ISP connection and can hold an SSH tunnel open.

Architecture

the host VM (homelab)
    |
    v
SSH SOCKS5 Tunnel (:1080)
    | encrypted, auto-reconnect via systemd
    v
Residential Machine (Raspberry Pi, home network)
    |
    v
Internet (x.x.x.x — residential ISP)

SSH tunnel service

# /etc/systemd/system/socks5-residential.service
[Unit]
Description=SSH SOCKS5 Residential Proxy
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/ssh -D 1080 -N -f -p 6464 user@<TAILSCALE_IP>
Restart=on-failure
RestartSec=15

[Install]
WantedBy=multi-user.target

The residential machine is a Raspberry Pi on a home network, connected via Tailscale. SSH key auth only.

Privoxy (HTTP to SOCKS5 bridge)

Firecrawl and Camofox speak HTTP, not SOCKS5. Privoxy sits in between and forwards everything to the SSH tunnel.

Install it:

apt install privoxy

Edit /etc/privoxy/config, strip out everything except these two lines:

listen-address  0.0.0.0:8118
forward-socks5  /  127.0.0.1:1080  .

That’s the entire config. listen-address binds to all interfaces so Docker containers can reach it. forward-socks5 sends every request to the SSH tunnel on port 1080.

Enable and start it:

systemctl enable --now privoxy

One thing that burned me: Privoxy defaults to listen-address 127.0.0.1:8118. Docker containers on bridge networks can’t reach the host’s loopback, so all your proxy requests fail with ECONNREFUSED. The 0.0.0.0 binding is what fixes it.

Verify it works:

curl -s --proxy http://localhost:8118 https://api.ipify.org
# should show your residential IP, not your server's IP

Privoxy uses about 6MB RAM. You won’t even notice it’s there.

How it all fits together

When the agent needs to look something up:

Search:

Agent calls web_search("proxmox zfs encryption")
    -> Firecrawl receives the query
    -> Firecrawl routes to SearXNG (queries Google, Bing, DuckDuckGo)
    -> Returns aggregated results via Firecrawl API
    -> Agent gets titles, URLs, and snippets

Extract:

Agent calls web_extract("https://example.com/article")
    -> Firecrawl receives the URL
    -> Playwright renders the JS-heavy page (through residential proxy)
    -> Returns clean markdown
    -> Agent parses the content

Browser:

Agent calls browser_navigate("https://protected-site.com/data")
    -> Camofox launches Firefox with randomized fingerprints
    -> Requests go through residential proxy
    -> Cloudflare JS challenge passes automatically
    -> Agent inspects the page with browser_snapshot

Every path goes through the residential IP. Every request looks like a real user on a home network.

Gotchas

The tunnel is a single point of failure

When the residential machine goes offline:

SSH tunnel down -> SOCKS5 :1080 dead
    -> Privoxy: CONNECTION REFUSED
    -> Camofox browser launch FAILS
    -> Firecrawl extracts FAIL
    -> All web access: broken

There’s no silent fallback to the datacenter IP. That’s on purpose: a failed request is safer than a request leaking from the wrong IP. The systemd service auto-reconnects within 15 seconds.

Resource requirements

Actual usage is modest:

Total: ~3.3G RAM. The limits are set high for headroom (Firecrawl’s API container can spike under heavy load), but day to day it sits well under 4G.

Residential IP stability

Home IPs can change. If your ISP assigns dynamic IPs, the SSH tunnel breaks when the IP changes. Fixes:

Geo-specific results

Search results and page content will reflect the residential IP’s location. Mine’s in India, so Google returns India-specific results. Usually fine for technical content, but worth knowing.

Cost

Everything here is open source. The only thing you’re paying for is the residential IP, which is your existing home internet. Total cost: $0/month beyond what you already have.

Cloud alternatives run $600+/month for the same coverage: Firecrawl Cloud at $100+, Bright Data at $500+, ScrapingBee at $50-200.

What I’m still working on

Related Posts

← All posts
Category: AI Infrastructure