Compare commits

...

42 Commits

Author SHA1 Message Date
381eb3b8f5 docs: WAF engine migration feasibility analysis (Coraza+CRS via HAProxy SPOA) (ref #662)
Some checks are pending
License Headers / check (push) Waiting to run
2026-06-18 22:31:12 +02:00
f9affe1e8b docs: record #662 Phase 7 — Python R3 decommissioned + nft persistence (epic complete) 2026-06-18 22:19:59 +02:00
eea4632642 fix(toolbox): persist R3 fanout to the Go engine ports 809x (was 808x Python) (ref #662)
The nft drop-in is what nftables.service re-applies at boot; pointing it at
the Go workers makes the #662 cutover survive a reboot. Rollback = 809x→808x.
Live /etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft already updated + dry-run
validated (nft -c -f exit 0).
2026-06-18 22:08:37 +02:00
CyberMind
c7d354a153
Merge pull request #672 from CyberMind-FR/fix/662-no-follow-redirect
fix(#662): relay upstream redirects instead of following them
2026-06-18 22:02:09 +02:00
8e009e0aa6 fix(toolbox-ng): relay upstream 3xx instead of following them (ref #662) 2026-06-18 22:01:08 +02:00
CyberMind
e0cd433485
Merge pull request #671 from CyberMind-FR/fix/662-gzip-banner
fix(#662): inject banner into compressed HTML (gzip decode/re-encode)
2026-06-18 19:38:41 +02:00
8ffe54ee0d chore: changelog 0.1.3 — gzip banner inject (ref #662) 2026-06-18 19:37:43 +02:00
449b28f8a1 fix(toolbox-ng): inject banner into gzip HTML, not just identity (ref #662)
The Go MITM engine's transparency banner only appeared on UNCOMPRESSED
HTML. Browsers send `Accept-Encoding: gzip, br`, so most pages came back
gzip/brotli-compressed; the engine passed the compressed body straight
through and injectLoader (which scans for <head>/<body>) silently no-oped
on the binary blob. Proven on-board: identity HTML → banner present;
gzip HTML → banner absent.

Two-part fix, stdlib-only (compress/gzip; brotli/zstd are not in the
stdlib, which is why we constrain the wire to gzip):

1. mitmPipeline now pins the upstream request to `Accept-Encoding: gzip`
   (Set, not Del — Del would make Go's Transport auto-decompress and lose
   wire compression to the client for ALL resources). This guarantees
   every response is gzip or identity. Applies to both CONNECT and
   transparent paths (shared pipeline).

2. New gzip.go inject helper: in the existing 2xx + text/html gate,
   injectIntoBody gunzips → injectLoader → re-gzips when Content-Encoding
   is gzip (keeping the client transfer compressed), injects directly on
   identity, and fails open (original bytes untouched) on corrupt/unknown
   encoding or a decompression bomb (32MiB inflate cap). Content-Length /
   resp.ContentLength are updated to match the served bytes so the grown
   body is not truncated.

Non-HTML / non-2xx responses still pass through byte-for-byte (possibly
still gzip). Poison Set-Cookie + anonymize unchanged. Idempotency guard
stays inside injectLoader.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 19:37:03 +02:00
4ef6d3aa76 docs: record #662 R3 cutover to Go engine + banner port (PR #670) 2026-06-18 19:20:21 +02:00
af76e33b45 docs: record #662 P5-prep + P6-prep (PRs #668, #669) in HISTORY 2026-06-18 19:19:38 +02:00
CyberMind
8df8f4d181
Merge pull request #670 from CyberMind-FR/feat/662-cutover-fix
feat(#662): R3 cutover to the Go MITM engine — unit fix, R3-CA loadCA, banner port
2026-06-18 19:19:23 +02:00
70d35eb7f2 feat(toolbox-ng): port real banner inject + /__toolbox portal reverse-proxy (ref #662) 2026-06-18 19:16:27 +02:00
73795bb3c3 feat(toolbox-ng): port transparency-banner loader inject + /__toolbox/* portal proxy (ref #662)
The Go MITM engine now injects the REAL visible transparency-banner loader
(replacing the invisible `<!-- sbx-ng banner -->` marker regression), mirroring
the authoritative Python inject_banner.py with stream_inject ON.

- banner.go: injectLoader() builds the guarded loader <script src="/__toolbox/
  loader.js" data-mh=.. data-wg=.. async> exactly like Python _loader_script;
  placement mirrors _LoaderInjector (after <head>'s '>', else before <body>,
  else unchanged); bannerGuard idempotency matches _GUARD; data-mh ascii-stripped.
- /__toolbox/loader.js + /__toolbox/bundle short-circuited in BOTH the CONNECT
  mitmPipeline and the transparent path, reverse-proxied to the portal
  (--portal, default http://127.0.0.1:8088). Startswith match (query-aware),
  fail-open to 204 so a banner asset never 502s the navigation.
- mitmPipeline threads `wg bool`: transparent path derives it from the
  10.99.1.0/24 peer IP (R3 WG), CONNECT passes false. Injection tightened to
  2xx text/html (Python skips non-200). injectMarker/Policy.Inject kept for the
  existing PoC tests.
- banner_test.go: guard idempotency, <head>/<body>/neither placement, wg + mh
  attributes, non-ascii stripping, path-detection + portal URL construction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 19:12:28 +02:00
03fdc8fe14 fix(toolbox-ng): cutover-ready worker unit — live R3 CA, transparent 10.99.1.1 bind, combined-PEM loadCA (ref #662) 2026-06-18 18:54:41 +02:00
CyberMind
223f81ac63
Merge pull request #669 from CyberMind-FR/feat/662-transparent-machash
feat(#662 Phase 6-prep): transparent SO_ORIGINAL_DST accept + mac_hash persona (DARK)
2026-06-18 18:41:17 +02:00
bf022f618f fix(toolbox-ng): transparent upstream must verify cert against SNI, not bare IP (ref #662)
The transparent mitm/allow path set req.URL.Host = ip:port, so http.Client
TLS-dialed the captured original-dst and verified the cert against the bare IP
→ guaranteed SNI/cert-name mismatch. Add a per-request transparentTransport
whose DialContext pins the TCP dial to the captured ip:port for every
connection while TLSClientConfig.ServerName = the SNI host, so the upstream is
reached at the real IP yet the cert is verified by hostname. req.URL.Host now
carries the SNI host (correct Host header + SNI); verification stays ON (no
InsecureSkipVerify). The CONNECT path is unchanged (dialHost == "" → it still
dials by req.URL.Host exactly as before).
2026-06-18 18:37:35 +02:00
9df984c73f fix(toolbox-ng): transparent splice must not decrypt — peek+replay ClientHello (ref #662)
handleTransparent previously forged a cert and terminated TLS BEFORE Decide, so
a splice host was already MITM'd and the splice branch then io.Copy'd decrypted
plaintext into a cleartext dial — a broken relay of a host policy says to pass
through untouched (cert-pinned apps, own media infra).

Now: peek the ClientHello off the raw conn without consuming it (recordingReader
tees the bytes), parse SNI with a new pure stdlib sniFromClientHello (fully
bounds-checked, never panics), and Decide on the peeked SNI with NO decryption.
splice → dial the ORIGINAL dst, replay the buffered ClientHello upstream, pipe
raw TCP both ways, NEVER tls.Server. allow/mitm/block → re-present the buffered
ClientHello to tls.Server via a prefixConn (Read drains the prefix then
delegates) and run the shared pipeline as before.

Adds table tests for sniFromClientHello (hand-assembled ClientHello with/without
SNI, non-handshake, truncated, not-ClientHello → ("",false)), a no-panic
truncation sweep, and prefixConn replay tests.
2026-06-18 18:37:13 +02:00
5acfdb17c6 fix(toolbox-ng): non-linux build regression in transparent dispatch (ref #662)
main.go (untagged) referenced px.handleTransparent + the transparent accept
loop unconditionally, so the linux-only transparent.go made `GOOS=darwin
go build ./...` fail. Move the accept loop into a linux-tagged runTransparent
helper and add a non-linux transparent_stub.go that log.Fatals; main.go now
calls runTransparent only when --transparent. Verified GOOS=linux/arm64,
linux/amd64 and darwin all build.
2026-06-18 18:36:46 +02:00
364b8c4a30 feat(toolbox-ng): transparent SO_ORIGINAL_DST accept path (build only, DARK) (ref #662)
Add cmd/sbxmitm/transparent.go (//go:build linux): parseOrigDst decodes a raw
sockaddr_in/sockaddr_in6 blob (endianness-robust family, big-endian port) into
host:port — PURE, fully unit-tested. origDst recovers the pre-DNAT destination
via getsockopt(SO_ORIGINAL_DST=80) using syscall.Syscall6 on the raw fd
(stdlib-only). handleTransparent recovers origDst, terminates TLS by SNI,
splices raw TCP to the REAL captured dst or runs mitmPipeline dialling it.

transparent_test.go table-tests parseOrigDst (v4/v6, both family endiannesses,
BE port, short-blob errors). End-to-end getsockopt capture needs nft DNAT and
is validated at Phase 5 shadow on the board, not in unit tests (documented).
2026-06-18 18:24:57 +02:00
ba933a6ec3 refactor(toolbox-ng): extract shared post-TLS MITM pipeline + add --transparent flag (ref #662)
Factor handleConnect's post-handshake logic (read request, apply verdict,
anonymize, proxy upstream, poison, inject, write) into mitmPipeline so the
CONNECT and transparent accept paths can't drift. dialHost param lets the
transparent path dial the captured original-dst instead of the SNI. Add a
--transparent bool flag: when set, a raw net.Listen accept loop dispatches each
conn to handleTransparent; default keeps the CONNECT http.Server EXACTLY.
CONNECT path + its tests unchanged.
2026-06-18 18:24:57 +02:00
67e85ba4dd feat(toolbox-ng): wire mac_hash into clientHashFromConn + Python parity (ref #662)
clientHashFromConn now resolves the peer IP via macHashOf (WG persona hash,
byte-identical to Python for 10.99.1.0/24), falling back to the raw peer IP for
non-WG/test conns so poison stays deterministic. Updated the TODO block: WG
mac_hash wiring DONE; remaining gap is only the transparent original-dst
plumbing (Deliverable 2) and the intentionally-out-of-scope R0-R2 ARP path.

test_machash_parity.py drives _common.mac_hash_of on the SAME fixtures; both
engines agree. Anti-rig verified on the Python side too.
2026-06-18 18:21:45 +02:00
5fb67f5b88 feat(toolbox-ng): port WG persona mac_hash to Go with cross-engine parity (ref #662)
Port _common._wg_hash_of / mac_hash_of to cmd/sbxmitm/machash.go: WG peers on
10.99.1.0/24 resolve to sha256(peer_pubkey)[:16], mtime-cached behind a mutex
(Go is concurrent; Python relied on the GIL). Off-subnet / R0-R2 ARP path is
out of scope for the R3 transparent engine; any error fails open to "".

Parity fixtures (testdata/wg-peers-fixture.json + machash-fixtures.json) carry
Python-authored expected values; machash_test.go asserts macHashOf matches.
Anti-rig verified: [:16]->[:15] fails the test.
2026-06-18 18:21:39 +02:00
CyberMind
c870b6362b
Merge pull request #668 from CyberMind-FR/feat/662-phase5prep-pkg
feat(#662 Phase 5-prep): wire Decide+jar+anonymize+poison into handlers + DARK debian package
2026-06-18 18:06:31 +02:00
de15a18c30 feat(#662 Phase 5-prep B): debian packaging for sbxmitm (DISABLED, dark)
New packages/secubox-toolbox-ng/debian/ producing the secubox-toolbox-ng
binary package (Architecture: arm64):
  - control: Maintainer Gerald KERMA; B-D golang-go; Depends
    only (static CGO_ENABLED=0 binary → no shlib deps). Compat 13 via
    debhelper-compat build-dep (debhelper rejects compat both ways).
  - changelog: 0.1.0-1~bookworm1.
  - rules: dh; GOOS=linux GOARCH=arm64 CGO_ENABLED=0 GOPROXY=off go build
    (pure stdlib, offline). dh_installsystemd --no-enable --no-start so the
    unit is shipped but NEVER enabled/started.
  - secubox-toolbox-ng-worker@.service: systemd template mirroring the Python
    mitm-wg worker@ but running sbxmitm on 127.0.0.1:809%i (distinct from the
    Python 808%i so both fleets coexist during cutover). Reads the ca-wg CA.
    DISABLED BY DESIGN — header documents Phase-6-cutover-only enablement.
  - postinst: daemon-reload only; explicitly NO enable/start; NO nft.

Built locally for arm64: dpkg-deb verified — ships /usr/sbin/sbxmitm (arm64
static ELF) + the disabled template; postinst contains ZERO deb-systemd-helper
enable lines. .gitignore extended for in-tree build artifacts. DARK: install
changes no runtime behaviour (no service start, no DNAT, no live-R3 wiring).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 18:03:40 +02:00
f65af3355c feat(#662 Phase 5-prep A): wire ported policy+jar into proxy handlers
Replace the hardcoded action() short-circuit in handleConnect with the
ported Decide(host,sni) + always-on anonymize + Set-Cookie poison:

  - allow/own-infra  → clean MITM (anonymize only, NO block/poison)
  - splice           → raw passthrough (unchanged)
  - block            → 204 (unchanged)
  - mitm + tracker   → poison tracking-id Set-Cookies via the HMAC jar

New pure, unit-testable helpers (privacy.go):
  - anonymizeRequest(http.Header): drop operator/carrier + re-id headers
    (mirrors privacy_guard._STRIP), pin DNT:1 + Sec-GPC:1.
  - isTrackingCookieName / poisonSetCookies: replace tracking-id cookie
    values with fakeID(clientHash,host,name,jarKey); attrs preserved,
    benign cookies untouched, fail-closed-to-clean when no key/clientHash.
  - Policy.isTracker / Policy.shouldPoison: poison ONLY on MITM'd tracker
    flows, never on allow/own-infra (same dark safety as the block path).
  - clientHashFromConn: PoC peer-IP stub, TODO(#662 P6) mac_hash via
    SO_ORIGINAL_DST + WG-peer map.

writeResponse (util.go) preserves multi-valued Set-Cookie headers.
Poison gated behind --poison (default on) AND a loaded --jar-key.
DARK: nothing wired to live R3. +8 tests (14→22), all green; vet clean;
arm64 cross-build OK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 18:00:26 +02:00
CyberMind
7355e606ca
Merge pull request #667 from CyberMind-FR/feat/662-phase4-jar
feat(#662 Phase 4): anti-track HMAC jar port (byte-exact) + sidecar emit
2026-06-18 17:52:12 +02:00
e594f681a4 doc(#662 Phase 4): fix stale fakeID comment — clarify it MUST use registrableJar (anti-consolidation footgun guard) 2026-06-18 17:52:03 +02:00
0db96a8beb fix(#662 Phase 4): jar uses privacy-flavored registrableJar (not ad_ghost) — byte-parity on gov.uk/IP trackers; + divergence-guard fixtures 2026-06-18 17:49:05 +02:00
667d8a09e0 feat(#662 Phase 4): sidecar emit helper (fire-and-forget unix-socket POST)
Add sidecar.go (package main, stdlib only): emit(socketPath, route, payload)
relays a signal to a SecuBox module's unix socket in a detached goroutine —
never blocks the proxy flow, never raises into the caller (mirrors
_common.fire_forget_post + queue_async). emitSync is the same-package,
test-observable synchronous form under a 2s timeout (mirrors httpx timeout=2).

Documents the addon→socket mapping the live engine will use
(cookies/dpi/avatar/ja4/soc_relay → /run/secubox/*.sock; social_graph is
in-process). NOT wired into the live path — transport only (Phase 5+ wiring).

sidecar_test.go: delivery over a throwaway unix socket, dead-socket
no-panic/no-block, empty-route defaulting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 17:45:52 +02:00
170619053f feat(#662 Phase 4): port anti-track HMAC fake-identity jar to Go, byte-exact
Port privacy.py's _jar_key / _shape / fake_id into jar.go (package main,
stdlib only): loadJarKey (read+TrimSpace, empty->nil), shape (GA1/fb uint64
big-endian modulo math, uuid via rune-length >=32 branch, hex[:32] default),
fakeID (HMAC-SHA256 over client|registrable(tracker)|cookie, reuses the
Phase-3 registrable()). Returns ("",false) where Python returns None.

Cross-engine parity proven: testdata/jar-fixtures.json (expect values
GENERATED by privacy.fake_id with a FIXED test key, not the real /etc key)
covers _ga, _ga_<prop> GA4, _fbp, uuid, _pk_id, name>=32, generic hex, and a
doubleclick.net subdomain-folding case. jar_test.go (Go) and
tests/test_jar_parity.py (Python) load the SAME fixtures+key and both pass ->
byte-exact. No int-math or rune-length divergence found.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 17:44:14 +02:00
CyberMind
25f6c19586
Merge pull request #666 from CyberMind-FR/feat/662-phase3-parity
feat(#662 Phase 3): Go block/splice decision engine + cross-engine parity harness
2026-06-18 17:38:07 +02:00
6dcf978e66 test(#662 Phase 3): harden parity harness — own-infra false-prefix + comment-kept learned-trackers fixtures (lock the loadLinesRaw vs loadLines divergence) 2026-06-18 17:37:45 +02:00
df052796d9 test(#662 Phase 3): cross-engine parity harness — Python side (source of truth)
tests/test_engine_parity.py loads the SAME parity-fixtures.json + testdata
config from ../secubox-toolbox-ng/testdata, monkeypatches ad_ghost paths
(_ALLOW_PATH/_LEARNED_PATH/_SELF_REGS + cache resets) at the snapshot, and
drives the production decision logic — ad_ghost._allowed + _AD_HOST + the
learned-trackers check composed with splice.should_splice — under the SAME
precedence as Go's Decide. Asserts action == expect for every fixture.

Parity proven: this run caught a real Go↔Python divergence — Go had
comment-stripped learned-trackers, but ad_ghost._learned_set does not; Go was
fixed (loadLinesRaw) to match Python. Python is the source of truth.

test_fixtures_present guards that all four action classes are exercised.
2026-06-18 17:32:38 +02:00
5fc8785d68 test(#662 Phase 3): cross-engine parity harness — Go side + fixtures
testdata/parity-fixtures.json + testdata/config/ snapshot: a FIXED config both
engines load identically. Fixtures cover every action class — static ad host,
learned-tracker, pure-tracker that is also a splice candidate (never wins →
block), own-infra secubox.in subdomain (allow), allowlisted host (allow),
splice-seed + splice-learned hosts (splice), fortknox site in never-set (mitm),
no-false-suffix negative (notdoubleclick.net → mitm), plain site (mitm).

policy_test.go: TestParityDecide loads the fixtures+config and asserts
Decide == expect; TestPolicyActionVerbs checks the legacy action() surface;
TestRegistrable exercises the _registrable port incl. 2-level TLDs.
2026-06-18 17:32:31 +02:00
25a3afaff1 feat(#662 Phase 3): port toolbox BLOCK/SPLICE logic into Go core
Add cmd/sbxmitm/policy.go: a LoadPolicy() layer that reads the SAME on-disk
config the Python addons use (ad-allowlist.txt, learned-trackers.txt,
tls-splice-seed.conf, splice-learned.txt, pure-trackers.txt) with the same
env overrides, plus a unified Decide(host, sni) -> {allow,block,splice,mitm}.

Ports, byte-for-byte against the Python source of truth:
  - _AD_HOST regex (RE2-safe → Go (?i) inline flag, no fallback needed)
  - _registrable incl. the _2L two-level-TLD list
  - splice.host_matches / should_splice (never wins; then seed∪learned)
  - ad_ghost._allowed (own-infra + allowlist ALWAYS win first)

Loader nuance preserved: ad_ghost._learned_set does NOT comment-strip
(machine-generated file), unlike the splice/allowlist loaders — mirrored via
loadLinesRaw vs loadLines so a '#' in learned-trackers is kept verbatim.

Decide precedence: allow > splice (never-set excludes trackers) > block > mitm.

Wire the loaded policy into the PoC CONNECT proxy (replacing the hardcoded
AdHosts/SpliceHosts); action() keeps the legacy 3-verb surface (allow→mitm).
Old TestActionDecision removed (drove the removed hardcoded fields); coverage
moves to the parity harness.
2026-06-18 17:32:20 +02:00
CyberMind
84f0a37fdf
Merge pull request #665 from CyberMind-FR/feat/662-phase2b-bench
feat(#662 Phase 2b): multi-core throughput bench (3.4x at 4 cores)
2026-06-18 17:23:35 +02:00
ca9b38b175 feat(#662 Phase 2b): parallel handshake bench — Go core scales 3.4x at 4 cores (multi-core gate settled) 2026-06-18 17:23:23 +02:00
8a4996d14c docs(#662): Phase 2 bench results — Go PoC proven on arm64 (CA-compat/204/inject/JA4/12MB); throughput gate deferred to controlled bench 2026-06-18 17:19:01 +02:00
CyberMind
da71515d79
Merge pull request #664 from CyberMind-FR/fix/662-restore-ng-source
fix(#662): restore Go PoC source lost to .gitignore
2026-06-18 17:13:42 +02:00
73e79b85b4 fix(#662): restore Go PoC source — .gitignore 'sbxmitm' wrongly ignored cmd/sbxmitm/ dir (anchored to /sbxmitm) 2026-06-18 17:13:28 +02:00
CyberMind
56d1bee9fb
Merge pull request #663 from CyberMind-FR/feature/662-epic-migrate-toolbox-mitm-engine-off-pyt
epic: migrate toolbox MITM engine off Python mitmproxy (gomitmproxy/hudsucker/Squid analysis + phased switch)
2026-06-18 17:09:26 +02:00
6daacb1987 feat(#662 Phase 1): MITM-engine migration analysis + phased plan + compiled/tested Go forging-MITM PoC
Analysis: gomitmproxy (unmaintained, dropped) vs martian/goproxy (Go) vs hudsucker
(Rust) vs Squid+ICAP, mapped to the 18-addon capability set. Recommendation: Go
hot-path core + retained Python analysis sidecars. Phased plan with shadow-run +
nft-DNAT-flip rollback (no big-bang cutover). Phase-1 PoC (packages/secubox-toolbox-ng,
stdlib-only): forge from ca-wg CA, 204-block, body-inject, SNI-splice, ClientHello/JA4
capture — go vet clean, tests green, arm64 cross-compile OK. NOT wired to live R3.
2026-06-18 17:07:48 +02:00
50 changed files with 5050 additions and 9 deletions

View File

@ -3,6 +3,70 @@
---
## 2026-06-18 — #662 Phase 7: Python R3 engine DECOMMISSIONED + nft persistence
- **nft persistence** (master `eea46326`): the boot re-apply source is the drop-in
`/etc/nftables.d/zz-secubox-toolbox-wg-fanout.nft` (loaded by nftables.service). Edited
it `808x→809x` (live already 809x → zero disruption), `nft -c -f` validated reboot-safe;
patched the repo source `packages/secubox-toolbox/nftables.d/secubox-toolbox-wg-fanout.nft`.
- **Python decommissioned**: `disable --now secubox-toolbox-mitm-wg-worker@{1..4}` +
`-mitm-wg-dynreload.path` → 8081-8084 free, **~240M RAM freed**. Units kept (disabled)
for emergency rollback. **Kept** `secubox-toolbox-mitm.service` (R2 captive-AP mitm on
10.99.0.1:8080 — a different path; the cutover was R3-only). Also pointed the board's
`/usr/share/.../secubox-toolbox-wg-fanout.nft` → 809x so a postinst re-run can't revert
to dead ports.
- **Verified self-sufficient with Python gone**: banner injects on gzip HTML, ads 204,
redirects relayed 301.
- Deliberately did NOT rebuild+reinstall the secubox-toolbox .deb (portal-restart blip +
board-wide nft reload, gratuitous) — repo source is 809x, the next natural build closes
the installed-payload drift. **#662 epic complete: Go engine sole R3 MITM, fast, ~64MB
vs ~280-470MB, persistent, ad-block + banner + redirects all correct.**
## 2026-06-18 — #662 R3 CUTOVER to the Go MITM engine (PR #670) — LIVE + banner ported
- **Cutover executed and live.** The Go engine now serves **100% of R3 traffic**,
replacing the Python mitmproxy workers. Found + fixed 4 blockers that made the dark
package unable to serve the live path: (1) it forged with the wrong CA (ca-wg "WG CA"
vs the "R3 CA" clients trust) → now uses the mitmproxy confdir bundle; (2) root-only
key vs non-root user → R3 CA bundle is group-readable; (3) bound 127.0.0.1 vs the
10.99.1.1 DNAT target → now binds 10.99.1.1; (4) ran CONNECT vs transparent → now
`--transparent`. `loadCA` scans PEM blocks by type (combined cert+key bundle).
- **Validated on real arm64 hardware** then rolled out gated: localhost forge against
the real R3 CA → scoped-DNAT transparent capture → **canary slot 3 (~25%, dead-man
armed)** → **widen to 100%**. At 100%: 0 restarts, 0 errors, ~64MB total
(vs Python ~280-470MB), even round-robin, 142 distinct SNIs/75s.
- **Banner ported** (the one regression the user caught — "no more banner but fast").
Go now injects the real loader `<script src="/__toolbox/loader.js" data-mh=.. data-wg=..>`
(guard-idempotent, R3 wg flag, mac_hash identity) and reverse-proxies
`/__toolbox/loader.js`+`/__toolbox/bundle` to the portal (127.0.0.1:8088, fail-open),
keeping bundle/level logic in Python. Verified live: loader injected + assets 200.
- **Rollback** = one `nft replace` (Python workers kept warm). **Persistence gap**: the
nft flip is a live edit, not yet in the drift-managed generator → reboot safely falls
back to Python (workers enabled, banner intact). Phase 7 (decommission Python +
persist nft) deferred to a soak'd follow-up.
## 2026-06-18 — #662 MITM engine migration: P5-prep + P6-prep (PRs #668, #669, all DARK)
- **P5-prep (PR #668).** Wired the ported `Decide`+jar into the Go engine's request/
response handlers: `handleConnect` runs allow/splice/block/mitm; `anonymizeRequest`
(strip operator/re-id headers + DNT/GPC) on every MITM'd flow; cookie-poison gated
to mitm+tracker only (never allow/own-infra; fail-closed-to-clean; benign cookies +
Set-Cookie attrs preserved). New `secubox-toolbox-ng` debian pkg builds an arm64
`.deb` shipping `/usr/sbin/sbxmitm` + a **DISABLED** `worker@.service` on `:809%i`
(no enable/start, no nft). 22 Go tests, reviewed APPROVED.
- **P6-prep (PR #669).** No-traffic build-out of the live transparent path, still DARK.
`machash.go` ports `mac_hash_of`/`_wg_hash_of` (WG peers → `sha256(pubkey)[:16]`,
mtime-cached, fail-open) wired into `clientHashFromConn`, cross-engine parity vs
Python (anti-rig verified). Transparent `SO_ORIGINAL_DST` accept (`--transparent`,
default off): peeks ClientHello SNI WITHOUT decrypting → Decide → **splice = true raw
passthrough** (never `tls.Server`) / else forge via replayable `prefixConn`; upstream
TLS verifies by SNI, pins captured ip:port. Two-stage review caught + fixed a
splice-decrypt defect. Builds linux/arm64+amd64+darwin, vet clean, race green, Python
parity 10 passed. CONNECT path + poison gate byte-unchanged.
- **Engine now functionally complete + packaged, entirely DARK.** Remaining work =
the production DEPLOYMENT phases (shadow → cutover → decommission), which touch live
R3 traffic and are deferred to a deliberate watched session — NOT chained off "go".
## 2026-06-18 — #656 Ad Intelligence (PR #657, toolbox 2.6.56) + splice reverted
- **Ad Intelligence — learn/act/measure.** `ad_ghost` now records every

View File

@ -0,0 +1,55 @@
# Toolbox MITM engine migration — phased plan (#662)
> Engine: **Go hot-path core + retained Python analysis sidecars** (see analysis doc).
> Discipline: shadow-run before cutover; nft-DNAT flip = instant rollback at every step; NEVER big-bang. This is a multi-PR epic — each phase is its own PR with a gate.
## Invariants (must hold every phase)
- Reuse the existing CA `/etc/secubox/toolbox/ca-wg/{ca.pem,key.pem}` (what R3 clients already trust) — no new CA, no client re-enroll.
- Live R3 keeps running on the Python mitmproxy workers (8081-8084) until the final cutover. The Go core runs on **separate ports (8090-8093)**, no DNAT, until Phase 6.
- Ad-blocking + anti-track must never regress (the whole point of the appliance).
- arm64; one static Go binary; systemd `secubox-toolbox-ng-worker@N`.
## Phase 1 — PoC (THIS PR) — GATE: compiles + smoke test passes
**packages/secubox-toolbox-ng/** (Go module). NOT wired to live R3.
- `go.mod`, `cmd/sbxmitm/main.go`: a forging MITM that loads `ca-wg/{ca.pem,key.pem}`, listens on a port, and demonstrates the discriminating capabilities:
- request short-circuit **204** for a sample ad host (proves ad_ghost block),
- response **body inject** of a marker (proves banner/ad CSS),
- **SNI splice** passthrough for a sample host (proves tls_splice),
- **JA4 ClientHello capture** via a `crypto/tls` shim logging cipher suites/exts (proves the Go JA4 gap is closable).
- Smoke test (`make test` / a shell script): build for host, run, `curl -x`/transparent a request through it, assert the 204 + the injected marker + a JA4 line.
- `README.md`: build (`GOOS=linux GOARCH=arm64 go build`), the capability map, and the phase roadmap.
- **No deb packaging, no board deploy, no DNAT.** Pure de-risking spike.
## Phase 2 — arm64 build + board bench (no traffic) — GATE: forge+throughput ≥ mitmproxy
- CI/build: cross-compile arm64 static binary; debian packaging stub `secubox-toolbox-ng` (binary + systemd unit, unit DISABLED).
- Deploy the binary to gk2, run on :8090 (no DNAT). Bench: cert-forge latency (cold/warm), req/s, multi-core CPU under synthetic load vs a mitmproxy worker. Confirm it reuses ca-wg certs (client trusts forged leaf).
## Phase 3 — hot-path feature parity — GATE: parity tests green
Port the cheap per-request rewrites into the Go core, reading the SAME data files:
- block 204 from `_AD_HOST`-equivalent + learned-trackers.txt + pure-trackers.txt, with `ad-allowlist.txt` + own-infra guard (#658) honored.
- header/cookie strip (utiq/protective/anonymize), XFF.
- serve `/__toolbox/loader.js` + `/__toolbox/bundle`; banner inject (buffer + streaming).
- SNI splice from the media seed + learned-splice (the safe, no-auto-promote version).
- Parity harness: feed recorded request/response fixtures to both engines, diff the block/inject/strip decisions.
## Phase 4 — analysis sidecars + anti-track poison — GATE: sidecar contract tests
- Go core fires unix-socket events (fire-and-forget) to the EXISTING Python services for social-graph / dpi / cookies / avatar / soc / ja4-scoring — reuse their socket contracts; they stay Python, off the hot path.
- Port the deterministic anti-track **HMAC jar + Set-Cookie forge** to Go (small, security-critical → exhaustive tests vs the Python `privacy.py` jar output for identical inputs).
- Contextual ad metrics (ad_block_stats / per-visitor) written by a sidecar or the Go core's bg writer.
## Phase 5 — SHADOW run — GATE: N-day output parity, zero client breakage
- Run the Go core on :8090-8093. Mirror a SMALL fraction of R3 (e.g. one fanout slot, or a passive tee) to it; compare its would-block/would-inject/recorded against the live mitmproxy for the same flows. Do NOT serve clients from it yet.
- Soak; review divergences; fix; repeat until parity.
## Phase 6 — CUTOVER — GATE: soak, instant rollback ready
- Flip the nft `numgen inc mod 4` fanout from 8081-8084 (mitmproxy) → 8090-8093 (Go core). Keep the mitmproxy workers RUNNING (stopped from receiving DNAT, but up) so rollback = flip the map back (seconds).
- Soak under real load; watch ad-blocking, banner, anti-track, JA4, latency, CPU.
## Phase 7 — decommission — GATE: stable post-cutover window
- Stop/disable the mitmproxy workers; keep the package installed (rollback) for one release, then remove.
## Rollback
At every phase the live path is the mitmproxy workers until Phase 6's DNAT flip; Phase 6 rollback is an nft map edit (seconds). No phase removes the fallback until Phase 7.
## Effort/risk (honest)
Weeks across 7 PRs. Highest-risk areas: JA4-in-Go (de-risked in Phase 1), the anti-track poison port (Phase 4, exhaustively tested), and the cutover (Phase 6, shadow-gated + instant rollback). Recommend pausing after each gate for review.

View File

@ -0,0 +1,120 @@
# Toolbox MITM engine migration — analysis (gomitmproxy / martian·goproxy / hudsucker / Squid+ICAP)
- **Date:** 2026-06-18 · **Issue:** #662 · **Status:** analysis + recommendation
## Why
The R3 path runs Python **mitmproxy**: GIL-bound, ~1 core total across 4 workers,
the tunnel's CPU/latency ceiling (#646). Goal: a multi-core engine **without
losing the 18-addon feature set**. TLS termination was never the bottleneck —
the single-thread L7 work is — so a bare TLS proxy is a non-starter (loses every
feature). The only worthwhile target is a faster **L7 engine** that re-implements
the inline logic.
## The real requirement: our 18 addons' capabilities
| # | Addon | Capability it needs |
|---|-------|---------------------|
| 1 | inject_xff | requestheaders: set XFF from real peer IP |
| 2 | utiq_defense | requestheaders: detect/strip operator (Utiq) headers; short-circuit |
| 3 | protective_mode | requestheaders: strip tracker headers/cookies, spoof |
| 4 | privacy_guard (anti-track v2) | **request 204 / forge Set-Cookie (HMAC jar) / strip headers**; classify; file+key reads |
| 5 | ad_ghost | request **204** + candidate/per-visitor capture; response **CSS body inject**; allowlist; bg SQLite |
| 6 | media_cache | response synthesis from disk cache (range) |
| 7 | local_store | **tls_clienthello** read + async SQLite |
| 8 | social_graph | response cookie-id correlation + **body peek** + SQLite |
| 9 | inject_banner | request short-circuit **serve** /__toolbox/*; **streaming** body inject + buffered inject; CSP detect |
| 10 | dpi | async fire-and-forget POST (unix socket) |
| 11 | cookies | response Set-Cookie read → async POST |
| 12 | avatar | UA → async POST |
| 13 | ja4 | **raw TLS ClientHello** (cipher suites, extensions, ALPN) |
| 14 | soc_relay | events → async POST |
| 15 | cert_pin_detect | **TLS handshake-error** hook → learn ignore_hosts |
| 16 | media_stats | response headers → stats |
| 17 | tls_splice | **tls_clienthello SNI → connection passthrough** (ignore_connection) |
| 18 | (dpi dup/util) | — |
Capability buckets that discriminate the engines:
- **(C)** request short-circuit (return 204/synth without upstream) — ad_ghost, privacy_guard, inject_banner, media_cache.
- **(E)** **streaming** response body rewrite (inject into first chunk, no buffering) — inject_banner TTFB path.
- **(G)** **raw ClientHello introspection** for JA4 — ja4, local_store.
- **(H)** **TLS-layer SNI passthrough/splice** — tls_splice, cert_pin_detect, bypass list.
- **(I)** TLS handshake-error hook — cert_pin_detect.
- **(J)** async side-effects (socket POST / bg SQLite) — 7 addons.
## Engine assessment
### gomitmproxy (Go, AdguardTeam) — DROP
Purpose-built for ad-blocking MITM, but **last release v0.2.1 (2021), effectively
unmaintained**. Reusing an abandoned TLS-handling core for a security appliance
is the wrong bet. Cross off.
### martian (Google) / goproxy (elazarl) — Go, maintained
- Strong on **B/C/D/F/J** (modifier/handler APIs return custom responses, modify
headers/cookies/body; goroutines for async). Easy **arm64 cross-compile**
(`GOOS=linux GOARCH=arm64`), single static binary — great fit for the appliance.
- **Gaps:** **(G) JA4** — both abstract TLS at the HTTP layer; raw ClientHello
isn't exposed by the modifier API. *Workaround:* wrap the listener with our own
`crypto/tls` `Config.GetConfigForClient`/`GetCertificate` to capture the
ClientHello before handing to the proxy — feasible, extra code. **(E) streaming
inject** is manual (wrap the response body reader). **(H/I)** host-level
splice/cert-error handling is doable at the CONNECT layer.
- Verdict: pragmatic, lowest-friction toolchain, but JA4 + streaming need custom
glue.
### hudsucker (Rust, omjadas + ideamans fork) — maintained
- **Best technical coverage:** tokio/hyper async (**multi-core**), `HttpHandler`
(C/D/F), **streaming bodies (E)** native, WebSocket. Critically, **rustls
exposes the ClientHello** (Acceptor/`ClientHello` peek pre-handshake) → **JA4
(G) is clean**, and SNI-based **splice (H)** is natural.
- **Costs:** Rust **arm64 cross-compile friction** (no toolchain here; needs
`cross`/musl setup), and porting 18 addons + the anti-track HMAC-jar/classify
brain to Rust is the **highest re-implementation + re-validation effort**.
- Verdict: technically the strongest (only one covering JA4 + streaming cleanly),
but the heaviest port + ops.
### Squid + ssl-bump + ICAP — mature C, multi-process
- **Native wins:** ssl-bump forges from one root key (A), **peek-and-splice (H)
is literally tls_splice + the bypass list**, native cert-error handling (I),
multi-process scaling. ICAP REQMOD/RESPMOD covers **C/D/F** (204, body rewrite,
header/cookie mod) — ad_ghost/banner-buffer/poison can live in an ICAP service.
- **Gaps:** **(E) streaming** inject — ICAP buffers, no first-chunk inject.
**(G) JA4** — ICAP is post-decrypt HTTP; ClientHello isn't exposed to ICAP
(Squid logs its own TLS details, not via ICAP). Heavy **ops/config**; each ICAP
call is a round-trip; the anti-track HMAC-jar/poison + social-graph logic in an
ICAP service is awkward (still Python, still off-core for analysis).
- Verdict: least *custom proxy* code + native splice/cert handling, but loses
JA4 + streaming-banner and trades Python addons for Squid-config + an ICAP
service. Good if we drop JA4/streaming; otherwise a poor fit.
## Recommendation — **Go hot-path core + retained Python analysis sidecars** (hybrid)
Single-engine "rewrite everything in Rust" is the highest risk; Squid loses JA4 +
streaming. The lowest-risk path to multi-core that **preserves the
security-validated Python brain**:
1. **Go core** (goproxy/martian or a thin `net/http`+`crypto/tls` forging proxy)
owns the **hot path**: TLS forge (reusing `ca-wg`), SNI splice (H), the cheap
per-request rewrites — block 204 (ad_ghost/privacy_guard), header/cookie strip
(utiq/protective/anonymize), banner inject (E via body-reader wrap), serve
/__toolbox/*. Multi-core, one static arm64 binary.
2. **JA4 (G)** in Go via a `crypto/tls` ClientHello-capture shim (no Python).
3. **Heavy/off-path analysis stays Python sidecars** the Go core feeds
fire-and-forget over unix sockets (J): social-graph correlation, classify,
DB/report writers, SOC/DPI relays. These are already async + off the hot path,
so they don't need to be fast — and we DON'T re-validate the anti-track
HMAC-jar/poison + cookie-graph security logic in a new language.
4. The anti-track **poison** (forge Set-Cookie from the HMAC jar) is hot-path +
security-critical → port the *deterministic* jar/forge to Go (small, testable),
keep classify (which list a host is on) as data the Go core reads from the
learned/pure files (already file-based).
This gets multi-core on the hot path, keeps the risky brain in validated Python,
and only ports the small, mechanical, hot pieces. If JA4-in-Go proves painful, the
fallback is **hudsucker** (Rust) for the core (clean JA4) at higher port cost.
## Honest effort/risk
- **Weeks, multi-PR.** 18 addons; security-critical; production board.
- Must **shadow-run** the new core alongside mitmproxy (mirror a fraction of R3
traffic) and compare before any cutover. **Never** big-bang.
- Rollback = the nft fanout still points at the mitmproxy workers until the final
cutover flips the DNAT to the Go core's ports.
See the phased plan: `docs/superpowers/plans/2026-06-18-mitm-engine-migration.md`.

View File

@ -0,0 +1,62 @@
# MITM engine migration — Phase 2 bench results (#662)
- **Date:** 2026-06-18 · ran the Phase-1 Go PoC on gk2 (arm64), `127.0.0.1:8090`,
**no DNAT** (zero impact on live R3, which stayed on the mitmproxy workers).
## Proven on the real arm64 board (with the live `ca-wg` CA)
| Check | Result |
|-------|--------|
| Static arm64 binary | 5.4 MB, `ELF aarch64`, CGO_ENABLED=0 — runs natively on gk2 |
| **CA-compat forging** | `curl -x :8090 --cacert ca.pem https://example.com/`**200**; the forged leaf (signed by the existing `ca-wg` CA) is trusted — R3 clients would trust it, no re-enroll |
| **MITM + body inject** | injected `<!-- sbx-ng banner -->` marker present in the HTML |
| **204 block** | `https://doubleclick.net/`**204** (ad_ghost path) |
| **JA4 capture** | live: `t0304_c31_ah2_sni=example.com` (TLS1.3 / 31 ciphers / ALPN h2 / SNI) — the `ja4` addon's material is reachable in Go on arm64 |
| **Footprint** | **~12 MB RSS** vs Python mitmproxy ~70117 MB per worker |
So every **discriminating capability** the analysis flagged (CA-compat, request-204,
body-inject, SNI-splice, JA4) works on the actual hardware, at ~1/6th the memory.
## Gate: "forge + throughput ≥ mitmproxy" — PARTIAL
- **Forge:** ✅ proven (CA-compat, cached per host, fast).
- **Footprint:** ✅ ~12 MB (far below mitmproxy).
- **Throughput / multi-core:** ⚠️ **not cleanly measured.** The instantaneous-CPU
sample was cut short by (a) a transient `wg-admin` ssh blip and (b) a
`pkill -f sbxmitm` self-match bug (the kill matched its own ssh shell). Multi-core
is **structurally guaranteed** — Go runs `GOMAXPROCS=4` with no GIL, vs Python
mitmproxy capped ~1 core/worker — but a rigorous throughput-vs-mitmproxy
comparison must be done in a **controlled load environment**, NOT by hammering
the production board.
## Phase 2b — controlled multi-core throughput bench (SETTLES the gate)
`BenchmarkHandshake` (cmd/sbxmitm/bench_test.go) drives full client↔proxy forged
TLS handshakes in parallel at `-cpu=1,2,4,8` (dev box, warm forge cache):
| Cores | ns/handshake | handshakes/s | scaling |
|-------|--------------|--------------|---------|
| 1 | 398,895 | ~2,510 | 1.00× |
| 2 | 204,116 | ~4,900 | **1.95×** |
| 4 | 117,307 | ~8,520 | **3.40×** |
| 8 | 86,999 | ~11,490 | 4.58× |
Near-linear to 2 cores, **3.40× at 4 cores** (gk2's core count) — the Go core's
throughput **scales with cores**, whereas a GIL-bound Python mitmproxy worker
stays ~1 core regardless. So on gk2's 4 cores the Go core does ~3.4× the handshake
throughput of one Python worker; ~2,510 handshakes/s even single-core dwarfs the
toolbox's real load (a few clients).
## Conclusion (Phase 2 + 2b)
Migration premise **validated on real hardware**: CA-compat + all L7/TLS
discriminators + ~12 MB footprint (arm64) + **multi-core throughput scaling**
(3.4× at 4 cores). The big unknowns are answered; what remains is
mechanical-but-large porting (Phase 3+) + a gated cutover.
## Ops note
The PoC was localhost-only (`127.0.0.1:8090`), no DNAT, cleaned up (`fuser -k
8090/tcp` + binary removed). LESSON: never `pkill -f <name>` over ssh when `<name>`
appears in the remote command line — it kills its own shell; use `fuser -k
<port>/tcp` or `pgrep | grep -v $$` + kill-by-PID.
## Next
Phase 2 + 2b gates PASSED. → **Phase 3** (hot-path feature parity: port block/
inject/strip/splice reading the real data files, parity harness vs the Python
addons). Pause for review before committing to the port — see the phased plan.

View File

@ -0,0 +1,173 @@
# WAF engine migration — feasibility analysis (#662 follow-on)
> Status: ANALYSIS ONLY. No code, no plan, nothing touched on the live WAF.
> Question asked: *"can the #662 Go-engine technique be adapted to the WAF?"*
> Date: 2026-06-18. Sibling of `2026-06-18-mitm-engine-migration-analysis.md`.
## TL;DR
Technically yes — and the hardest part of #662 (cert forging / transport / CA
trust) **does not exist** for the WAF, because HAProxy already terminates TLS and
hands mitmproxy cleartext. But the right move is **NOT** to hand-roll a Go WAF the
way we hand-rolled the R3 engine. The WAF's decision logic is security-critical and
synchronous (block-before-forward), which is exactly where bespoke code is most
dangerous. The recommendation is to **ADOPT** a vetted engine (OWASP Coraza + CRS v4)
rather than port our bespoke regex rules, and — if the non-WAF addons can be
relocated — to **retire the in-path mitmproxy entirely** via HAProxy's SPOA, which
also eliminates the WAF's worst failure mode (the single-backend SPOF that "downs all
inspected vhosts").
Crucially, **the perf premise is weaker than #662's.** #662 had a measured CPU/latency
ceiling on the R3 tunnel. The WAF is *not* currently throughput-bound. So the
justification here is **resilience + security coverage + fewer band-aids**, not raw
speed. Be honest about that when deciding whether it's worth the risk.
---
## 1. What the WAF actually is (grounded, repo + live board)
- **Reverse-proxy inspector**, not a transparent/forward MITM like R3. Path:
external client → **HAProxy `*:443 ssl` (TLS 1.3 termination)** → cleartext HTTP →
**mitmproxy `--mode regular` in the `mitmproxy` LXC (`10.100.0.60:8080`)**
backend vhosts. HAProxy rewrites to absolute-form (`set-uri http://Host/path`) so
the forward-proxy accepts it.
- **No TLS / no cert machinery on the WAF side.** mitmproxy never decrypts, never
forges, holds no CA. (This removes the entire hard half of the #662 port.)
- **Hot path (every request), deterministic:** host→backend dict lookup
(live-reloaded from `/srv/mitmproxy/haproxy-routes.json`, 255 entries, 187 routed
through inspection), then a single linear **regex scan** over
`path+query+body+UA` against `waf-rules.json` (~90+ patterns: sqli/xss/cmdi/
traversal/ssrf/xxe/log4shell/scanners/cve…), first-match-wins. Block = set
`flow.response` to short-circuit → **synchronous, decide-before-forward**.
- **Enforcement is graduated and mostly soft:** 1st/2nd hit → 403 *warning page*;
3rd hit in 300 s (`BAN_THRESHOLD=3`) → ban via **CrowdSec LAPI** (`POST /v1/alerts`,
JWT watcher) → `crowdsec-firewall-bouncer` drops at nft. The CrowdSec POST is a
**synchronous `urllib` call (~up to 4 s) inside the request hook** — the clearest
GIL/latency smell, trivially a goroutine in Go.
- **Stateful bits are small:** per-IP sliding-window dict (in-memory, lost on
restart; hit 1500+ entries under attack). Everything else is stateless.
- **Three NON-WAF addons ride the same proxy:** `media_cache.py` (#607 disk cache for
owned-vhost media), `cookie_audit.py` (RGPD Set-Cookie ledger, observational),
and CDN **banner injection** (`response` hook, injects `<script>` before `</body>`
on owned vhosts). These do **traffic transformation / caching** — a verdict-only
WAF (SPOA) would not cover them; their fate must be decided (relocate, drop, or
keep a thin in-path component).
- **Two synced package copies:** `packages/secubox-mitmproxy/` (canonical, 1193-line
addon, CrowdSec bridge + watchdog + FastAPI control) and the legacy
`packages/secubox-waf/` (968-line, ships `wafctl` + the LXC unit). Sync-lag is a
known liability (`.claude/TODO.md`).
## 2. Live performance — the decisive datum
| Metric (gk2, read-only) | Value |
|---|---|
| mitmproxy | 11.0.2 / Py 3.11 / **single process, single asyncio loop** (no multi-core) |
| Request volume | **~3.6 req/s** sustained (mostly internet scanner probing) |
| WAF CPU | **~1753% of ONE core** (clean Δ ≈ 17%); ~5050 CPU-s over 12 d, niced |
| Board load avg | ~3.5 on 4 cores — board near-saturated overall, WAF a minority |
| Inspected vhosts | 187 of 255 routes, **one `mitmproxy_inspector` backend** |
| Hardening band-aids | `MemoryMax=512M`, `RuntimeMaxSec=21600` (6 h forced restart), `http2=false`, loop-guard, `Connection: close` (FD-leak fix), nft pre-rate-limit, watchdog (lxc-restart on 3 probe fails) |
**Conclusion:** at today's load a rewrite is **not justified by throughput** — the
WAF isn't pegging its core. The real motivations are: (1) the **single-threaded
ceiling under attack/burst** (saturates ~710 req/s on the inspected path; a scan
flood serializes through one loop), (2) the **single-backend SPOF** — with
`waf_enabled`, *all* vhosts + the default route funnel through one inspector, so its
death = board-wide 503 (the watchdog only turns a multi-hour outage into a ~3-min
one), (3) the **resource pathologies** (FD/conn-pool leak, HTTP/2 memory drift)
papered over by restarts. The project's own `.claude/PHASE-7-WAF-ROADMAP.md` already
says it: *"mitmproxy is NOT a WAF tool… ModSec ~5× throughput of Python mitm."*
## 3. Why the #662 playbook only half-applies
| #662 (R3 anti-track) | WAF |
|---|---|
| Forward/transparent MITM, forges certs, CA trust, SO_ORIGINAL_DST — **hard** | Reverse proxy, **HAProxy already terminates TLS**, cleartext in — **easy** |
| Decisions can be **async** (poison cookies fire-and-forget) | Decisions are **synchronous** (block before forward) — can't sidecar the verdict |
| Feature-set was **bespoke** → hand-port justified | Detection is **generic WAF rules** → a vetted CRS exists → **adopt, don't port** |
| Bug = degraded browsing (annoying) | Bug = **outage of all vhosts OR a security bypass** — far higher bar |
| Clear measured perf ceiling drove it | **Not throughput-bound today** — weaker perf case |
So: transport is easier, but the part #662 deliberately kept in Python (the "risky
brain") **is** the WAF's core and is on the synchronous critical path. The lesson is
inverted: for R3 we built; for the WAF we should **adopt the engine** and only write
thin glue.
## 4. Options (build-vs-adopt)
**Option A — HAProxy + `coraza-spoa` + CRS v4 (RECOMMENDED, if addons relocatable).**
Keep HAProxy as-is; attach OWASP **Coraza** (CRS v4) as a **SPOA/SPOE agent**.
HAProxy sends each request to the agent, **blocks for the verdict**, applies
`http-request deny 403 if {var(txn.coraza.action) -m str deny}`. Pure-Go, clean
arm64 (`CGO_ENABLED=0`). **Retires the in-path mitmproxy → eliminates the SPOF**
(traffic no longer flows *through* the inspector; the agent is out-of-band, in-line
only for the verdict). Adopts a community-vetted ruleset instead of our bespoke
regex. *Gaps:* SPOA returns a **verdict only — no traffic transformation**, so
banner-injection / media-cache / cookie-audit must move elsewhere or be dropped.
*Risks:* `coraza-spoa` is **0.x (v0.7.2, 2026-05)**, no named prod adopters → pin +
benchmark on arm64; **HAProxy 3.1+ requires `mode spop`** for the SPOA backend →
check the board's HAProxy version before wiring.
**Option B — Go reverse-proxy embedding Coraza (`coraza/v3` `http.WrapHandler`).**
A single Go binary replaces mitmproxy *in-path* (`net/http/httputil.ReverseProxy` +
Coraza). Keeps the in-path model → can still do banner/cache/transformation, and
gets multi-core + bounded memory + no FD leak. Still **adopts** the engine + CRS;
only the proxy glue is bespoke. *Cost:* ReverseProxy footguns (bounded body
buffering, Content-Length resync, error/upgrade handling) need a real PoC test
suite; still an in-path component (SPOF remains, but a robust Go one).
**Option C — CrowdSec AppSec component (Coraza inline).** CrowdSec's AppSec
component *is* Coraza inline; since we already integrate CrowdSec (LAPI bridge), this
could deliver the inline WAF as a CrowdSec component and unify the stack. Worth
scoping against A.
**Option D — REJECT: hand-roll a Go WAF engine / port the bespoke regex rules.** The
"don't roll your own crypto" rule applies to WAF rulesets. Bespoke signatures miss
generic/0-day-class detection that CRS anomaly-scoring is built for, and carry a
permanent FP-tuning + CVE-tracking burden. Also reject the dead `spoa-modsecurity`
(ModSecurity v2, EOL 2024).
## 5. CSPN angle
The project targets ANSSI CSPN. Adopting **OWASP CRS v4** (a flagship, test-suite-
covered ruleset) is far more defensible for certification than bespoke regex, and a
formal SPOA verdict + an explicit **fail-open vs fail-close** SPOE policy is a clean,
auditable security-decision boundary. (Current bespoke WAF = warn-pages + 3-strike
CrowdSec ban; CRS gives graduated anomaly scoring with documented paranoia levels.)
## 6. Recommendation + gated next steps (NOT started)
**Recommendation:** ADOPT Coraza + CRS v4. Prefer **Option A (SPOA, retire mitmproxy,
kill the SPOF)** if banner/cache/cookie-audit can be relocated; fall back to
**Option B (in-path Go + embedded Coraza)** if traffic transformation must stay
in-path. Do **not** hand-roll the engine or port the regex rules.
Proposed gated plan, more conservative than #662 (security-critical + SPOF):
1. **Decide the addon fate** (banner / media-cache / cookie-audit): relocate, drop,
or keep a thin in-path component → this picks A vs B.
2. **Check the board's HAProxy version** (SPOE 2.x vs 3.1 `mode spop`).
3. **PoC, detect-only, SHADOW:** run coraza-spoa (or the Go+Coraza proxy) in
**detection-only** mode against a mirror/copy of real traffic; **compare its
verdicts to the current regex WAF** on the same requests (false-pos / false-neg
delta). Serve no clients.
4. **arm64 benchmark** (latency added per request, body-size cost, burst behaviour).
5. **CRS tuning pass** on real traffic in detect-only (FP elimination, paranoia
level) before any blocking.
6. **Canary ONE low-risk vhost** through the new path with the old WAF as instant
fallback; watch; widen; then retire the mitmproxy inspector.
**Honest framing for the go/no-go:** if the goal is "the WAF is slow," the data says
it isn't (yet) — don't take the risk. If the goal is **resilience (kill the SPOF,
end the FD-leak/memory restarts, multi-core burst headroom) + better/auditable
detection coverage (CRS) for CSPN**, then Coraza+CRS via SPOA is a strong, mostly-
*adopt* move with a contained bespoke surface — a very different risk profile from
the #662 hand-roll.
## Sources
Repo: `packages/secubox-mitmproxy/addons/secubox_waf.py`, `data/waf-rules.json`,
`packages/secubox-haproxy/sbin/haproxyctl`, `packages/secubox-waf/systemd/
mitmproxy.service`, `.claude/PHASE-7-WAF-ROADMAP.md`. Live: gk2 read-only
(mitmproxy 11.0.2, 3.6 req/s, ~1753% one core, 255 routes/187 inspected, HAProxy
TLS-term → cleartext). External (2025-26): OWASP Coraza v3.7 / coraza-spoa v0.7.2 /
coraza-coreruleset (CRS v4.25 LTS), HAProxy SPOE + 3.1 `mode spop`, CrowdSec AppSec
in-band/out-of-band, ngrok in-process Coraza.

12
packages/secubox-toolbox-ng/.gitignore vendored Normal file
View File

@ -0,0 +1,12 @@
/sbxmitm
*.test
cmd/sbxmitm/sbxmitm
# Debian build artifacts (rules builds the binary + go caches in-tree)
/_gocache/
/_gopath/
/debian/.debhelper/
/debian/files
/debian/*.substvars
/debian/secubox-toolbox-ng/
/debian/debhelper-build-stamp
/debian/*.debhelper.log

View File

@ -0,0 +1,60 @@
# secubox-toolbox-ng — Go MITM engine (migration spike, #662 Phase 1)
De-risking PoC for migrating the R3 toolbox MITM engine off Python **mitmproxy**
(GIL-bound, ~1 core) onto a multi-core **Go** core, **without losing the
18-addon feature set**. See:
- Analysis: `docs/superpowers/specs/2026-06-18-mitm-engine-migration-analysis.md`
- Phased plan: `docs/superpowers/plans/2026-06-18-mitm-engine-migration.md`
> **Status: Phase 1 — PoC only. NOT wired into the live R3 path.** The live
> tunnel still runs on the Python mitmproxy workers (8081-8084). This binary is
> a standalone CONNECT-proxy spike that proves the risky capabilities.
## What the PoC proves (the discriminating risks from the analysis)
- **CA-compat forging** — loads the *existing* `ca-wg/{ca.pem,key.pem}` and forges
per-host leaf certs the R3 clients already trust (no re-enroll). Cached per host.
- **request 204** — short-circuit block (ad_ghost / privacy_guard).
- **response body inject** — marker before `</head>`/`</body>` (banner / ad-CSS).
- **SNI splice** — raw passthrough, no MITM, by SNI suffix (tls_splice).
- **JA4 material capture**`crypto/tls` `GetCertificate` receives the
`ClientHelloInfo` (SNI, cipher suites, ALPN, TLS versions) → proves the `ja4`
addon's handshake fingerprint is reachable in Go (full JA4 extension-hash needs
a raw-ClientHello peek — Phase 4).
All stdlib (no external modules → builds offline). Tests are network-free
(localhost handshake + temp self-signed CA).
## Build & test
```sh
cd packages/secubox-toolbox-ng
go test ./... # network-free PoC tests
GOOS=linux GOARCH=arm64 go build -o sbxmitm ./cmd/sbxmitm # appliance target
```
## Try it (CONNECT proxy, against the board CA)
```sh
./sbxmitm --ca-cert /etc/secubox/toolbox/ca-wg/ca.pem \
--ca-key /etc/secubox/toolbox/ca-wg/key.pem --listen :8090
curl -x localhost:8090 --cacert /etc/secubox/toolbox/ca-wg/ca.pem https://doubleclick.net/ # → 204
curl -x localhost:8090 --cacert /etc/secubox/toolbox/ca-wg/ca.pem https://example.com/ # → body has the sbx-ng marker
# logs print `ja4 t0304_cNN_a... sni=...` per handshake
```
## Capability → engine map (recap)
Go covers request-204 / body-rewrite / header-cookie-mod / splice / async-sidecars
cleanly; JA4 needs the ClientHello shim (proven here); streaming inject + the
anti-track HMAC-jar/poison port land in Phase 3/4. Heavy analysis (social-graph,
classify, DB/report writers) stays in the existing Python sidecars, fed
fire-and-forget over unix sockets.
## Roadmap (do NOT cut over without the gates)
1. ✅ PoC (this) — forge + 204 + inject + splice + ClientHello capture, compiled + tested.
2. arm64 packaging + board bench on :8090 (no DNAT) — forge/throughput vs mitmproxy.
3. Hot-path feature parity (block lists + allowlist + own-infra guard, header/cookie strip, banner, splice) — parity harness vs the Python addons.
4. Analysis sidecars (unix-socket fire-and-forget) + anti-track HMAC-jar/forge port (exhaustively tested vs `privacy.py`).
5. **Shadow run** — mirror a fraction of R3, compare outputs. No client served yet.
6. **Cutover** — flip nft `numgen` fanout 8081-8084 → 8090-8093; mitmproxy stays up for instant rollback.
7. Decommission mitmproxy after a stable soak.
Rollback is an nft DNAT-map edit at every step; the Python engine is the live
path until Phase 6.

View File

@ -0,0 +1,167 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: transparency-banner loader inject (#662)
//
// Ports the LIVE transparency-banner injection from the authoritative Python
// addon (../secubox-toolbox/mitmproxy_addons/inject_banner.py) into the Go
// engine. With stream_inject ON the Python addon injects a tiny LOADER
// <script src="/__toolbox/loader.js" data-mh=.. data-wg=.. async></script> and
// SERVES /__toolbox/loader.js + /__toolbox/bundle itself for ANY origin (the
// injected same-origin URL resolves to whatever MITM'd host the client is on).
//
// To avoid re-porting the bundle/level business logic to Go, this engine
// REVERSE-PROXIES /__toolbox/* to the portal (default http://127.0.0.1:8088),
// which already serves both endpoints. The injection (injectLoader) mirrors the
// Python _loader_script + _LoaderInjector byte-for-byte on the tag shape and
// placement; the guard makes it idempotent (matches Python _GUARD).
//
// Pure standard library — no external modules.
package main
import (
"bytes"
"io"
"log"
"net/http"
"strings"
"time"
)
// bannerGuard matches the Python _GUARD ("__GONDWANA_MITM_BANNER__"): an HTML
// comment marker that makes injection idempotent across stream chunks / repeat
// passes. If the body already contains it, we never inject again.
const bannerGuard = "__GONDWANA_MITM_BANNER__"
// asciiOnly drops every non-ASCII byte from s, mirroring the Python
// `s.encode("ascii", "ignore")` used on the client hash before it lands in the
// data-mh attribute. The clientHash is normally a hex mac_hash (already ASCII),
// but a non-WG fallback could carry odd bytes — strip defensively.
func asciiOnly(s string) string {
var b strings.Builder
b.Grow(len(s))
for i := 0; i < len(s); i++ {
if s[i] < 0x80 {
b.WriteByte(s[i])
}
}
return b.String()
}
// loaderScript builds the loader <script> tag EXACTLY like the Python
// _loader_script: a guard comment followed by the same-origin loader.js tag
// carrying the client identity (data-mh) + WG flag (data-wg). wg → "1" else "0";
// clientHash is ascii-sanitised. The src is same-origin so it resolves to the
// MITM'd host and is intercepted by the /__toolbox/* short-circuit.
func loaderScript(clientHash string, wg bool) []byte {
wgVal := "0"
if wg {
wgVal = "1"
}
mh := asciiOnly(clientHash)
tag := `<script src="/__toolbox/loader.js" data-mh="` + mh +
`" data-wg="` + wgVal + `" async></script>`
return []byte("<!-- " + bannerGuard + " -->" + tag)
}
// injectLoader inserts the loader <script> into an HTML body once. Placement
// mirrors the Python _LoaderInjector.__call__:
// - guard idempotency: if the body already contains bannerGuard → unchanged.
// - find the first (case-insensitive) "<head"; if present, find the next ">"
// after it and insert the tag right after that ">".
// - else find the first "<body" and insert the tag right BEFORE it.
// - if neither is present → return the body unchanged (no inject).
func injectLoader(body []byte, clientHash string, wg bool) []byte {
if bytes.Contains(body, []byte(bannerGuard)) {
return body
}
script := loaderScript(clientHash, wg)
low := bytes.ToLower(body)
if h := bytes.Index(low, []byte("<head")); h >= 0 {
if j := bytes.IndexByte(body[h:], '>'); j >= 0 {
at := h + j + 1
out := make([]byte, 0, len(body)+len(script))
out = append(out, body[:at]...)
out = append(out, script...)
out = append(out, body[at:]...)
return out
}
}
if b := bytes.Index(low, []byte("<body")); b >= 0 {
out := make([]byte, 0, len(body)+len(script))
out = append(out, body[:b]...)
out = append(out, script...)
out = append(out, body[b:]...)
return out
}
return body
}
// ── /__toolbox/* reverse-proxy to the portal ─────────────────────────────────
// isToolboxAssetPath reports whether a request path is one of the banner assets
// the engine must serve itself (by reverse-proxying to the portal) for ANY
// origin. STARTSWITH (not exact) is REQUIRED: the path includes the query
// string and the bundle is fetched as /__toolbox/bundle?mh=..&wg=.. — an exact
// match would never fire. Mirrors the Python request() p.startswith(...) checks.
func isToolboxAssetPath(path string) bool {
return strings.HasPrefix(path, "/__toolbox/loader.js") ||
strings.HasPrefix(path, "/__toolbox/bundle")
}
// portalTargetURL builds the absolute portal URL for an intercepted asset
// request: <portal-base> + the original request path (which already includes
// the query string). The portal base's trailing slash is trimmed so the result
// never doubles the leading "/" of the path.
func portalTargetURL(portal, pathWithQuery string) string {
return strings.TrimRight(portal, "/") + pathWithQuery
}
// portalClient is the short-timeout HTTP client used to fetch banner assets from
// the portal. Shared (stdlib http.Client is goroutine-safe) so we don't churn
// connections per request.
var portalClient = &http.Client{
Timeout: 5 * time.Second,
// Never follow redirects: the portal is a fixed loopback base, so not
// following 3xx means a misbehaving/compromised portal can't steer the
// worker into fetching an arbitrary outbound host (SSRF hygiene). The 3xx
// is relayed to the client as-is.
CheckRedirect: func(*http.Request, []*http.Request) error { return http.ErrUseLastResponse },
}
// servePortalAsset reverse-proxies a /__toolbox/* request to the portal and
// writes the portal's response (status + Content-Type + Cache-Control + body)
// back to the client over the already-established (TLS) conn. It returns true
// once it has written a response — the caller MUST NOT then forward upstream.
//
// Fail-open: if the portal request errors (portal down, timeout, non-2xx read
// failure) we serve a minimal 204 No Content so the navigation is never broken,
// and log at most a warning. We never 502 the whole page over a banner asset.
func servePortalAsset(w io.Writer, portal, pathWithQuery string) bool {
target := portalTargetURL(portal, pathWithQuery)
resp, err := portalClient.Get(target)
if err != nil {
log.Printf("portal asset fetch failed for %s: %v", target, err)
writeRaw(w, 204, "No Content", nil, nil)
return true
}
defer resp.Body.Close()
body, rerr := io.ReadAll(io.LimitReader(resp.Body, 8<<20))
if rerr != nil {
log.Printf("portal asset read failed for %s: %v", target, rerr)
writeRaw(w, 204, "No Content", nil, nil)
return true
}
headers := map[string]string{}
if ct := resp.Header.Get("Content-Type"); ct != "" {
headers["Content-Type"] = ct
}
if cc := resp.Header.Get("Cache-Control"); cc != "" {
headers["Cache-Control"] = cc
}
// writeRaw formats "HTTP/1.1 <code> <status>"; pass only the reason phrase
// (not resp.Status, which already embeds the code → would double it).
writeRaw(w, resp.StatusCode, http.StatusText(resp.StatusCode), headers, body)
return true
}

View File

@ -0,0 +1,132 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: transparency-banner loader inject tests (#662)
//
// Mirrors the authoritative Python tests of inject_banner._loader_script /
// _LoaderInjector / the /__toolbox/* request() short-circuit. The portal
// reverse-proxy integration (a live portal) is validated on-board, NOT here;
// these unit tests cover the pure injection logic + the path/url helpers.
package main
import (
"strings"
"testing"
)
func TestInjectLoaderGuardIdempotent(t *testing.T) {
// Body already carrying the guard → returned byte-for-byte unchanged.
body := []byte("<html><head><!-- " + bannerGuard + " --><script></script></head><body>hi</body></html>")
out := injectLoader(body, "abc123", false)
if string(out) != string(body) {
t.Fatalf("guarded body must be unchanged.\n got: %s", out)
}
}
func TestInjectLoaderHeadInsertion(t *testing.T) {
body := []byte(`<html><head lang="en"><title>x</title></head><body>hi</body></html>`)
out := string(injectLoader(body, "deadbeef", true))
// The tag must land right AFTER the first <head ...>'s closing '>'.
headOpen := `<head lang="en">`
idx := strings.Index(out, headOpen)
if idx < 0 {
t.Fatalf("head open lost: %s", out)
}
after := out[idx+len(headOpen):]
wantTag := `<!-- ` + bannerGuard + ` --><script src="/__toolbox/loader.js" data-mh="deadbeef" data-wg="1" async></script>`
if !strings.HasPrefix(after, wantTag) {
t.Fatalf("tag not inserted right after <head>'s '>'.\n got: %s", after)
}
// <title> must still follow the injected tag (we inserted, not replaced).
if !strings.Contains(out, wantTag+`<title>x</title>`) {
t.Fatalf("original head content displaced: %s", out)
}
}
func TestInjectLoaderBodyFallback(t *testing.T) {
// No <head> → insert right BEFORE the first <body>.
body := []byte(`<html><body class="x">hi</body></html>`)
out := string(injectLoader(body, "cafe", false))
wantTag := `<!-- ` + bannerGuard + ` --><script src="/__toolbox/loader.js" data-mh="cafe" data-wg="0" async></script>`
if !strings.Contains(out, wantTag+`<body class="x">`) {
t.Fatalf("tag not inserted right before <body>.\n got: %s", out)
}
}
func TestInjectLoaderNeitherHeadNorBody(t *testing.T) {
body := []byte(`<p>just a fragment</p>`)
out := injectLoader(body, "x", true)
if string(out) != string(body) {
t.Fatalf("no head/body → must be unchanged.\n got: %s", out)
}
}
func TestInjectLoaderWGAttr(t *testing.T) {
cases := []struct {
wg bool
want string
}{
{true, `data-wg="1"`},
{false, `data-wg="0"`},
}
for _, c := range cases {
out := string(injectLoader([]byte(`<head></head>`), "mh1", c.wg))
if !strings.Contains(out, c.want) {
t.Fatalf("wg=%v: want %q in %s", c.wg, c.want, out)
}
}
}
func TestInjectLoaderNonASCIIHashStripped(t *testing.T) {
// Non-ascii bytes in the client hash are dropped (Python .encode("ascii","ignore")).
out := string(injectLoader([]byte(`<head></head>`), "abécÿ12", false))
if !strings.Contains(out, `data-mh="abc12"`) {
t.Fatalf("non-ascii bytes not stripped: %s", out)
}
}
func TestInjectLoaderHeadCaseInsensitive(t *testing.T) {
body := []byte(`<HTML><HEAD></HEAD><BODY>hi</BODY></HTML>`)
out := string(injectLoader(body, "z", false))
if !strings.Contains(out, `<HEAD><!-- `+bannerGuard) {
t.Fatalf("case-insensitive <HEAD> match failed: %s", out)
}
}
func TestIsToolboxAssetPath(t *testing.T) {
cases := []struct {
path string
want bool
}{
{"/__toolbox/loader.js", true},
{"/__toolbox/loader.js?v=2", true},
{"/__toolbox/bundle", true},
{"/__toolbox/bundle?mh=abc&wg=1", true},
{"/__toolbox/other", false},
{"/index.html", false},
{"/", false},
{"", false},
{"/__toolboxbundle", false},
}
for _, c := range cases {
if got := isToolboxAssetPath(c.path); got != c.want {
t.Errorf("isToolboxAssetPath(%q) = %v, want %v", c.path, got, c.want)
}
}
}
func TestPortalTargetURL(t *testing.T) {
cases := []struct {
portal, path, want string
}{
{"http://127.0.0.1:8088", "/__toolbox/loader.js", "http://127.0.0.1:8088/__toolbox/loader.js"},
{"http://127.0.0.1:8088", "/__toolbox/bundle?mh=abc&wg=1", "http://127.0.0.1:8088/__toolbox/bundle?mh=abc&wg=1"},
// Trailing slash on the portal base must not double up.
{"http://127.0.0.1:8088/", "/__toolbox/loader.js", "http://127.0.0.1:8088/__toolbox/loader.js"},
}
for _, c := range cases {
if got := portalTargetURL(c.portal, c.path); got != c.want {
t.Errorf("portalTargetURL(%q,%q) = %q, want %q", c.portal, c.path, got, c.want)
}
}
}

View File

@ -0,0 +1,95 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// #662 Phase 2b — controlled multi-core throughput bench. Drives full client↔
// proxy TLS handshakes (forge + ClientHello capture) in parallel. Run with
// `-cpu=1,2,4,8` to SHOW the scaling Python mitmproxy's GIL cannot do:
// go test -run x -bench BenchmarkHandshake -benchmem -cpu=1,2,4,8 ./cmd/sbxmitm
package main
import (
"crypto/ecdsa"
"crypto/elliptic"
"crypto/rand"
"crypto/tls"
"crypto/x509"
"crypto/x509/pkix"
"encoding/pem"
"math/big"
"net"
"os"
"path/filepath"
"testing"
"time"
)
func benchCA(b *testing.B) (string, string) {
b.Helper()
dir := b.TempDir()
key, _ := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
tmpl := &x509.Certificate{
SerialNumber: big.NewInt(1), Subject: pkix.Name{CommonName: "Bench CA"},
NotBefore: time.Now().Add(-time.Hour), NotAfter: time.Now().Add(24 * time.Hour),
IsCA: true, KeyUsage: x509.KeyUsageCertSign, BasicConstraintsValid: true,
}
der, _ := x509.CreateCertificate(rand.Reader, tmpl, tmpl, key.Public(), key)
cp := filepath.Join(dir, "ca.pem")
kp := filepath.Join(dir, "key.pem")
cf, _ := os.Create(cp)
pem.Encode(cf, &pem.Block{Type: "CERTIFICATE", Bytes: der})
cf.Close()
kder, _ := x509.MarshalPKCS8PrivateKey(key)
kf, _ := os.Create(kp)
pem.Encode(kf, &pem.Block{Type: "PRIVATE KEY", Bytes: kder})
kf.Close()
return cp, kp
}
// BenchmarkHandshake: steady-state forged-cert TLS handshakes/sec under parallel
// load (warm forge cache). req/s should rise ~linearly with -cpu (no GIL).
func BenchmarkHandshake(b *testing.B) {
cp, kp := benchCA(b)
ca, err := loadCA(cp, kp)
if err != nil {
b.Fatal(err)
}
px := &Proxy{ca: ca}
if _, err := ca.forge("example.com"); err != nil { // warm cache
b.Fatal(err)
}
ln, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
b.Fatal(err)
}
defer ln.Close()
cfg := px.serverTLSConfig()
go func() {
for {
c, err := ln.Accept()
if err != nil {
return
}
go func() {
s := tls.Server(c, cfg)
s.Handshake()
s.Close()
}()
}
}()
pool := x509.NewCertPool()
pool.AddCert(ca.cert)
addr := ln.Addr().String()
ccfg := &tls.Config{ServerName: "example.com", RootCAs: pool, MinVersion: tls.VersionTLS12}
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
conn, err := tls.Dial("tcp", addr, ccfg)
if err != nil {
b.Error(err)
return
}
conn.Close()
}
})
}

View File

@ -0,0 +1,109 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: gzip-aware banner injection (#662)
//
// The transparency-banner inject (injectLoader) scans the HTML body for
// <head>/<body>. Browsers send `Accept-Encoding: gzip, br`, so most upstream
// responses come back COMPRESSED — and a compressed body has no plaintext
// <head>/<body> for injectLoader to find, so it silently no-ops (the banner
// vanished on every gzip page). mitmPipeline now pins the upstream request to
// `Accept-Encoding: gzip` (dropping br/zstd/deflate we cannot decode with the
// stdlib), so every response is either gzip or identity.
//
// This file holds the gzip helpers + the single inject-path transform that
// decompresses (if gzip) → injectLoader → recompresses, fail-open on any error
// so a banner asset never breaks the page.
//
// Pure standard library — compress/gzip only; no external modules (brotli/zstd
// are NOT in the stdlib, which is exactly why we constrain the wire to gzip).
package main
import (
"bytes"
"compress/gzip"
"io"
"strings"
)
// gunzipCap bounds the decompressed output so a maliciously-crafted gzip body
// (a "decompression bomb") cannot blow the worker's memory. The upstream body
// itself is already read under an 8MiB LimitReader; 32MiB of inflated HTML is a
// generous ceiling for a single page. Exceeding it → treated as an error
// (caller fails open and serves the original compressed bytes).
const gunzipCap = 32 << 20
// gunzipBytes inflates a gzip-compressed body. It is defensive on two axes:
// - a malformed/non-gzip input returns an error (caller fails open),
// - the decompressed output is capped at gunzipCap; if the stream would
// exceed it, that is reported as an error too (decompression-bomb guard).
func gunzipBytes(in []byte) ([]byte, error) {
zr, err := gzip.NewReader(bytes.NewReader(in))
if err != nil {
return nil, err
}
defer zr.Close()
// Read up to gunzipCap+1 so we can tell "exactly at the cap" (fine) from
// "the stream is bigger than the cap" (bomb → error).
out, err := io.ReadAll(io.LimitReader(zr, gunzipCap+1))
if err != nil {
return nil, err
}
if len(out) > gunzipCap {
return nil, errGunzipTooLarge
}
return out, nil
}
// errGunzipTooLarge is returned by gunzipBytes when the decompressed stream
// exceeds gunzipCap (decompression-bomb guard).
var errGunzipTooLarge = errString("gunzip output exceeds cap")
// errString is a tiny stdlib-only error type (avoids importing errors/fmt for
// one sentinel).
type errString string
func (e errString) Error() string { return string(e) }
// gzipBytes compresses in with the default gzip level. It never errors: the
// gzip.Writer only writes into an in-memory bytes.Buffer, which cannot fail.
func gzipBytes(in []byte) []byte {
var buf bytes.Buffer
zw := gzip.NewWriter(&buf)
_, _ = zw.Write(in)
_ = zw.Close()
return buf.Bytes()
}
// injectIntoBody runs the transparency-banner injection over a (possibly
// gzip-compressed) HTML body, returning the new body bytes to serve and whether
// the body was rewritten.
//
// - encoding == "" (identity): injectLoader runs directly on body; the result
// is returned (ok=true). The caller MUST update Content-Length to len(out).
// - encoding == "gzip" (case-insensitive): the body is gunzipped, injected,
// then RE-gzipped so the client transfer stays compressed (the tunnel is
// perf-sensitive). The caller keeps Content-Encoding: gzip and sets
// Content-Length to len(out).
// - any other encoding (br/zstd/deflate — should not occur after the upstream
// Accept-Encoding pin, but be safe): pass through untouched, ok=false.
//
// Fail-open: if gunzip fails (corrupt / not-actually-gzip / bomb), the ORIGINAL
// bytes are returned with ok=false so the page is never broken.
//
// idempotency / placement live entirely inside injectLoader (unchanged).
func injectIntoBody(body []byte, encoding, clientHash string, wg bool) (out []byte, ok bool) {
switch strings.ToLower(strings.TrimSpace(encoding)) {
case "":
return injectLoader(body, clientHash, wg), true
case "gzip":
plain, err := gunzipBytes(body)
if err != nil {
return body, false // fail open: serve the original compressed bytes
}
injected := injectLoader(plain, clientHash, wg)
return gzipBytes(injected), true
default:
return body, false // unknown encoding we cannot decode → pass through
}
}

View File

@ -0,0 +1,152 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: gzip-aware banner injection tests (#662)
//
// Covers the LIVE bug: the banner only injected into UNCOMPRESSED HTML, so
// gzip pages (the common case — browsers send Accept-Encoding: gzip,br) lost
// the banner. These tests pin the decompress→inject→recompress transform and
// its fail-open behaviour.
package main
import (
"bytes"
"strings"
"testing"
)
func TestGzipRoundTrip(t *testing.T) {
cases := [][]byte{
[]byte(""),
[]byte("hello world"),
[]byte(`<html><head><title>x</title></head><body>hi</body></html>`),
bytes.Repeat([]byte("AB"), 100000), // larger, compressible payload
}
for _, x := range cases {
got, err := gunzipBytes(gzipBytes(x))
if err != nil {
t.Fatalf("gunzipBytes(gzipBytes(%d bytes)) errored: %v", len(x), err)
}
if !bytes.Equal(got, x) {
t.Fatalf("round-trip mismatch: got %d bytes, want %d bytes", len(got), len(x))
}
}
}
func TestGunzipNonGzipFails(t *testing.T) {
// Plain bytes that are not a gzip stream → error, no panic.
if _, err := gunzipBytes([]byte("this is definitely not gzip")); err == nil {
t.Fatal("gunzipBytes on non-gzip input must error")
}
}
func TestInjectIntoBodyGzip(t *testing.T) {
// End-to-end-ish: HTML with <head>, gzipped, run through the exact transform
// the inject path uses. Result must gunzip back to an injected, intact doc.
html := `<html><head><title>page</title></head><body>content</body></html>`
out, ok := injectIntoBody(gzipBytes([]byte(html)), "gzip", "abc123", true)
if !ok {
t.Fatal("gzip inject must report ok=true")
}
plain, err := gunzipBytes(out)
if err != nil {
t.Fatalf("re-gzipped output must gunzip cleanly: %v", err)
}
s := string(plain)
if !strings.Contains(s, bannerGuard) {
t.Fatalf("banner guard %q absent after gzip inject:\n%s", bannerGuard, s)
}
// Document otherwise intact: original head/body content preserved.
if !strings.Contains(s, "<title>page</title>") || !strings.Contains(s, "<body>content</body>") {
t.Fatalf("original document content displaced:\n%s", s)
}
// The loader tag landed inside <head>.
if !strings.Contains(s, `<head><!-- `+bannerGuard) {
t.Fatalf("loader tag not inserted right after <head>:\n%s", s)
}
}
func TestInjectIntoBodyGzipCaseInsensitiveEncoding(t *testing.T) {
html := `<head></head>`
out, ok := injectIntoBody(gzipBytes([]byte(html)), "GZIP", "z", false)
if !ok {
t.Fatal("Content-Encoding GZIP (upper) must be recognised → ok=true")
}
plain, err := gunzipBytes(out)
if err != nil {
t.Fatalf("gunzip failed: %v", err)
}
if !strings.Contains(string(plain), bannerGuard) {
t.Fatalf("banner absent for upper-case GZIP encoding: %s", plain)
}
}
func TestInjectIntoBodyGzipFailOpen(t *testing.T) {
// Bytes labelled gzip but NOT gzip → fail open: original bytes, ok=false,
// no panic.
bad := []byte("not gzip at all <head></head>")
out, ok := injectIntoBody(bad, "gzip", "x", false)
if ok {
t.Fatal("corrupt gzip body must fail open (ok=false)")
}
if !bytes.Equal(out, bad) {
t.Fatalf("fail-open must return the ORIGINAL bytes untouched")
}
}
func TestInjectIntoBodyIdentity(t *testing.T) {
// Identity (empty Content-Encoding): inject directly, grown body returned.
html := []byte(`<html><head></head><body>hi</body></html>`)
out, ok := injectIntoBody(html, "", "deadbeef", false)
if !ok {
t.Fatal("identity inject must report ok=true")
}
if !bytes.Contains(out, []byte(bannerGuard)) {
t.Fatalf("banner absent on identity inject: %s", out)
}
if len(out) <= len(html) {
t.Fatalf("identity inject must GROW the body: got %d, was %d", len(out), len(html))
}
}
func TestInjectIntoBodyUnknownEncodingPassthrough(t *testing.T) {
// br/zstd/deflate (shouldn't occur after the Accept-Encoding pin) → untouched.
body := []byte("\x1f\x8b some br-ish bytes")
out, ok := injectIntoBody(body, "br", "x", false)
if ok {
t.Fatal("unknown encoding must pass through (ok=false)")
}
if !bytes.Equal(out, body) {
t.Fatalf("unknown-encoding passthrough must be byte-for-byte")
}
}
func TestGunzipBombGuard(t *testing.T) {
// A body that inflates beyond gunzipCap must be rejected (not OOM the worker).
// gzip of >32MiB of zeros compresses to a small blob but inflates past the
// cap → gunzipBytes returns an error → inject path fails open.
big := gzipBytes(make([]byte, gunzipCap+1024))
if _, err := gunzipBytes(big); err == nil {
t.Fatal("gunzipBytes must reject output exceeding gunzipCap")
}
// And via the inject path: fail open, original bytes preserved.
out, ok := injectIntoBody(big, "gzip", "x", false)
if ok {
t.Fatal("over-cap gzip body must fail open through injectIntoBody")
}
if !bytes.Equal(out, big) {
t.Fatal("over-cap fail-open must return the original compressed bytes")
}
}
func TestGunzipExactlyAtCap(t *testing.T) {
// A body that inflates to EXACTLY gunzipCap is allowed (boundary).
payload := make([]byte, gunzipCap)
got, err := gunzipBytes(gzipBytes(payload))
if err != nil {
t.Fatalf("exactly-at-cap payload must be allowed: %v", err)
}
if len(got) != gunzipCap {
t.Fatalf("at-cap length mismatch: got %d, want %d", len(got), gunzipCap)
}
}

View File

@ -0,0 +1,153 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: anti-track fake-identity jar (#662 Phase 4)
//
// Byte-exact port of the Python anti-track HMAC fake-identity jar
// (packages/secubox-toolbox/secubox_toolbox/privacy.py: _jar_key / _shape /
// fake_id). Python is the source of truth; this mirrors it exactly, proven by
// the cross-engine parity harness (testdata/jar-fixtures.json + jar_test.go ↔
// tests/test_jar_parity.py).
//
// The jar mints a STABLE fabricated cookie value per (client, tracker,
// cookie_name): a deterministic HMAC-SHA256 of stable inputs, never derived
// from real client data, identical across workers and restarts ('rémanent').
//
// Pure standard library — no external modules, no go.sum.
package main
import (
"crypto/hmac"
"crypto/sha256"
"encoding/binary"
"encoding/hex"
"fmt"
"os"
"strings"
)
// _privacyMultiTLD mirrors privacy._MULTI_TLD EXACTLY (NOT ad_ghost._2L — they
// differ: privacy has ac.uk/com.cn/com.tr/gov.uk/org.uk, lacks gouv.fr; and
// privacy returns IP literals as-is where ad_ghost returns None). The jar MUST
// use the privacy-flavored registrable so fakeID is byte-identical to
// privacy.fake_id across engines (else the fake persona mismatches at cutover).
var _privacyMultiTLD = map[string]bool{
"ac.uk": true, "co.jp": true, "co.nz": true, "co.uk": true, "co.za": true,
"com.au": true, "com.br": true, "com.cn": true, "com.tr": true,
"gov.uk": true, "org.uk": true,
}
// registrableJar mirrors privacy.registrable (NOT policy.go's ad_ghost-flavored
// registrable). eTLD+1 with the privacy multi-TLD table; IP literals returned
// as-is.
func registrableJar(host string) string {
host = strings.TrimRight(strings.ToLower(strings.TrimSpace(host)), ".")
if host == "" {
return host
}
allDigit := true
for _, c := range strings.ReplaceAll(host, ".", "") {
if c < '0' || c > '9' {
allDigit = false
break
}
}
if allDigit {
return host // IP literal → as-is (matches privacy.registrable)
}
parts := strings.Split(host, ".")
if len(parts) <= 2 {
return host
}
last2 := strings.Join(parts[len(parts)-2:], ".")
if _privacyMultiTLD[last2] {
return strings.Join(parts[len(parts)-3:], ".")
}
return last2
}
// loadJarKey reads the seed key file, trimming surrounding whitespace exactly
// like Python's `Path(JAR_KEY_PATH).read_bytes().strip()`.
//
// Returns nil when the file is missing/unreadable OR strips to empty — both of
// which mirror Python's `_jar_key()` returning None (which makes fake_id return
// None / fakeID return ("", false)). Note: strings.TrimSpace and Python's
// bytes.strip() trim the SAME ASCII whitespace set on byte boundaries
// (space, \t, \n, \r, \v=0x0b, \f=0x0c). The canonical key's first/last bytes
// must be non-whitespace, which the test fixture guarantees.
func loadJarKey(path string) []byte {
raw, err := os.ReadFile(path)
if err != nil {
return nil
}
// strings.TrimSpace over the byte string trims the same ASCII whitespace
// bytes Python's bytes.strip() does (it also strips Unicode space runes,
// but a key file is raw bytes with ASCII-whitespace padding, so the two
// agree on the edge bytes the fixture uses).
key := []byte(strings.TrimSpace(string(raw)))
if len(key) == 0 {
return nil
}
return key
}
// shape renders the HMAC digest into the cookie's observed format so the
// target accepts it. Mirrors privacy._shape EXACTLY:
//
// n = (name or "").lower()
// i = int.from_bytes(digest[:8], "big"); j = int.from_bytes(digest[8:16], "big")
// if n.startswith("_ga"): return "GA1.2.%d.%d" % (i % 1e10, j % 1e10)
// if n in ("_fbp",): return "fb.1.%d.%d" % (i % 1e13, j % 1e10)
// if n in ("uuid","uid","_pk_id") or len(name) >= 32:
// h = digest.hex(); return "%s-%s-%s-%s-%s" % (h[:8],h[8:12],h[12:16],h[16:20],h[20:32])
// return digest.hex()[:32]
//
// Note: Python `len(name)` is the RUNE (character) length, not byte length;
// we use len([]rune(name)) to match. The GA1/fb int math is on a uint64 read
// big-endian from the first/second 8 bytes; every modulus is < 2^64 so the
// Go uint64 computation matches Python's non-negative int, and fmt "%d" of a
// uint64 matches Python's "%d".
func shape(name string, digest []byte) string {
n := strings.ToLower(name)
i := binary.BigEndian.Uint64(digest[:8])
j := binary.BigEndian.Uint64(digest[8:16])
switch {
case strings.HasPrefix(n, "_ga"):
return fmt.Sprintf("GA1.2.%d.%d", i%10_000_000_000, j%10_000_000_000)
case n == "_fbp":
return fmt.Sprintf("fb.1.%d.%d", i%10_000_000_000_000, j%10_000_000_000)
case n == "uuid" || n == "uid" || n == "_pk_id" || len([]rune(name)) >= 32:
h := hex.EncodeToString(digest)
return fmt.Sprintf("%s-%s-%s-%s-%s", h[:8], h[8:12], h[12:16], h[16:20], h[20:32])
default:
return hex.EncodeToString(digest)[:32]
}
}
// fakeID returns a stable fabricated cookie value for (clientHash, tracker,
// cookieName). Mirrors privacy.fake_id EXACTLY:
//
// if not key or not client_hash or not tracker: return None
// msg = ("%s|%s|%s" % (client_hash, registrable(tracker), cookie_name)).encode()
// digest = hmac.new(key, msg, sha256).digest()
// return _shape(cookie_name, digest)
//
// Returns ("", false) for every case where Python returns None: empty key,
// empty clientHash, or empty tracker.
//
// IMPORTANT: this uses registrableJar (privacy.registrable flavor), NOT the
// ad_ghost-flavored registrable() in policy.go. They DIVERGE (gov.uk vs gouv.fr,
// IP literals) — `privacy.fake_id` folds the tracker via privacy.registrable, so
// the jar MUST too or the fake persona mismatches across engines at cutover.
// Do NOT "consolidate" to policy.registrable; the divergence-guard fixtures
// (ad.example.gov.uk, 9.9.9.9) will fail if you do.
func fakeID(clientHash, tracker, cookieName string, key []byte) (string, bool) {
if len(key) == 0 || clientHash == "" || tracker == "" {
return "", false
}
msg := fmt.Sprintf("%s|%s|%s", clientHash, registrableJar(tracker), cookieName)
mac := hmac.New(sha256.New, key)
mac.Write([]byte(msg))
digest := mac.Sum(nil)
return shape(cookieName, digest), true
}

View File

@ -0,0 +1,141 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// Cross-engine JAR parity harness — Go side (#662 Phase 4).
//
// Loads testdata/jar-fixtures.json + the fixed test key (testdata/jar-test.key,
// NOT the real /etc key), computes fakeID per fixture, and asserts == the
// fixture's expect. The Python side (../secubox-toolbox/tests/test_jar_parity.py)
// loads the SAME files and drives privacy.fake_id; both must agree → the HMAC
// fake-identity jar is byte-exact across engines. Python is the source of truth.
package main
import (
"encoding/hex"
"encoding/json"
"os"
"path/filepath"
"testing"
)
type jarFixture struct {
Client string `json:"client"`
Tracker string `json:"tracker"`
CookieName string `json:"cookie_name"`
Expect string `json:"expect"`
Why string `json:"why"`
}
type jarFile struct {
KeyFile string `json:"key_file"`
KeyHex string `json:"key_hex"`
Fixtures []jarFixture `json:"fixtures"`
}
func loadJarFile(t *testing.T) (jarFile, string) {
t.Helper()
dir := testdataDir(t) // shared with policy_test.go (cmd/sbxmitm → ../../testdata)
raw, err := os.ReadFile(filepath.Join(dir, "jar-fixtures.json"))
if err != nil {
t.Fatalf("read jar fixtures: %v", err)
}
var jf jarFile
if err := json.Unmarshal(raw, &jf); err != nil {
t.Fatalf("parse jar fixtures: %v", err)
}
if len(jf.Fixtures) == 0 {
t.Fatal("no jar fixtures")
}
return jf, dir
}
// TestJarKeyLoad: loadJarKey strips the file's surrounding whitespace back to
// the canonical key declared in key_hex (proves .strip()/TrimSpace parity).
func TestJarKeyLoad(t *testing.T) {
jf, dir := loadJarFile(t)
key := loadJarKey(filepath.Join(dir, jf.KeyFile))
if key == nil {
t.Fatal("loadJarKey returned nil")
}
want, err := hex.DecodeString(jf.KeyHex)
if err != nil {
t.Fatalf("bad key_hex: %v", err)
}
if hex.EncodeToString(key) != hex.EncodeToString(want) {
t.Fatalf("loaded key %x != canonical %x", key, want)
}
}
// TestJarParity: fakeID == Python-generated expect for every fixture.
func TestJarParity(t *testing.T) {
jf, dir := loadJarFile(t)
key := loadJarKey(filepath.Join(dir, jf.KeyFile))
if key == nil {
t.Fatal("loadJarKey returned nil — cannot run parity")
}
for _, fx := range jf.Fixtures {
got, ok := fakeID(fx.Client, fx.Tracker, fx.CookieName, key)
if !ok {
t.Errorf("fakeID(%q,%q,%q) returned !ok (%s)", fx.Client, fx.Tracker, fx.CookieName, fx.Why)
continue
}
if got != fx.Expect {
t.Errorf("fakeID(%q,%q,%q)=%q want %q (%s)",
fx.Client, fx.Tracker, fx.CookieName, got, fx.Expect, fx.Why)
}
}
}
// TestJarShapeCoverage: the fixtures must exercise every _shape branch, else
// "parity" is vacuous for an untested branch.
func TestJarShapeCoverage(t *testing.T) {
jf, _ := loadJarFile(t)
var sawGA, sawFB, sawUUID, sawHex bool
for _, fx := range jf.Fixtures {
switch {
case len(fx.Expect) >= 4 && fx.Expect[:4] == "GA1.":
sawGA = true
case len(fx.Expect) >= 3 && fx.Expect[:3] == "fb.":
sawFB = true
case len(fx.Expect) == 36 && fx.Expect[8] == '-':
sawUUID = true
case len(fx.Expect) == 32:
sawHex = true
}
}
if !sawGA || !sawFB || !sawUUID || !sawHex {
t.Fatalf("shape coverage incomplete: GA=%v FB=%v UUID=%v HEX=%v", sawGA, sawFB, sawUUID, sawHex)
}
}
// TestJarFolding: two subdomains of the same registrable tracker, same client &
// cookie name, mint the IDENTICAL fake id (registrable() folding).
func TestJarFolding(t *testing.T) {
jf, dir := loadJarFile(t)
key := loadJarKey(filepath.Join(dir, jf.KeyFile))
a, _ := fakeID("foldclient", "px.doubleclick.net", "uid", key)
b, _ := fakeID("foldclient", "ads.doubleclick.net", "uid", key)
if a == "" || a != b {
t.Fatalf("folding broken: px=%q ads=%q", a, b)
}
}
// TestJarNilCases: fakeID returns ("",false) exactly where Python returns None.
func TestJarNilCases(t *testing.T) {
jf, dir := loadJarFile(t)
key := loadJarKey(filepath.Join(dir, jf.KeyFile))
cases := []struct {
name string
client, tracker, cookie string
k []byte
}{
{"empty key", "c", "t.example", "uid", nil},
{"empty client", "", "t.example", "uid", key},
{"empty tracker", "c", "", "uid", key},
}
for _, tc := range cases {
if v, ok := fakeID(tc.client, tc.tracker, tc.cookie, tc.k); ok || v != "" {
t.Errorf("%s: fakeID=%q,%v want \"\",false", tc.name, v, ok)
}
}
}

View File

@ -0,0 +1,119 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: WG persona identity (mac_hash) (#662 Phase 6 prep)
//
// Byte-exact port of the Python WG-peer identity resolver
// (packages/secubox-toolbox/mitmproxy_addons/_common.py: _wg_hash_of /
// mac_hash_of). Python is the source of truth; this mirrors it exactly, proven
// by the cross-engine parity harness (testdata/wg-peers-fixture.json +
// testdata/machash-fixtures.json + machash_test.go ↔ tests/test_machash_parity.py).
//
// R3 clients reach this transparent engine over WireGuard on 10.99.1.0/24 and
// have NO ARP entry on the captive subnet, so they are identified by their WG
// public key (one peer → one IP, deterministic): ip → sha256(pubkey)[:16].
//
// Pure standard library — no external modules, no go.sum.
package main
import (
"crypto/sha256"
"encoding/hex"
"encoding/json"
"os"
"strings"
"sync"
)
// wgPeersPath is the on-disk WG peer DB, mirroring _common._WG_PEERS_DB. It is a
// package-level var (not a const) so tests can repoint it at a fixture.
var wgPeersPath = "/var/lib/secubox/toolbox/wg-peers.json"
// wgPeer mirrors the per-pubkey metadata object in wg-peers.json. Only "ip" is
// consumed here (other fields are ignored, like the Python meta.get("ip")).
type wgPeer struct {
IP string `json:"ip"`
}
// wgPeersDB mirrors the file shape: {"peers": {"<pubkey>": {"ip": "..."}}}.
type wgPeersDB struct {
Peers map[string]wgPeer `json:"peers"`
}
// WG peer cache, mtime-keyed and reloaded only on mtime change — exactly like
// the Python _WG_PEERS_CACHE / _WG_PEERS_MTIME globals. Guarded by a mutex: the
// Go proxy is genuinely concurrent (Python relied on the GIL), so the cache map
// and mtime MUST NOT be read/written without holding wgMu.
var (
wgMu sync.Mutex
wgCache map[string]string // ip → sha256(pubkey)[:16]
wgMtime int64 // last loaded file mtime (UnixNano), 0 = unloaded
)
// resetWGCache clears the in-process WG cache so the next wgHashOf reload reads
// wgPeersPath afresh. Used by tests after repointing wgPeersPath; mirrors the
// Python parity test resetting _WG_PEERS_CACHE/_WG_PEERS_MTIME.
func resetWGCache() {
wgMu.Lock()
wgCache = nil
wgMtime = 0
wgMu.Unlock()
}
// wgHashOf maps a WG peer IP (10.99.1.X) to sha256(peer_pubkey)[:16]. Mirrors
// _common._wg_hash_of EXACTLY: mtime-cached, reloaded only when the file mtime
// changes (or the cache is empty); ANY error (missing file, bad JSON, stat
// failure) → "" (best-effort, fail-open to empty, never panics). Returns "" for
// an IP not present in the DB. The cache is mutex-guarded for concurrency.
func wgHashOf(ip string) string {
wgMu.Lock()
defer wgMu.Unlock()
fi, err := os.Stat(wgPeersPath)
if err != nil {
return "" // missing file / unreadable → fail-open (Python: not exists → None)
}
mtime := fi.ModTime().UnixNano()
if mtime != wgMtime || wgCache == nil {
raw, err := os.ReadFile(wgPeersPath)
if err != nil {
return ""
}
var db wgPeersDB
if err := json.Unmarshal(raw, &db); err != nil {
return "" // bad JSON → fail-open (Python: except → None)
}
fresh := make(map[string]string, len(db.Peers))
for pubkey, meta := range db.Peers {
if meta.IP != "" {
sum := sha256.Sum256([]byte(pubkey))
fresh[meta.IP] = hex.EncodeToString(sum[:])[:16]
}
}
wgCache = fresh
wgMtime = mtime
}
return wgCache[ip] // missing key → "" (Python: cache.get(ip) → None)
}
// macHashOf resolves an IP to a stable per-client persona identity hash.
// Mirrors _common.mac_hash_of, but scoped to the R3 transparent engine:
//
// - empty ip → ""
// - 10.99.1.0/24 (WG peer) → wgHashOf(ip) = sha256(peer_pubkey)[:16]
// - else → ""
//
// The Python mac_hash_of has a third branch for the captive subnet
// (R0/R1/R2): hash_mac(mac_of(ip)) = HMAC(salt, ARP MAC). That ARP/HMAC path is
// INTENTIONALLY out of scope here — R3 clients arrive over WireGuard and have no
// ARP entry on the captive subnet, so this engine is WG-only. Off-subnet IPs
// therefore resolve to "" (the caller falls back to the raw peer IP).
func macHashOf(ip string) string {
if ip == "" {
return ""
}
if strings.HasPrefix(ip, "10.99.1.") {
return wgHashOf(ip)
}
return "" // R0-R2 ARP/HMAC path out of scope for the R3 transparent engine
}

View File

@ -0,0 +1,118 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// Cross-engine mac_hash (WG persona identity) parity harness — Go side
// (#662 Phase 6 prep).
//
// Loads testdata/machash-fixtures.json + the SAME testdata/wg-peers-fixture.json
// the Python side reads, points wgPeersPath at the fixture, and asserts
// macHashOf(ip) == each fixture's expected. The Python side
// (../secubox-toolbox/tests/test_machash_parity.py) monkeypatches
// _common._WG_PEERS_DB to the SAME fixture and drives _common.mac_hash_of; both
// must agree → the WG persona hash is byte-exact across engines. Python is the
// source of truth: the expected values were GENERATED by sha256(pubkey)[:16] in
// Python, never hand-computed in Go (non-circular parity).
package main
import (
"encoding/json"
"os"
"path/filepath"
"testing"
)
type machashFixture struct {
IP string `json:"ip"`
Expected string `json:"expected"`
Why string `json:"why"`
}
type machashFile struct {
WGPeersFile string `json:"wg_peers_file"`
Fixtures []machashFixture `json:"fixtures"`
}
func loadMachashFile(t *testing.T) (machashFile, string) {
t.Helper()
dir := testdataDir(t) // shared with policy_test.go (cmd/sbxmitm → ../../testdata)
raw, err := os.ReadFile(filepath.Join(dir, "machash-fixtures.json"))
if err != nil {
t.Fatalf("read machash fixtures: %v", err)
}
var mf machashFile
if err := json.Unmarshal(raw, &mf); err != nil {
t.Fatalf("parse machash fixtures: %v", err)
}
if len(mf.Fixtures) == 0 {
t.Fatal("no machash fixtures")
}
return mf, dir
}
// withWGFixture points wgPeersPath at the fixture and resets the cache so the
// override is (re)read, restoring the original path afterwards. Mirrors exactly
// the (path, cache) surface the Python _wg_hash_of reads.
func withWGFixture(t *testing.T, mf machashFile, dir string) {
t.Helper()
orig := wgPeersPath
wgPeersPath = filepath.Join(dir, mf.WGPeersFile)
resetWGCache()
t.Cleanup(func() {
wgPeersPath = orig
resetWGCache()
})
}
// TestMacHashParity: macHashOf == Python-generated expected for every fixture.
func TestMacHashParity(t *testing.T) {
mf, dir := loadMachashFile(t)
withWGFixture(t, mf, dir)
for _, fx := range mf.Fixtures {
got := macHashOf(fx.IP)
if got != fx.Expected {
t.Errorf("macHashOf(%q)=%q want %q (%s)", fx.IP, got, fx.Expected, fx.Why)
}
}
}
// TestMacHashCoverage: the fixtures must exercise the discriminating cases, else
// "parity" is vacuous. We need at least one resolved WG peer (non-empty), one
// in-subnet miss (empty), one off-subnet IP (empty), and the empty ip (empty).
func TestMacHashCoverage(t *testing.T) {
mf, dir := loadMachashFile(t)
withWGFixture(t, mf, dir)
var sawResolved, sawSubnetMiss, sawOffSubnet, sawEmpty bool
for _, fx := range mf.Fixtures {
switch {
case fx.IP == "":
sawEmpty = true
case fx.Expected != "":
sawResolved = true
case len(fx.IP) >= 8 && fx.IP[:8] == "10.99.1.":
sawSubnetMiss = true
default:
sawOffSubnet = true
}
}
if !sawResolved || !sawSubnetMiss || !sawOffSubnet || !sawEmpty {
t.Fatalf("machash coverage incomplete: resolved=%v subnetMiss=%v offSubnet=%v empty=%v",
sawResolved, sawSubnetMiss, sawOffSubnet, sawEmpty)
}
}
// TestWGCacheReload: wgHashOf reflects the file's content; after pointing at a
// missing path it fails open to "" (best-effort, never panics).
func TestWGCacheReload(t *testing.T) {
mf, dir := loadMachashFile(t)
withWGFixture(t, mf, dir)
// A resolved peer from the fixture returns non-empty.
if got := wgHashOf("10.99.1.10"); got == "" {
t.Fatal("wgHashOf(10.99.1.10) empty — fixture not loaded")
}
// Repoint at a missing file → reload → fail-open to "".
wgPeersPath = filepath.Join(dir, "does-not-exist.json")
resetWGCache()
if got := wgHashOf("10.99.1.10"); got != "" {
t.Fatalf("wgHashOf with missing file = %q want \"\"", got)
}
}

View File

@ -0,0 +1,479 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: forging MITM PoC (#662 Phase 1)
//
// De-risking spike for migrating the R3 MITM engine off Python mitmproxy onto a
// multi-core Go core. Pure standard library (no external modules) so it builds
// offline and cross-compiles to arm64 with `GOOS=linux GOARCH=arm64 go build`.
//
// It is NOT wired into the live R3 path. It proves the discriminating
// capabilities the engine analysis flagged as risky:
// - forge per-host leaf certs from the EXISTING ca-wg CA (client trust intact),
// - request short-circuit 204 (ad_ghost block),
// - response body inject (banner / ad-CSS),
// - SNI splice passthrough (tls_splice),
// - TLS ClientHello capture for JA4 (ja4 addon) via crypto/tls.GetCertificate.
//
// Runs as an HTTP CONNECT proxy for easy smoke-testing (`curl -x`). The live
// engine will run transparent (SO_ORIGINAL_DST) — same handlers, different
// accept path (Phase 2+).
package main
import (
"bytes"
"context"
"crypto"
"crypto/rand"
"crypto/tls"
"crypto/x509"
"crypto/x509/pkix"
"encoding/pem"
"flag"
"fmt"
"io"
"log"
"math/big"
"net"
"net/http"
"os"
"strconv"
"strings"
"sync"
"time"
)
// ── CA + per-host leaf forging ──────────────────────────────────────────────
// CA holds the loaded forging CA (reused from ca-wg) + a per-host leaf cache.
type CA struct {
cert *x509.Certificate
key crypto.Signer
mu sync.Mutex
cache map[string]*tls.Certificate
}
func loadCA(certPath, keyPath string) (*CA, error) {
cpem, err := os.ReadFile(certPath)
if err != nil {
return nil, fmt.Errorf("read ca cert: %w", err)
}
kpem, err := os.ReadFile(keyPath)
if err != nil {
return nil, fmt.Errorf("read ca key: %w", err)
}
// Scan for the right block TYPE rather than assuming position: the live R3
// CA the toolbox forges with (mitmproxy confdir `mitmproxy-ca.pem`) is a
// COMBINED cert+key bundle, and --ca-key may point at it. Tolerate cert and
// key co-residing in either file, in any order.
cblk := firstPEMBlock(cpem, func(b *pem.Block) bool { return b.Type == "CERTIFICATE" })
if cblk == nil {
return nil, fmt.Errorf("ca cert: no CERTIFICATE PEM block")
}
cert, err := x509.ParseCertificate(cblk.Bytes)
if err != nil {
return nil, fmt.Errorf("parse ca cert: %w", err)
}
kblk := firstPEMBlock(kpem, func(b *pem.Block) bool { return strings.Contains(b.Type, "PRIVATE KEY") })
if kblk == nil {
return nil, fmt.Errorf("ca key: no PRIVATE KEY PEM block")
}
key, err := parseKey(kblk.Bytes)
if err != nil {
return nil, fmt.Errorf("parse ca key: %w", err)
}
return &CA{cert: cert, key: key, cache: map[string]*tls.Certificate{}}, nil
}
// firstPEMBlock returns the first PEM block in data satisfying want, or nil.
// Used to pull a specific block (CERTIFICATE / PRIVATE KEY) out of a file that
// may hold several (e.g. mitmproxy's combined CA bundle).
func firstPEMBlock(data []byte, want func(*pem.Block) bool) *pem.Block {
for {
blk, rest := pem.Decode(data)
if blk == nil {
return nil
}
if want(blk) {
return blk
}
data = rest
}
}
func parseKey(der []byte) (crypto.Signer, error) {
if k, err := x509.ParsePKCS8PrivateKey(der); err == nil {
if s, ok := k.(crypto.Signer); ok {
return s, nil
}
}
if k, err := x509.ParsePKCS1PrivateKey(der); err == nil {
return k, nil
}
if k, err := x509.ParseECPrivateKey(der); err == nil {
return k, nil
}
return nil, fmt.Errorf("unsupported CA key format")
}
// forge returns a leaf cert for host signed by the CA, cached.
func (c *CA) forge(host string) (*tls.Certificate, error) {
host = strings.ToLower(strings.TrimSpace(host))
c.mu.Lock()
if tc, ok := c.cache[host]; ok {
c.mu.Unlock()
return tc, nil
}
c.mu.Unlock()
serial, _ := rand.Int(rand.Reader, new(big.Int).Lsh(big.NewInt(1), 128))
tmpl := &x509.Certificate{
SerialNumber: serial,
Subject: pkix.Name{CommonName: host},
NotBefore: time.Now().Add(-1 * time.Hour),
NotAfter: time.Now().Add(24 * time.Hour),
KeyUsage: x509.KeyUsageDigitalSignature | x509.KeyUsageKeyEncipherment,
ExtKeyUsage: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth},
DNSNames: []string{host},
}
der, err := x509.CreateCertificate(rand.Reader, tmpl, c.cert, c.key.Public(), c.key)
if err != nil {
return nil, err
}
leaf, err := x509.ParseCertificate(der) // parsed cert has Raw populated (Verify needs it)
if err != nil {
return nil, err
}
tc := &tls.Certificate{Certificate: [][]byte{der, c.cert.Raw}, PrivateKey: c.key, Leaf: leaf}
c.mu.Lock()
c.cache[host] = tc
c.mu.Unlock()
return tc, nil
}
// ── Pure handler logic ───────────────────────────────────────────────────────
//
// The decision surface (Decide / action / registrable / splice helpers) lives
// in policy.go, ported from the Python addons and proven at parity by the
// cross-engine harness. The body-inject helper is kept here next to the wiring.
// injectMarker inserts p.Inject before </head> (else </body>, else prepends).
func (p *Policy) injectMarker(body []byte) []byte {
if len(p.Inject) == 0 || bytes.Contains(body, p.Inject) {
return body
}
for _, tag := range [][]byte{[]byte("</head>"), []byte("</body>")} {
if i := bytes.Index(bytes.ToLower(body), bytes.ToLower(tag)); i >= 0 {
out := make([]byte, 0, len(body)+len(p.Inject))
out = append(out, body[:i]...)
out = append(out, p.Inject...)
out = append(out, body[i:]...)
return out
}
}
return append(append([]byte{}, p.Inject...), body...)
}
// ── JA4 ClientHello capture (the Go-feasibility proof for the ja4 addon) ─────
// ja4ish builds a compact handshake fingerprint from the fields crypto/tls
// exposes in ClientHelloInfo (SNI, TLS versions, cipher count, ALPN). A FULL
// JA4 also needs the extension list, which requires a raw-ClientHello-bytes
// peek before stdlib parsing — feasible (Phase 4); this proves the material is
// reachable in Go without Python.
func ja4ish(h *tls.ClientHelloInfo) string {
maxVer := uint16(0)
for _, v := range h.SupportedVersions {
if v > maxVer {
maxVer = v
}
}
alpn := "none"
if len(h.SupportedProtos) > 0 {
alpn = h.SupportedProtos[0]
}
return fmt.Sprintf("t%04x_c%02d_a%s_sni=%s", maxVer, len(h.CipherSuites), alpn, h.ServerName)
}
// ── CONNECT-proxy MITM wiring ────────────────────────────────────────────────
type Proxy struct {
ca *CA
pol *Policy
jaSink func(string) // JA4 observations (logged; a sidecar in prod)
jarKey []byte // anti-track HMAC fake-identity seed (nil → poison off)
poison bool // master gate: poison tracker Set-Cookies (default on when jarKey present)
portal string // portal base URL for /__toolbox/* reverse-proxy (banner assets)
}
func (px *Proxy) serverTLSConfig() *tls.Config {
return &tls.Config{
GetCertificate: func(h *tls.ClientHelloInfo) (*tls.Certificate, error) {
if px.jaSink != nil {
px.jaSink(ja4ish(h)) // capture handshake fingerprint
}
name := h.ServerName
if name == "" {
name = "unknown.local"
}
return px.ca.forge(name)
},
}
}
func (px *Proxy) handleConnect(w http.ResponseWriter, r *http.Request) {
host := r.URL.Hostname()
hj, ok := w.(http.Hijacker)
if !ok {
http.Error(w, "no hijack", 500)
return
}
client, _, err := hj.Hijack()
if err != nil {
return
}
defer client.Close()
io.WriteString(client, "HTTP/1.1 200 Connection Established\r\n\r\n")
// Decide once on (host, sni). For the CONNECT PoC the SNI is the CONNECT
// host; the transparent engine will splice on the real ClientHello SNI.
verdict := px.pol.Decide(host, host)
if verdict == "splice" {
// passthrough: raw TCP to upstream, no TLS interception (tls_splice).
up, err := net.DialTimeout("tcp", r.URL.Host, 10*time.Second)
if err != nil {
return
}
defer up.Close()
go io.Copy(up, client)
io.Copy(client, up)
return
}
// MITM: TLS-terminate the client with a forged cert (+ ClientHello capture).
tconn := tls.Server(client, px.serverTLSConfig())
if err := tconn.Handshake(); err != nil {
return
}
defer tconn.Close()
// Shared post-TLS pipeline. CONNECT dials upstream by the request URL host
// (req.URL.Host set inside), so dialHost is "" → mitmPipeline derives it.
// CONNECT PoC is never an R3 WG client → wg=false.
px.mitmPipeline(tconn, client, host, verdict, "", false)
}
// mitmPipeline runs the shared post-TLS-handshake MITM logic used by BOTH the
// CONNECT path (handleConnect) and the transparent path (handleTransparent):
// read the decrypted request, apply the verdict, anonymize, proxy upstream,
// poison tracker Set-Cookies, inject into HTML, and write the response back over
// tconn. Factored out so the two accept paths never drift.
//
// - tconn : the TLS-terminated client connection (forged leaf).
// - rawClient : the underlying client net.Conn (for the per-client identity).
// - host : the decision host (CONNECT host / transparent SNI). Also the
// Host/SNI used for the upstream request and TLS verification.
// - verdict : the already-Decided action ∈ {allow, mitm, block}.
// - dialHost : upstream "ip:port" to FORCE-dial at the TCP layer. "" →
// CONNECT semantics: dial by req.URL.Host (the request URL / host). Non-""
// → transparent: TCP-connect the captured original-dst while doing TLS with
// ServerName=host and verifying the cert against host (not the bare IP).
// - wg : the client is an R3 WireGuard peer (10.99.1.0/24); threaded
// into the injected loader's data-wg attribute. CONNECT path passes false.
func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict, dialHost string, wg bool) {
br := newReader(tconn)
req, err := http.ReadRequest(br)
if err != nil {
return
}
req.URL.Scheme = "https"
if req.URL.Host == "" {
req.URL.Host = host
}
// #636/#662 — serve the banner loader + bundle for ANY origin so the injected
// <script src="/__toolbox/loader.js"> resolves (R3 clients hit arbitrary
// hosts whose origin can't serve /__toolbox/*). Short-circuit BEFORE dialing
// the real upstream by reverse-proxying to the portal. Mirrors the Python
// InjectBanner.request() startswith checks (path includes the query string).
if isToolboxAssetPath(req.URL.RequestURI()) {
servePortalAsset(tconn, px.portal, req.URL.RequestURI())
return
}
// Transparent: the upstream request must carry the SNI host (for Host header,
// SNI, and cert verification); the actual TCP dial is pinned to the captured
// original-dst by transparentTransport. We do NOT put the bare ip:port in
// req.URL.Host (that would make http.Client verify the cert against the IP).
if dialHost != "" && host != "" {
req.URL.Host = host
}
if verdict == "block" {
writeRaw(tconn, 204, "No Content", map[string]string{"X-SecuBox-Ng": "blocked"}, nil)
return
}
// ── verdict ∈ {"allow","mitm"} → intercept normally ──────────────────────
//
// allow → own-infra / allowlist: clean MITM, apply NO block/poison.
// mitm → intercept + apply the response handlers (poison if a tracker).
//
// Always-on hygiene: anonymize the request on EVERY MITM'd flow (incl.
// allow — stripping operator headers + asserting opt-out is universally
// safe and never touches own-infra correctness).
clientHash := clientHashFromConn(rawClient) // mac_hash-aware (WG persona)
anonymizeRequest(req.Header)
// #662 — pin the upstream Accept-Encoding to gzip (overwrite, dropping
// br/zstd/deflate we cannot decode with the stdlib). This guarantees every
// response is either gzip or identity, so the inject path can reliably
// gunzip→inject→re-gzip the HTML. We Set (not Del): Del would make Go's
// Transport auto-decompress and re-serve identity, losing wire compression
// to the client for ALL resources (incl. non-injected ones). Set keeps the
// Transport in pass-through mode so non-HTML bodies stay compressed
// end-to-end. Browsers always accept gzip, so relaying gzip back is safe.
req.Header.Set("Accept-Encoding", "gzip")
// proxy upstream, inject into HTML bodies.
//
// CheckRedirect: a MITM proxy must NOT follow 3xx itself — it relays the
// redirect to the client so the BROWSER follows it (correct URL bar, origin,
// cookie scope, method semantics). Go's http.Client follows by default, which
// would collapse a 301/302 into the final 200 under the original URL (wrong).
// Mirror mitmproxy's pass-through behaviour.
up := &http.Client{
Timeout: 30 * time.Second,
CheckRedirect: func(*http.Request, []*http.Request) error { return http.ErrUseLastResponse },
}
if dialHost != "" {
// Transparent: pin the TCP dial to the captured original-dst, do TLS with
// ServerName=host, verify the cert against host (verification stays ON).
up.Transport = transparentTransport(dialHost, host)
}
req.RequestURI = ""
resp, err := up.Do(req)
if err != nil {
writeRaw(tconn, 502, "Bad Gateway", nil, nil)
return
}
defer resp.Body.Close()
// Poison: only on MITM'd tracker flows (never on allow/own-infra), and only
// when the jar key is loaded. Replaces tracking-id Set-Cookie values with a
// stable fabricated persona; benign cookies pass through untouched.
if verdict == "mitm" && px.poison && len(px.jarKey) > 0 && px.pol.shouldPoison(host) {
if sc := resp.Header.Values("Set-Cookie"); len(sc) > 0 {
poisoned := poisonSetCookies(sc, clientHash, host, px.jarKey)
resp.Header.Del("Set-Cookie")
for _, c := range poisoned {
resp.Header.Add("Set-Cookie", c)
}
}
}
body, _ := io.ReadAll(io.LimitReader(resp.Body, 8<<20))
// Inject the transparency-banner loader only on 2xx text/html responses
// (mirrors the Python addon, which skips non-200). The loader's same-origin
// <script src="/__toolbox/loader.js"> is served by the short-circuit above.
//
// #662 — the body may be gzip-compressed (we pinned Accept-Encoding: gzip
// upstream). injectIntoBody gunzips→injects→re-gzips when Content-Encoding
// is gzip, injects directly when identity, and fails open (untouched) on a
// corrupt/unknown encoding. Only on a successful rewrite do we update the
// framing: writeResponse emits Content-Length from len(body), but a stale
// resp.ContentLength / Content-Encoding could mislead downstream — so we
// keep them consistent with the bytes we actually serve.
if resp.StatusCode >= 200 && resp.StatusCode < 300 &&
strings.Contains(resp.Header.Get("Content-Type"), "text/html") {
if out, ok := injectIntoBody(body, resp.Header.Get("Content-Encoding"), clientHash, wg); ok {
body = out
// Keep the response framing consistent with the served bytes. The
// encoding is unchanged (gzip stays gzip, identity stays identity);
// only the length changed because injection grew the body. A stale
// Content-Length would truncate/corrupt the response.
resp.Header.Set("Content-Length", strconv.Itoa(len(body)))
resp.ContentLength = int64(len(body))
}
}
writeResponse(tconn, resp, body)
}
// transparentTransport builds a per-request http.Transport for the transparent
// path: it TCP-dials the captured original-dst (ip:port) for EVERY connection
// regardless of req.URL.Host, while performing TLS with ServerName=sni and
// verifying the cert against that name — so a transparently-redirected upstream
// is reached at the real captured IP yet validated by hostname, NOT the bare IP
// (which would always mismatch the cert). Cert verification stays ON
// (no InsecureSkipVerify). Pure stdlib so it builds on all GOOS.
func transparentTransport(dialAddr, sni string) *http.Transport {
d := &net.Dialer{Timeout: 10 * time.Second}
return &http.Transport{
DialContext: func(ctx context.Context, network, _ string) (net.Conn, error) {
return d.DialContext(ctx, network, dialAddr)
},
TLSClientConfig: &tls.Config{ServerName: sni},
TLSHandshakeTimeout: 10 * time.Second,
ResponseHeaderTimeout: 30 * time.Second,
ForceAttemptHTTP2: false,
}
}
func main() {
caCert := flag.String("ca-cert", "/etc/secubox/toolbox/ca-wg/ca.pem", "CA cert PEM")
caKey := flag.String("ca-key", "/etc/secubox/toolbox/ca-wg/key.pem", "CA key PEM")
addr := flag.String("listen", ":8090", "CONNECT proxy listen addr")
jarKeyPath := flag.String("jar-key", "/etc/secubox/secrets/privacy-jar.key",
"anti-track HMAC fake-identity seed (poison disabled if absent)")
poison := flag.Bool("poison", true,
"poison tracking Set-Cookies on MITM'd tracker flows (needs --jar-key; never touches allow/own-infra)")
transparent := flag.Bool("transparent", false,
"transparent mode: accept nft-DNAT'd conns + recover SO_ORIGINAL_DST (live R3); default is the CONNECT proxy PoC")
portal := flag.String("portal", "http://127.0.0.1:8088",
"portal base URL; /__toolbox/loader.js + /__toolbox/bundle are reverse-proxied here (banner assets, served for any MITM'd origin)")
flag.Parse()
ca, err := loadCA(*caCert, *caKey)
if err != nil {
log.Fatalf("CA load: %v", err)
}
// Load the BLOCK/SPLICE policy from the SAME on-disk config the Python
// addons read (defaults + env overrides). Missing files are tolerated
// (best-effort, like the addons): the engine then simply MITMs everything.
pol, err := LoadPolicy(PolicyOpts{})
if err != nil {
log.Fatalf("policy load: %v", err)
}
pol.Inject = []byte("<!-- sbx-ng banner -->")
// Anti-track jar seed: best-effort (like the Python _jar_key). Absent/empty
// → loadJarKey returns nil → poison stays off even if --poison is set.
jarKey := loadJarKey(*jarKeyPath)
if *poison && len(jarKey) == 0 {
log.Printf("poison requested but jar key %s absent/empty → poison OFF", *jarKeyPath)
}
px := &Proxy{
ca: ca,
pol: pol,
jaSink: func(s string) { log.Printf("ja4 %s", s) },
jarKey: jarKey,
poison: *poison,
portal: *portal,
}
if *transparent {
// Transparent R3 mode: raw accept loop, each conn carries its pre-DNAT
// destination via SO_ORIGINAL_DST (recovered in handleTransparent). The
// accept loop lives in runTransparent — linux-tagged, with a non-linux
// stub so the package still builds (and `darwin go build`) off-target.
runTransparent(px, *addr)
return
}
srv := &http.Server{Addr: *addr, Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method == http.MethodConnect {
px.handleConnect(w, r)
return
}
http.Error(w, "CONNECT only (PoC)", 405)
})}
log.Printf("sbxmitm CONNECT PoC listening on %s (CA %s)", *addr, *caCert)
log.Fatal(srv.ListenAndServe())
}

View File

@ -0,0 +1,204 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
package main
import (
"crypto/ecdsa"
"crypto/elliptic"
"crypto/rand"
"crypto/tls"
"crypto/x509"
"crypto/x509/pkix"
"encoding/pem"
"math/big"
"net"
"os"
"path/filepath"
"sync"
"testing"
"time"
)
// genTestCA writes a self-signed CA (cert+key PEM) to dir, mirroring ca-wg.
func genTestCA(t *testing.T, dir string) (certPath, keyPath string) {
t.Helper()
key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
if err != nil {
t.Fatal(err)
}
tmpl := &x509.Certificate{
SerialNumber: big.NewInt(1),
Subject: pkix.Name{CommonName: "SecuBox Test CA"},
NotBefore: time.Now().Add(-time.Hour),
NotAfter: time.Now().Add(24 * time.Hour),
IsCA: true,
KeyUsage: x509.KeyUsageCertSign | x509.KeyUsageDigitalSignature,
BasicConstraintsValid: true,
}
der, err := x509.CreateCertificate(rand.Reader, tmpl, tmpl, key.Public(), key)
if err != nil {
t.Fatal(err)
}
certPath = filepath.Join(dir, "ca.pem")
keyPath = filepath.Join(dir, "key.pem")
cf, _ := os.Create(certPath)
pem.Encode(cf, &pem.Block{Type: "CERTIFICATE", Bytes: der})
cf.Close()
kder, _ := x509.MarshalPKCS8PrivateKey(key)
kf, _ := os.Create(keyPath)
pem.Encode(kf, &pem.Block{Type: "PRIVATE KEY", Bytes: kder})
kf.Close()
return certPath, keyPath
}
func TestForgeChainsToCA(t *testing.T) {
cp, kp := genTestCA(t, t.TempDir())
ca, err := loadCA(cp, kp)
if err != nil {
t.Fatalf("loadCA: %v", err)
}
leaf, err := ca.forge("ads.example.com")
if err != nil {
t.Fatalf("forge: %v", err)
}
pool := x509.NewCertPool()
pool.AddCert(ca.cert)
if _, err := leaf.Leaf.Verify(x509.VerifyOptions{Roots: pool, DNSName: "ads.example.com"}); err != nil {
t.Fatalf("forged leaf does not chain to CA / wrong SAN: %v", err)
}
leaf2, _ := ca.forge("ads.example.com")
if leaf2 != leaf {
t.Fatal("forge not cached")
}
}
// TestLoadCACombinedPEM proves loadCA pulls the right blocks out of a COMBINED
// cert+key bundle — the real shape of mitmproxy's confdir `mitmproxy-ca.pem`,
// which the live R3 CA uses and the worker unit points --ca-key at. mitmproxy
// writes the PRIVATE KEY block first, then the CERTIFICATE; loadCA must scan by
// type, not position.
func TestLoadCACombinedPEM(t *testing.T) {
dir := t.TempDir()
key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
if err != nil {
t.Fatal(err)
}
tmpl := &x509.Certificate{
SerialNumber: big.NewInt(7),
Subject: pkix.Name{CommonName: "Gondwana ToolBoX R3 CA (test)"},
NotBefore: time.Now().Add(-time.Hour),
NotAfter: time.Now().Add(24 * time.Hour),
IsCA: true,
KeyUsage: x509.KeyUsageCertSign | x509.KeyUsageDigitalSignature,
BasicConstraintsValid: true,
}
der, err := x509.CreateCertificate(rand.Reader, tmpl, tmpl, key.Public(), key)
if err != nil {
t.Fatal(err)
}
kder, _ := x509.MarshalPKCS8PrivateKey(key)
keyPEM := pem.EncodeToMemory(&pem.Block{Type: "PRIVATE KEY", Bytes: kder})
certPEM := pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: der})
// mitmproxy-ca.pem layout: key THEN cert in one file.
combined := filepath.Join(dir, "mitmproxy-ca.pem")
if err := os.WriteFile(combined, append(append([]byte{}, keyPEM...), certPEM...), 0o600); err != nil {
t.Fatal(err)
}
// mitmproxy-ca-cert.pem: cert only.
certOnly := filepath.Join(dir, "mitmproxy-ca-cert.pem")
if err := os.WriteFile(certOnly, certPEM, 0o644); err != nil {
t.Fatal(err)
}
// The unit's exact arg shape: --ca-cert <cert-only> --ca-key <combined>.
ca, err := loadCA(certOnly, combined)
if err != nil {
t.Fatalf("loadCA(cert-only, combined): %v", err)
}
leaf, err := ca.forge("ads.example.com")
if err != nil {
t.Fatalf("forge: %v", err)
}
pool := x509.NewCertPool()
pool.AddCert(ca.cert)
if _, err := leaf.Leaf.Verify(x509.VerifyOptions{Roots: pool, DNSName: "ads.example.com"}); err != nil {
t.Fatalf("forged leaf does not chain to combined-PEM CA: %v", err)
}
// Belt-and-braces: the combined file works as BOTH cert and key source.
if _, err := loadCA(combined, combined); err != nil {
t.Fatalf("loadCA(combined, combined): %v", err)
}
}
// NOTE (#662 Phase 3): the old TestActionDecision drove the removed hardcoded
// Policy{AdHosts, SpliceHosts} fields. The decision surface now loads from
// disk (LoadPolicy) and mirrors the Python addons; coverage moved to
// TestParityDecide / TestPolicyActionVerbs in policy_test.go.
func TestInjectMarker(t *testing.T) {
p := &Policy{Inject: []byte("<!--SBX-->")}
out := string(p.injectMarker([]byte("<html><head></head><body>hi</body></html>")))
if !contains(out, "<!--SBX--></head>") {
t.Fatalf("marker not injected before </head>: %s", out)
}
if string(p.injectMarker([]byte(out))) != out {
t.Fatal("inject not idempotent")
}
}
func contains(s, sub string) bool {
for i := 0; i+len(sub) <= len(s); i++ {
if s[i:i+len(sub)] == sub {
return true
}
}
return false
}
// TestClientHelloCaptureAndForge: a real localhost TLS handshake proves the Go
// core forges a per-SNI cert from the CA that the client trusts AND that the
// ClientHello (JA4 material) is captured.
func TestClientHelloCaptureAndForge(t *testing.T) {
cp, kp := genTestCA(t, t.TempDir())
ca, err := loadCA(cp, kp)
if err != nil {
t.Fatal(err)
}
var mu sync.Mutex
var captured string
px := &Proxy{ca: ca, jaSink: func(s string) { mu.Lock(); captured = s; mu.Unlock() }}
ln, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatal(err)
}
defer ln.Close()
go func() {
c, err := ln.Accept()
if err != nil {
return
}
s := tls.Server(c, px.serverTLSConfig())
s.Handshake()
s.Close()
}()
pool := x509.NewCertPool()
pool.AddCert(ca.cert)
conn, err := tls.Dial("tcp", ln.Addr().String(), &tls.Config{ServerName: "example.com", RootCAs: pool})
if err != nil {
t.Fatalf("client handshake against forged cert failed (CA not trusted / forge broken): %v", err)
}
conn.Close()
mu.Lock()
defer mu.Unlock()
if captured == "" {
t.Fatal("ClientHello not captured")
}
if !contains(captured, "sni=example.com") {
t.Fatalf("JA4 capture missing SNI: %q", captured)
}
t.Logf("captured JA4-ish: %s", captured)
}

View File

@ -0,0 +1,47 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// Gate tests for the poison emission (#662 Phase 5-prep, Part A): poison only
// fires on MITM'd TRACKER flows, never on allow/own-infra flows. This is the
// same safety envelope as anti-track — own-infra/allowlist flows stay clean.
package main
import (
"path/filepath"
"testing"
)
// TestShouldPoisonGate: a tracker host MITM'd → poison; an own-infra/allowlisted
// host → never poison (even though both are intercepted = "mitm" verb).
func TestShouldPoisonGate(t *testing.T) {
pf, dir := loadParityFile(t)
cfgPath := func(rel string) string { return filepath.Join(dir, filepath.FromSlash(rel)) }
pol, err := LoadPolicy(PolicyOpts{
AllowPath: cfgPath(pf.Config.AdAllowlist),
LearnedPath: cfgPath(pf.Config.LearnedTrackers),
SpliceSeedPath: cfgPath(pf.Config.SpliceSeed),
SpliceLearnPath: cfgPath(pf.Config.SpliceLearned),
PureTrackersPath: cfgPath(pf.Config.PureTrackers),
FortknoxSites: pf.Config.FortknoxSites,
SelfDomains: pf.Config.SelfDomains,
})
if err != nil {
t.Fatal(err)
}
cases := map[string]bool{
// tracker hosts → poison eligible (a tracker we'd otherwise block, but
// once MITM'd we poison rather than blunt-block).
"ads.doubleclick.net": true,
"adnxs.com": true,
// own-infra + allowlisted + benign → NEVER poison.
"hub.secubox.in": false,
"analytics.example-allowed.com": false,
"news.example.com": false,
}
for host, want := range cases {
if got := pol.shouldPoison(host); got != want {
t.Errorf("shouldPoison(%q)=%v want %v", host, got, want)
}
}
}

View File

@ -0,0 +1,369 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: policy layer (#662 Phase 3)
//
// Ports the toolbox BLOCK (ad_ghost) and SPLICE (tls_splice) decision logic
// into the Go core, reading the SAME on-disk config files the Python addons
// use. Python is the source of truth; this mirrors it byte-for-byte on the
// decision surface, proven by the cross-engine parity harness
// (testdata/parity-fixtures.json + policy_test.go ↔ tests/test_engine_parity.py).
//
// Pure standard library — no external modules, no go.sum.
package main
import (
"bufio"
"os"
"regexp"
"strings"
)
// ── ad_ghost: static ad/tracker host pattern (port of _AD_HOST) ──────────────
//
// Python (mitmproxy_addons/ad_ghost.py):
//
// _AD_HOST = re.compile(
// r"(?:^|\.)(?:doubleclick|googlesyndication|googleadservices|"
// r"googletagservices|adservice\.google|amazon-adsystem|adnxs|adsrvr|"
// r"adform|criteo|rubiconproject|taboola|outbrain|smartadserver|moatads|"
// r"scorecardresearch|2mdn|adroll|pubmatic|openx|casalemedia|"
// r"yieldlove|sharethrough|teads|3lift|adsystem|adserver)",
// re.IGNORECASE)
//
// Every construct here — non-capturing groups, `^`, `\.`, alternation, the
// case-insensitive flag — is RE2-safe, so it translates 1:1 to Go regexp via
// the `(?i)` inline flag. No fallback substring split was needed.
const adHostPattern = `(?i)(?:^|\.)(?:doubleclick|googlesyndication|googleadservices|` +
`googletagservices|adservice\.google|amazon-adsystem|adnxs|adsrvr|` +
`adform|criteo|rubiconproject|taboola|outbrain|smartadserver|moatads|` +
`scorecardresearch|2mdn|adroll|pubmatic|openx|casalemedia|` +
`yieldlove|sharethrough|teads|3lift|adsystem|adserver)`
// _2L_TLD: two-level public suffixes (port of ad_ghost._2L_TLD).
var twoLevelTLD = map[string]bool{
"co.uk": true, "com.au": true, "co.jp": true, "co.nz": true,
"com.br": true, "co.za": true, "gouv.fr": true,
}
// ── PolicyOpts: configurable file paths (env-overridable, like Python) ───────
// PolicyOpts holds the on-disk paths the loaders read. Empty fields fall back
// to the real production defaults (or the env override) in LoadPolicy.
type PolicyOpts struct {
AllowPath string // ad-allowlist.txt (_ALLOW_PATH)
LearnedPath string // learned-trackers.txt (_LEARNED_PATH)
SpliceSeedPath string // conf/tls-splice-seed.conf (SEED_PATH)
SpliceLearnPath string // splice-learned.txt (LEARNED_PATH)
PureTrackersPath string // pure-trackers.txt (PURE_PATH)
FortknoxSites []string // filters.json fortknox_sites
SelfDomains []string // _SELF_REGS (default {secubox.in}, env SECUBOX_SELF_DOMAINS)
}
// defaultPolicyOpts returns the production defaults, honoring the same env vars
// the Python addons read.
func defaultPolicyOpts() PolicyOpts {
o := PolicyOpts{
AllowPath: "/var/lib/secubox/toolbox/ad-allowlist.txt",
LearnedPath: "/var/lib/secubox/toolbox/learned-trackers.txt",
SpliceSeedPath: envOr("SECUBOX_SPLICE_SEED", "/usr/lib/secubox/toolbox/conf/tls-splice-seed.conf"),
SpliceLearnPath: envOr("SECUBOX_SPLICE_LEARNED", "/var/lib/secubox/toolbox/splice-learned.txt"),
PureTrackersPath: envOr("SECUBOX_PURE_TRACKERS", "/var/lib/secubox/toolbox/pure-trackers.txt"),
}
// _SELF_REGS: env SECUBOX_SELF_DOMAINS (comma-split), default {secubox.in}.
self := os.Getenv("SECUBOX_SELF_DOMAINS")
if strings.TrimSpace(self) == "" {
self = "secubox.in"
}
for _, d := range strings.Split(self, ",") {
if d = strings.TrimSpace(strings.ToLower(d)); d != "" {
o.SelfDomains = append(o.SelfDomains, d)
}
}
return o
}
func envOr(key, def string) string {
if v := os.Getenv(key); v != "" {
return v
}
return def
}
// ── Policy: the loaded decision state ────────────────────────────────────────
// Policy carries the loaded sets/regex and decides per-host actions. It also
// keeps the legacy PoC fields (Inject) so the existing wiring/tests still work.
type Policy struct {
adHost *regexp.Regexp
learned map[string]bool // learned-trackers (host or registrable, lowercased)
allow map[string]bool // ad-allowlist (host or registrable, lowercased)
spliceSeed map[string]bool // splice seed patterns
spliceLearn map[string]bool // splice learned patterns
never map[string]bool // pure-trackers fortknox (splice never-set)
selfRegs map[string]bool // own-infra registrable domains
selfDomains []string // own-infra (for the host==d || host endswith .d guard)
// Legacy PoC fields kept so non-policy behaviour is unchanged.
Inject []byte // banner / ad-CSS marker injected before </head> or </body>
}
// loadLines mirrors the comment-stripping Python loaders (splice._load_lines,
// ad_ghost._allowed's allowlist read): split on first '#', trim, lowercase,
// skip blanks. Missing/unreadable file → empty set (best-effort).
func loadLines(path string) map[string]bool {
return scanLines(path, true)
}
// loadLinesRaw mirrors ad_ghost._learned_set, which does NOT comment-strip —
// learned-trackers.txt is a machine-generated one-host-per-line file. It does
// `{ln.strip().lower() for ln in f if ln.strip()}`. Matching this exactly is
// load-bearing for parity (a '#' in this file would be kept verbatim, not a
// comment), so the Go core must mirror the divergent behaviour, not normalise it.
func loadLinesRaw(path string) map[string]bool {
return scanLines(path, false)
}
func scanLines(path string, stripComments bool) map[string]bool {
out := map[string]bool{}
f, err := os.Open(path)
if err != nil {
return out
}
defer f.Close()
sc := bufio.NewScanner(f)
sc.Buffer(make([]byte, 0, 64*1024), 1<<20)
for sc.Scan() {
ln := sc.Text()
if stripComments {
if i := strings.IndexByte(ln, '#'); i >= 0 {
ln = ln[:i]
}
}
ln = strings.ToLower(strings.TrimSpace(ln))
if ln != "" {
out[ln] = true
}
}
return out
}
// LoadPolicy loads all backing files from opts (defaults applied for empty
// fields) and compiles the ad-host regex. It never returns an error for missing
// files (best-effort, like the Python addons), only for a regex-compile bug.
func LoadPolicy(opts PolicyOpts) (*Policy, error) {
def := defaultPolicyOpts()
if opts.AllowPath == "" {
opts.AllowPath = def.AllowPath
}
if opts.LearnedPath == "" {
opts.LearnedPath = def.LearnedPath
}
if opts.SpliceSeedPath == "" {
opts.SpliceSeedPath = def.SpliceSeedPath
}
if opts.SpliceLearnPath == "" {
opts.SpliceLearnPath = def.SpliceLearnPath
}
if opts.PureTrackersPath == "" {
opts.PureTrackersPath = def.PureTrackersPath
}
if len(opts.SelfDomains) == 0 {
opts.SelfDomains = def.SelfDomains
}
re, err := regexp.Compile(adHostPattern)
if err != nil {
return nil, err
}
// never-set = pure-trackers fortknox_sites (mirrors TlsSplice._refresh_sets).
never := loadLines(opts.PureTrackersPath)
for _, s := range opts.FortknoxSites {
if s = strings.Trim(strings.ToLower(strings.TrimSpace(s)), "."); s != "" {
never[s] = true
}
}
selfRegs := map[string]bool{}
selfDomains := make([]string, 0, len(opts.SelfDomains))
for _, d := range opts.SelfDomains {
d = strings.ToLower(strings.TrimSpace(d))
if d == "" {
continue
}
selfRegs[d] = true
selfDomains = append(selfDomains, d)
}
return &Policy{
adHost: re,
learned: loadLinesRaw(opts.LearnedPath), // mirrors _learned_set (no comment-strip)
allow: loadLines(opts.AllowPath),
spliceSeed: loadLines(opts.SpliceSeedPath),
spliceLearn: loadLines(opts.SpliceLearnPath),
never: never,
selfRegs: selfRegs,
selfDomains: selfDomains,
}, nil
}
// ── registrable: port of ad_ghost._registrable ───────────────────────────────
//
// host = host.split(":")[0].lower().strip(".")
// if not host or host.replace(".","").isdigit() or ":" in host: return None
// p = host.split(".")
// if len(p) <= 2: return host
// last2 = ".".join(p[-2:])
// return ".".join(p[-3:]) if (last2 in _2L_TLD and len(p) >= 3) else last2
func registrable(host string) string {
host = strings.ToLower(host)
if i := strings.IndexByte(host, ':'); i >= 0 {
host = host[:i]
}
host = strings.Trim(host, ".")
if host == "" {
return ""
}
// host.replace(".","").isdigit() → all-digit IPv4-ish → no registrable.
if isAllDigits(strings.ReplaceAll(host, ".", "")) {
return ""
}
// The Python checks ":" in host AFTER stripping the port; a residual colon
// (e.g. an IPv6 literal) yields None. We already split on the first colon,
// so re-check the remainder for any colon to mirror exactly.
if strings.IndexByte(host, ':') >= 0 {
return ""
}
p := strings.Split(host, ".")
if len(p) <= 2 {
return host
}
last2 := strings.Join(p[len(p)-2:], ".")
if twoLevelTLD[last2] && len(p) >= 3 {
return strings.Join(p[len(p)-3:], ".")
}
return last2
}
func isAllDigits(s string) bool {
if s == "" {
return false // Python "".isdigit() is False
}
for _, r := range s {
if r < '0' || r > '9' {
return false
}
}
return true
}
// ── splice helpers: port of splice.host_matches / should_splice ──────────────
// hostMatches: True if host == pattern OR host is a dotted-suffix subdomain.
func hostMatches(host string, patterns map[string]bool) bool {
h := strings.Trim(strings.ToLower(host), ".")
if h == "" || len(patterns) == 0 {
return false
}
if patterns[h] {
return true
}
for p := range patterns {
if strings.HasSuffix(h, "."+p) {
return true
}
}
return false
}
// allowed: port of ad_ghost._allowed. Own-infra ALWAYS wins (reflash-safe),
// then the operator allowlist (host or registrable).
func (p *Policy) allowed(host string) bool {
h := strings.ToLower(host)
reg := registrable(h)
if reg == "" {
reg = h
}
// own infra: registrable in selfRegs, OR host == d || host endswith "."+d.
if p.selfRegs[reg] {
return true
}
for _, d := range p.selfDomains {
if h == d || strings.HasSuffix(h, "."+d) {
return true
}
}
return p.allow[h] || p.allow[reg]
}
// shouldSplice: port of splice.should_splice (never wins; then seed learned).
func (p *Policy) shouldSplice(sni string) bool {
s := strings.Trim(strings.ToLower(sni), ".")
if s == "" {
return false
}
if hostMatches(s, p.never) {
return false
}
return hostMatches(s, p.spliceSeed) || hostMatches(s, p.spliceLearn)
}
// blockedByAd: port of the ad_ghost requestheaders block decision (sans the
// allowlist guard, which Decide applies first): _AD_HOST match OR
// registrable/host in learned-trackers.
func (p *Policy) blockedByAd(host string) bool {
if p.adHost.MatchString(host) {
return true
}
reg := registrable(host)
if reg != "" && p.learned[reg] {
return true
}
return p.learned[strings.ToLower(host)]
}
// ── Decide: the unified cross-engine decision ────────────────────────────────
//
// action ∈ {"allow","block","splice","mitm"}. Precedence (mirrors the Python
// across the two addons, documented in the harness):
//
// 1. own-infra / allowlist → "allow" (ad_ghost._allowed; never block/splice)
// 2. splice never-set check, then seed/learned → "splice"
// (tls_splice runs FIRST at the TLS layer; should_splice already excludes
// the never-set = pure-trackers fortknox, so a tracker that is also a
// splice candidate fails should_splice here and falls through to block)
// 3. _AD_HOST / learned → "block" (ad_ghost requestheaders, request layer)
// 4. otherwise → "mitm"
//
// sni defaults to host when empty (the live engine splices on SNI == the TLS
// host; for the parity harness host and sni are the same value).
func (p *Policy) Decide(host, sni string) string {
if sni == "" {
sni = host
}
if p.allowed(host) {
return "allow"
}
if p.shouldSplice(sni) {
return "splice"
}
if p.blockedByAd(host) {
return "block"
}
return "mitm"
}
// action keeps the legacy 3-verb surface (block/splice/mitm) for the PoC
// CONNECT wiring, derived from Decide: "allow" collapses to "mitm" (an
// allowlisted host is intercepted normally, just never short-circuited).
func (p *Policy) action(host string) string {
switch p.Decide(host, host) {
case "splice":
return "splice"
case "block":
return "block"
default: // "allow" and "mitm" both → normal interception
return "mitm"
}
}

View File

@ -0,0 +1,142 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// Cross-engine parity harness — Go side (#662 Phase 3).
//
// Loads testdata/parity-fixtures.json + the testdata/config snapshot, runs
// Policy.Decide on each host, and asserts == the fixture's expect. The Python
// side (../secubox-toolbox/tests/test_engine_parity.py) loads the SAME files
// and drives the SAME decision; both must agree → parity proven.
package main
import (
"encoding/json"
"os"
"path/filepath"
"testing"
)
type parityConfig struct {
AdAllowlist string `json:"ad_allowlist"`
LearnedTrackers string `json:"learned_trackers"`
SpliceSeed string `json:"splice_seed"`
SpliceLearned string `json:"splice_learned"`
PureTrackers string `json:"pure_trackers"`
SelfDomains []string `json:"self_domains"`
FortknoxSites []string `json:"fortknox_sites"`
}
type parityFixture struct {
Host string `json:"host"`
Expect string `json:"expect"`
Why string `json:"why"`
}
type parityFile struct {
Config parityConfig `json:"config"`
Fixtures []parityFixture `json:"fixtures"`
}
// testdataDir resolves the testdata/ dir relative to this package
// (cmd/sbxmitm → ../../testdata).
func testdataDir(t *testing.T) string {
t.Helper()
d, err := filepath.Abs(filepath.Join("..", "..", "testdata"))
if err != nil {
t.Fatal(err)
}
return d
}
func loadParityFile(t *testing.T) (parityFile, string) {
t.Helper()
dir := testdataDir(t)
raw, err := os.ReadFile(filepath.Join(dir, "parity-fixtures.json"))
if err != nil {
t.Fatalf("read fixtures: %v", err)
}
var pf parityFile
if err := json.Unmarshal(raw, &pf); err != nil {
t.Fatalf("parse fixtures: %v", err)
}
if len(pf.Fixtures) == 0 {
t.Fatal("no fixtures")
}
return pf, dir
}
func TestParityDecide(t *testing.T) {
pf, dir := loadParityFile(t)
cfgPath := func(rel string) string { return filepath.Join(dir, filepath.FromSlash(rel)) }
pol, err := LoadPolicy(PolicyOpts{
AllowPath: cfgPath(pf.Config.AdAllowlist),
LearnedPath: cfgPath(pf.Config.LearnedTrackers),
SpliceSeedPath: cfgPath(pf.Config.SpliceSeed),
SpliceLearnPath: cfgPath(pf.Config.SpliceLearned),
PureTrackersPath: cfgPath(pf.Config.PureTrackers),
FortknoxSites: pf.Config.FortknoxSites,
SelfDomains: pf.Config.SelfDomains,
})
if err != nil {
t.Fatalf("LoadPolicy: %v", err)
}
for _, fx := range pf.Fixtures {
got := pol.Decide(fx.Host, fx.Host)
if got != fx.Expect {
t.Errorf("Decide(%q)=%q want %q (%s)", fx.Host, got, fx.Expect, fx.Why)
}
}
}
// TestPolicyActionVerbs checks the legacy 3-verb action() surface still wired
// into the PoC CONNECT path: allow collapses to mitm; block/splice preserved.
func TestPolicyActionVerbs(t *testing.T) {
pf, dir := loadParityFile(t)
cfgPath := func(rel string) string { return filepath.Join(dir, filepath.FromSlash(rel)) }
pol, err := LoadPolicy(PolicyOpts{
AllowPath: cfgPath(pf.Config.AdAllowlist),
LearnedPath: cfgPath(pf.Config.LearnedTrackers),
SpliceSeedPath: cfgPath(pf.Config.SpliceSeed),
SpliceLearnPath: cfgPath(pf.Config.SpliceLearned),
PureTrackersPath: cfgPath(pf.Config.PureTrackers),
FortknoxSites: pf.Config.FortknoxSites,
SelfDomains: pf.Config.SelfDomains,
})
if err != nil {
t.Fatal(err)
}
cases := map[string]string{
"ads.doubleclick.net": "block",
"r1.googlevideo.com": "splice",
"news.example.com": "mitm",
"notdoubleclick.net": "mitm",
"analytics.example-allowed.com": "mitm", // allow → normal interception (mitm verb)
"hub.secubox.in": "mitm", // own-infra → normal interception
}
for host, want := range cases {
if got := pol.action(host); got != want {
t.Errorf("action(%q)=%q want %q", host, got, want)
}
}
}
// TestRegistrable exercises the _registrable port incl. the 2-level TLD list.
func TestRegistrable(t *testing.T) {
cases := map[string]string{
"a.b.example.com": "example.com",
"example.com": "example.com",
"com": "com",
"a.b.example.co.uk": "example.co.uk",
"example.co.uk": "example.co.uk", // 2 labels → returned as-is
"x.y.z.example.com": "example.com",
"1.2.3.4": "",
"": "",
}
for in, want := range cases {
if got := registrable(in); got != want {
t.Errorf("registrable(%q)=%q want %q", in, got, want)
}
}
}

View File

@ -0,0 +1,194 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: always-on anonymize + Set-Cookie poison wiring
// (#662 Phase 5-prep, Part A)
//
// These helpers wire the ported policy (policy.go) + HMAC fake-identity jar
// (jar.go) into the MITM response path. They mirror the INTENT of the Python
// privacy_guard._anonymize and privacy.fake_id poison (mitmproxy_addons/
// privacy_guard.py, secubox_toolbox/privacy.py) — best-effort privacy hygiene,
// NOT byte-identical to the Python request-Cookie path. The jar values
// themselves ARE byte-exact (proven in jar_test.go).
//
// Safety envelope (DARK, like anti-track): poison only acts on MITM'd TRACKER
// flows. allow/own-infra flows are left CLEAN — never poisoned, never blocked.
//
// Pure standard library — no external modules.
package main
import (
"net"
"net/http"
"strings"
)
// ── anonymize: always-on hygiene ─────────────────────────────────────────────
// anonymizeStrip mirrors privacy_guard._STRIP / protective_mode._STRIP: the
// operator/carrier + re-identification REQUEST headers we drop on every MITM'd
// flow. Lower-cased for case-insensitive matching against canonicalised keys.
var anonymizeStrip = []string{
"msisdn", "x-msisdn", "x-up-calling-line-id", "x-up-subno",
"x-nokia-msisdn", "x-acr", "x-vf-acr", "x-amobee-1", "x-amobee-2",
"tm-user-id", "x-wap-profile", "x-wap-msisdn", "x-network-info",
"x-forwarded-for", "forwarded", "x-real-ip", "via",
}
// anonymizeRequest applies always-on privacy hygiene to a MITM'd request:
// drop the operator/tracking headers above, then pin DNT:1 + Sec-GPC:1 (the
// opt-out signals). Mirrors privacy_guard._anonymize. Minimal + best-effort:
// it never errors and is safe to call on every intercepted request.
//
// NOTE: unlike the Python spoof path we do NOT drop Cookie/Referer here —
// anonymize is the universally-safe hygiene layer; cookie neutralisation is the
// poison layer (poisonSetCookies), gated behind the tracker classification.
func anonymizeRequest(h http.Header) {
for _, name := range anonymizeStrip {
// http.Header.Del canonicalises the key; our list is lower-case but Del
// matches case-insensitively via CanonicalMIMEHeaderKey.
h.Del(name)
}
h.Set("DNT", "1")
h.Set("Sec-GPC", "1")
}
// ── poison: response Set-Cookie value replacement ────────────────────────────
// trackingCookieNames is the set of exact cookie names we treat as tracking
// identifiers worth poisoning (lower-cased). These map onto the shapes the jar
// (_shape in jar.go) knows how to forge plausibly.
var trackingCookieNames = map[string]bool{
"_fbp": true, "_fbc": true, "_gid": true, "_gcl_au": true,
"uid": true, "uuid": true, "_pk_id": true, "_pk_ses": true,
"__qca": true, "muid": true, "ide": true, "fr": true,
"_uetvid": true, "_uetsid": true, "anid": true, "nid": true,
}
// isTrackingCookieName reports whether a Set-Cookie name looks like a tracking
// identifier we should poison. Prefix rule: any "_ga*" cookie (GA + GA4
// per-property _ga_<id>) is a tracking id; otherwise an exact-match against
// trackingCookieNames. Benign session/CSRF cookies (sessionid, csrftoken, …)
// are NOT matched, so they pass through untouched.
func isTrackingCookieName(name string) bool {
n := strings.ToLower(strings.TrimSpace(name))
if n == "" {
return false
}
if strings.HasPrefix(n, "_ga") {
return true
}
return trackingCookieNames[n]
}
// poisonSetCookies rewrites the response Set-Cookie header lines for a MITM'd
// tracker flow: for each cookie whose NAME is a tracking id, the value is
// replaced with the jar fakeID(clientHash, host, name, key) while ALL cookie
// attributes (Path, Domain, Max-Age, Secure, HttpOnly, SameSite, …) are
// preserved verbatim. Non-tracking cookies are returned byte-identical.
//
// Gating (caller's responsibility too, but defensive here): if the jar key is
// absent OR fakeID returns !ok (empty clientHash / tracker), the cookie is left
// UNCHANGED — we never emit a malformed cookie, and we never invent a fake
// where we lack the seed. This keeps the poison fail-closed-to-clean.
//
// This is the emission half of the jar; the classification half (is this a
// tracker flow at all) is Policy.shouldPoison, applied by the wiring before
// this is ever called — poison NEVER touches allow/own-infra flows.
func poisonSetCookies(setCookies []string, clientHash, host string, key []byte) []string {
if len(setCookies) == 0 {
return setCookies
}
out := make([]string, len(setCookies))
for i, sc := range setCookies {
out[i] = poisonOneSetCookie(sc, clientHash, host, key)
}
return out
}
// poisonOneSetCookie rewrites a single Set-Cookie line. The line shape is
// `name=value; Attr1; Attr2=...`; we split off the first `;` to isolate the
// name=value pair, replace value if name is a tracking id and a fake mints,
// then re-attach the (unchanged) attribute tail.
func poisonOneSetCookie(sc, clientHash, host string, key []byte) string {
semi := strings.IndexByte(sc, ';')
pair := sc
tail := ""
if semi >= 0 {
pair = sc[:semi]
tail = sc[semi:] // includes the leading ';'
}
eq := strings.IndexByte(pair, '=')
if eq < 0 {
return sc // attribute-only / malformed → leave untouched
}
name := strings.TrimSpace(pair[:eq])
if !isTrackingCookieName(name) {
return sc
}
fake, ok := fakeID(clientHash, host, name, key)
if !ok {
return sc // no jar key / no clientHash → leave clean (fail-closed)
}
return name + "=" + fake + tail
}
// ── tracker classification + poison gate ─────────────────────────────────────
// isTracker mirrors the tracker classification used by the block decision
// (privacy.is_tracker / ad_ghost): _AD_HOST regex OR host/registrable in the
// learned-trackers set. Reused here so poison fires on exactly the hosts the
// engine already considers trackers.
func (p *Policy) isTracker(host string) bool {
return p.blockedByAd(host)
}
// shouldPoison reports whether a MITM'd flow to host should have its tracking
// Set-Cookies poisoned. TRUE only for tracker hosts that are NOT own-infra /
// allowlisted — own-infra flows are left clean (same dark safety as the block
// path). The caller additionally requires a loaded jar key.
func (p *Policy) shouldPoison(host string) bool {
if p.allowed(host) {
return false // own-infra / allowlist → never poison
}
return p.isTracker(host)
}
// ── client identity ──────────────────────────────────────────────────────────
// clientHashFromConn returns the per-client identity used to mint the stable
// fake persona (jar fakeID first arg).
//
// It mirrors the Python privacy_guard._client_hash → _common.mac_hash_of(peer_ip)
// for the WireGuard R3 path: the peer IP is resolved to the WG persona hash
// (sha256(peer_pubkey)[:16]) by macHashOf. For 10.99.1.0/24 WG peers that hash
// is byte-identical to the Python engine (proven in machash_test.go ↔
// test_machash_parity.py), so a flow's fake persona is stable across the Go and
// Python engines and across restarts.
//
// macHashOf returns "" for any IP it cannot resolve (non-WG peers, the captive
// R0-R2 ARP path which is out of scope for this R3 engine, missing WG DB). In
// that case we fall back to the raw peer IP so non-WG / test conns still get a
// deterministic seed and poison remains functional — the fallback value is just
// not cross-engine-stable, which is acceptable for non-R3 traffic.
//
// DONE(#662): mac_hash wiring for the WG path. Remaining gaps, intentionally NOT
// addressed here:
// - the transparent original-dst plumbing that feeds the *real* peer IP into
// this function lives in transparent.go (handleTransparent); the CONNECT PoC
// still sees the proxy-hop peer IP.
// - the R0-R2 captive-subnet ARP/HMAC branch of _common.mac_hash_of is out of
// scope (this engine is WG-only — see machash.go macHashOf).
func clientHashFromConn(conn net.Conn) string {
if conn == nil {
return ""
}
host, _, err := net.SplitHostPort(conn.RemoteAddr().String())
if err != nil {
host = conn.RemoteAddr().String()
}
if mh := macHashOf(host); mh != "" {
return mh
}
return host
}

View File

@ -0,0 +1,152 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// Unit tests for the always-on anonymize hygiene + the Set-Cookie poison
// emission wired into the MITM response path (#662 Phase 5-prep, Part A).
//
// These exercise the PURE helpers (anonymizeRequest / poisonSetCookies /
// isTrackingCookieName) so the wiring is testable without standing up a full
// proxy. The behaviour mirrors the Python privacy_guard._anonymize and the
// privacy.fake_id poison intent (see comments in privacy.go) — best-effort
// hygiene, not byte-identical to the request-Cookie path.
package main
import (
"net/http"
"testing"
)
// TestAnonymizeRequestStripsOperatorHeaders: the operator/carrier + re-id
// headers are dropped, and DNT:1 + Sec-GPC:1 are pinned (mirrors
// privacy_guard._anonymize / protective_mode spoof header hygiene).
func TestAnonymizeRequestStripsOperatorHeaders(t *testing.T) {
h := http.Header{}
h.Set("X-MSISDN", "33612345678")
h.Set("X-ACR", "carrier-acr-token")
h.Set("X-Up-Calling-Line-Id", "33612345678")
h.Set("X-Wap-Profile", "http://wap.example/ua.xml")
h.Set("X-Forwarded-For", "10.0.0.7")
h.Set("Via", "1.1 carrier-proxy")
h.Set("User-Agent", "Mozilla/5.0") // must survive
anonymizeRequest(h)
for _, k := range []string{
"X-Msisdn", "X-Acr", "X-Up-Calling-Line-Id", "X-Wap-Profile",
"X-Forwarded-For", "Via",
} {
if v := h.Get(k); v != "" {
t.Errorf("anonymizeRequest left %s=%q (should be stripped)", k, v)
}
}
if h.Get("User-Agent") != "Mozilla/5.0" {
t.Errorf("anonymizeRequest clobbered a benign header: User-Agent=%q", h.Get("User-Agent"))
}
if h.Get("DNT") != "1" {
t.Errorf("DNT not pinned: %q", h.Get("DNT"))
}
if h.Get("Sec-GPC") != "1" {
t.Errorf("Sec-GPC not pinned: %q", h.Get("Sec-GPC"))
}
}
// TestAnonymizeRequestPinsSignalsWhenAbsent: DNT/Sec-GPC are asserted even when
// no operator headers were present (always-on hygiene).
func TestAnonymizeRequestPinsSignalsWhenAbsent(t *testing.T) {
h := http.Header{}
anonymizeRequest(h)
if h.Get("DNT") != "1" || h.Get("Sec-GPC") != "1" {
t.Fatalf("opt-out signals not pinned on a clean request: DNT=%q GPC=%q",
h.Get("DNT"), h.Get("Sec-GPC"))
}
}
// TestIsTrackingCookieName: known tracking-id cookie names are recognised;
// benign session/CSRF cookies are not.
func TestIsTrackingCookieName(t *testing.T) {
track := []string{"_ga", "_GA_ABC123", "_fbp", "_gid", "uid", "uuid", "_pk_id", "__qca", "_gcl_au"}
for _, n := range track {
if !isTrackingCookieName(n) {
t.Errorf("isTrackingCookieName(%q)=false, want true", n)
}
}
benign := []string{"sessionid", "csrftoken", "XSRF-TOKEN", "PHPSESSID", "cart", "lang"}
for _, n := range benign {
if isTrackingCookieName(n) {
t.Errorf("isTrackingCookieName(%q)=true, want false", n)
}
}
}
// TestPoisonSetCookiesReplacesTrackingValue: a tracking Set-Cookie has its value
// replaced by the jar fakeID (attributes preserved), while a non-tracking cookie
// is left byte-identical.
func TestPoisonSetCookiesReplacesTrackingValue(t *testing.T) {
key := []byte("test-jar-seed-key-0123456789abcdef")
const ch = "203.0.113.9"
const host = "ads.doubleclick.net"
in := []string{
"_ga=GA1.2.111.222; Path=/; Domain=.doubleclick.net; Max-Age=63072000",
"sessionid=abc123; Path=/; HttpOnly",
}
out := poisonSetCookies(in, ch, host, key)
if len(out) != 2 {
t.Fatalf("poisonSetCookies returned %d cookies, want 2", len(out))
}
// The _ga value must be the jar fakeID and the attributes preserved.
want, ok := fakeID(ch, host, "_ga", key)
if !ok {
t.Fatal("fakeID returned !ok for _ga")
}
wantCookie := "_ga=" + want + "; Path=/; Domain=.doubleclick.net; Max-Age=63072000"
if out[0] != wantCookie {
t.Errorf("poisoned _ga = %q\n want %q", out[0], wantCookie)
}
if out[0] == in[0] {
t.Error("tracking cookie value was NOT changed")
}
// The benign cookie must be untouched.
if out[1] != in[1] {
t.Errorf("non-tracking cookie altered: %q != %q", out[1], in[1])
}
}
// TestPoisonSetCookiesNoKeyLeavesUnchanged: with no jar key (key present-gate),
// nothing is poisoned (fail-closed-to-clean: we never emit a broken cookie).
func TestPoisonSetCookiesNoKeyLeavesUnchanged(t *testing.T) {
in := []string{"_ga=GA1.2.1.2; Path=/"}
out := poisonSetCookies(in, "1.2.3.4", "ads.doubleclick.net", nil)
if len(out) != 1 || out[0] != in[0] {
t.Fatalf("poisonSetCookies with nil key altered output: %v", out)
}
}
// TestPoisonSetCookiesNoClientHashLeavesUnchanged: empty clientHash → fakeID !ok
// → cookie left as-is.
func TestPoisonSetCookiesNoClientHashLeavesUnchanged(t *testing.T) {
key := []byte("test-jar-seed-key-0123456789abcdef")
in := []string{"_ga=GA1.2.1.2; Path=/"}
out := poisonSetCookies(in, "", "ads.doubleclick.net", key)
if len(out) != 1 || out[0] != in[0] {
t.Fatalf("poisonSetCookies with empty clientHash altered output: %v", out)
}
}
// TestPoisonSetCookiesDeterministic: same (client,host,name) → same fake value
// across calls ('rémanent' jar — proven byte-exact in jar_test.go; here we just
// assert the wiring keeps it stable).
func TestPoisonSetCookiesDeterministic(t *testing.T) {
key := []byte("test-jar-seed-key-0123456789abcdef")
in := []string{"uid=real-user-7; Path=/"}
a := poisonSetCookies(in, "9.9.9.9", "adnxs.com", key)
b := poisonSetCookies(in, "9.9.9.9", "adnxs.com", key)
if a[0] != b[0] {
t.Fatalf("poison not deterministic: %q != %q", a[0], b[0])
}
if a[0] == in[0] {
t.Fatal("uid (tracking) cookie not poisoned")
}
}

View File

@ -0,0 +1,93 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: sidecar emit helper (#662 Phase 4)
//
// Fire-and-forget POST to a unix-socket'd SecuBox module, mirroring the Python
// addons' _common.fire_forget_post: it NEVER blocks the proxy flow and NEVER
// raises into the caller. The live engine will relay extracted signals to the
// existing module sockets; this is the transport only — NOT yet wired into the
// live request/response path (Phase 5+ wiring).
//
// Addon → socket mapping the live engine will use (verbatim from the Python
// addons' TARGET constants, packages/secubox-toolbox/mitmproxy_addons/*.py):
//
// addon socket path route
// cookies → /run/secubox/cookies.sock POST /inject
// dpi → /run/secubox/dpi.sock POST /classify
// avatar → /run/secubox/avatar.sock POST /fingerprint
// ja4 → /run/secubox/threat-analyst.sock POST /ja4
// soc_relay → /run/secubox/soc.sock POST /event
// social_graph: in-process (no socket) — correlated inside the engine, not emitted.
//
// emit takes the full socket PATH (not an http+unix:// URL) plus the route in
// the payload's destination; callers build the path from the table above.
//
// Pure standard library — no external modules, no go.sum.
package main
import (
"context"
"fmt"
"net"
"time"
)
// emitTimeout caps the whole connect+write+read so a slow/dead module socket
// can never wedge the engine. Mirrors the Python httpx timeout=2.
const emitTimeout = 2 * time.Second
// emit fires a fire-and-forget POST of payload to the given unix socket at
// route, in a detached goroutine. It returns immediately and never blocks the
// caller; all errors (missing socket, dead peer, timeout) are swallowed —
// dropping a relayed signal must never break a client flow. Mirrors
// _common.fire_forget_post + queue_async (create_task, never raise).
//
// route is the HTTP path on the module (e.g. "/inject", "/classify"); use the
// addon→socket table above to pick socketPath + route together.
func emit(socketPath, route string, payload []byte) {
go emitSync(socketPath, route, payload)
}
// emitSync performs the actual POST synchronously (under emitTimeout). Exposed
// (lowercase, same-package) so tests can observe delivery deterministically
// without racing the goroutine. Returns an error only for the test's benefit;
// emit() discards it.
func emitSync(socketPath, route string, payload []byte) error {
if route == "" {
route = "/"
}
ctx, cancel := context.WithTimeout(context.Background(), emitTimeout)
defer cancel()
var d net.Dialer
conn, err := d.DialContext(ctx, "unix", socketPath)
if err != nil {
return err // dead/missing socket — swallowed by emit()
}
defer conn.Close()
if dl, ok := ctx.Deadline(); ok {
_ = conn.SetDeadline(dl)
}
// Minimal HTTP/1.1 POST. Host is a placeholder (unix transport); the module
// FastAPI apps ignore it. Connection: close so the peer EOFs after replying.
req := fmt.Sprintf(
"POST %s HTTP/1.1\r\nHost: secubox.local\r\nContent-Type: application/json\r\n"+
"Content-Length: %d\r\nConnection: close\r\n\r\n",
route, len(payload))
if _, err := conn.Write([]byte(req)); err != nil {
return err
}
if len(payload) > 0 {
if _, err := conn.Write(payload); err != nil {
return err
}
}
// Best-effort drain so the peer sees a clean close; we don't parse the
// response (fire-and-forget). Errors here are irrelevant.
buf := make([]byte, 512)
_, _ = conn.Read(buf)
return nil
}

View File

@ -0,0 +1,125 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// Unit tests for the sidecar emit helper (#662 Phase 4).
package main
import (
"bufio"
"net"
"path/filepath"
"strings"
"testing"
"time"
)
// TestEmitDelivers: emitSync to a live unix socket delivers the POST request
// line, route and JSON body.
func TestEmitDelivers(t *testing.T) {
sock := filepath.Join(t.TempDir(), "emit.sock")
ln, err := net.Listen("unix", sock)
if err != nil {
t.Fatalf("listen: %v", err)
}
defer ln.Close()
got := make(chan string, 1)
go func() {
c, err := ln.Accept()
if err != nil {
return
}
defer c.Close()
c.SetReadDeadline(time.Now().Add(2 * time.Second))
var sb strings.Builder
r := bufio.NewReader(c)
buf := make([]byte, 4096)
for {
n, err := r.Read(buf)
sb.Write(buf[:n])
if err != nil || strings.Contains(sb.String(), `"k":"v"`) {
break
}
}
// Reply so emitSync's drain completes cleanly.
c.Write([]byte("HTTP/1.1 204 No Content\r\nContent-Length: 0\r\nConnection: close\r\n\r\n"))
got <- sb.String()
}()
if err := emitSync(sock, "/classify", []byte(`{"k":"v"}`)); err != nil {
t.Fatalf("emitSync: %v", err)
}
select {
case raw := <-got:
if !strings.HasPrefix(raw, "POST /classify HTTP/1.1") {
t.Errorf("missing/wrong request line in:\n%s", raw)
}
if !strings.Contains(raw, `{"k":"v"}`) {
t.Errorf("body not delivered in:\n%s", raw)
}
case <-time.After(3 * time.Second):
t.Fatal("server never received the emit")
}
}
// TestEmitDeadSocketNoPanicNoBlock: emit() (the goroutine form) to a
// nonexistent socket must return immediately and never panic, and emitSync
// must just return an error without blocking past the timeout.
func TestEmitDeadSocketNoPanicNoBlock(t *testing.T) {
dead := filepath.Join(t.TempDir(), "nope.sock")
// emit (async) returns instantly even though the socket is dead.
done := make(chan struct{})
go func() {
defer close(done)
emit(dead, "/inject", []byte(`{"x":1}`)) // must not panic/block
}()
select {
case <-done:
case <-time.After(time.Second):
t.Fatal("emit() blocked on a dead socket")
}
// emitSync surfaces the dial error (which emit swallows) without blocking.
start := time.Now()
if err := emitSync(dead, "/inject", []byte(`{}`)); err == nil {
t.Error("emitSync to dead socket: expected error, got nil")
}
if elapsed := time.Since(start); elapsed > emitTimeout+time.Second {
t.Errorf("emitSync blocked %v on dead socket", elapsed)
}
}
// TestEmitEmptyRouteDefaults: an empty route becomes "/".
func TestEmitEmptyRouteDefaults(t *testing.T) {
sock := filepath.Join(t.TempDir(), "root.sock")
ln, err := net.Listen("unix", sock)
if err != nil {
t.Fatal(err)
}
defer ln.Close()
got := make(chan string, 1)
go func() {
c, err := ln.Accept()
if err != nil {
return
}
defer c.Close()
buf := make([]byte, 256)
n, _ := c.Read(buf)
c.Write([]byte("HTTP/1.1 204 No Content\r\nContent-Length: 0\r\nConnection: close\r\n\r\n"))
got <- string(buf[:n])
}()
if err := emitSync(sock, "", nil); err != nil {
t.Fatalf("emitSync: %v", err)
}
select {
case raw := <-got:
if !strings.HasPrefix(raw, "POST / HTTP/1.1") {
t.Errorf("empty route not defaulted to /, got:\n%s", raw)
}
case <-time.After(2 * time.Second):
t.Fatal("no request received")
}
}

View File

@ -0,0 +1,398 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
//go:build linux
// SecuBox-Deb :: toolbox-ng :: transparent SO_ORIGINAL_DST accept path
// (#662 Phase 6 prep)
//
// The live R3 engine runs transparent: nft DNAT redirects the client's TCP SYN
// to this worker, which recovers the ORIGINAL destination via
// getsockopt(SOL_IP, SO_ORIGINAL_DST) (IPv4) or
// getsockopt(SOL_IPV6, IP6T_SO_ORIGINAL_DST=80) (IPv6). This is a SECOND listen
// mode behind --transparent; the CONNECT PoC (main.go handleConnect) is left
// EXACTLY as-is.
//
// This is DARK — never wired to live traffic yet. The pure parser (parseOrigDst)
// is unit-tested; the syscall glue (origDst) and end-to-end transparent capture
// can only be exercised behind a real nft DNAT redirect, validated at Phase 5
// shadow on the board, NOT in unit tests.
//
// Pure standard library — syscall + net + crypto/tls; no external modules.
package main
import (
"bytes"
"crypto/tls"
"encoding/binary"
"fmt"
"io"
"log"
"net"
"strings"
"syscall"
"unsafe"
)
// SO_ORIGINAL_DST is the Netfilter getsockopt that returns the pre-DNAT
// destination sockaddr. Same value (80) for IPv4 (SOL_IP) and IPv6
// (SOL_IPV6, where it is named IP6T_SO_ORIGINAL_DST).
const soOriginalDst = 80
// parseOrigDst decodes a raw sockaddr blob (as returned by getsockopt
// SO_ORIGINAL_DST) into host + port. It is PURE — no syscall — so it is fully
// unit-testable offline.
//
// IPv4 sockaddr_in (16 bytes): [0:2]=family (AF_INET=2, host byte order),
// [2:4]=port (BIG-endian / network order), [4:8]=4-byte address.
// IPv6 sockaddr_in6 (≥24 bytes): [0:2]=family (AF_INET6=10), [2:4]=port (BE),
// [4:8]=flowinfo, [8:24]=16-byte address.
//
// The family field is host byte order in the kernel; on x86/arm64 (little-end)
// AF_INET=2 lands in the low byte. We accept the family if EITHER the LE or BE
// 16-bit read matches the expected constant, so the parser is endianness-robust
// across architectures.
func parseOrigDst(raw []byte) (host string, port int, err error) {
if len(raw) < 4 {
return "", 0, fmt.Errorf("sockaddr too short: %d bytes", len(raw))
}
famLE := binary.LittleEndian.Uint16(raw[0:2])
famBE := binary.BigEndian.Uint16(raw[0:2])
p := int(binary.BigEndian.Uint16(raw[2:4])) // port is network order
switch {
case famLE == syscall.AF_INET || famBE == syscall.AF_INET:
if len(raw) < 8 {
return "", 0, fmt.Errorf("sockaddr_in too short: %d bytes", len(raw))
}
ip := net.IPv4(raw[4], raw[5], raw[6], raw[7])
return ip.String(), p, nil
case famLE == syscall.AF_INET6 || famBE == syscall.AF_INET6:
if len(raw) < 24 {
return "", 0, fmt.Errorf("sockaddr_in6 too short: %d bytes", len(raw))
}
ip := make(net.IP, 16)
copy(ip, raw[8:24])
return ip.String(), p, nil
default:
return "", 0, fmt.Errorf("unknown sockaddr family: LE=%d BE=%d", famLE, famBE)
}
}
// origDst recovers the pre-DNAT original destination of a transparently
// redirected TCP connection via getsockopt(SO_ORIGINAL_DST). v4 vs v6 is chosen
// by the local address family. stdlib-only (syscall.Syscall6 on the raw fd via
// SyscallConn). Linux-only by build tag.
func origDst(conn *net.TCPConn) (host string, port int, err error) {
level := syscall.SOL_IP
if la, ok := conn.LocalAddr().(*net.TCPAddr); ok && la.IP.To4() == nil && la.IP != nil {
level = syscall.SOL_IPV6
}
rc, err := conn.SyscallConn()
if err != nil {
return "", 0, err
}
// A sockaddr_in6 is 28 bytes; size the buffer for the larger of the two.
buf := make([]byte, 28)
size := uint32(len(buf))
var goErr error
ctrlErr := rc.Control(func(fd uintptr) {
_, _, errno := syscall.Syscall6(
syscall.SYS_GETSOCKOPT,
fd,
uintptr(level),
uintptr(soOriginalDst),
uintptr(unsafe.Pointer(&buf[0])),
uintptr(unsafe.Pointer(&size)),
0,
)
if errno != 0 {
goErr = errno
}
})
if ctrlErr != nil {
return "", 0, ctrlErr
}
if goErr != nil {
return "", 0, goErr
}
return parseOrigDst(buf[:size])
}
// ── ClientHello SNI peek (no decryption) ─────────────────────────────────────
// recordingReader tees every byte it reads off the underlying reader into an
// in-memory buffer, so the exact bytes consumed during the ClientHello peek can
// be re-fed to either the upstream (splice) or a tls.Server (mitm/allow/block).
type recordingReader struct {
r io.Reader
buf bytes.Buffer
}
func (rr *recordingReader) Read(p []byte) (int, error) {
n, err := rr.r.Read(p)
if n > 0 {
rr.buf.Write(p[:n])
}
return n, err
}
// prefixConn is a net.Conn whose Read drains an internal prefix buffer (the
// bytes already peeked off the wire) before delegating to the underlying conn;
// every other net.Conn method delegates straight through. This re-presents the
// recorded ClientHello bytes to a tls.Server / upstream that must see the
// original handshake.
type prefixConn struct {
prefix []byte
off int
net.Conn
}
func (pc *prefixConn) Read(p []byte) (int, error) {
if pc.off < len(pc.prefix) {
n := copy(p, pc.prefix[pc.off:])
pc.off += n
return n, nil
}
return pc.Conn.Read(p)
}
// peekClientHello reads exactly the first TLS record (the ClientHello) off conn
// WITHOUT consuming it from the caller's perspective: the bytes are recorded so
// they can be replayed. It returns the recorded record bytes (the full set of
// bytes read off the wire, which equals the first TLS record) for replay.
func peekClientHello(conn net.Conn) (record []byte, err error) {
rr := &recordingReader{r: conn}
// TLS record header: type(1) + version(2) + length(2).
hdr := make([]byte, 5)
if _, err := io.ReadFull(rr, hdr); err != nil {
return rr.buf.Bytes(), err
}
recLen := int(binary.BigEndian.Uint16(hdr[3:5]))
// Sanity cap: a ClientHello must fit in a single record (max 16KiB payload).
if recLen < 0 || recLen > (1<<14) {
return rr.buf.Bytes(), fmt.Errorf("clienthello record length out of range: %d", recLen)
}
if _, err := io.ReadFull(rr, make([]byte, recLen)); err != nil {
return rr.buf.Bytes(), err
}
return rr.buf.Bytes(), nil
}
// sniFromClientHello extracts the SNI host_name from a raw TLS ClientHello
// record. It is PURE (no I/O) and defensive: every slice is bounds-checked and
// any malformed/short input or absent SNI returns ("", false) — it never panics.
//
// Record framing parsed here:
//
// record header : type=0x16 (handshake) | version(2) | length(2)
// handshake hdr : type=0x01 (ClientHello) | length(3)
// body : client_version(2) | random(32) |
// session_id_len(1) + session_id |
// cipher_suites_len(2) + cipher_suites |
// compression_len(1) + compression_methods |
// extensions_len(2) + extensions
// extension : ext_type(2) | ext_len(2) + ext_data
// server_name : list_len(2) | name_type(1)=0 | name_len(2) + host
func sniFromClientHello(record []byte) (string, bool) {
// record header (5) — type 0x16 handshake.
if len(record) < 5 || record[0] != 0x16 {
return "", false
}
recLen := int(binary.BigEndian.Uint16(record[3:5]))
body := record[5:]
if len(body) < recLen {
return "", false
}
body = body[:recLen]
// handshake header (4) — type 0x01 ClientHello + 3-byte length.
if len(body) < 4 || body[0] != 0x01 {
return "", false
}
hsLen := int(body[1])<<16 | int(body[2])<<8 | int(body[3])
hs := body[4:]
if len(hs) < hsLen {
return "", false
}
hs = hs[:hsLen]
// client_version(2) + random(32).
if len(hs) < 34 {
return "", false
}
p := hs[34:]
// session_id: len(1) + data.
if len(p) < 1 {
return "", false
}
sidLen := int(p[0])
p = p[1:]
if len(p) < sidLen {
return "", false
}
p = p[sidLen:]
// cipher_suites: len(2) + data.
if len(p) < 2 {
return "", false
}
csLen := int(binary.BigEndian.Uint16(p[0:2]))
p = p[2:]
if len(p) < csLen {
return "", false
}
p = p[csLen:]
// compression_methods: len(1) + data.
if len(p) < 1 {
return "", false
}
cmLen := int(p[0])
p = p[1:]
if len(p) < cmLen {
return "", false
}
p = p[cmLen:]
// extensions: len(2) + entries.
if len(p) < 2 {
return "", false
}
extLen := int(binary.BigEndian.Uint16(p[0:2]))
p = p[2:]
if len(p) < extLen {
return "", false
}
ext := p[:extLen]
for len(ext) >= 4 {
etype := binary.BigEndian.Uint16(ext[0:2])
elen := int(binary.BigEndian.Uint16(ext[2:4]))
ext = ext[4:]
if len(ext) < elen {
return "", false
}
data := ext[:elen]
ext = ext[elen:]
if etype != 0x0000 { // server_name
continue
}
// server_name_list: list_len(2) + entries.
if len(data) < 2 {
return "", false
}
listLen := int(binary.BigEndian.Uint16(data[0:2]))
list := data[2:]
if len(list) < listLen {
return "", false
}
list = list[:listLen]
// First entry: name_type(1) + name_len(2) + host.
if len(list) < 3 {
return "", false
}
nameType := list[0]
nameLen := int(binary.BigEndian.Uint16(list[1:3]))
list = list[3:]
if nameType != 0x00 || len(list) < nameLen { // 0 = host_name
return "", false
}
return string(list[:nameLen]), true
}
return "", false
}
// ── transparent accept path ──────────────────────────────────────────────────
// runTransparent runs the transparent (SO_ORIGINAL_DST) accept loop: listen on
// addr, and for each nft-DNAT'd connection recover its pre-DNAT destination and
// dispatch to handleTransparent. Linux-only (build-tagged).
func runTransparent(px *Proxy, addr string) {
ln, err := net.Listen("tcp", addr)
if err != nil {
log.Fatalf("transparent listen: %v", err)
}
log.Printf("sbxmitm TRANSPARENT listening on %s", addr)
for {
conn, err := ln.Accept()
if err != nil {
log.Printf("accept: %v", err)
continue
}
go px.handleTransparent(conn)
}
}
// handleTransparent serves one transparently-redirected client connection:
// 1. recover the pre-DNAT original destination via SO_ORIGINAL_DST,
// 2. PEEK the ClientHello off the raw conn without consuming it,
// 3. parse the SNI and Decide WITHOUT decrypting,
// 4. splice → raw TCP passthrough to the ORIGINAL dst, replaying the peeked
// ClientHello first; NEVER terminate TLS (cert-pinned/own-infra safe),
// 5. allow/mitm/block → NOW tls.Server over the replayable conn (so the TLS
// server still sees the original ClientHello) and run the shared pipeline.
func (px *Proxy) handleTransparent(client net.Conn) {
defer client.Close()
tcp, ok := client.(*net.TCPConn)
if !ok {
return // transparent mode only accepts raw TCP conns
}
// R3 WG client? The data-wg attribute of the injected loader mirrors the
// Python _loader_script (ip.startswith("10.99.1.")) — derived from the same
// client conn peer IP that feeds clientHashFromConn.
wg := false
if peer, _, perr := net.SplitHostPort(client.RemoteAddr().String()); perr == nil {
wg = strings.HasPrefix(peer, "10.99.1.")
}
dstHost, dstPort, err := origDst(tcp)
if err != nil {
return // no original-dst (not DNAT'd) → drop; nothing safe to do
}
dialAddr := net.JoinHostPort(dstHost, fmt.Sprintf("%d", dstPort))
// Peek the ClientHello WITHOUT decrypting. The recorded bytes are replayed
// to whatever we hand the conn to next (upstream for splice, tls.Server
// otherwise) so the original handshake is preserved byte-for-byte.
hello, perr := peekClientHello(client)
if perr != nil {
return // could not read a ClientHello → nothing safe to do
}
sni, _ := sniFromClientHello(hello)
decisionHost := sni
if decisionHost == "" {
decisionHost = dstHost // no SNI → fall back to the captured dst IP
}
verdict := px.pol.Decide(decisionHost, sni)
if verdict == "splice" {
// Passthrough: raw TCP to the REAL captured destination, never the SNI,
// NEVER terminating TLS. Replay the peeked ClientHello to the upstream
// first, then pipe raw bytes both directions over the raw client conn.
up, derr := net.Dial("tcp", dialAddr)
if derr != nil {
return
}
defer up.Close()
if _, werr := up.Write(hello); werr != nil {
return
}
go func() { _, _ = io.Copy(up, client) }()
_, _ = io.Copy(client, up)
return
}
// allow / mitm / block → re-present the peeked ClientHello to a tls.Server
// over a replayable conn, then run the shared pipeline dialling the captured
// original-dst (NOT the SNI).
replay := &prefixConn{prefix: hello, Conn: client}
tconn := tls.Server(replay, px.serverTLSConfig())
if err := tconn.Handshake(); err != nil {
return
}
defer tconn.Close()
px.mitmPipeline(tconn, client, decisionHost, verdict, dialAddr, wg)
}

View File

@ -0,0 +1,33 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
//go:build !linux
// SecuBox-Deb :: toolbox-ng :: transparent mode non-linux stub (#662).
//
// SO_ORIGINAL_DST recovery is Netfilter-specific (Linux-only). The real
// transparent accept path lives in transparent.go behind //go:build linux. This
// stub lets the package still compile (and `GOOS=darwin go build ./...`) on
// non-linux: invoking transparent mode there is a hard error, never silently
// degraded. handleTransparent is stubbed too in case it is referenced.
package main
import (
"log"
"net"
)
// runTransparent is the non-linux counterpart of the linux accept loop: it
// refuses to start, because transparent SO_ORIGINAL_DST capture requires Linux.
func runTransparent(px *Proxy, addr string) {
_ = px
_ = addr
log.Fatal("transparent mode requires linux (SO_ORIGINAL_DST)")
}
// handleTransparent is a non-linux stub; it can never be reached because
// runTransparent log.Fatals first. Present so any reference still links.
func (px *Proxy) handleTransparent(client net.Conn) {
_ = client
log.Fatal("transparent mode requires linux (SO_ORIGINAL_DST)")
}

View File

@ -0,0 +1,304 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
//go:build linux
// Tests for the transparent SO_ORIGINAL_DST sockaddr parser (#662 Phase 6 prep).
//
// Only the PURE parser (parseOrigDst) is unit-tested here: it decodes a raw
// sockaddr byte blob with no syscall, so it is fully covered offline. The real
// getsockopt(SO_ORIGINAL_DST) glue (origDst) cannot be exercised without an nft
// DNAT redirect in the kernel — end-to-end transparent capture is validated at
// Phase 5 shadow on the board, NOT in unit tests (documented in transparent.go).
package main
import (
"bytes"
"encoding/binary"
"io"
"net"
"testing"
"time"
)
// mkSockaddrIn4 builds a 16-byte sockaddr_in: family(2 host-order) + port(BE) +
// 4-byte addr + 8 pad. familyLE controls whether the 2 family bytes are written
// little-endian (low byte first, the x86/arm64 host order) or big-endian, so we
// can prove parseOrigDst tolerates both.
func mkSockaddrIn4(family uint16, port uint16, a, b, c, d byte, familyLE bool) []byte {
buf := make([]byte, 16)
if familyLE {
binary.LittleEndian.PutUint16(buf[0:2], family)
} else {
binary.BigEndian.PutUint16(buf[0:2], family)
}
binary.BigEndian.PutUint16(buf[2:4], port) // port is always network order
buf[4], buf[5], buf[6], buf[7] = a, b, c, d
return buf
}
// mkSockaddrIn6 builds a 28-byte sockaddr_in6: family(2) + port(BE) +
// flowinfo(4) + 16-byte addr + scope_id(4).
func mkSockaddrIn6(family uint16, port uint16, addr [16]byte, familyLE bool) []byte {
buf := make([]byte, 28)
if familyLE {
binary.LittleEndian.PutUint16(buf[0:2], family)
} else {
binary.BigEndian.PutUint16(buf[0:2], family)
}
binary.BigEndian.PutUint16(buf[2:4], port)
copy(buf[8:24], addr[:])
return buf
}
func TestParseOrigDstIPv4(t *testing.T) {
cases := []struct {
name string
raw []byte
wantHost string
wantPort int
}{
{"le-family", mkSockaddrIn4(2, 443, 93, 184, 216, 34, true), "93.184.216.34", 443},
{"be-family", mkSockaddrIn4(2, 8080, 10, 99, 1, 10, false), "10.99.1.10", 8080},
{"high-port", mkSockaddrIn4(2, 65535, 1, 2, 3, 4, true), "1.2.3.4", 65535},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
host, port, err := parseOrigDst(tc.raw)
if err != nil {
t.Fatalf("parseOrigDst: %v", err)
}
if host != tc.wantHost || port != tc.wantPort {
t.Fatalf("parseOrigDst = %q:%d want %q:%d", host, port, tc.wantHost, tc.wantPort)
}
})
}
}
func TestParseOrigDstIPv6(t *testing.T) {
// 2606:2800:220:1:248:1893:25c8:1946 (example.com-ish), port 443.
addr := [16]byte{0x26, 0x06, 0x28, 0x00, 0x02, 0x20, 0x00, 0x01,
0x02, 0x48, 0x18, 0x93, 0x25, 0xc8, 0x19, 0x46}
for _, le := range []bool{true, false} {
raw := mkSockaddrIn6(10, 443, addr, le)
host, port, err := parseOrigDst(raw)
if err != nil {
t.Fatalf("parseOrigDst(le=%v): %v", le, err)
}
want := "2606:2800:220:1:248:1893:25c8:1946"
if host != want || port != 443 {
t.Fatalf("parseOrigDst(le=%v) = %q:%d want %q:443", le, host, port, want)
}
}
}
func TestParseOrigDstPortBigEndian(t *testing.T) {
// Port 0x01BB = 443; assert it is read big-endian (network order), not the
// host-order 0xBB01 = 47873.
raw := mkSockaddrIn4(2, 0x01BB, 8, 8, 8, 8, true)
_, port, err := parseOrigDst(raw)
if err != nil {
t.Fatal(err)
}
if port != 443 {
t.Fatalf("port = %d want 443 (big-endian decode)", port)
}
}
func TestParseOrigDstErrors(t *testing.T) {
cases := []struct {
name string
raw []byte
}{
{"empty", nil},
{"unknown-family-4", make([]byte, 4)}, // all-zero family=0 → unknown-family branch
{"too-short-v4", mkV4Short()}, // valid AF_INET family but 4≤len<8 → sockaddr_in <8 guard
{"too-short-v6", mkV6Short()}, // AF_INET6 but < 24 bytes
{"unknown-family", mkSockaddrIn4(7, 443, 1, 2, 3, 4, true)},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
if _, _, err := parseOrigDst(tc.raw); err == nil {
t.Fatalf("parseOrigDst(%s) = nil err, want error", tc.name)
}
})
}
}
// mkV6Short returns an AF_INET6 blob truncated before the 16-byte address.
func mkV6Short() []byte {
buf := make([]byte, 10) // family + port + flowinfo + 2 bytes of addr
binary.LittleEndian.PutUint16(buf[0:2], 10)
binary.BigEndian.PutUint16(buf[2:4], 443)
return buf
}
// mkV4Short returns a blob with a valid AF_INET family byte but a total length
// in [4,8): it passes the >=4 length check and matches the AF_INET case, so it
// exercises parseOrigDst's sockaddr_in `<8` guard (not the unknown-family path).
func mkV4Short() []byte {
buf := make([]byte, 6) // family(2) + port(2) but no full 4-byte address
binary.LittleEndian.PutUint16(buf[0:2], 2) // AF_INET
binary.BigEndian.PutUint16(buf[2:4], 443)
return buf
}
// ── sniFromClientHello ───────────────────────────────────────────────────────
// mkClientHello hand-assembles a minimal but structurally-valid TLS
// ClientHello record. If withSNI is true a server_name extension carrying
// `sni` (a single host_name entry) is appended; otherwise NO extensions are
// emitted (extensions length 0).
//
// Record layout assembled here (see sniFromClientHello for the parser):
//
// record header : type=0x16 (handshake) | version 0x0303 | record_len(2)
// handshake : type=0x01 (ClientHello) | hs_len(3)
// body : client_version 0x0303 | random(32) |
// session_id_len=0 |
// cipher_suites_len(2)=2 | cipher 0x002f |
// compression_len=1 | method 0x00 |
// extensions_len(2) | [ server_name ext ]
// server_name : ext_type 0x0000 | ext_len(2) |
// list_len(2) | name_type 0x00 | name_len(2) | host bytes
func mkClientHello(sni string, withSNI bool) []byte {
body := []byte{0x03, 0x03} // client_version TLS1.2
body = append(body, make([]byte, 32)...) // random (zeros)
body = append(body, 0x00) // session_id_len = 0
// cipher_suites: length 2, one suite TLS_RSA_WITH_AES_128_CBC_SHA (0x002f)
body = append(body, 0x00, 0x02, 0x00, 0x2f)
// compression_methods: length 1, method null (0x00)
body = append(body, 0x01, 0x00)
var exts []byte
if withSNI {
host := []byte(sni)
var sn []byte
sn = append(sn, 0x00) // name_type = host_name
sn = append(sn, byte(len(host)>>8), byte(len(host))) // name_len(2)
sn = append(sn, host...)
var list []byte
list = append(list, byte(len(sn)>>8), byte(len(sn))) // server_name_list len(2)
list = append(list, sn...)
exts = append(exts, 0x00, 0x00) // ext_type = server_name
exts = append(exts, byte(len(list)>>8), byte(len(list))) // ext_len(2)
exts = append(exts, list...)
}
body = append(body, byte(len(exts)>>8), byte(len(exts))) // extensions_len(2)
body = append(body, exts...)
// handshake header: type 0x01 + 3-byte length
hs := []byte{0x01, byte(len(body) >> 16), byte(len(body) >> 8), byte(len(body))}
hs = append(hs, body...)
// record header: type 0x16 + version 0x0303 + 2-byte length
rec := []byte{0x16, 0x03, 0x03, byte(len(hs) >> 8), byte(len(hs))}
rec = append(rec, hs...)
return rec
}
func TestSNIFromClientHello(t *testing.T) {
// Sanity: the hand-assembled blob parses with our own parser.
good := mkClientHello("example.com", true)
if sni, ok := sniFromClientHello(good); !ok || sni != "example.com" {
t.Fatalf("sniFromClientHello(valid) = %q,%v want example.com,true", sni, ok)
}
cases := []struct {
name string
rec []byte
wantSNI string
wantOK bool
}{
{"with-sni", mkClientHello("secubox.in", true), "secubox.in", true},
{"no-sni-ext", mkClientHello("", false), "", false},
{"nil", nil, "", false},
{"empty", []byte{}, "", false},
{"non-handshake-record", []byte{0x17, 0x03, 0x03, 0x00, 0x05, 1, 2, 3, 4, 5}, "", false},
{"truncated-header", []byte{0x16, 0x03}, "", false},
// valid record header claiming length 100 but body truncated.
{"truncated-body", []byte{0x16, 0x03, 0x03, 0x00, 0x64, 0x01, 0x00, 0x00}, "", false},
// truncate a known-good blob mid-extensions.
{"truncated-good", good[:len(good)-3], "", false},
{"not-clienthello-hs", func() []byte {
b := mkClientHello("x.example", true)
b[5] = 0x02 // handshake type ServerHello, not ClientHello
return b
}(), "", false},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
sni, ok := sniFromClientHello(tc.rec)
if ok != tc.wantOK || sni != tc.wantSNI {
t.Fatalf("sniFromClientHello = %q,%v want %q,%v", sni, ok, tc.wantSNI, tc.wantOK)
}
})
}
}
func TestSNIFromClientHelloNoPanic(t *testing.T) {
// Fuzz-ish: every truncation of a valid blob must return cleanly, never panic.
good := mkClientHello("example.com", true)
for i := 0; i <= len(good); i++ {
func() {
defer func() {
if r := recover(); r != nil {
t.Fatalf("panic on good[:%d]: %v", i, r)
}
}()
_, _ = sniFromClientHello(good[:i])
}()
}
}
// ── prefixConn (replayable client conn) ──────────────────────────────────────
// fakeConn adapts an io.ReadWriteCloser to net.Conn for prefixConn tests.
type fakeConn struct{ io.ReadWriteCloser }
func (fakeConn) LocalAddr() net.Addr { return &net.TCPAddr{} }
func (fakeConn) RemoteAddr() net.Addr { return &net.TCPAddr{} }
func (fakeConn) SetDeadline(time.Time) error { return nil }
func (fakeConn) SetReadDeadline(time.Time) error { return nil }
func (fakeConn) SetWriteDeadline(time.Time) error { return nil }
type rwc struct {
*bytes.Reader
w *bytes.Buffer
}
func (r rwc) Write(p []byte) (int, error) { return r.w.Write(p) }
func (rwc) Close() error { return nil }
func TestPrefixConnReplaysBufferedThenLive(t *testing.T) {
live := bytes.NewReader([]byte("LIVE-DATA"))
wbuf := &bytes.Buffer{}
underlying := fakeConn{rwc{Reader: live, w: wbuf}}
pc := &prefixConn{prefix: []byte("PEEKED"), Conn: underlying}
got, err := io.ReadAll(pc)
if err != nil {
t.Fatalf("read: %v", err)
}
if string(got) != "PEEKEDLIVE-DATA" {
t.Fatalf("prefixConn read = %q want PEEKEDLIVE-DATA", got)
}
// Writes delegate straight through to the underlying conn.
if _, err := pc.Write([]byte("OUT")); err != nil {
t.Fatalf("write: %v", err)
}
if wbuf.String() != "OUT" {
t.Fatalf("underlying write = %q want OUT", wbuf.String())
}
}
func TestPrefixConnEmptyPrefix(t *testing.T) {
live := bytes.NewReader([]byte("ONLY-LIVE"))
underlying := fakeConn{rwc{Reader: live, w: &bytes.Buffer{}}}
pc := &prefixConn{Conn: underlying}
got, _ := io.ReadAll(pc)
if string(got) != "ONLY-LIVE" {
t.Fatalf("prefixConn read = %q want ONLY-LIVE", got)
}
}

View File

@ -0,0 +1,60 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
package main
import (
"bufio"
"fmt"
"io"
"net"
"net/http"
)
func newReader(c net.Conn) *bufio.Reader { return bufio.NewReader(c) }
// writeResponse serializes an http.Response (status + headers + body) onto a
// (TLS) conn, preserving MULTI-VALUED headers (notably Set-Cookie, which the
// poison path rewrites per-cookie). Hop-by-hop framing headers are dropped and
// replaced with an explicit Content-Length + Connection: close, because we send
// the fully-buffered body.
func writeResponse(c io.Writer, resp *http.Response, body []byte) {
status := resp.Status
if status == "" {
status = fmt.Sprintf("%d", resp.StatusCode)
}
fmt.Fprintf(c, "HTTP/1.1 %s\r\n", status)
for k, vals := range resp.Header {
switch http.CanonicalHeaderKey(k) {
case "Content-Length", "Transfer-Encoding", "Connection":
continue // we set framing ourselves
}
for _, v := range vals {
fmt.Fprintf(c, "%s: %s\r\n", k, v)
}
}
fmt.Fprintf(c, "Content-Length: %d\r\n", len(body))
fmt.Fprintf(c, "Connection: close\r\n")
io.WriteString(c, "\r\n")
if len(body) > 0 {
c.Write(body)
}
}
// writeRaw writes a minimal HTTP/1.1 response onto a (TLS) conn.
func writeRaw(c io.Writer, code int, status string, headers map[string]string, body []byte) {
if status == "" {
status = "OK"
}
fmt.Fprintf(c, "HTTP/1.1 %d %s\r\n", code, status)
fmt.Fprintf(c, "Content-Length: %d\r\n", len(body))
fmt.Fprintf(c, "Connection: close\r\n")
for k, v := range headers {
if v != "" {
fmt.Fprintf(c, "%s: %s\r\n", k, v)
}
}
io.WriteString(c, "\r\n")
if len(body) > 0 {
c.Write(body)
}
}

View File

@ -0,0 +1,48 @@
secubox-toolbox-ng (0.1.4-1~bookworm1) bookworm; urgency=medium
* proxy: do NOT follow upstream redirects — relay 3xx to the client so the
browser follows it (correct URL/origin/cookies). Go's default http.Client
followed them, collapsing 301/302 into a final 200 under the original URL.
(ref #662)
-- Gerald KERMA <devel@cybermind.fr> Wed, 18 Jun 2026 20:10:00 +0000
secubox-toolbox-ng (0.1.3-1~bookworm1) bookworm; urgency=medium
* banner: inject into COMPRESSED HTML too. Pin upstream Accept-Encoding to gzip
(stdlib can't brotli), and in the inject path gunzip → injectLoader → re-gzip
(32MiB inflate cap, fail-open on corrupt). Fixes missing banner on the common
gzip/br case; non-HTML passes through untouched. (ref #662)
-- Gerald KERMA <devel@cybermind.fr> Wed, 18 Jun 2026 19:45:00 +0000
secubox-toolbox-ng (0.1.2-1~bookworm1) bookworm; urgency=medium
* banner: port the real transparency-banner inject — inject the loader
<script src="/__toolbox/loader.js" data-mh=.. data-wg=..> (guard-idempotent,
R3 wg flag, mac_hash identity) and reverse-proxy /__toolbox/loader.js +
/__toolbox/bundle to the portal (127.0.0.1:8088), replacing the invisible
marker comment. Fail-open to 204. (ref #662)
-- Gerald KERMA <devel@cybermind.fr> Wed, 18 Jun 2026 19:20:00 +0000
secubox-toolbox-ng (0.1.1-1~bookworm1) bookworm; urgency=medium
* worker@ unit: forge with the LIVE R3 CA clients trust (mitmproxy confdir
bundle, group-readable) instead of the root-only ca-wg WG-CA key; bind
transparent on 10.99.1.1:809%i (the nft R3 DNAT target) instead of CONNECT
on 127.0.0.1; add wg-quick@wg-toolbox dependency. (ref #662)
* loadCA: scan PEM blocks by type so a combined cert+key bundle
(mitmproxy-ca.pem) is accepted for --ca-key. (ref #662)
-- Gerald KERMA <devel@cybermind.fr> Wed, 18 Jun 2026 19:00:00 +0000
secubox-toolbox-ng (0.1.0-1~bookworm1) bookworm; urgency=medium
* Initial packaging of the Go MITM engine migration target (#662 Phase 5-prep).
Ships /usr/sbin/sbxmitm + a DISABLED systemd template unit
(secubox-toolbox-ng-worker@.service). DARK by design: the unit is not
enabled or started, no nft DNAT, no live-R3 wiring — enabled only at the
Phase 6 cutover.
-- Gerald KERMA <devel@cybermind.fr> Wed, 18 Jun 2026 22:00:00 +0200

View File

@ -0,0 +1,22 @@
Source: secubox-toolbox-ng
Section: net
Priority: optional
Maintainer: Gerald KERMA <devel@cybermind.fr>
Build-Depends: debhelper-compat (= 13), golang-go (>= 2:1.22~)
Standards-Version: 4.6.2
Homepage: https://cybermind.fr/secubox
Rules-Requires-Root: no
Package: secubox-toolbox-ng
Architecture: arm64
Depends: ${misc:Depends}
Description: SecuBox-Deb — Go MITM engine (migration target, DARK)
Multi-core Go re-implementation of the R3 toolbox MITM engine (#662),
ported off the GIL-bound Python mitmproxy worker fleet. Ships the
standalone sbxmitm binary plus a DISABLED systemd template unit.
.
This package is the Phase-6-cutover migration target. The unit is NOT
enabled or started by the maintainer scripts — the live R3 tunnel keeps
running on the Python workers until the cutover is performed manually.
Installing this package changes NO runtime behaviour (no service start,
no nft DNAT).

View File

@ -0,0 +1,27 @@
#!/bin/sh
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
# SecuBox-Deb :: toolbox-ng — postinst
#
# DARK by design (#662 Phase 5-prep):
# - DO reload the systemd unit catalogue so the template is known.
# - DO NOT enable or start secubox-toolbox-ng-worker@.service — this is the
# Phase-6 cutover target; the live R3 tunnel keeps running on the Python
# workers until the operator performs the cutover manually.
# - DO NOT touch nftables (no DNAT, no live-R3 rewiring).
set -e
case "$1" in
configure)
if [ -d /run/systemd/system ]; then
systemctl daemon-reload >/dev/null 2>&1 || true
fi
# Intentionally NO `systemctl enable --now`. See the unit header and
# debian/changelog: enabled only at the Phase 6 cutover.
;;
abort-upgrade|abort-remove|abort-deconfigure)
;;
esac
#DEBHELPER#
exit 0

View File

@ -0,0 +1,44 @@
#!/usr/bin/make -f
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
# SecuBox-Deb :: toolbox-ng — Go MITM engine (migration target, DARK)
#
# The binary is pure-stdlib (no go.sum, no external modules), so it
# cross-compiles offline with GOPROXY=off. CI cross-builds for arm64;
# this rules file does the same with `GOOS=linux GOARCH=arm64 go build`.
export DH_VERBOSE = 1
# Build the static arm64 binary offline (stdlib only — no network, no go.sum).
export GOOS = linux
export GOARCH = arm64
export CGO_ENABLED = 0
export GOFLAGS = -mod=mod
export GOPROXY = off
# Keep the Go build/module cache inside the build tree (sandbox-friendly).
export GOCACHE = $(CURDIR)/_gocache
export GOPATH = $(CURDIR)/_gopath
%:
dh $@
override_dh_auto_build:
go build -trimpath -ldflags=-s -o sbxmitm ./cmd/sbxmitm
# No Go unit tests at package-build time (run in CI on the host arch; the
# arm64 cross-binary cannot execute its tests here).
override_dh_auto_test:
override_dh_auto_install:
install -d debian/secubox-toolbox-ng/usr/sbin
install -m 0755 sbxmitm debian/secubox-toolbox-ng/usr/sbin/sbxmitm
override_dh_auto_clean:
rm -f sbxmitm
rm -rf _gocache _gopath
# DARK: install the unit file into the catalogue but DO NOT enable or start it.
# This is the Phase-6 cutover target; the live R3 tunnel stays on the Python
# workers until the operator enables it manually. The postinst still reloads the
# unit catalogue so `systemctl` knows the template exists.
override_dh_installsystemd:
dh_installsystemd --no-enable --no-start --name=secubox-toolbox-ng-worker@

View File

@ -0,0 +1,66 @@
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
# SecuBox-Deb :: toolbox-ng — Go MITM engine worker template (#662)
#
# ── DISABLED BY DESIGN (DARK) ────────────────────────────────────────────────
# This is the Phase-6 CUTOVER MIGRATION TARGET. It is NOT enabled or started by
# the package (postinst does not `systemctl enable --now`). The live R3 tunnel
# keeps running on the Python mitmproxy workers
# (secubox-toolbox-mitm-wg-worker@{1..4}, ports 8081-8084) until the cutover is
# performed manually.
#
# Mirrors the Python worker@ fanout: each %i ∈ {1..4} listens TRANSPARENT on
# 10.99.1.1:809%i — the SAME wg-toolbox interface IP the nft R3 DNAT targets
# (`iif wg-toolbox tcp dport 443/80 → 10.99.1.1:numgen inc mod 4 → 808{1..4}`),
# on 809%i ports so the Go and Python fleets coexist during a side-by-side
# canary. The engine recovers the original destination via SO_ORIGINAL_DST
# (works for this non-root user under NoNewPrivileges, same as mitmdump).
#
# Forges with the LIVE R3 CA clients already trust — mitmproxy's confdir bundle
# (CN "Gondwana ToolBoX R3 CA"), group-readable by secubox-toolbox — NOT the
# root-only ca-wg key.pem (CN "WG CA"), which clients do NOT trust.
#
# Enable ONLY at Phase 6 canary:
#
# systemctl enable --now secubox-toolbox-ng-worker@1.service # one slot first
# # canary: nft ... map { ... 3 : 8091 } (was 3:8084), watch, then widen
#
# Rollback: re-point the nft DNAT map slot back at the Python 808%i worker,
# then disable this unit.
[Unit]
Description=SecuBox ToolBoX-NG Go MITM worker %i (migration target, transparent 10.99.1.1:809%i)
Documentation=https://github.com/CyberMind-FR/secubox-deb/issues/662
After=network.target wg-quick@wg-toolbox.service
Wants=wg-quick@wg-toolbox.service
[Service]
Type=simple
User=secubox-toolbox
Group=secubox-toolbox
# Forge with the LIVE R3 CA the clients trust: cert = mitmproxy-ca-cert.pem,
# key = mitmproxy-ca.pem (a combined cert+key bundle — loadCA scans for the
# PRIVATE KEY block). Both are group-readable by secubox-toolbox. The anti-track
# jar key is best-effort: absent → poison stays off.
ExecStart=/usr/sbin/sbxmitm \
--transparent \
--listen 10.99.1.1:809%i \
--ca-cert /etc/secubox/toolbox/ca-wg/mitmproxy-ca-cert.pem \
--ca-key /etc/secubox/toolbox/ca-wg/mitmproxy-ca.pem \
--jar-key /etc/secubox/secrets/privacy-jar.key
Restart=on-failure
RestartSec=5
# Hardening (mirrors the Python worker envelope).
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
ReadOnlyPaths=/etc/secubox
MemoryHigh=100M
MemoryMax=128M
TasksMax=128
[Install]
WantedBy=multi-user.target

View File

@ -0,0 +1 @@
3.0 (native)

View File

@ -0,0 +1,3 @@
module github.com/CyberMind-FR/secubox-deb/secubox-toolbox-ng
go 1.22

View File

@ -0,0 +1,4 @@
# SecuBox toolbox-ng parity fixture: operator ad-allowlist.
# Allowlist ALWAYS wins (never block, never splice, never record).
analytics.example-allowed.com # an allowlisted host
criteo-but-allowed.example # would-be-ad registrable, but allowlisted

View File

@ -0,0 +1,3 @@
learned-tracker.example
pure-tracker.example
commented-learned.example # inline comment — _learned_set keeps the FULL line, not comment-stripped

View File

@ -0,0 +1,3 @@
# SecuBox toolbox-ng parity fixture: pure trackers — the splice never-set.
# A host here is NEVER spliced even if it's a splice-seed/learned candidate.
pure-tracker.example # pure tracker AND in splice-learned → never wins → block

View File

@ -0,0 +1,3 @@
# SecuBox toolbox-ng parity fixture: auto-learned splice (never-HTML) hosts.
assets.example-cdn.com # a splice-learned host
pure-tracker.example # ALSO in pure-trackers (never) → never wins → not spliced

View File

@ -0,0 +1,3 @@
# SecuBox toolbox-ng parity fixture: shipped splice seed (pure-asset CDNs).
googlevideo.com # YouTube video streams
fbcdn.net # Facebook / Instagram media

View File

@ -0,0 +1,84 @@
{
"_doc": "Cross-engine JAR (anti-track HMAC fake-identity) parity fixtures (#662 Phase 4). Go core (jar_test.go) and Python (privacy.fake_id via tests/test_jar_parity.py) load THIS file + the fixed test key file (jar-test.key, NOT the real /etc/secubox/secrets/privacy-jar.key), compute fakeID/fake_id per fixture, and MUST agree. Python is the source of truth; expect values are GENERATED by privacy.fake_id (never hand-computed). The key file carries leading/trailing whitespace to exercise .strip()/TrimSpace; key_hex below is the canonical post-strip key.",
"key_file": "jar-test.key",
"key_hex": "53656375426f780a546573744a61724b65795631aabbccddeeff0011deadbe7f",
"fixtures": [
{
"client": "clientAAA",
"tracker": "google-analytics.com",
"cookie_name": "_ga",
"expect": "GA1.2.3904711466.3108239649",
"why": "_ga cookie -> GA1 shape"
},
{
"client": "clientAAA",
"tracker": "google-analytics.com",
"cookie_name": "_ga_ABC123",
"expect": "GA1.2.5796600959.265364931",
"why": "GA4 per-property -> still GA1 shape (startswith _ga)"
},
{
"client": "clientAAA",
"tracker": "connect.facebook.net",
"cookie_name": "_fbp",
"expect": "fb.1.6011068296128.8272063998",
"why": "_fbp -> fb shape"
},
{
"client": "clientAAA",
"tracker": "tracker.example.com",
"cookie_name": "uuid",
"expect": "a357739e-e6e8-020e-c9ee-cb92950d1a71",
"why": "uuid -> uuid shape"
},
{
"client": "clientAAA",
"tracker": "matomo.example.com",
"cookie_name": "_pk_id",
"expect": "7be228ae-3261-d609-1cec-dc0dc05a8abf",
"why": "_pk_id -> uuid shape"
},
{
"client": "clientAAA",
"tracker": "tracker.example.com",
"cookie_name": "abcdefghijklmnopqrstuvwxyz012345",
"expect": "416e7233-dfb8-ec7f-a2fe-45ed5dbdcaf4",
"why": "name >=32 chars -> uuid shape via len branch"
},
{
"client": "clientAAA",
"tracker": "tracker.example.com",
"cookie_name": "sid",
"expect": "5cb0940c4562a4f76cf638e40ff552af",
"why": "generic -> hex[:32]"
},
{
"client": "clientFold",
"tracker": "px.doubleclick.net",
"cookie_name": "uid",
"expect": "c1b6daf8-7ac1-edf6-c67b-3e23ec8eb61d",
"why": "registrable folding A (px.doubleclick.net)"
},
{
"client": "clientFold",
"tracker": "ads.doubleclick.net",
"cookie_name": "uid",
"expect": "c1b6daf8-7ac1-edf6-c67b-3e23ec8eb61d",
"why": "registrable folding B (ads.doubleclick.net) -> SAME fake_id as A"
},
{
"client": "clientGovuk",
"tracker": "ad.example.gov.uk",
"cookie_name": "uid",
"expect": "75cc2df5-1ee2-da62-9023-aa11c57419af",
"why": "DIVERGENCE GUARD: privacy.registrable=example.gov.uk (gov.uk in privacy._MULTI_TLD); ad_ghost._2L lacks gov.uk so policy.registrable would give gov.uk -> forces the jar to use registrableJar"
},
{
"client": "clientIP",
"tracker": "9.9.9.9",
"cookie_name": "sid",
"expect": "53bf4dd57df7a26d6eff83092c869835",
"why": "DIVERGENCE GUARD: IP-literal tracker -> privacy.registrable returns as-is (ad_ghost._registrable returns None) -> forces registrableJar"
}
]
}

Binary file not shown.

View File

@ -0,0 +1,36 @@
{
"_doc": "Cross-engine mac_hash (WG persona identity) parity fixtures (#662 Phase 6 prep). Go core (machash_test.go, macHashOf with wgPeersPath pointed at wg-peers-fixture.json) and Python (_common.mac_hash_of with _WG_PEERS_DB monkeypatched to the SAME wg-peers-fixture.json) load THIS file and MUST agree. Python is the source of truth: expected = sha256(pubkey.encode()).hexdigest()[:16], generated by Python, never Go-authored. The R0-R2 ARP/HMAC path is intentionally out of scope for the R3 transparent engine (WG-only); off-subnet IPs expect empty.",
"wg_peers_file": "wg-peers-fixture.json",
"fixtures": [
{
"ip": "10.99.1.10",
"expected": "7d790156855ebeef",
"why": "WG peer phone-gk2 -> sha256(pubkey)[:16]"
},
{
"ip": "10.99.1.11",
"expected": "6f3663aa06e871c4",
"why": "WG peer laptop-admin -> sha256(pubkey)[:16]"
},
{
"ip": "10.99.1.12",
"expected": "1db566f7c72180f0",
"why": "WG peer tablet-lab -> sha256(pubkey)[:16]"
},
{
"ip": "10.99.1.250",
"expected": "",
"why": "WG subnet but no peer entry -> empty"
},
{
"ip": "192.168.1.5",
"expected": "",
"why": "off-subnet (R0-R2 ARP path out of scope in R3) -> empty"
},
{
"ip": "",
"expected": "",
"why": "empty ip -> empty"
}
]
}

View File

@ -0,0 +1,31 @@
{
"_doc": "Cross-engine parity fixtures (#662 Phase 3). Both the Go core (policy_test.go) and the Python addons (tests/test_engine_parity.py) load THIS file plus the testdata/config snapshot, run their Decide logic on each host, and must agree. Python is the source of truth; Go matches it. action ∈ {allow, block, splice, mitm}.",
"config": {
"ad_allowlist": "config/ad-allowlist.txt",
"learned_trackers": "config/learned-trackers.txt",
"splice_seed": "config/tls-splice-seed.conf",
"splice_learned": "config/splice-learned.txt",
"pure_trackers": "config/pure-trackers.txt",
"self_domains": ["secubox.in"],
"fortknox_sites": ["mybank.example"]
},
"fixtures": [
{"host": "ads.doubleclick.net", "expect": "block", "why": "static ad host (_AD_HOST dotted-prefix doubleclick)"},
{"host": "doubleclick.net", "expect": "block", "why": "static ad host (_AD_HOST bare)"},
{"host": "criteo.com", "expect": "block", "why": "static ad host (_AD_HOST criteo)"},
{"host": "learned-tracker.example", "expect": "block", "why": "auto-learned tracker (learned-trackers.txt)"},
{"host": "pure-tracker.example", "expect": "block", "why": "pure-tracker + splice-learned: never wins (no splice) → falls to block (also learned)"},
{"host": "hub.secubox.in", "expect": "allow", "why": "own-infra subdomain (self_domains) — never block/splice"},
{"host": "secubox.in", "expect": "allow", "why": "own-infra apex"},
{"host": "analytics.example-allowed.com", "expect": "allow", "why": "operator allowlisted host"},
{"host": "criteo-but-allowed.example", "expect": "allow", "why": "would-be-ad registrable but allowlisted → allowlist wins"},
{"host": "r1.googlevideo.com", "expect": "splice", "why": "splice seed subdomain (CDN shard)"},
{"host": "googlevideo.com", "expect": "splice", "why": "splice seed exact"},
{"host": "assets.example-cdn.com", "expect": "splice", "why": "splice-learned host"},
{"host": "mybank.example", "expect": "mitm", "why": "fortknox site in never-set; not in seed/learned → no splice; not ad/learned → mitm"},
{"host": "notdoubleclick.net", "expect": "mitm", "why": "no-false-suffix negative — _AD_HOST requires (^|.) boundary"},
{"host": "news.example.com", "expect": "mitm", "why": "plain site"},
{"host": "notsecubox.in", "expect": "mitm", "why": "own-infra FALSE-prefix negative — must NOT match self_domains"},
{"host": "commented-learned.example", "expect": "mitm", "why": "learned-trackers NOT comment-stripped (_learned_set keeps full line incl ' # ...'); bare host not in set → not blocked. Discriminates loadLinesRaw vs loadLines"}
]
}

View File

@ -0,0 +1,16 @@
{
"peers": {
"aL3kF2pQ9rZxT7vN1wB4cD6eH8jM0sU2yX5zA7bC1E=": {
"ip": "10.99.1.10",
"name": "phone-gk2"
},
"bM4lG3qR0sAyU8wO2xC5dE7fI9kN1tV3zY6aB8cD2F=": {
"ip": "10.99.1.11",
"name": "laptop-admin"
},
"cN5mH4rS1tBzV9xP3yD6eF8gJ0lO2uW4aZ7bC9dE3G=": {
"ip": "10.99.1.12",
"name": "tablet-lab"
}
}
}

View File

@ -3,7 +3,12 @@
#
# REPLACES the prerouting rules from secubox-toolbox-wg.nft :
# iif wg-toolbox tcp dport 443 dnat ip to 10.99.1.1:8081 (single port)
# with a round-robin numgen mapping to ports 8081..8084.
# with a round-robin numgen mapping to ports 8091..8094.
#
# #662 CUTOVER (2026-06-18): the fanout now targets the Go MITM engine
# (secubox-toolbox-ng-worker@{1..4}, transparent on 10.99.1.1:809%i) instead
# of the Python mitmproxy workers (808%i). Rollback = change 809x → 808x below
# and `nft -f` this file (the Python workers are kept warm for that).
#
# Why numgen inc and not jhash : nftables 1.0.6 (Debian bookworm) doesn't
# support `jhash` in numgen yet (lands in 1.0.7+). `inc` is round-robin
@ -25,19 +30,20 @@ table inet wg-toolbox {
# Phase 9 (#501) — 4-worker round-robin DNAT. numgen returns
# 0..3 ; the map sends each to one of the 4 worker ports on
# 10.99.1.1. Conntrack pins the choice for the whole flow.
# #662: ports are 809x (Go engine), was 808x (Python).
iif "wg-toolbox" tcp dport 443 dnat ip to 10.99.1.1 \
: numgen inc mod 4 map {
0 : 8081,
1 : 8082,
2 : 8083,
3 : 8084
0 : 8091,
1 : 8092,
2 : 8093,
3 : 8094
}
iif "wg-toolbox" tcp dport 80 dnat ip to 10.99.1.1 \
: numgen inc mod 4 map {
0 : 8081,
1 : 8082,
2 : 8083,
3 : 8084
0 : 8091,
1 : 8092,
2 : 8093,
3 : 8094
}
# Phase 7 (#498) — DNS DNAT for legacy peer configs that hand out

View File

@ -0,0 +1,125 @@
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
# Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
"""Cross-engine parity harness — Python side (#662 Phase 3).
Loads the SAME ``parity-fixtures.json`` and ``testdata/config`` snapshot the Go
core uses (``../secubox-toolbox-ng/testdata``), drives the production Python
decision logic ``ad_ghost._allowed`` + ``_AD_HOST`` + the learned-trackers
check, composed with ``splice.should_splice`` under the SAME precedence as
Go's ``Policy.Decide``, and asserts the action == the fixture's ``expect``.
Python is the source of truth: if Go and Python ever diverge on a fixture, Go
is fixed to match this. Both test files reading the identical inputs is what
makes the parity meaningful.
"""
from __future__ import annotations
import json
import os
import pytest
from mitmproxy_addons import ad_ghost
from secubox_toolbox import splice
_HERE = os.path.dirname(os.path.abspath(__file__))
# tests/ → packages/secubox-toolbox → packages → packages/secubox-toolbox-ng
_NG_TESTDATA = os.path.normpath(
os.path.join(_HERE, "..", "..", "secubox-toolbox-ng", "testdata"))
_FIXTURES = os.path.join(_NG_TESTDATA, "parity-fixtures.json")
def _load_fixtures():
with open(_FIXTURES, encoding="utf-8") as f:
return json.load(f)
def _cfg_path(rel: str) -> str:
return os.path.join(_NG_TESTDATA, rel.replace("/", os.sep))
def _decide(host: str, sni: str, *, seed, learned_splice, never,
self_regs) -> str:
"""Mirror Go's Policy.Decide precedence EXACTLY.
1. own-infra / allowlist (ad_ghost._allowed) "allow"
2. splice never-set check, then seed/learned (splice.should_splice) "splice"
3. _AD_HOST match OR registrable/host in learned-trackers "block"
4. otherwise "mitm"
"""
# 1. allowlist + own-infra ALWAYS win first.
if ad_ghost._allowed(host):
return "allow"
# 2. splice (TLS layer runs first; never-set already excludes trackers).
if splice.should_splice(sni or host, seed, learned_splice, never):
return "splice"
# 3. ad_ghost block decision (request layer).
blocked = bool(ad_ghost._AD_HOST.search(host))
if not blocked:
reg = ad_ghost._registrable(host)
ls = ad_ghost._learned_set()
if (reg and reg in ls) or host.lower() in ls:
blocked = True
if blocked:
return "block"
# 4.
return "mitm"
@pytest.fixture
def parity_env(monkeypatch):
"""Point the Python addon decision logic at the SAME testdata snapshot the
Go core loads, and load the splice sets the same way the addon does."""
data = _load_fixtures()
cfg = data["config"]
# ad_ghost: allowlist + learned-trackers paths, self-domains, fresh caches.
monkeypatch.setattr(ad_ghost, "_ALLOW_PATH", _cfg_path(cfg["ad_allowlist"]))
monkeypatch.setattr(ad_ghost, "_LEARNED_PATH", _cfg_path(cfg["learned_trackers"]))
monkeypatch.setattr(ad_ghost, "_SELF_REGS",
{d.strip().lower() for d in cfg["self_domains"] if d.strip()})
# reset module-level caches so the monkeypatched paths are (re)read.
monkeypatch.setattr(ad_ghost, "_allow", set())
monkeypatch.setattr(ad_ghost, "_allow_mtime", 0.0)
monkeypatch.setattr(ad_ghost, "_learned", set())
monkeypatch.setattr(ad_ghost, "_learned_mtime", 0.0)
monkeypatch.setattr(ad_ghost, "_learned_check", 0.0) # bypass the 60s cache
# splice: load seed/learned the addon way; never = pure-trackers fortknox.
seed = splice.load_splice_seed(_cfg_path(cfg["splice_seed"]))
learned_splice = splice.load_learned_splice(_cfg_path(cfg["splice_learned"]))
never = splice.load_learned_splice(_cfg_path(cfg["pure_trackers"]))
for s in cfg.get("fortknox_sites", []) or []:
never.add(str(s).lower().strip("."))
return {
"fixtures": data["fixtures"],
"seed": seed,
"learned_splice": learned_splice,
"never": never,
"self_regs": ad_ghost._SELF_REGS,
}
def test_parity_decide(parity_env):
seed = parity_env["seed"]
learned_splice = parity_env["learned_splice"]
never = parity_env["never"]
self_regs = parity_env["self_regs"]
failures = []
for fx in parity_env["fixtures"]:
host = fx["host"]
got = _decide(host, host, seed=seed, learned_splice=learned_splice,
never=never, self_regs=self_regs)
if got != fx["expect"]:
failures.append(
f"Decide({host!r})={got!r} want {fx['expect']!r} ({fx.get('why')})")
assert not failures, "Python↔fixture parity mismatches:\n" + "\n".join(failures)
def test_fixtures_present(parity_env):
# Guard: the fixture set must cover every action class, else "parity" is
# vacuously true for a missing branch.
actions = {fx["expect"] for fx in parity_env["fixtures"]}
assert actions == {"allow", "block", "splice", "mitm"}, actions

View File

@ -0,0 +1,97 @@
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
# Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
"""Cross-engine JAR parity harness — Python side (#662 Phase 4).
Loads the SAME ``jar-fixtures.json`` + fixed test key the Go core uses
(``../secubox-toolbox-ng/testdata``), points ``privacy.JAR_KEY_PATH`` at the
test key (NOT the real ``/etc/secubox/secrets/privacy-jar.key``), resets the
jar-key cache, and asserts ``privacy.fake_id`` == each fixture's ``expect``.
Python is the source of truth: the ``expect`` values were GENERATED by this
very ``privacy.fake_id`` with the test key. The Go side (jar_test.go) must
reproduce them byte-for-byte. Both files reading identical inputs is what makes
the parity meaningful.
"""
from __future__ import annotations
import json
import os
import pytest
from secubox_toolbox import privacy
_HERE = os.path.dirname(os.path.abspath(__file__))
# tests/ → packages/secubox-toolbox → packages → packages/secubox-toolbox-ng
_NG_TESTDATA = os.path.normpath(
os.path.join(_HERE, "..", "..", "secubox-toolbox-ng", "testdata"))
_FIXTURES = os.path.join(_NG_TESTDATA, "jar-fixtures.json")
def _load():
with open(_FIXTURES, encoding="utf-8") as f:
return json.load(f)
@pytest.fixture
def jar_env(monkeypatch):
"""Point privacy at the test key file and reset the cache so the override
is (re)read. Mirrors exactly the (path, cache) surface the Go loadJarKey
reads."""
data = _load()
key_path = os.path.join(_NG_TESTDATA, data["key_file"].replace("/", os.sep))
monkeypatch.setattr(privacy, "JAR_KEY_PATH", key_path)
monkeypatch.setattr(privacy, "_jar_key_cache", {"v": None})
return data
def test_jar_key_loads_canonical(jar_env):
# _jar_key() must strip the file's surrounding whitespace back to the
# canonical key declared in key_hex (proves .strip() parity with TrimSpace).
key = privacy._jar_key()
assert key is not None
assert key.hex() == jar_env["key_hex"]
def test_jar_parity(jar_env):
failures = []
for fx in jar_env["fixtures"]:
got = privacy.fake_id(fx["client"], fx["tracker"], fx["cookie_name"])
if got != fx["expect"]:
failures.append(
f"fake_id({fx['client']!r},{fx['tracker']!r},{fx['cookie_name']!r})"
f"={got!r} want {fx['expect']!r} ({fx.get('why')})")
assert not failures, "Python↔fixture jar parity mismatches:\n" + "\n".join(failures)
def test_jar_shapes_covered(jar_env):
# Every _shape branch must appear, else parity is vacuous for that branch.
shapes = set()
for fx in jar_env["fixtures"]:
e = fx["expect"]
if e.startswith("GA1."):
shapes.add("ga")
elif e.startswith("fb."):
shapes.add("fb")
elif len(e) == 36 and e[8] == "-":
shapes.add("uuid")
elif len(e) == 32:
shapes.add("hex")
assert shapes == {"ga", "fb", "uuid", "hex"}, shapes
def test_jar_folding(jar_env):
# Two subdomains of the same registrable tracker fold to the SAME fake id.
a = privacy.fake_id("foldclient", "px.doubleclick.net", "uid")
b = privacy.fake_id("foldclient", "ads.doubleclick.net", "uid")
assert a is not None and a == b
def test_jar_none_cases(jar_env):
# fake_id returns None exactly where Go fakeID returns ("", False).
assert privacy.fake_id("", "t.example", "uid") is None # empty client
assert privacy.fake_id("c", "", "uid") is None # empty tracker
# empty key → None
monkeypatched_empty = {"v": b""}
object.__setattr__(privacy, "_jar_key_cache", monkeypatched_empty)
assert privacy.fake_id("c", "t.example", "uid") is None

View File

@ -0,0 +1,87 @@
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
# Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
"""Cross-engine mac_hash (WG persona identity) parity harness — Python side
(#662 Phase 6 prep).
Loads the SAME ``machash-fixtures.json`` + ``wg-peers-fixture.json`` the Go core
uses (``../secubox-toolbox-ng/testdata``), points ``_common._WG_PEERS_DB`` at the
fixture WG DB (NOT the real ``/var/lib/secubox/toolbox/wg-peers.json``), resets
the WG cache, and asserts ``_common.mac_hash_of`` == each fixture's ``expected``.
Python is the source of truth: the ``expected`` values were GENERATED by
``sha256(pubkey.encode()).hexdigest()[:16]`` (the very algorithm
``_common._wg_hash_of`` runs). The Go side (machash_test.go) must reproduce them
byte-for-byte. Both files reading identical inputs is what makes the parity
meaningful (and non-circular).
"""
from __future__ import annotations
import json
import os
from pathlib import Path
import pytest
from mitmproxy_addons import _common
_HERE = os.path.dirname(os.path.abspath(__file__))
# tests/ → packages/secubox-toolbox → packages → packages/secubox-toolbox-ng
_NG_TESTDATA = os.path.normpath(
os.path.join(_HERE, "..", "..", "secubox-toolbox-ng", "testdata"))
_FIXTURES = os.path.join(_NG_TESTDATA, "machash-fixtures.json")
def _load():
with open(_FIXTURES, encoding="utf-8") as f:
return json.load(f)
@pytest.fixture
def wg_env(monkeypatch):
"""Point _common at the fixture WG DB and reset the mtime cache so the
override is (re)read. Mirrors exactly the (path, cache, mtime) surface the
Go wgHashOf reads (wgPeersPath + resetWGCache)."""
data = _load()
wg_path = os.path.join(_NG_TESTDATA, data["wg_peers_file"].replace("/", os.sep))
monkeypatch.setattr(_common, "_WG_PEERS_DB", Path(wg_path))
monkeypatch.setattr(_common, "_WG_PEERS_MTIME", 0.0)
_common._WG_PEERS_CACHE.clear()
return data
def test_machash_parity(wg_env):
failures = []
for fx in wg_env["fixtures"]:
# _common returns None where Go returns ""; normalise None → "".
got = _common.mac_hash_of(fx["ip"]) or ""
if got != fx["expected"]:
failures.append(
f"mac_hash_of({fx['ip']!r})={got!r} want {fx['expected']!r}"
f" ({fx.get('why')})")
assert not failures, "Python↔fixture mac_hash parity mismatches:\n" + "\n".join(failures)
def test_machash_coverage(wg_env):
# The fixtures must exercise the discriminating cases, else parity is vacuous.
resolved = subnet_miss = off_subnet = empty = False
for fx in wg_env["fixtures"]:
ip, exp = fx["ip"], fx["expected"]
if ip == "":
empty = True
elif exp != "":
resolved = True
elif ip.startswith("10.99.1."):
subnet_miss = True
else:
off_subnet = True
assert resolved and subnet_miss and off_subnet and empty, (
f"coverage incomplete: resolved={resolved} subnet_miss={subnet_miss} "
f"off_subnet={off_subnet} empty={empty}")
def test_machash_missing_db_fail_open(wg_env):
# A missing WG DB fails open to None (best-effort), never raises.
_common._WG_PEERS_DB = Path("/nonexistent/secubox/wg-peers.json")
_common._WG_PEERS_MTIME = 0.0
_common._WG_PEERS_CACHE.clear()
assert _common.mac_hash_of("10.99.1.10") is None