Compare commits

...

23 Commits

Author SHA1 Message Date
4c6777dc68 chore(toolbox): 2.7.0 middle release — kbin milestone + Tor chapter (ref #683)
Some checks are pending
License Headers / check (push) Waiting to run
kbin (public ToolBoX portal) framed as the first tool of the CyberMind
Swiss-army cyber kit: transparent perf, full-encrypted MITM inspection,
ad poison/smog injection, adware-ban banner, safe browsing.

- secubox-toolbox 2.6.59 -> 2.7.0 (caps 2.6.x, opens kbin chapter)
- docs: wiki Kbin-Toolbox.md, FAQ-KBIN-TOR.md, README blurb
- plan #683: kbin Tor endpoint (outbound egress quick-switch) — design spec
- WIP/TODO/HISTORY updated

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 11:48:32 +02:00
CyberMind
1a315317e7
Merge pull request #682 from CyberMind-FR/feat/662-client-geoflag
feat(#662): per-client country flag from real external WG-endpoint IP
2026-06-19 11:24:01 +02:00
CyberMind
04598482fb
Merge pull request #681 from CyberMind-FR/fix/662-altsvc-strip
fix(#662): strip Alt-Svc — stop HTTP/3 so traffic stays on MITM-able TCP
2026-06-19 11:22:23 +02:00
be0497e6de fix(toolbox-ng): strip Alt-Svc to stop HTTP/3 advertisement → keep traffic on MITM-able TCP (ref #662) 2026-06-19 11:21:21 +02:00
7db7a73d65 fix(toolbox): QUIC udp443 reject (not drop) — drop made browsers retry QUIC 199x instead of TCP fallback; reject forces immediate fallback → MITM sees the traffic (ref #662) 2026-06-19 11:18:06 +02:00
3ade5619d0 feat(toolbox): per-client country flag from REAL external WG endpoint IP (ref #662)
/admin/clients/rich geo-resolved the stored client IP, which for WG clients is
the internal 10.99.1.x (GeoIPs to nothing) → empty flags. The true origin is the
peer's pre-tunnel WG endpoint (from wg show wg-toolbox dump).

- wg.wg_endpoints(): parse `wg show wg-toolbox dump`, map sha256(pubkey)[:16]
  → external endpoint IP. Skips (none)/RFC1918/loopback/link-local. Best-effort
  (empty on missing wg/error), cached ~30s — no shell-out per row.
- admin_clients_rich: geo-enrich from the external endpoint when present, else
  fall back to the stored ip (non-WG/captive clients still work). Within ENRICH_LIMIT.
- PRIVACY: external IP used transiently for the GeoIP lookup only — never stored
  or returned. Country-granularity only (flag/ISO + existing asn_org).
2026-06-19 11:16:03 +02:00
a48f43607b fix(toolbox): drop QUIC (UDP443) BEFORE the outbound accept — was after → never fired → HTTP/3 bypassed the whole MITM (no inject/adblock/metrics/social) (ref #662) 2026-06-19 11:10:29 +02:00
CyberMind
27ba48c1a1
Merge pull request #680 from CyberMind-FR/feat/662-inline-banner
feat(#662): inline the banner (SW-immune) — defeat site service workers
2026-06-19 11:02:52 +02:00
c04a9d0c1c chore: changelog 0.1.13 — inline SW-immune banner (ref #662) 2026-06-19 11:01:47 +02:00
3009ef93d9 fix(toolbox): inline transparency banner — survive sites with a service worker (ref #662)
Sites with a SERVICE WORKER (leparisien, cnn…) intercept every same-origin
request, so the legacy <script src="/__toolbox/loader.js"> + its
fetch("/__toolbox/bundle") were hijacked by the page SW (404 / app-shell)
before reaching the MITM engine → banner never appeared. Fix: INLINE the
banner — the engine fetches the complete script body server-side at inject time
and bakes a self-contained <script>…</script> with mh/wg/csp + the bundle as JS
literals. No same-origin fetch for the SW to touch.

Avoids the #653 failure: the inline script reads NO document.currentScript
(null in async) and does NO fetch() — everything is baked as literals.

Python portal:
- new GET /__toolbox/inline?mh=&wg=&csp= → complete inline banner script body.
- refactor bundle.py: extract shared render/SPA/dismiss/countTrackers/🔓 logic
  into _BANNER_CORE; inline_script(mh,wg,csp) bakes the bundle (get_bundle) as a
  JSON literal + mh/wg/csp string literals. Legacy LOADER_JS (src-loader) kept
  working off the same core. </script> breakout hardened (</ → <\/).

Go engine:
- fetchInlineBanner(): GET portal /__toolbox/inline via the short-timeout portal
  client; fail-open (ok=false → skip inject, page intact).
- injectInlineBanner(): idempotent (same bannerGuard), same placement as
  injectLoader, emits an inline <script> (not <script src>).
- live inject path uses the inline banner; injectLoader + /__toolbox/loader.js
  route kept. Cosmetic <style> (already inline, SW-immune) unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 11:00:27 +02:00
CyberMind
78ad554ece
Merge pull request #679 from CyberMind-FR/feat/662-adlearn
feat(#662): restore + strengthen ad/tracker auto-learn (live-reload, candidate emit, cross-site promotion)
2026-06-19 10:40:07 +02:00
895356dc00 chore: changelog 0.1.12 — ad auto-learn loop + live-reload (ref #662) 2026-06-19 10:37:49 +02:00
4063ae1a95 feat(toolbox): restore + strengthen ad/tracker auto-learn loop (ref #662)
The #662 Go cutover blocked from STATIC lists but never (1) emitted learning
candidates nor (2) live-reloaded the lists, so new adwares slipped through
forever and even autolearn promotions needed a worker restart. Restore the
full loop: feeders/outsiders -> lock to blocklist -> silence (204) -> smog
(poison) -> statistify, fed by BOTH the ad-path heuristic AND cross-site
cookie reuse (the social graph).

Go (packages/secubox-toolbox-ng):
- policy.go: mtime-based live-reload (Part 1, linchpin). Policy now holds the
  backing file paths + per-file last-mtime; maybeReload() (throttled ~15s)
  re-stats each file and atomically swaps the changed map under an RWMutex.
  Decide/shouldPoison take the read lock; allowedSafe() is the lock-taking
  entry for the candidate feed. Covers learned-trackers + ad-allowlist +
  splice seed/learned + pure-trackers. Promotions/edits now take effect with
  NO worker restart.
- adstats.go: ad-candidate learning feed (Part 2). Ports ad_ghost._AD_PATH
  (RE2) + a (host,site)->hits aggregator (cap 20k), drained into the existing
  ad-event payload's new "candidates" list by the same 10s flusher.
- main.go: maybeRecordAdCandidate() on the allow/mitm branch — 3rd-party
  (registrable(host) != registrable(site)) AND _AD_PATH match, gated behind
  the analysis relay flag, O(1) fire-and-forget.

Python (packages/secubox-toolbox):
- api.py: /__toolbox/ad-event now ingests "candidates" ->
  store.record_ad_candidates(); capped, try/except, never 500s.
- secubox-toolbox-autolearn: new _social_feed() promotes any cross-site
  cookie-reuse tracker (>= SECUBOX_SOCIAL_MIN_SITES distinct src_site in a
  recent window) from social_edges into learned-trackers.txt, reusing the
  _ad_feed allowlist/self guard and merge/de-dup.

Smog: confirmed isTracker() already consults the live-reloaded learned set
(blockedByAd), so a promoted cross-site tracker is poisoned automatically once
the policy reloads it — no new poison code.

TDD: reload_test.go (incl. -race concurrency), adcand_test.go,
test_ad_event_candidates.py, test_autolearn_socialfeed.py. Go build (offline
arm64 + darwin), vet, go test -race all green.
2026-06-19 10:36:02 +02:00
CyberMind
77da033371
Merge pull request #678 from CyberMind-FR/feat/662-social-relay
feat(#662): restore /social cross-site tracker graph (faithful social_graph port + block-path correlation)
2026-06-19 10:13:36 +02:00
3850da5479 fix(toolbox-ng): correlate social edges on the block path (blocked trackers carry the cross-site cookie) (ref #662) 2026-06-19 09:58:07 +02:00
040e460876 chore: changelog 0.1.10 — social relay (ref #662) 2026-06-19 09:53:10 +02:00
55f9e4c803 feat(toolbox): restore /social cross-site tracker graph via Go engine + portal ingest (ref #662)
The #662 Phase-7 cutover decommissioned the in-process Python social_graph
addon that fed social.record_edge(), freezing the kbin /social d3 graph
(social_edges -> social_nodes/social_links in toolbox.db).

Go engine (packages/secubox-toolbox-ng/cmd/sbxmitm/social.go):
- cookieIDHash: byte-exact port of social.cookie_id_hash (lower-case
  domain+name, raw value, NUL separators), proven by a shared Python-generated
  fixture (social-cookie-id-fixtures.json) asserted by both social_test.go and
  tests/test_social_parity.py (anti-rig, same discipline as the jar harness).
- isDenyListed + _DEFAULT_DENY_COOKIES set; registrableSocial (the addon's
  _registrable_domain eTLD+1 flavour, distinct from policy.registrable);
  Set-Cookie + request-Cookie 3rd-party edge extraction; CMP consent_state
  (none_seen/pre_consent/post_consent) via a per-(peer,site) in-memory log.
- Edges (hash-only, NEVER raw values) buffered + flushed every 10s to the
  portal /__toolbox/social-event; WG-peer flows only; gated by --social-relay
  (default true); fire-and-forget, never blocks the flow.

Python portal (secubox_toolbox/api.py):
- POST /__toolbox/social-event ingest (sibling of /__toolbox/ad-event, same
  unauthenticated R3-perimeter trust + 2MB body guard): per-row record_edge
  with try/except, cap 5000, always 204; debounced safety fold_recent
  (<= once/60s) so new edges surface promptly between the existing app.py
  social_fold_loop ticks.

Go: build offline arm64+darwin, go vet, go test -race all green.
2026-06-19 09:51:26 +02:00
257fc95182 fix(toolbox): loader.js no-store so SPA/loader updates propagate (was max-age=3600 → stale loaders pinned 1h) (ref #662) 2026-06-19 09:38:24 +02:00
CyberMind
591106ec65
Merge pull request #677 from CyberMind-FR/feat/662-cumulative-live
feat(#662): cumulative-stats reads live module mitm_events (un-freeze kbin page)
2026-06-19 09:32:00 +02:00
CyberMind
15a668829b
Merge pull request #676 from CyberMind-FR/feat/662-analysis-relay
feat(#662): relay per-flow telemetry to dpi/cookies/ja4 analysis sidecars
2026-06-19 09:31:52 +02:00
73b8ad36b1 fix(toolbox): cumulative-stats reads LIVE module sockets, not frozen toolbox.db (ref #662)
The kbin 'Qui te piste?' page (/cumulative-stats.json) read event counts +
top-hosts from toolbox.db's events table, which froze at the #662 Phase-7
cutover. Pull live counts/hosts from the analysis modules over their unix
sockets (dpi/cookies/threat-analyst), with graceful fallback to the legacy
toolbox.db query if every module call fails. sessions/risk/level
distributions read the clients table and are unchanged.
2026-06-19 09:29:54 +02:00
d0db3e87fd chore: changelog 0.1.9 — analysis relay (ref #662) 2026-06-19 09:23:59 +02:00
05c659b4ca feat(toolbox-ng): relay per-flow dpi/cookies/ja4 telemetry to analysis sidecars (ref #662)
Restores the dpi/cookies/ja4 events feeding the kbin "Qui te piste?"
cumulative-stats page, frozen since the Phase-7 cutover decommissioned
the Python mitmproxy relay addons. The Go engine now re-emits EXACTLY
what those addons did, via the existing fire-and-forget emit() helper.

- relay.go: pure payload builders (dpiEvent/cookiesEvent/ja4Event) +
  gated emit wrappers. NAMES ONLY for cookies (never values, CSPN);
  caps ≤30 set / ≤50 sent names, name[:32], url[:300]; user_agent null
  when absent; ja4 extensions always null (stdlib doesn't expose them);
  alpn/ciphers always JSON arrays.
- main.go: --analysis-relay flag (default true) → Proxy.analysisRelay;
  dpi emit in mitmPipeline allow/mitm branch (before anonymize, original
  UA); cookies emit after resp; serverTLSConfigCapture hook relaying ja4
  with the client conn peer IP; peerIP helper.
- transparent.go: ja4 capture wired with the real transparent peer IP.

Fire-and-forget: a dead/slow sidecar socket never blocks or delays the
proxy flow (emit detaches with its own 2s timeout). Block/splice paths
never relay dpi/cookies; ja4 fires per handshake (blocked/allowed alike,
matching the Python tls_clienthello addon).

TDD: relay_test.go covers payload shapes, names-only parsing, caps,
url truncation, the gate at call sites, and live unix-socket delivery.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 09:23:14 +02:00
41 changed files with 4182 additions and 140 deletions

View File

@ -3,6 +3,27 @@
---
## 2026-06-19 — kbin milestone: ToolBoX 2.7.0 (middle release) + Tor chapter staged (#683)
- **End-of-session checkpoint** — docs + positioning + version, no runtime behaviour change.
- **`secubox-toolbox` 2.6.59 → 2.7.0** (middle release) — caps the 2.6.x line
(ad-intelligence / Anti-Track v2 / anti-bot uTLS #662) and opens the **kbin** chapter:
kbin (`kbin.gk2.secubox.in`, the public ToolBoX portal) framed as the *first tool of the
CyberMind Swiss-army cyber kit* — transparent performance, full-encrypted MITM inspection,
ad poison/smog injection, adware-ban transparency banner, safe browsing.
- **Docs** — new wiki use-case `docs/wiki/Kbin-Toolbox.md`, `docs/FAQ-KBIN-TOR.md`,
README positioning blurb.
- **Plan #683 (issue + spec)** — kbin **Tor endpoint**: a quick-switch re-routing consenting
client surfing through Tor (outbound egress, pseudo-network) so the kbin exit is anonymized.
Spec `docs/superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md`. Invariants:
inspection preserved (Tor after the forging core), fail-closed, opt-in/default-OFF, no DNS
leak, CSPN audit-logged. Opposite direction of `secubox-exposure` (inbound hidden services);
reuses its Tor control. Depends on the #662 Go core for the preferred SOCKS5-dialer transport.
- **Caveat recorded** — Tor mode must force `tls_splice` (#649) OFF per-client or asset flows
leak the real IP.
---
## 2026-06-19 — #662 anti-bot: Chrome TLS fingerprint (uTLS) — defeat DataDome without splice (PR #674)
- lemonde.fr (DataDome) blocked R3 navigation at the 2nd level: the engine re-origined

View File

@ -1,10 +1,26 @@
# TODO — SecuBox-DEB Backlog
*Mis à jour : 2026-06-13*
*Mis à jour : 2026-06-19*
---
## 🔥 P0 — Immediate (in flight)
### kbin Tor endpoint — anonymized quick-switch surfing (#683)
> Capstone du couteau suisse cyber : l'anonymat de la sortie. Spec :
> `docs/superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md`.
> Invariants : inspection préservée, fail-closed, opt-in (défaut OFF), no DNS leak, CSPN audit.
- [ ] **Transport** — Option A dialer SOCKS5 upstream (cœur Go #662, *préféré*) vs
Option B nft mark → Tor TransPort (fallback pré-#662).
- [ ] **Profil Tor egress** — réutiliser `secubox-exposure` (bootstrap/NEWNYM), egress-only.
- [ ] **API toolbox**`POST /admin/tor/{on,off}` (WG-hash scoped) + `GET /tor/state` +
`POST /tor/newnym` + état SQLite per-client (TTL 24h).
- [ ] **UI kbin** — toggle 🧅 + badge état + flag pays de sortie + bouton « nouvelle identité ».
- [ ] **Leak-guard nft** + DNS-over-Tor (test exit IP + resolver ≠ Unbound).
- [ ] **`tls_splice` OFF en mode Tor** (#649) — sinon les flux asset fuient l'IP réelle.
- [ ] **CSPN** — audit-log chaque bascule ; soak DARK (flag présent, UI cachée) avant flip.
### ToolBox clients (`clients/`)
- [x] **#531 Android scaffold + CI** — Gradle/Compose one-tap onboarding,

View File

@ -1,5 +1,35 @@
# WIP — Work In Progress
*Mis à jour : 2026-06-18*
*Mis à jour : 2026-06-19*
---
## 🔄 2026-06-19 : kbin milestone — ToolBoX 2.7.0 + chapitre Tor (plan)
Checkpoint de fin de session. Pas de changement de comportement runtime — docs +
positionnement + version + plan de la lame suivante.
- ✅ **ToolBoX 2.7.0** (middle release) — clôt la ligne 2.6.x (ad-intelligence /
Anti-Track v2 / anti-bot uTLS #662), ouvre le chapitre kbin « premier outil du
couteau suisse cyber ». kbin = perf transparente + full encrypted + poison/smog +
bandeau anti-adware + safe browsing.
- ✅ **Docs kbin** — wiki [`Kbin-Toolbox.md`](../docs/wiki/Kbin-Toolbox.md),
[`FAQ-KBIN-TOR.md`](../docs/FAQ-KBIN-TOR.md), blurb README.
- ✅ **Plan #683** — spec
[`2026-06-19-kbin-tor-anonymized-surfing-design.md`](../docs/superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md) :
endpoint Tor quick-switch (egress sortant, fail-closed, opt-in, no DNS leak,
inspection préservée). Dépend du cœur Go #662.
### ⬜ Next Up — chapitre Tor (#683)
- **Décider le transport** : Option A (dialer SOCKS5 upstream via le cœur Go #662,
*préféré*) vs Option B (nft mark → Tor TransPort, fallback pré-#662).
- **Profil Tor egress** dans `secubox-exposure` (ou unit `tor-egress` dédié) —
egress-only, pas de relay/hidden-service dans ce profil.
- **API toolbox** : `POST /admin/tor/{on,off}` (par client, WG-hash), `GET /tor/state`,
`POST /tor/newnym` + état SQLite + bandeau 🧅 UI.
- **Leak-guard nft** + DNS-over-Tor (test : exit IP + resolver ≠ Unbound local).
- **Caveat** : en mode Tor, forcer `tls_splice` OFF pour ce client (sinon les flux
asset fuient l'IP réelle). Soak DARK (flag présent, UI cachée) avant flip.
---

View File

@ -57,6 +57,30 @@
---
## 🗡️ kbin — le premier outil du couteau suisse cyber
**kbin** (`kbin.gk2.secubox.in`) est le portail public de la **ToolBoX** SecuBox — la
*cabine numérique* et **première lame du couteau suisse cyber modulaire** de
[cybermind.fr](https://cybermind.fr). On s'y branche, on surfe normalement, et la lame
inspecte et protège le trafic de façon transparente :
| 🗡️ | Lame |
|----|------|
| ⚡ | **Performance transparente** — on ne déchiffre que ce qu'on modifie (SNI-splice sélectif) |
| 🔒 | **Full encrypted** — inspection MITM complète, forge de cert par hôte, fingerprint Chrome uTLS |
| ☠️ | **Injection de poison & smog** — le trafic ad-tech ressort empoisonné, pas seulement bloqué |
| 🚫 | **Bandeau anti-adware** — transparence injectée, immune au CSP, SPA-aware |
| 🛡️ | **Safe browsing** — Vortex DNS + blacklist nft + détection anti-bot |
> **Prochaine lame — 🧅 mode Tor quick-switch ([#683](https://github.com/CyberMind-FR/secubox-deb/issues/683)).**
> Un tap → le surf ressort par le réseau Tor (egress sortant, pseudo-network) : l'inspection
> reste intacte, seule l'**IP de sortie** devient anonyme. Fail-closed, opt-in, sans fuite DNS.
- Use-case : [docs/wiki/Kbin-Toolbox.md](docs/wiki/Kbin-Toolbox.md)
- FAQ : [docs/FAQ-KBIN-TOR.md](docs/FAQ-KBIN-TOR.md)
---
## License — CyberMind Source-Disclosed (CMSD-1.0)
> **Source disclosed, rights reserved.**

93
docs/FAQ-KBIN-TOR.md Normal file
View File

@ -0,0 +1,93 @@
# FAQ — kbin & le mode Tor anonymisé
> kbin (`kbin.gk2.secubox.in`) = le portail public de la **ToolBoX** SecuBox, premier
> outil du couteau suisse cyber CyberMind. Cette FAQ couvre le surf protégé et le futur
> **mode Tor quick-switch** ([#683](https://github.com/CyberMind-FR/secubox-deb/issues/683)).
---
### Qu'est-ce que kbin exactement ?
Le portail public de `secubox-toolbox`. On rejoint l'AP libre de la cabine, on consent,
et tout le trafic traverse le pipeline de forge MITM SecuBox : inspection chiffrée,
nettoyage pub/tracker, bandeau de transparence, safe browsing. Voir
[Kbin-Toolbox](wiki/Kbin-Toolbox.md).
### kbin voit-il tout mon trafic ? C'est pas dangereux ?
C'est **consenti et éphémère**. La MAC est hashée avec un sel rotatif 24 h, aucune valeur
de cookie brute n'est persistée, aucun mapping session ↔ identité réelle ne survit au TTL.
Trois niveaux d'opt-in : R0 (bypass complet), R1 (analyse passive, recommandé), R2/R3
(TLS-break + bandeau). Sans consentement, **pas** de déchiffrement.
### « Performance transparente », ça veut dire quoi ?
On ne déchiffre que ce qu'on modifie. Les flux pur-asset (vidéo, images CDN) sont
*splicés* dès le ClientHello TLS (`tls_splice`, #649) — les workers ne forgent/déchiffrent
pas ce qui n'a aucune valeur L7. Débit ligne, latence quasi nulle.
### C'est quoi « l'injection de poison et de smog » ?
Le trafic ad-tech et tracker n'est pas seulement bloqué : il est **empoisonné**. Anti-Track
v2 (#633) renvoie des pseudo-réponses, neutralise les scripts CDN préchargés, et au niveau
réseau fait de l'IP-drop + DNS-refuse. Le profil publicitaire ressort pollué, pas vide —
indistinguable d'un vrai blocage côté tracker.
### Le bandeau anti-adware, il bloque quoi ?
Une bannière de transparence injectée dans la page : nombre de trackers vus/bloqués,
acteurs reconnus cross-site. Elle est immune au CSP et SPA-aware (#636/#639, webext #655).
C'est l'affichage ; le blocage réel vient des blocklists Vortex DNS + blacklist nft.
---
## Mode Tor (plan #683)
### Le mode Tor, ça fait quoi ?
Un interrupteur 🧅 sur kbin : un tap → ton surf ressort **par le réseau Tor** au lieu du
WAN de la box. IP de sortie anonyme, identité réseau masquée — du « pseudo-network
surfing ».
### Est-ce que kbin arrête de m'inspecter/protéger en mode Tor ?
Non. Tor se place **après** le cœur de forge MITM, sur le transport upstream (dialer
SOCKS5). Tu gardes le poison/smog, le bandeau et le safe browsing ; **seules l'IP de sortie
et l'identité réseau changent**.
### Et si Tor tombe, ça repasse en clair ?
**Jamais.** Le design est **fail-closed** : si Tor n'est pas disponible, le trafic est
coupé, pas renvoyé en clearnet. L'anonymat est un invariant, pas un best-effort.
### Y a-t-il des fuites DNS ?
Non. Quand le mode Tor est actif, la résolution passe **par Tor**, pas par l'Unbound local.
### C'est la même chose que `secubox-exposure` ?
Non, direction opposée. `secubox-exposure` publie des **services cachés** Tor (entrant —
exposer un service interne). kbin Tor endpoint fait sortir ton **surf** par Tor (sortant).
Le contrôle Tor (bootstrap, NEWNYM/nouvelle identité) est réutilisé entre les deux.
### Comment je change d'IP de sortie ?
Bouton « nouvelle identité » (NEWNYM) → nouveau circuit Tor → nouvelle IP de sortie, à la
volée, sans reconnecter.
### C'est activé par défaut ?
Non. **Opt-in par client** (scopé WG-hash), **défaut OFF**, respecte ton niveau de
consentement R. Chaque bascule on/off est journalisée (audit-log CSPN immuable).
---
## Voir aussi
- [Kbin-Toolbox](wiki/Kbin-Toolbox.md) — la page use-case complète
- [Spec mode Tor](superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md)
- [Anti-Track](wiki/Anti-Track.md) — bloque/empoisonne/anonymise (couche DNS/IP)
---
*CyberMind — Gérald Kerma · LicenseRef-CMSD-1.0*

View File

@ -0,0 +1,99 @@
# Design — kbin Tor endpoint: quick-switch anonymized web surfing
*Spec · 2026-06-19 · issue [#683](https://github.com/CyberMind-FR/secubox-deb/issues/683) · status: PLAN (no code yet)*
## Problem
kbin (the public ToolBoX portal, first tool of the Swiss-army cyber kit) already gives
transparent perf + full-MITM inspection + ad poison/smog + adware-ban banner + safe
browsing. The **egress is still clearnet**: a kbin session exits via the board WAN with the
real IP. The capstone is **anonymity of the exit** — a quick-switch that re-routes a
consenting client's surfing through **Tor** (outbound), turning kbin into a pseudo-network
surfing booth.
This is the **opposite direction** of `secubox-exposure` (which publishes inbound Tor
hidden services). We reuse its Tor control plumbing (bootstrap, NEWNYM) but for egress.
## Invariants (non-negotiable)
1. **Inspection preserved** — Tor sits *after* the MITM forging core, on the upstream
transport (SOCKS5 dialer). Poison/smog + banner + safe-browsing stay; only the **exit
IP + network identity** change.
2. **Fail-closed** — if Tor is down/not bootstrapped, traffic is dropped, never falls back
to clearnet. Anonymity is an invariant, not best-effort.
3. **No DNS leak** — when Tor mode is on, resolution goes through Tor, not local Unbound.
4. **Opt-in, default OFF** — per-client (WG-hash scoped), honors the existing R consent
level. No silent global toggle.
5. **CSPN** — every Tor on/off decision written to the immutable audit-log; no plaintext
exit; TLS 1.3 floor unchanged.
## Two transport options (decide first)
| Option | Mechanism | Pros | Cons |
|--------|-----------|------|------|
| **A — SOCKS5 upstream dialer** (preferred) | The Go forging core (#662) dials upstream via Tor's SOCKS5 (`127.0.0.1:9050`) when the client is Tor-flagged. | Clean integration with #662; per-flow choice; cert verify + uTLS preserved; DNS-over-Tor native (SOCKS5 remote resolve). | Requires the Go core to land first (#662 dependency). |
| **B — nft mark → Tor TransPort** | Per-client nft mark routes 80/443 to Tor `TransPort`/`DNSPort`; transparent at L3. | Engine-agnostic; works without #662. | Bypasses the forging core unless chained carefully → risk of losing inspection (violates invariant 1). |
**Recommendation:** Option A, gated on #662 Go core. Option B only as a pre-#662 fallback,
and only if the mark routes *through* the MITM TPROXY first, then Tor.
## Components
- **Tor daemon**`tor.service`, SOCKS5 `9050` + control port (cookie auth). Reuse
`secubox-exposure` bootstrap; ensure egress-only config (no relay, no hidden service in
this profile).
- **toolbox API**`POST /admin/tor/{on,off}` (per-client, kbin-gated for bulk),
`GET /tor/state` (bootstrapped? exit country? client flag?), `POST /tor/newnym`.
- **Go forging core (#662)** — upstream dialer switch: Tor-flagged client → SOCKS5 dialer
(remote DNS) instead of direct. uTLS Chrome FP + manual cert verify unchanged.
- **State store** — per-client `tor_enabled` (WG-hash scoped, TTL-bound) in the toolbox
SQLite (`clients` table extension or a small `tor_flags` table).
- **nft leak-guard** — when a client is Tor-flagged, a guard rule ensures no 80/443 path
reaches the WAN except via the Tor dialer (defense-in-depth for invariant 2/3).
- **kbin UI** — 🧅 toggle + state badge (bootstrapping / on / exit-country flag) + "new
identity" button; respects R-level (greyed if R0).
## UX
```
[kbin page] ── tap 🧅 ──▶ POST /admin/tor/on (this client)
Tor bootstrapped? ──no──▶ "Tor démarre…" (spinner, fail-closed until ready)
│yes
flag client tor_enabled (WG-hash, TTL 24h) + audit-log
forging core dials upstream via SOCKS5 → exit IP changes
badge: 🧅 ON · 🌍 <exit-country flag> [Nouvelle identité]
```
## Open questions (resolve next session)
- Per-flow vs per-session Tor? (start per-session/per-client; per-flow later)
- Exit-country selection (`ExitNodes {cc}`) exposed to user, or auto?
- Latency expectation messaging — Tor is slower; the perf banner must set expectations.
- Interaction with `tls_splice` (#649): splice = direct fast-path; in Tor mode, splice
must be disabled or also routed through Tor (else asset flows leak the real IP).
**Likely: Tor mode forces splice OFF for that client.**
- Interaction with Anti-Track v2 IP-drop/DNS-refuse: ordering vs Tor resolution.
## Dependencies & sequencing
1. **#662 Go core** lands the upstream dialer abstraction → enables Option A.
2. Tor egress profile in `secubox-exposure` (or a dedicated `tor-egress` unit).
3. toolbox API + state + UI.
4. nft leak-guard + DNS-over-Tor verification (leak test: compare exit IP + DNS resolver).
5. CSPN audit-log wiring + soak DARK (flag exists, UI hidden) → flip.
## Test plan (sketch)
- Leak test: with Tor mode on, `check.torproject.org` confirms Tor; DNS resolver is not the
local Unbound; real WAN IP never observed upstream.
- Fail-closed test: stop `tor.service` mid-session → traffic drops, no clearnet egress.
- Inspection test: ad-block + banner + poison still fire while on Tor.
- NEWNYM test: exit IP changes after "new identity".
---
*CyberMind — Gérald Kerma · LicenseRef-CMSD-1.0*

94
docs/wiki/Kbin-Toolbox.md Normal file
View File

@ -0,0 +1,94 @@
# kbin — ToolBoX, le premier outil du couteau suisse cyber
**CyberMind · Gondwana · Notre-Dame-du-Cruet · Savoie** | [Home](Home) | [Anti-Track](Anti-Track) | [Modules](Modules)
> **kbin** (`kbin.gk2.secubox.in`) est le portail public de la **ToolBoX** SecuBox —
> la *cabine téléphonique numérique*. C'est le **premier outil du couteau suisse cyber
> modulaire** de [cybermind.fr](https://cybermind.fr) : on s'y connecte, on surfe, et la
> lame inspecte, nettoie et protège le trafic de façon transparente.
---
## Le concept en une phrase
> **Branche-toi, navigue normalement — kbin rend ta session rapide, chiffrée, sans pub
> et bientôt anonyme.**
kbin est la face publique du module [`secubox-toolbox`](../../packages/secubox-toolbox/).
Le client rejoint l'AP libre, consent (R1 passif / R2 TLS-break), et tout son trafic
traverse le pipeline de forge MITM SecuBox — sans configuration, sans app obligatoire.
---
## Les 5 lames déjà affûtées
| 🗡️ Lame | Ce qu'elle fait | Implémentation |
|---------|-----------------|----------------|
| **⚡ Performance transparente** | Débit ligne, latence quasi nulle ; on ne déchiffre que ce qu'on modifie (SNI-splice sélectif des flux pur-asset). | `tls_splice` addon (#649), workers R3 |
| **🔒 Full encrypted** | Inspection MITM complète sur HTTPS sortant : forge de cert par hôte, chaîne de certs vérifiée, fingerprint Chrome (uTLS) côté upstream. | Go forging core (#662), uTLS HelloChrome |
| **☠️ Injection de poison & smog** | Le trafic ad-tech / tracker entre dans la chambre d'inspection et ressort empoisonné/embrumé : pseudo-réponses, scripts neutralisés, IP-drop + DNS-refuse. | Anti-Track v2 (#633), `privacy_guard`, ad-ghoster |
| **🚫 Bandeau anti-adware** | Bannière de transparence injectée dans la page : « tu as été pisté / X trackers bloqués », immune au CSP, SPA-aware. | banner saga (#636/#639), webext (#655) |
| **🛡️ Safe browsing** | Blocklists Vortex DNS, blacklist nft (CrowdSec + threat-intel), détection anti-bot/challenge passive. | Phase 13 enforcement plane, Vortex Unbound |
---
## La lame suivante : 🧅 Tor quick-switch (plan #683)
C'est la **pointe manquante** : l'anonymat de la sortie.
Aujourd'hui kbin voit, nettoie et protège — mais le trafic ressort par le WAN de la box,
avec l'IP réelle. Le **endpoint Tor** ajoute un interrupteur :
> **Un tap sur kbin → 🧅 « Mode Tor »** → le surf du client ressort **par le réseau Tor**
> au lieu du WAN. Pseudo-réseau, IP de sortie anonyme, identité réseau masquée.
Invariants de conception (voir
[spec](../superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md)) :
- **L'inspection reste intacte** — Tor se place *après* le cœur de forge MITM, sur le
transport upstream (dialer SOCKS5). On garde poison/smog + bandeau + safe browsing ;
seules **l'IP de sortie et l'identité réseau** changent.
- **Opt-in par client** (scopé WG-hash), **défaut OFF**, respecte le niveau de consentement R.
- **Fail-closed** — si Tor tombe, **pas** de repli clearnet (l'anonymat est un invariant,
pas un best-effort).
- **Pas de fuite DNS** — résolution via Tor quand le mode est actif, pas via l'Unbound local.
- **CSPN** — chaque bascule Tor on/off est journalisée (audit-log immuable) ; aucune sortie
en clair.
### Cas d'usage
1. **Cabine VILLAGE3B** — un visiteur veut consulter un site sensible (santé, juridique,
presse) depuis la borne publique sans laisser l'IP de la box. Tap 🧅 → surf anonyme.
2. **Pseudo-network surfing** — naviguer comme depuis un autre pays / une autre identité
réseau, le temps d'une session éphémère 24h.
3. **Renouvellement de circuit** — bouton « nouvelle identité » (NEWNYM) pour changer
d'IP de sortie à la volée.
> Direction **opposée** à `secubox-exposure` : celui-ci publie des *services cachés* Tor
> (entrant) ; kbin Tor endpoint fait sortir le surf client *par* Tor (sortant).
---
## Où ça vit
| Élément | Emplacement |
|---------|-------------|
| Portail public | `kbin.gk2.secubox.in` → HAProxy → `toolbox_landing``10.99.0.1:8088` |
| Tableau opérateur | `admin.gk2.secubox.in/toolbox/` |
| Vue carto perso | `kbin.gk2.secubox.in/social/me` |
| Module | [`packages/secubox-toolbox/`](../../packages/secubox-toolbox/) |
| Canal Tor (réutilisé) | [`packages/secubox-exposure/`](../../packages/secubox-exposure/) |
---
## Voir aussi
- [Anti-Track](Anti-Track) — moteur bloque/empoisonne/anonymise (couche DNS/IP)
- [FAQ kbin & Tor](../FAQ-KBIN-TOR.md)
- Punk Exposure Engine — canal Tor, doctrine dans `CLAUDE.md`
- Epic [#662](https://github.com/CyberMind-FR/secubox-deb/issues/662) — migration cœur MITM (Go)
- Plan [#683](https://github.com/CyberMind-FR/secubox-deb/issues/683) — kbin Tor endpoint
---
*CyberMind — Gérald Kerma · LicenseRef-CMSD-1.0*

View File

@ -0,0 +1,147 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: ad-candidate learning-feed tests (#662)
//
// The Go cutover blocked from STATIC lists but never emitted LEARNING
// candidates, so a brand-new adware (acotedemoi.com) was never observed → never
// promoted → slipped through forever. These tests prove the engine now ports
// ad_ghost's _AD_PATH heuristic and records a candidate (host,site) for every
// 3rd-party ad-path request on the allow/mitm path — the feed autolearn promotes.
package main
import (
"path/filepath"
"testing"
)
func TestAdPathRegex(t *testing.T) {
hit := []string{
"/ad/1.gif", "/ads/x", "/adserver/req", "/pagead/conversion",
"/gampad/ads", "/doubleclick/x", "/beacon", "/pixel.gif",
"/collect", "/track", "/tracking/p", "/telemetry/v2", "/metric",
"/PAGEAD/Upper", // case-insensitive
}
for _, p := range hit {
if !adPathRE.MatchString(p) {
t.Errorf("adPathRE should MATCH %q", p)
}
}
miss := []string{"/", "/index.html", "/api/users", "/static/app.js", "/cart", "/headline"}
for _, p := range miss {
if adPathRE.MatchString(p) {
t.Errorf("adPathRE should NOT match %q", p)
}
}
}
// newAdCandTestPolicy builds a Policy with doubleclick.net allowlisted (so the
// allowlist-skip branch is exercised) and nothing learned.
func newAdCandTestPolicy(t *testing.T) *Policy {
t.Helper()
pol, err := LoadPolicy(PolicyOpts{
AllowPath: writeTemp(t, "doubleclick.net\n"),
LearnedPath: writeTemp(t, ""),
SpliceSeedPath: writeTemp(t, ""),
SpliceLearnPath: writeTemp(t, ""),
PureTrackersPath: writeTemp(t, ""),
SelfDomains: []string{"secubox.in"},
})
if err != nil {
t.Fatalf("LoadPolicy: %v", err)
}
return pol
}
func TestMaybeRecordAdCandidate(t *testing.T) {
pol := newAdCandTestPolicy(t)
cases := []struct {
name string
host string // request host
site string // referer site (registrable)
path string
want bool // candidate recorded?
wantHK string
}{
{"3rd-party ad-path → candidate", "metrics.acotedemoi.com", "lemonde.fr", "/collect", true, "metrics.acotedemoi.com"},
{"3rd-party ad-path /pagead", "ads.foo.io", "news.example", "/pagead/x", true, "ads.foo.io"},
{"1st-party (same registrable) → no candidate", "static.lemonde.fr", "lemonde.fr", "/ads/x", false, ""},
{"3rd-party non-ad-path → no candidate", "cdn.acotedemoi.com", "lemonde.fr", "/app.js", false, ""},
{"no site (no Referer) → no candidate", "metrics.acotedemoi.com", "", "/collect", false, ""},
{"allowlisted host → no candidate", "ads.doubleclick.net", "lemonde.fr", "/pagead/x", false, ""},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
cand := newAdCandidates()
px := &Proxy{pol: pol, cand: cand, analysisRelay: true}
px.maybeRecordAdCandidate(tc.host, tc.site, tc.path)
snap := cand.snapshot()
if tc.want {
if len(snap) != 1 {
t.Fatalf("want 1 candidate, got %d (%+v)", len(snap), snap)
}
if snap[0].Host != tc.wantHK {
t.Fatalf("candidate host = %q, want %q", snap[0].Host, tc.wantHK)
}
if snap[0].Site != tc.site {
t.Fatalf("candidate site = %q, want %q", snap[0].Site, tc.site)
}
if snap[0].Hits != 1 {
t.Fatalf("candidate hits = %d, want 1", snap[0].Hits)
}
} else if len(snap) != 0 {
t.Fatalf("want 0 candidates, got %d (%+v)", len(snap), snap)
}
})
}
}
// TestAdCandidateGatedByRelay proves the feed is gated behind the analysis/ad
// relay flag: with the gate off, nothing is recorded even on a textbook hit.
func TestAdCandidateGatedByRelay(t *testing.T) {
pol := newAdCandTestPolicy(t)
cand := newAdCandidates()
px := &Proxy{pol: pol, cand: cand, analysisRelay: false}
px.maybeRecordAdCandidate("metrics.acotedemoi.com", "lemonde.fr", "/collect")
if n := len(cand.snapshot()); n != 0 {
t.Fatalf("relay off: want 0 candidates, got %d", n)
}
}
// TestAdCandidateHitsAccumulate proves repeated (host,site) hits coalesce.
func TestAdCandidateHitsAccumulate(t *testing.T) {
cand := newAdCandidates()
for i := 0; i < 5; i++ {
cand.record("x.tracker.io", "site.example")
}
snap := cand.snapshot()
if len(snap) != 1 || snap[0].Hits != 5 {
t.Fatalf("want 1 row hits=5, got %+v", snap)
}
// snapshot clears.
if n := len(cand.snapshot()); n != 0 {
t.Fatalf("snapshot should clear: got %d", n)
}
}
// TestAdCandidatePayloadShape proves the candidates list serialises into the
// extended ad-event payload (host/site/hits keys).
func TestAdCandidatePayloadShape(t *testing.T) {
cand := newAdCandidates()
cand.record("x.tracker.io", "site.example")
rows := cand.snapshot()
p := adEventPayload{Candidates: rows}
if p.empty() {
t.Fatal("payload with candidates must not be empty()")
}
}
// writeTemp writes content to a fresh temp file and returns its path.
func writeTemp(t *testing.T, content string) string {
t.Helper()
f := filepath.Join(t.TempDir(), "list.txt")
writeFile(t, f, content)
return f
}

View File

@ -26,10 +26,74 @@ import (
"log"
"net/http"
"net/url"
"regexp"
"sync"
"time"
)
// ── ad-candidate learning feed (#662 auto-learn loop) ─────────────────────────
//
// The STATIC block list never grows on its own; ad_ghost fed autolearn by
// capturing CANDIDATES — 3rd-party requests whose PATH smells like an ad/track
// endpoint — into ad_candidates, which secubox-toolbox-autolearn later promotes
// into learned-trackers.txt at AD_MIN_SITES distinct sites. The Go cutover
// dropped this feed, so new adwares (acotedemoi.com) were never observed. This
// restores it in the engine: the allow/mitm hot path records (host,site) when
// the request is 3rd-party AND adPathRE matches, buffered + flushed with the
// existing ad-event machinery.
// adPathRE ports ad_ghost._AD_PATH (RE2-safe, case-insensitive). Matches a path
// that looks like an ad/track endpoint. Learning only — never a block decision.
//
// Python: re.compile(r"/ads?/|/adserver|/pagead|/gampad|/doubleclick|/beacon|"
// r"/pixel|/collect|/track(ing)?|/telemetry|/metric", re.I)
var adPathRE = regexp.MustCompile(`(?i)/ads?/|/adserver|/pagead|/gampad|/doubleclick|/beacon|/pixel|/collect|/track(ing)?|/telemetry|/metric`)
// adCandMapCap bounds the candidate buffer (mirrors ad_ghost's `len(_cand) <
// 20000` guard): NEW keys past the cap are dropped until the next flush clears
// it, so a dead portal can never grow memory unbounded.
const adCandMapCap = 20000
// adCandidates is the lock-guarded (host,site)→hits candidate aggregator,
// drained by the ad-stats flusher into the ad-event payload's "candidates" list.
type adCandidates struct {
mu sync.Mutex
hit map[adKey]int64
}
func newAdCandidates() *adCandidates { return &adCandidates{hit: map[adKey]int64{}} }
// record tallies one ad-candidate (host,site). O(1); the cap drops only NEW keys
// (existing keys keep accumulating). Empty host is ignored.
func (a *adCandidates) record(host, site string) {
if host == "" {
return
}
a.mu.Lock()
defer a.mu.Unlock()
k := adKey{adHost: host, site: site}
if _, ok := a.hit[k]; ok {
a.hit[k]++
} else if len(a.hit) < adCandMapCap {
a.hit[k] = 1
}
}
// snapshot atomically reads-and-clears the buffer, returning the candidate rows.
func (a *adCandidates) snapshot() []adCandidateRow {
a.mu.Lock()
defer a.mu.Unlock()
if len(a.hit) == 0 {
return nil
}
rows := make([]adCandidateRow, 0, len(a.hit))
for k, n := range a.hit {
rows = append(rows, adCandidateRow{Host: k.adHost, Site: k.site, Hits: n})
}
a.hit = map[adKey]int64{}
return rows
}
// refererSite ports the ad_ghost _site_of logic: parse the Referer header as a
// URL, take its hostname, and return registrable(hostname). Empty Referer or a
// parse failure → "" (the page that issued the blocked request is unknown).
@ -133,9 +197,19 @@ type adClientRow struct {
Bytes int64 `json:"bytes"`
}
// adCandidateRow is one learning candidate (host seen issuing ad-path requests
// from a 1st-party site). Mirrors the portal /__toolbox/ad-event "candidates"
// contract → store.record_ad_candidates([(host, site, hits), ...]).
type adCandidateRow struct {
Host string `json:"host"`
Site string `json:"site"`
Hits int64 `json:"hits"`
}
type adEventPayload struct {
Blocks []adBlockRow `json:"blocks"`
Clients []adClientRow `json:"clients"`
Blocks []adBlockRow `json:"blocks"`
Clients []adClientRow `json:"clients"`
Candidates []adCandidateRow `json:"candidates,omitempty"`
}
// snapshot atomically reads-and-clears both maps, returning the accumulated rows.
@ -159,7 +233,9 @@ func (a *adStats) snapshot() adEventPayload {
}
// empty reports whether a payload carries no rows (nothing to POST).
func (p adEventPayload) empty() bool { return len(p.Blocks) == 0 && len(p.Clients) == 0 }
func (p adEventPayload) empty() bool {
return len(p.Blocks) == 0 && len(p.Clients) == 0 && len(p.Candidates) == 0
}
// adEventClient is a short-timeout fire-and-forget client for the ad-event POST.
// Sibling of portalClient (banner.go): the portal is a fixed loopback base, so
@ -175,8 +251,15 @@ var adEventClient = &http.Client{
// non-2xx) is swallowed with at most a debug log — the metrics are stats, not
// security, and the engine must never block on the portal. Exposed (returns the
// flushed payload) so the test can assert the snapshot/clear + payload shape.
func (a *adStats) flushOnce(portal string) adEventPayload {
//
// cand may be nil (the CONNECT PoC / tests with no learning feed); when set its
// candidate rows are drained into the SAME payload so the learning feed rides
// the existing ad-event channel (one POST per 10s, not two).
func (a *adStats) flushOnce(portal string, cand *adCandidates) adEventPayload {
p := a.snapshot()
if cand != nil {
p.Candidates = cand.snapshot()
}
if p.empty() {
return p
}
@ -198,10 +281,10 @@ func (a *adStats) flushOnce(portal string) adEventPayload {
// runAdStatsFlusher is the background flusher goroutine: every adFlushInterval it
// drains the aggregator to the portal. Start it once from main() (like the
// engine's other startup goroutines). It runs forever (the process lifetime).
func (a *adStats) runAdStatsFlusher(portal string) {
func (a *adStats) runAdStatsFlusher(portal string, cand *adCandidates) {
t := time.NewTicker(adFlushInterval)
defer t.Stop()
for range t.C {
a.flushOnce(portal)
a.flushOnce(portal, cand)
}
}

View File

@ -41,7 +41,7 @@ func TestRecordAdBlockEmptyHostIgnored(t *testing.T) {
func TestRecordAdBlockPerClientOnlyWhenMacSet(t *testing.T) {
a := newAdStats()
a.recordAdBlock("ads.example.com", "site", "") // no mac → no client row
a.recordAdBlock("ads.example.com", "site", "") // no mac → no client row
a.recordAdBlock("ads.example.com", "site", "mac1") // mac → client row
a.recordAdBlock("ads.example.com", "site", "mac1")
@ -111,7 +111,7 @@ func TestFlushOncePayloadShapeMatchesContract(t *testing.T) {
}))
defer srv.Close()
a.flushOnce(srv.URL)
a.flushOnce(srv.URL, nil)
if ct != "application/json" {
t.Fatalf("Content-Type = %q, want application/json", ct)
@ -145,7 +145,7 @@ func TestFlushOnceEmptySkipsPost(t *testing.T) {
w.WriteHeader(http.StatusNoContent)
}))
defer srv.Close()
a.flushOnce(srv.URL)
a.flushOnce(srv.URL, nil)
if posted {
t.Fatalf("flushOnce on empty aggregator must not POST")
}
@ -156,7 +156,7 @@ func TestFlushOnceSwallowsPortalError(t *testing.T) {
a.recordAdBlock("ads.example.com", "site", "")
// Unreachable portal → must not panic, must still clear the maps (snapshot
// happens before the POST).
a.flushOnce("http://127.0.0.1:1")
a.flushOnce("http://127.0.0.1:1", nil)
if len(a.blocks) != 0 {
t.Fatalf("flushOnce must clear maps even on POST failure")
}

View File

@ -24,6 +24,7 @@ import (
"io"
"log"
"net/http"
"net/url"
"strings"
"time"
)
@ -111,6 +112,94 @@ func injectLoader(body []byte, clientHash string, wg, cspBypassed bool) []byte {
return body
}
// ── inline banner (#662, supersedes injectLoader in the live path) ──────────
//
// Sites with a SERVICE WORKER (leparisien, cnn…) intercept EVERY same-origin
// request, so the legacy <script src="/__toolbox/loader.js"> tag and the
// fetch("/__toolbox/bundle") it makes are hijacked by the page's SW (404 /
// app-shell) BEFORE they reach this engine → the banner never appears. The fix
// is to INLINE the whole banner: the engine fetches the COMPLETE script body
// from the portal server-side (once per injected HTML response) and bakes it
// into a self-contained <script>…</script> with mh/wg/csp + the bundle as JS
// literals — so there is NOTHING same-origin for the SW to hijack.
//
// injectLoader + the /__toolbox/loader.js short-circuit are KEPT (not removed)
// for compatibility, but the live inject path now uses the inline banner.
// fetchInlineBanner fetches the COMPLETE inline banner script BODY from the
// portal's /__toolbox/inline endpoint (which bakes mh/wg/csp + the bundle as JS
// literals). Returns (body, true) on a 2xx; FAIL-OPEN (returns "", false) on any
// error — portal down, timeout, non-2xx, read failure — so the caller simply
// skips the inject and serves the page intact (no banner, like today's fail-open
// when the portal asset 204s). It NEVER breaks a navigation over a banner.
//
// wg → "1" else "0"; cspBypassed → csp=1 (the 🔓 proof) else 0; clientHash is
// ascii-sanitised exactly like the data-mh attribute was.
func fetchInlineBanner(portal, clientHash string, wg, cspBypassed bool) (string, bool) {
wgVal := "0"
if wg {
wgVal = "1"
}
cspVal := "0"
if cspBypassed {
cspVal = "1"
}
q := url.Values{}
q.Set("mh", asciiOnly(clientHash))
q.Set("wg", wgVal)
q.Set("csp", cspVal)
target := strings.TrimRight(portal, "/") + "/__toolbox/inline?" + q.Encode()
resp, err := portalClient.Get(target)
if err != nil {
log.Printf("inline banner fetch failed for %s: %v", target, err)
return "", false
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
log.Printf("inline banner fetch non-2xx (%d) for %s", resp.StatusCode, target)
return "", false
}
body, rerr := io.ReadAll(io.LimitReader(resp.Body, 8<<20))
if rerr != nil {
log.Printf("inline banner read failed for %s: %v", target, rerr)
return "", false
}
return string(body), true
}
// injectInlineBanner inserts a SELF-CONTAINED <script>scriptBody</script> into an
// HTML body once. It is idempotent via the SAME bannerGuard marker injectLoader
// uses (so a body already carrying either form is never double-injected), and it
// uses the SAME placement injectLoader did:
// - guard idempotency: body already contains bannerGuard → unchanged.
// - after the first (case-insensitive) "<head"'s closing '>'.
// - else right BEFORE the first "<body".
// - else return the body unchanged (no inject).
//
// scriptBody is the COMPLETE inline IIFE from fetchInlineBanner (NOT a src tag);
// an empty scriptBody is a no-op (returns the body unchanged) so a failed/skipped
// fetch is handled gracefully by the caller passing "".
func injectInlineBanner(body []byte, scriptBody string) []byte {
if scriptBody == "" {
return body
}
if bytes.Contains(body, []byte(bannerGuard)) {
return body
}
script := []byte("<!-- " + bannerGuard + " --><script>" + scriptBody + "</script>")
low := bytes.ToLower(body)
if h := bytes.Index(low, []byte("<head")); h >= 0 {
if j := bytes.IndexByte(body[h:], '>'); j >= 0 {
return spliceAt(body, script, h+j+1)
}
}
if b := bytes.Index(low, []byte("<body")); b >= 0 {
return spliceAt(body, script, b)
}
return body
}
// ── /__toolbox/* reverse-proxy to the portal ─────────────────────────────────
// isToolboxAssetPath reports whether a request path is one of the banner assets

View File

@ -10,10 +10,19 @@
package main
import (
"net/http"
"net/http/httptest"
"strings"
"testing"
)
// inlineTestScript is a stand-in for the COMPLETE inline banner body that
// fetchInlineBanner pulls from the portal. The Go engine treats it as an opaque
// string (the JS literal-baking is the portal's job, covered by the Python
// tests); these tests only assert placement / idempotency / fail-open. Shared
// across banner_test, gzip_test, compress_test, cosmetic_test.
const inlineTestScript = `(function(){window.__SBX_LOADER__=1;})();`
func TestInjectLoaderGuardIdempotent(t *testing.T) {
// Body already carrying the guard → returned byte-for-byte unchanged.
body := []byte("<html><head><!-- " + bannerGuard + " --><script></script></head><body>hi</body></html>")
@ -130,3 +139,141 @@ func TestPortalTargetURL(t *testing.T) {
}
}
}
// ── #662 inline banner (SW-immune; supersedes injectLoader in the live path) ──
func TestInjectInlineBannerEmptyScriptNoop(t *testing.T) {
// scriptBody == "" (fetch failed/skipped) → no inject, body unchanged.
body := []byte(`<html><head></head><body>hi</body></html>`)
out := injectInlineBanner(body, "")
if string(out) != string(body) {
t.Fatalf("empty scriptBody must be a no-op.\n got: %s", out)
}
}
func TestInjectInlineBannerGuardIdempotent(t *testing.T) {
// Body already carrying the guard → returned byte-for-byte unchanged.
body := []byte("<html><head><!-- " + bannerGuard + " --><script></script></head><body>hi</body></html>")
out := injectInlineBanner(body, inlineTestScript)
if string(out) != string(body) {
t.Fatalf("guarded body must be unchanged.\n got: %s", out)
}
}
func TestInjectInlineBannerHeadInsertion(t *testing.T) {
body := []byte(`<html><head lang="en"><title>x</title></head><body>hi</body></html>`)
out := string(injectInlineBanner(body, inlineTestScript))
headOpen := `<head lang="en">`
idx := strings.Index(out, headOpen)
if idx < 0 {
t.Fatalf("head open lost: %s", out)
}
after := out[idx+len(headOpen):]
// An INLINE <script> (not <script src), carrying the body verbatim, right
// after the <head>'s '>'.
wantTag := `<!-- ` + bannerGuard + ` --><script>` + inlineTestScript + `</script>`
if !strings.HasPrefix(after, wantTag) {
t.Fatalf("inline tag not inserted right after <head>'s '>'.\n got: %s", after)
}
if strings.Contains(out, "<script src=") {
t.Fatalf("inline banner must NOT be a <script src> tag: %s", out)
}
if !strings.Contains(out, wantTag+`<title>x</title>`) {
t.Fatalf("original head content displaced: %s", out)
}
}
func TestInjectInlineBannerBodyFallback(t *testing.T) {
body := []byte(`<html><body class="x">hi</body></html>`)
out := string(injectInlineBanner(body, inlineTestScript))
wantTag := `<!-- ` + bannerGuard + ` --><script>` + inlineTestScript + `</script>`
if !strings.Contains(out, wantTag+`<body class="x">`) {
t.Fatalf("inline tag not inserted right before <body>.\n got: %s", out)
}
}
func TestInjectInlineBannerNeitherHeadNorBody(t *testing.T) {
body := []byte(`<p>just a fragment</p>`)
out := injectInlineBanner(body, inlineTestScript)
if string(out) != string(body) {
t.Fatalf("no head/body → must be unchanged.\n got: %s", out)
}
}
func TestInjectInlineBannerCaseInsensitiveHead(t *testing.T) {
body := []byte(`<HTML><HEAD></HEAD><BODY>hi</BODY></HTML>`)
out := string(injectInlineBanner(body, inlineTestScript))
if !strings.Contains(out, `<HEAD><!-- `+bannerGuard) {
t.Fatalf("case-insensitive <HEAD> match failed: %s", out)
}
}
func TestFetchInlineBannerOK(t *testing.T) {
// Portal returns a body + 200 → fetchInlineBanner returns (body, true) and
// echoes mh/wg/csp into the query.
var gotQuery string
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
gotQuery = r.URL.RawQuery
w.Header().Set("Content-Type", "application/javascript")
_, _ = w.Write([]byte(inlineTestScript))
}))
defer srv.Close()
body, ok := fetchInlineBanner(srv.URL, "deadbeef", true, true)
if !ok {
t.Fatal("fetchInlineBanner must report ok=true on a 200")
}
if body != inlineTestScript {
t.Fatalf("fetchInlineBanner body mismatch: %q", body)
}
for _, want := range []string{"mh=deadbeef", "wg=1", "csp=1"} {
if !strings.Contains(gotQuery, want) {
t.Fatalf("query %q missing %q", gotQuery, want)
}
}
}
func TestFetchInlineBannerWGCSPZero(t *testing.T) {
var gotQuery string
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
gotQuery = r.URL.RawQuery
_, _ = w.Write([]byte(inlineTestScript))
}))
defer srv.Close()
if _, ok := fetchInlineBanner(srv.URL, "x", false, false); !ok {
t.Fatal("ok=true expected")
}
for _, want := range []string{"wg=0", "csp=0"} {
if !strings.Contains(gotQuery, want) {
t.Fatalf("query %q missing %q", gotQuery, want)
}
}
}
func TestFetchInlineBannerFailOpenDeadPortal(t *testing.T) {
// A dead portal (closed listener) → fail-open: ("", false) → caller skips the
// inject and serves the page intact. No panic, no error surfaced.
srv := httptest.NewServer(http.HandlerFunc(func(http.ResponseWriter, *http.Request) {}))
url := srv.URL
srv.Close() // close BEFORE the fetch → dial error
body, ok := fetchInlineBanner(url, "x", false, false)
if ok {
t.Fatal("dead portal must fail open (ok=false)")
}
if body != "" {
t.Fatalf("fail-open body must be empty, got %q", body)
}
}
func TestFetchInlineBannerNon2xxFailOpen(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusInternalServerError)
_, _ = w.Write([]byte("boom"))
}))
defer srv.Close()
body, ok := fetchInlineBanner(srv.URL, "x", false, false)
if ok || body != "" {
t.Fatalf("non-2xx must fail open: ok=%v body=%q", ok, body)
}
}

View File

@ -90,7 +90,7 @@ func TestInjectIntoBodyBrotli(t *testing.T) {
if err != nil {
t.Fatal(err)
}
out, ok := injectIntoBody(enc, "br", "abc123", true, false)
out, ok := injectIntoBody(enc, "br", inlineTestScript, true)
if !ok {
t.Fatal("br inject must report ok=true")
}
@ -113,7 +113,7 @@ func TestInjectIntoBodyZstd(t *testing.T) {
if err != nil {
t.Fatal(err)
}
out, ok := injectIntoBody(enc, "zstd", "abc123", true, false)
out, ok := injectIntoBody(enc, "zstd", inlineTestScript, true)
if !ok {
t.Fatal("zstd inject must report ok=true")
}
@ -132,7 +132,7 @@ func TestInjectIntoBodyZstd(t *testing.T) {
func TestInjectIntoBodyBrotliCaseInsensitive(t *testing.T) {
enc, _ := brotliBytes([]byte(`<head></head>`))
out, ok := injectIntoBody(enc, "BR", "z", false, false)
out, ok := injectIntoBody(enc, "BR", inlineTestScript, false)
if !ok {
t.Fatal("Content-Encoding BR (upper) must be recognised → ok=true")
}
@ -147,7 +147,7 @@ func TestInjectIntoBodyBrotliCaseInsensitive(t *testing.T) {
func TestInjectIntoBodyBrotliFailOpen(t *testing.T) {
bad := []byte("not brotli at all <head></head>")
out, ok := injectIntoBody(bad, "br", "x", false, false)
out, ok := injectIntoBody(bad, "br", inlineTestScript, false)
if ok {
t.Fatal("corrupt br body must fail open (ok=false)")
}
@ -158,7 +158,7 @@ func TestInjectIntoBodyBrotliFailOpen(t *testing.T) {
func TestInjectIntoBodyZstdFailOpen(t *testing.T) {
bad := []byte("not zstd at all <head></head>")
out, ok := injectIntoBody(bad, "zstd", "x", false, false)
out, ok := injectIntoBody(bad, "zstd", inlineTestScript, false)
if ok {
t.Fatal("corrupt zstd body must fail open (ok=false)")
}
@ -177,7 +177,7 @@ func TestBrotliZstdBombGuard(t *testing.T) {
t.Fatal("unbrotliBytes must reject output exceeding gunzipCap")
}
// fail-open through the inject path.
if out, ok := injectIntoBody(brBomb, "br", "x", false, false); ok || !bytes.Equal(out, brBomb) {
if out, ok := injectIntoBody(brBomb, "br", inlineTestScript, false); ok || !bytes.Equal(out, brBomb) {
t.Fatal("over-cap br body must fail open with original bytes")
}
@ -188,7 +188,7 @@ func TestBrotliZstdBombGuard(t *testing.T) {
if _, err := unzstdBytes(zsBomb); err == nil {
t.Fatal("unzstdBytes must reject output exceeding gunzipCap")
}
if out, ok := injectIntoBody(zsBomb, "zstd", "x", false, false); ok || !bytes.Equal(out, zsBomb) {
if out, ok := injectIntoBody(zsBomb, "zstd", inlineTestScript, false); ok || !bytes.Equal(out, zsBomb) {
t.Fatal("over-cap zstd body must fail open with original bytes")
}
}

View File

@ -132,27 +132,32 @@ func TestInjectCosmeticCaseInsensitive(t *testing.T) {
}
}
func TestInjectLoaderAndCosmeticCompose(t *testing.T) {
func TestInjectInlineBannerAndCosmeticCompose(t *testing.T) {
// Both markers must be present after composing the two injects (wg client).
// #662 — the banner is now the INLINE script (not a <script src> tag).
body := []byte(`<html><head></head><body>hi</body></html>`)
out := string(injectHTML(body, "deadbeef", true, false))
out := string(injectHTML(body, inlineTestScript, true))
if !strings.Contains(out, bannerGuard) {
t.Fatalf("loader marker missing after compose: %s", out)
t.Fatalf("banner marker missing after compose: %s", out)
}
if !strings.Contains(out, cosmeticGuard) {
t.Fatalf("cosmetic marker missing after compose: %s", out)
}
if !strings.Contains(out, `data-mh="deadbeef"`) {
t.Fatalf("loader data-mh missing after compose: %s", out)
// The inline banner is an inline <script> carrying the baked body, NOT a src.
if !strings.Contains(out, "<script>"+inlineTestScript+"</script>") {
t.Fatalf("inline banner body missing after compose: %s", out)
}
if strings.Contains(out, "<script src=") {
t.Fatalf("inline path must NOT emit a <script src> tag: %s", out)
}
}
func TestInjectHTMLNonWGSkipsCosmetic(t *testing.T) {
// Non-WG (non-R3) clients get the loader but NOT the cosmetic style.
// Non-WG (non-R3) clients get the banner but NOT the cosmetic style.
body := []byte(`<html><head></head><body>hi</body></html>`)
out := string(injectHTML(body, "x", false, false))
out := string(injectHTML(body, inlineTestScript, false))
if !strings.Contains(out, bannerGuard) {
t.Fatalf("loader marker missing for non-wg: %s", out)
t.Fatalf("banner marker missing for non-wg: %s", out)
}
if strings.Contains(out, cosmeticGuard) {
t.Fatalf("cosmetic style must NOT be injected for non-wg client: %s", out)
@ -163,7 +168,7 @@ func TestInjectIntoBodyGzipCarriesCosmetic(t *testing.T) {
// The gzip decompress→inject→recompress path must carry BOTH injects for wg.
body := []byte(`<html><head></head><body>hi</body></html>`)
gz := gzipBytes(body)
out, ok := injectIntoBody(gz, "gzip", "mh1", true, false)
out, ok := injectIntoBody(gz, "gzip", inlineTestScript, true)
if !ok {
t.Fatalf("injectIntoBody(gzip) returned ok=false")
}
@ -174,4 +179,8 @@ func TestInjectIntoBodyGzipCarriesCosmetic(t *testing.T) {
if !strings.Contains(string(plain), bannerGuard) || !strings.Contains(string(plain), cosmeticGuard) {
t.Fatalf("gzip path lost a marker: %s", plain)
}
// The inline banner script body survives the gzip round-trip.
if !strings.Contains(string(plain), "<script>"+inlineTestScript+"</script>") {
t.Fatalf("inline banner body lost on gzip path: %s", plain)
}
}

View File

@ -146,31 +146,39 @@ func zstdBytes(in []byte) ([]byte, error) {
}
// injectHTML applies BOTH HTML transforms in one pass over the DECOMPRESSED
// body: the transparency-banner loader (always) AND, for R3 (wg) clients, the
// ad/popup-hiding cosmetic <style> (#662 — the cutover left this unported). Both
// are idempotent (own guard markers) and order-independent; running them in the
// same decompressed step means the cosmetic style benefits from the gzip
// handling exactly like the loader. The cosmetic style is gated to wg because it
// is an R3-tunnel opt-in behaviour (mirrors the Python addon's _is_r3plus gate).
func injectHTML(plain []byte, clientHash string, wg, cspBypassed bool) []byte {
out := injectLoader(plain, clientHash, wg, cspBypassed)
// body: the transparency-banner (always, via the INLINE script) AND, for R3 (wg)
// clients, the ad/popup-hiding cosmetic <style> (#662 — the cutover left this
// unported). Both are idempotent (own guard markers) and order-independent;
// running them in the same decompressed step means the cosmetic style benefits
// from the gzip handling exactly like the banner. The cosmetic style is gated to
// wg because it is an R3-tunnel opt-in behaviour (mirrors the Python addon's
// _is_r3plus gate).
//
// #662 — scriptBody is the COMPLETE inline banner IIFE pre-fetched server-side
// from the portal (fetchInlineBanner). We INLINE it (injectInlineBanner) instead
// of a <script src="/__toolbox/loader.js"> tag so a site's SERVICE WORKER has no
// same-origin request to hijack. An empty scriptBody (fetch failed/skipped) makes
// the banner inject a no-op — fail-open, page intact. The cosmetic <style> is
// already inline and SW-immune, so it is UNCHANGED.
func injectHTML(plain []byte, scriptBody string, wg bool) []byte {
out := injectInlineBanner(plain, scriptBody)
if wg {
out = injectCosmetic(out)
}
return out
}
// injectIntoBody runs the HTML injection (loader + R3 cosmetic style) over a
// (possibly gzip-compressed) HTML body, returning the new body bytes to serve
// and whether the body was rewritten. cspBypassed (#662) is threaded into the
// loader tag as data-csp="1" when a real CSP was relaxed on this page.
// injectIntoBody runs the HTML injection (inline banner + R3 cosmetic style) over
// a (possibly compressed) HTML body, returning the new body bytes to serve and
// whether the body was rewritten. scriptBody (#662) is the COMPLETE inline banner
// IIFE pre-fetched from the portal; "" → the banner inject is skipped (fail-open).
//
// - encoding == "" (identity): injectHTML runs directly on body; the result
// is returned (ok=true). The caller MUST update Content-Length to len(out).
// - encoding ∈ {gzip, br, zstd} (case-insensitive): the body is decoded,
// injected, then RE-ENCODED in the SAME codec so the client transfer stays
// compressed (the tunnel is perf-sensitive) and Content-Encoding is
// UNCHANGED. The caller sets Content-Length to len(out). BOTH the loader and
// UNCHANGED. The caller sets Content-Length to len(out). BOTH the banner and
// the cosmetic style are injected on the decompressed body, so the cosmetic
// CSS lands on compressed pages too (the common case).
// - any other encoding (deflate, multi-value, …): pass through untouched,
@ -181,23 +189,23 @@ func injectHTML(plain []byte, clientHash string, wg, cspBypassed bool) []byte {
// never broken or corrupted.
//
// The 32MiB decompression-bomb cap (gunzipCap) is enforced uniformly across
// gzip/br/zstd. idempotency / placement live inside injectLoader/injectCosmetic.
func injectIntoBody(body []byte, encoding, clientHash string, wg, cspBypassed bool) (out []byte, ok bool) {
// gzip/br/zstd. idempotency / placement live inside injectInlineBanner/injectCosmetic.
func injectIntoBody(body []byte, encoding, scriptBody string, wg bool) (out []byte, ok bool) {
switch strings.ToLower(strings.TrimSpace(encoding)) {
case "":
return injectHTML(body, clientHash, wg, cspBypassed), true
return injectHTML(body, scriptBody, wg), true
case "gzip":
plain, err := gunzipBytes(body)
if err != nil {
return body, false // fail open: serve the original compressed bytes
}
return gzipBytes(injectHTML(plain, clientHash, wg, cspBypassed)), true
return gzipBytes(injectHTML(plain, scriptBody, wg)), true
case "br":
plain, err := unbrotliBytes(body)
if err != nil {
return body, false // fail open
}
reenc, err := brotliBytes(injectHTML(plain, clientHash, wg, cspBypassed))
reenc, err := brotliBytes(injectHTML(plain, scriptBody, wg))
if err != nil {
return body, false // fail open: never serve a truncated br frame
}
@ -207,7 +215,7 @@ func injectIntoBody(body []byte, encoding, clientHash string, wg, cspBypassed bo
if err != nil {
return body, false // fail open
}
reenc, err := zstdBytes(injectHTML(plain, clientHash, wg, cspBypassed))
reenc, err := zstdBytes(injectHTML(plain, scriptBody, wg))
if err != nil {
return body, false // fail open: never serve a truncated zstd frame
}

View File

@ -44,7 +44,7 @@ func TestInjectIntoBodyGzip(t *testing.T) {
// End-to-end-ish: HTML with <head>, gzipped, run through the exact transform
// the inject path uses. Result must gunzip back to an injected, intact doc.
html := `<html><head><title>page</title></head><body>content</body></html>`
out, ok := injectIntoBody(gzipBytes([]byte(html)), "gzip", "abc123", true, false)
out, ok := injectIntoBody(gzipBytes([]byte(html)), "gzip", inlineTestScript, true)
if !ok {
t.Fatal("gzip inject must report ok=true")
}
@ -68,7 +68,7 @@ func TestInjectIntoBodyGzip(t *testing.T) {
func TestInjectIntoBodyGzipCaseInsensitiveEncoding(t *testing.T) {
html := `<head></head>`
out, ok := injectIntoBody(gzipBytes([]byte(html)), "GZIP", "z", false, false)
out, ok := injectIntoBody(gzipBytes([]byte(html)), "GZIP", inlineTestScript, false)
if !ok {
t.Fatal("Content-Encoding GZIP (upper) must be recognised → ok=true")
}
@ -85,7 +85,7 @@ func TestInjectIntoBodyGzipFailOpen(t *testing.T) {
// Bytes labelled gzip but NOT gzip → fail open: original bytes, ok=false,
// no panic.
bad := []byte("not gzip at all <head></head>")
out, ok := injectIntoBody(bad, "gzip", "x", false, false)
out, ok := injectIntoBody(bad, "gzip", inlineTestScript, false)
if ok {
t.Fatal("corrupt gzip body must fail open (ok=false)")
}
@ -97,7 +97,7 @@ func TestInjectIntoBodyGzipFailOpen(t *testing.T) {
func TestInjectIntoBodyIdentity(t *testing.T) {
// Identity (empty Content-Encoding): inject directly, grown body returned.
html := []byte(`<html><head></head><body>hi</body></html>`)
out, ok := injectIntoBody(html, "", "deadbeef", false, false)
out, ok := injectIntoBody(html, "", inlineTestScript, false)
if !ok {
t.Fatal("identity inject must report ok=true")
}
@ -113,7 +113,7 @@ func TestInjectIntoBodyUnknownEncodingPassthrough(t *testing.T) {
// #662 — gzip/br/zstd are now ALL decoded+re-encoded; deflate (and any other
// codec / multi-value AE) remains an unknown encoding we pass through.
body := []byte("\x78\x9c some deflate-ish bytes")
out, ok := injectIntoBody(body, "deflate", "x", false, false)
out, ok := injectIntoBody(body, "deflate", inlineTestScript, false)
if ok {
t.Fatal("unknown encoding must pass through (ok=false)")
}
@ -131,7 +131,7 @@ func TestGunzipBombGuard(t *testing.T) {
t.Fatal("gunzipBytes must reject output exceeding gunzipCap")
}
// And via the inject path: fail open, original bytes preserved.
out, ok := injectIntoBody(big, "gzip", "x", false, false)
out, ok := injectIntoBody(big, "gzip", inlineTestScript, false)
if ok {
t.Fatal("over-cap gzip body must fail open through injectIntoBody")
}

View File

@ -199,12 +199,26 @@ func ja4ish(h *tls.ClientHelloInfo) string {
type Proxy struct {
ca *CA
pol *Policy
jaSink func(string) // JA4 observations (logged; a sidecar in prod)
jarKey []byte // anti-track HMAC fake-identity seed (nil → poison off)
poison bool // master gate: poison tracker Set-Cookies (default on when jarKey present)
portal string // portal base URL for /__toolbox/* reverse-proxy (banner assets)
ads *adStats // #662 — ad-block metrics aggregator (flushed to the portal)
cspDemo bool // #662 CONSENTED-DEMONSTRATION: relax a page's CSP so the injected loader runs, and flag the bypass (data-csp=1 → 🔓). Default on.
jaSink func(string) // JA4 observations (logged; a sidecar in prod)
jarKey []byte // anti-track HMAC fake-identity seed (nil → poison off)
poison bool // master gate: poison tracker Set-Cookies (default on when jarKey present)
portal string // portal base URL for /__toolbox/* reverse-proxy (banner assets)
ads *adStats // #662 — ad-block metrics aggregator (flushed to the portal)
cand *adCandidates // #662 — ad-candidate learning feed (flushed with ads to the portal)
cspDemo bool // #662 CONSENTED-DEMONSTRATION: relax a page's CSP so the injected loader runs, and flag the bypass (data-csp=1 → 🔓). Default on.
// analysisRelay gates the per-flow telemetry relay to the dpi/cookies/ja4
// analysis sidecar sockets (#662 — restoring the "Qui te piste?" events the
// decommissioned Python addons fed). Default on; relay.go is the transport.
analysisRelay bool
// socialRelay gates the cross-site cookie-tracker correlation (#662 — restoring
// the kbin /social graph the decommissioned Python social_graph addon fed).
// Default on. social.go is the engine; edges are batched + POSTed to the
// portal's /__toolbox/social-event ingest. nil → off (CONNECT PoC / tests).
socialRelayOn bool
social *socialRelay
consent *consentLog
}
// recordAdBlock forwards a 204'd ad/tracker block to the engine's metrics
@ -216,12 +230,52 @@ func (px *Proxy) recordAdBlock(adHost, site, macHash string) {
}
}
// maybeRecordAdCandidate feeds the auto-learn loop (#662): on the allow/mitm
// path (NOT block — already caught; NOT allowlisted/own-infra), it records an
// ad-candidate (host, site) when the request is 3rd-party
// (registrable(host) != registrable(site)) AND the path smells like an ad/track
// endpoint (adPathRE). It is the engine port of ad_ghost's candidate capture —
// the feed secubox-toolbox-autolearn promotes into learned-trackers.txt at
// AD_MIN_SITES distinct sites. Gated behind the analysis/ad relay flag, O(1) hot
// path, fire-and-forget, nil-safe (CONNECT PoC / tests with no feed).
func (px *Proxy) maybeRecordAdCandidate(host, site, path string) {
if px == nil || px.cand == nil || !px.relayEnabled() || px.pol == nil {
return
}
if site == "" || host == "" {
return // no 1st-party context (no Referer) → nothing to attribute.
}
if px.pol.allowedSafe(host) {
return // own-infra / allowlist: never learn our own / trusted hosts.
}
if registrable(host) == registrable(site) {
return // 1st-party request: not a cross-site ad/track signal.
}
if !adPathRE.MatchString(path) {
return // path doesn't look like an ad/track endpoint.
}
px.cand.record(host, site)
}
func (px *Proxy) serverTLSConfig() *tls.Config {
return px.serverTLSConfigCapture(nil)
}
// serverTLSConfigCapture is serverTLSConfig with an extra per-handshake hook:
// capture, if non-nil, is invoked inside GetCertificate with the live
// *tls.ClientHelloInfo (SNI, SupportedProtos, CipherSuites). The accept-path
// handlers use it to relay the ja4 ClientHello payload (relay.go) WITH the
// client conn's peer IP — which is known at the handler, not inside the TLS
// config. Passing nil yields the plain forging config (CONNECT PoC, tests).
func (px *Proxy) serverTLSConfigCapture(capture func(*tls.ClientHelloInfo)) *tls.Config {
return &tls.Config{
GetCertificate: func(h *tls.ClientHelloInfo) (*tls.Certificate, error) {
if px.jaSink != nil {
px.jaSink(ja4ish(h)) // capture handshake fingerprint
}
if capture != nil {
capture(h) // ja4 relay material (peer IP threaded in by the handler)
}
name := h.ServerName
if name == "" {
name = "unknown.local"
@ -231,6 +285,38 @@ func (px *Proxy) serverTLSConfig() *tls.Config {
}
}
// peerIP returns the remote IP (no port) of a client conn, the same basis as
// clientHashFromConn. Used as the client_ip field of every relay payload.
func peerIP(conn net.Conn) string {
if conn == nil {
return ""
}
host, _, err := net.SplitHostPort(conn.RemoteAddr().String())
if err != nil {
return conn.RemoteAddr().String()
}
return host
}
// captureAndEmitJA4 returns a GetCertificate capture hook that relays the ja4
// ClientHello payload for THIS handshake (once), tagged with the given client
// conn's peer IP + mac-hash-aware clientHash. Gated by analysisRelay (emitJA4
// checks). The hook copies the ClientHelloInfo fields it needs immediately
// (the struct is only valid during the callback). Returns nil when the relay is
// off so the plain config is used (no per-handshake allocation).
func (px *Proxy) captureAndEmitJA4(rawClient net.Conn) func(*tls.ClientHelloInfo) {
if !px.relayEnabled() {
return nil
}
ip := peerIP(rawClient)
hash := clientHashFromConn(rawClient)
return func(h *tls.ClientHelloInfo) {
alpn := append([]string(nil), h.SupportedProtos...)
ciphers := append([]uint16(nil), h.CipherSuites...)
px.emitJA4(ip, hash, h.ServerName, alpn, ciphers)
}
}
func (px *Proxy) handleConnect(w http.ResponseWriter, r *http.Request) {
host := r.URL.Hostname()
hj, ok := w.(http.Hijacker)
@ -262,7 +348,9 @@ func (px *Proxy) handleConnect(w http.ResponseWriter, r *http.Request) {
}
// MITM: TLS-terminate the client with a forged cert (+ ClientHello capture).
tconn := tls.Server(client, px.serverTLSConfig())
// The capture hook relays the ja4 ClientHello payload for this handshake,
// tagged with the client's peer IP (#662). nil when the relay gate is off.
tconn := tls.Server(client, px.serverTLSConfigCapture(px.captureAndEmitJA4(client)))
if err := tconn.Handshake(); err != nil {
return
}
@ -326,6 +414,12 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
// per-client breakdown keys on the WG persona hash. recordAdBlock is
// O(1) and never blocks the block path.
px.recordAdBlock(host, refererSite(req.Header.Get("Referer")), clientHashFromConn(rawClient))
// #662 — the cross-site tracking evidence lives PRECISELY on the blocked
// trackers: the browser still SENT its 3rd-party Cookie to doubleclick/
// adnxs/… before we 204 it. Correlate that request-Cookie here (resp=nil,
// request-only) or the /social graph misses the very trackers it exists to
// expose. Hash-only, WG-peer only, fire-and-forget — same as the allow path.
px.emitSocial(peerIP(rawClient), host, req, nil)
writeRaw(tconn, 204, "No Content", map[string]string{"X-SecuBox-Ng": "blocked"}, nil)
return
}
@ -339,6 +433,24 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
// allow — stripping operator headers + asserting opt-out is universally
// safe and never touches own-infra correctness).
clientHash := clientHashFromConn(rawClient) // mac_hash-aware (WG persona)
// #662 — relay the DPI classification hint for this MITM'd request (allow|mitm
// only; never the block 204 / splice paths). Fire-and-forget BEFORE anonymize
// mutates headers, so we relay the client's original User-Agent (the Python
// DPIRelay ran on the unmodified request). Gated by --analysis-relay; a
// dead/slow dpi.sock can never block or delay the proxy flow.
relayIP := peerIP(rawClient)
px.emitDPI(relayIP, clientHash, host, req)
// #662 — feed the auto-learn loop: on this allow/mitm flow, record an
// ad-candidate when the request is 3rd-party AND its path smells like an
// ad/track endpoint (ad_ghost's _AD_PATH heuristic). site = registrable of
// the Referer (the ad_ghost _site_of flavour). Done BEFORE anonymize mutates
// headers (so the Referer is the client's original). O(1), gated,
// fire-and-forget — a new adware host gets observed here, promoted by
// autolearn, then blocked+smogged after the policy live-reloads it.
px.maybeRecordAdCandidate(host, refererSite(req.Header.Get("Referer")), req.URL.Path)
anonymizeRequest(req.Header)
// #662 — do NOT touch Accept-Encoding. We FORWARD the client's original
@ -379,6 +491,24 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
}
defer resp.Body.Close()
// #662 — relay the cookie metadata for this MITM'd response (allow|mitm only).
// NAMES ONLY (never values — privacy/CSPN); no-op unless ≥1 Set-Cookie OR ≥1
// request Cookie is present. Emitted before poison rewrites Set-Cookie VALUES,
// which is irrelevant here (names are unchanged by poison) but keeps the
// relayed names byte-for-byte the origin's. Fire-and-forget, gated.
px.emitCookies(relayIP, clientHash, req, resp)
// #662 — cross-site cookie-tracker correlation (restores the kbin /social
// graph). FAITHFUL to the decommissioned Python social_graph addon: extract
// 3rd-party cookie edges (Set-Cookie + request Cookie), hash the identifier
// (cookieIDHash — NEVER the raw value), classify consent_state, and buffer
// them for the batched POST to the portal /__toolbox/social-event ingest.
// Like the addon, this ONLY fires for known R3 WG peers (macHashOf, not the
// raw-IP fallback): non-WG flows yield no edges. allow|mitm only (the block
// 204 / splice paths return before here). Gated by --social-relay; pure +
// non-blocking (the flush is a background goroutine).
px.emitSocial(relayIP, host, req, resp)
// Poison: only on MITM'd tracker flows (never on allow/own-infra), and only
// when the jar key is loaded. Replaces tracking-id Set-Cookie values with a
// stable fabricated persona; benign cookies pass through untouched.
@ -409,16 +539,26 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
strings.Contains(resp.Header.Get("Content-Type"), "text/html") {
// #662 CONSENTED-DEMONSTRATION — ONLY here, on the responses we actually
// inject into (2xx text/html, R3/wg gate), and ONLY when the operator
// left the demo on, do we relax the page's CSP so the same-origin
// /__toolbox/loader.js can execute even on strict-CSP sites. cspBypassed
// is true iff there was a real CSP to bypass — it becomes data-csp="1" on
// the loader tag and the portal banner renders a 🔓 as the visible proof.
// We never strip CSP on non-injected responses.
// left the demo on, do we relax the page's CSP so the inline banner can
// run even on strict-CSP sites. cspBypassed is true iff there was a real
// CSP to bypass — it becomes csp=1 on the inline script and the banner
// renders a 🔓 as the visible proof. We never strip CSP on non-injected
// responses.
cspBypassed := false
if px.cspDemo {
cspBypassed = relaxCSPForLoader(resp.Header)
}
if out, ok := injectIntoBody(body, resp.Header.Get("Content-Encoding"), clientHash, wg, cspBypassed); ok {
// #662 — INLINE the banner (supersedes the <script src="/__toolbox/
// loader.js"> tag): sites with a SERVICE WORKER (leparisien, cnn…) hijack
// the same-origin src + its fetch("/__toolbox/bundle") before they reach
// this engine, so the banner never appeared. We fetch the COMPLETE script
// body from the portal server-side (mh/wg/csp + bundle baked as JS
// literals — no same-origin request for the SW to touch) and bake it into
// a self-contained <script>…</script>. Fail-open: a dead/slow portal →
// scriptBody=="" → the banner inject is skipped and the page is served
// intact (the cosmetic <style>, already inline, is unaffected).
scriptBody, _ := fetchInlineBanner(px.portal, clientHash, wg, cspBypassed)
if out, ok := injectIntoBody(body, resp.Header.Get("Content-Encoding"), scriptBody, wg); ok {
body = out
// Keep the response framing consistent with the served bytes. The
// encoding is unchanged (gzip stays gzip, identity stays identity);
@ -428,6 +568,11 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
resp.ContentLength = int64(len(body))
}
}
// #662 — strip Alt-Svc so the browser is never told this origin offers HTTP/3
// (h3). With h3 unadvertised it keeps using HTTP/2 over TCP, which we MITM;
// otherwise it caches "h3 available" and keeps trying QUIC (UDP 443) — which
// bypasses this TCP proxy and is only best-effort blocked by the nft reject.
resp.Header.Del("Alt-Svc")
writeResponse(tconn, resp, body)
}
@ -445,6 +590,10 @@ func main() {
"portal base URL; /__toolbox/loader.js + /__toolbox/bundle are reverse-proxied here (banner assets, served for any MITM'd origin)")
cspDemo := flag.Bool("csp-bypass-demo", true,
"CONSENTED DEMONSTRATION: relax a page's CSP so the injected transparency-banner loader runs even on strict-CSP sites, and flag the bypass (banner shows 🔓). Only on injected 2xx text/html R3 responses; never on non-injected responses. Set false to never touch CSP.")
analysisRelay := flag.Bool("analysis-relay", true,
"relay per-flow telemetry (dpi/cookies/ja4) to the analysis sidecar sockets so the kbin \"Qui te piste?\" events refill (#662; replaces the decommissioned Python relay addons). Fire-and-forget; a dead/slow sidecar never affects the proxy. Set false to emit nothing.")
socialRelay := flag.Bool("social-relay", true,
"compute cross-site cookie-tracker edges and POST them to the portal /__toolbox/social-event ingest so the kbin /social graph refills (#662; replaces the decommissioned Python social_graph addon). Hash-only (never raw cookie values); WG-peer flows only; batched + fire-and-forget — a dead/slow portal never affects the proxy. Set false to emit nothing.")
flag.Parse()
ca, err := loadCA(*caCert, *caKey)
if err != nil {
@ -472,12 +621,26 @@ func main() {
poison: *poison,
portal: *portal,
ads: newAdStats(),
cand: newAdCandidates(),
cspDemo: *cspDemo,
analysisRelay: *analysisRelay,
socialRelayOn: *socialRelay,
social: newSocialRelay(),
consent: newConsentLog(),
}
// #662 — start the social-edge flusher: the MITM path buffers cross-site
// tracker edges into px.social, drained every 10s to the portal's
// /__toolbox/social-event (best-effort, fire-and-forget) so the kbin /social
// graph (frozen since the cutover) refills.
go px.social.runFlusher(*portal)
// #662 — start the ad-block metrics flusher: the block path tallies every
// 204 into px.ads, drained every 10s to the portal's /__toolbox/ad-event
// (best-effort, fire-and-forget) so the #ads dashboard sees blocks again.
go px.ads.runAdStatsFlusher(*portal)
// #662 — the candidate feed (px.cand) is drained in the SAME flush so the
// learning candidates ride the existing ad-event channel (one POST / 10s).
go px.ads.runAdStatsFlusher(*portal, px.cand)
if *transparent {
// Transparent R3 mode: raw accept loop, each conn carries its pre-DNAT
// destination via SO_ORIGINAL_DST (recovered in handleTransparent). The

View File

@ -17,6 +17,8 @@ import (
"os"
"regexp"
"strings"
"sync"
"time"
)
// ── ad_ghost: static ad/tracker host pattern (port of _AD_HOST) ──────────────
@ -95,19 +97,55 @@ func envOr(key, def string) string {
// Policy carries the loaded sets/regex and decides per-host actions. It also
// keeps the legacy PoC fields (Inject) so the existing wiring/tests still work.
type Policy struct {
adHost *regexp.Regexp
learned map[string]bool // learned-trackers (host or registrable, lowercased)
allow map[string]bool // ad-allowlist (host or registrable, lowercased)
spliceSeed map[string]bool // splice seed patterns
spliceLearn map[string]bool // splice learned patterns
never map[string]bool // pure-trackers fortknox (splice never-set)
selfRegs map[string]bool // own-infra registrable domains
selfDomains []string // own-infra (for the host==d || host endswith .d guard)
// mu guards the live-reloadable map fields below. Decide/allowed/blockedByAd/
// shouldSplice take RLock; maybeReload takes Lock only when a backing file
// actually changed (the throttle + stat happen under a separate lighter lock).
mu sync.RWMutex
adHost *regexp.Regexp
learned map[string]bool // learned-trackers (host or registrable, lowercased)
allow map[string]bool // ad-allowlist (host or registrable, lowercased)
spliceSeed map[string]bool // splice seed patterns
spliceLearn map[string]bool // splice learned patterns
never map[string]bool // pure-trackers fortknox (splice never-set)
selfRegs map[string]bool // own-infra registrable domains
selfDomains []string // own-infra (for the host==d || host endswith .d guard)
// ── live-reload state (#662 auto-learn loop) ─────────────────────────────
//
// The lists are loaded once at startup, then re-read on-disk when their
// mtime changes so autolearn promotions / manual edits take effect WITHOUT a
// worker restart (mirrors ad_ghost._maybe_reload). The hot path (Decide)
// calls maybeReload(): a throttle check, then — at most every reloadThrottle —
// a cheap stat() of each backing file. Only a changed file is re-read and its
// map atomically swapped under mu.
reloadFiles []reloadTarget // backing files + their swap target
fortknoxSites []string // kept for rebuilding the never-set on pure-trackers reload
reloadMu sync.Mutex // guards lastReloadCheck + the per-file mtimes
lastReloadID int64 // unix-nano of the last throttle pass (0 = never)
reloadThrottle time.Duration // min interval between stat passes (0 in tests = eager)
// Legacy PoC fields kept so non-policy behaviour is unchanged.
Inject []byte // banner / ad-CSS marker injected before </head> or </body>
}
// reloadTarget describes one backing file the engine live-reloads: its path, the
// last mtime we read, whether comment-stripping applies (loadLines vs
// loadLinesRaw), and an applier that swaps the freshly-read set into the right
// Policy field (under p.mu, held by the caller). pure-trackers re-derives the
// never-set ( fortknox) so it stays consistent.
type reloadTarget struct {
path string
stripComm bool
lastMtime int64
apply func(p *Policy, set map[string]bool)
}
// defaultReloadThrottle is the production stat cadence: a backing-file change
// (autolearn runs hourly; a promotion is rare) is observed within ~15s, and the
// hot path stats at most ~4×/minute regardless of request rate.
const defaultReloadThrottle = 15 * time.Second
// loadLines mirrors the comment-stripping Python loaders (splice._load_lines,
// ad_ghost._allowed's allowlist read): split on first '#', trim, lowercase,
// skip blanks. Missing/unreadable file → empty set (best-effort).
@ -196,16 +234,107 @@ func LoadPolicy(opts PolicyOpts) (*Policy, error) {
selfDomains = append(selfDomains, d)
}
return &Policy{
adHost: re,
learned: loadLinesRaw(opts.LearnedPath), // mirrors _learned_set (no comment-strip)
allow: loadLines(opts.AllowPath),
spliceSeed: loadLines(opts.SpliceSeedPath),
spliceLearn: loadLines(opts.SpliceLearnPath),
never: never,
selfRegs: selfRegs,
selfDomains: selfDomains,
}, nil
p := &Policy{
adHost: re,
learned: loadLinesRaw(opts.LearnedPath), // mirrors _learned_set (no comment-strip)
allow: loadLines(opts.AllowPath),
spliceSeed: loadLines(opts.SpliceSeedPath),
spliceLearn: loadLines(opts.SpliceLearnPath),
never: never,
selfRegs: selfRegs,
selfDomains: selfDomains,
fortknoxSites: append([]string(nil), opts.FortknoxSites...),
reloadThrottle: defaultReloadThrottle,
}
// ── register the live-reloadable backing files (#662 auto-learn loop) ─────
//
// Each entry re-reads its file when its mtime changes and atomically swaps
// the map under p.mu (held by maybeReload). learned-trackers + ad-allowlist
// are the load-bearing pair (autolearn promotes into learned; the operator
// edits the allowlist); the splice seed/learned + pure-trackers files are
// reloaded too for consistency (pure-trackers re-derives the never-set).
p.reloadFiles = []reloadTarget{
{path: opts.LearnedPath, stripComm: false, lastMtime: statMtime(opts.LearnedPath),
apply: func(p *Policy, s map[string]bool) { p.learned = s }},
{path: opts.AllowPath, stripComm: true, lastMtime: statMtime(opts.AllowPath),
apply: func(p *Policy, s map[string]bool) { p.allow = s }},
{path: opts.SpliceSeedPath, stripComm: true, lastMtime: statMtime(opts.SpliceSeedPath),
apply: func(p *Policy, s map[string]bool) { p.spliceSeed = s }},
{path: opts.SpliceLearnPath, stripComm: true, lastMtime: statMtime(opts.SpliceLearnPath),
apply: func(p *Policy, s map[string]bool) { p.spliceLearn = s }},
{path: opts.PureTrackersPath, stripComm: true, lastMtime: statMtime(opts.PureTrackersPath),
apply: func(p *Policy, s map[string]bool) {
// pure-trackers fortknox → never-set (mirrors LoadPolicy above).
for _, fk := range p.fortknoxSites {
if fk = strings.Trim(strings.ToLower(strings.TrimSpace(fk)), "."); fk != "" {
s[fk] = true
}
}
p.never = s
}},
}
return p, nil
}
// statMtime returns the file's mtime in unix-nano, or 0 when the file is missing
// or unreadable (best-effort, like the Python loaders: a missing file → empty
// set, mtime 0). A file appearing/disappearing therefore registers as a change.
func statMtime(path string) int64 {
if path == "" {
return 0
}
fi, err := os.Stat(path)
if err != nil {
return 0
}
return fi.ModTime().UnixNano()
}
// maybeReload re-reads any backing list whose on-disk mtime changed since the
// last pass, swapping the affected map(s) under p.mu. Throttled to at most one
// stat pass per p.reloadThrottle (cheap: a time compare + a few stats), so the
// Decide hot path pays almost nothing. Concurrency-safe: the throttle/mtime
// bookkeeping is under reloadMu and the map swap under mu — Decide's readers
// hold mu.RLock, so a swap is atomic w.r.t. any in-flight decision.
func (p *Policy) maybeReload() {
now := time.Now()
p.reloadMu.Lock()
if p.reloadThrottle > 0 && p.lastReloadID != 0 &&
now.Sub(time.Unix(0, p.lastReloadID)) < p.reloadThrottle {
p.reloadMu.Unlock()
return
}
p.lastReloadID = now.UnixNano()
// Collect the files that changed (stat under reloadMu; re-read outside mu).
type pending struct {
idx int
set map[string]bool
}
var changed []pending
for i := range p.reloadFiles {
rt := &p.reloadFiles[i]
if rt.path == "" {
continue
}
m := statMtime(rt.path)
if m != rt.lastMtime {
rt.lastMtime = m
changed = append(changed, pending{idx: i, set: scanLines(rt.path, rt.stripComm)})
}
}
p.reloadMu.Unlock()
if len(changed) == 0 {
return
}
// Swap the affected maps atomically under the write lock.
p.mu.Lock()
for _, c := range changed {
p.reloadFiles[c.idx].apply(p, c.set)
}
p.mu.Unlock()
}
// ── registrable: port of ad_ghost._registrable ───────────────────────────────
@ -279,6 +408,11 @@ func hostMatches(host string, patterns map[string]bool) bool {
// allowed: port of ad_ghost._allowed. Own-infra ALWAYS wins (reflash-safe),
// then the operator allowlist (host or registrable).
//
// LOCK CONTRACT: reads the reloadable allow map — the caller MUST hold at least
// p.mu.RLock (Decide / shouldPoison do). Lock-free internally so Decide can call
// it alongside shouldSplice/blockedByAd under a single RLock (sync.RWMutex is
// not reentrant).
func (p *Policy) allowed(host string) bool {
h := strings.ToLower(host)
reg := registrable(h)
@ -297,7 +431,19 @@ func (p *Policy) allowed(host string) bool {
return p.allow[h] || p.allow[reg]
}
// allowedSafe is the lock-taking entry point to allowed() for callers OUTSIDE a
// Decide RLock (e.g. the ad-candidate feed). It also picks up a live-reloaded
// allowlist via maybeReload, so a freshly-allowlisted host stops being learned.
func (p *Policy) allowedSafe(host string) bool {
p.maybeReload()
p.mu.RLock()
defer p.mu.RUnlock()
return p.allowed(host)
}
// shouldSplice: port of splice.should_splice (never wins; then seed learned).
// LOCK CONTRACT: reads the reloadable never/spliceSeed/spliceLearn maps — the
// caller MUST hold at least p.mu.RLock (Decide does).
func (p *Policy) shouldSplice(sni string) bool {
s := strings.Trim(strings.ToLower(sni), ".")
if s == "" {
@ -312,6 +458,10 @@ func (p *Policy) shouldSplice(sni string) bool {
// blockedByAd: port of the ad_ghost requestheaders block decision (sans the
// allowlist guard, which Decide applies first): _AD_HOST match OR
// registrable/host in learned-trackers.
//
// LOCK CONTRACT: reads the reloadable learned map — the caller MUST hold at
// least p.mu.RLock. Decide and shouldPoison (via isTracker) do; the candidate-
// emit path calls it only through those.
func (p *Policy) blockedByAd(host string) bool {
if p.adHost.MatchString(host) {
return true
@ -339,9 +489,16 @@ func (p *Policy) blockedByAd(host string) bool {
// sni defaults to host when empty (the live engine splices on SNI == the TLS
// host; for the parity harness host and sni are the same value).
func (p *Policy) Decide(host, sni string) string {
// #662 — pick up autolearn promotions / manual edits without a worker
// restart. Throttled to ~every reloadThrottle and best-effort, so the hot
// path normally pays only a time compare. Done BEFORE taking the read lock
// (maybeReload may take the write lock to swap a changed map).
p.maybeReload()
if sni == "" {
sni = host
}
p.mu.RLock()
defer p.mu.RUnlock()
if p.allowed(host) {
return "allow"
}

View File

@ -148,6 +148,12 @@ func (p *Policy) isTracker(host string) bool {
// allowlisted — own-infra flows are left clean (same dark safety as the block
// path). The caller additionally requires a loaded jar key.
func (p *Policy) shouldPoison(host string) bool {
// #662 — consult the same live-reloaded learned set Decide uses, so a host
// promoted into learned-trackers (by autolearn) is poisoned (smogged), not
// only 204'd, without a worker restart. RLock-guard the reloadable maps
// (allowed + isTracker→blockedByAd read them); maybeReload may swap them.
p.mu.RLock()
defer p.mu.RUnlock()
if p.allowed(host) {
return false // own-infra / allowlist → never poison
}

View File

@ -0,0 +1,291 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: per-flow analysis relay (#662)
//
// Restores the dpi / cookies / ja4 EVENTS that feed the kbin "Qui te piste?"
// cumulative-stats page, frozen since the #662 Phase-7 cutover decommissioned
// the Python mitmproxy relay addons (packages/secubox-toolbox/mitmproxy_addons/
// {dpi,cookies,ja4}.py). The Go engine is now the live R3 MITM core; this file
// re-implements EXACTLY what those addons did — extract privacy-safe flow
// metadata and fire-and-forget it to the analysis sidecar sockets, which
// enrich + write toolbox.db.events keyed by client_mac_hash.
//
// Transport is the existing emit() helper (sidecar.go): a detached goroutine
// with its own 2s timeout — a dead/slow analysis socket can NEVER block, delay,
// or break a client flow. The payload builders here are pure (no I/O), O(1)-ish
// per flow, and emit NAMES ONLY for cookies (never values — privacy / CSPN).
//
// Pure standard library — no external modules.
package main
import (
"encoding/json"
"net/http"
"strings"
"time"
)
// Stable socket paths — verbatim from the Python addons' TARGET constants
// (the http+unix:///run/secubox/<x>.sock/<route> URLs), split into path+route.
const (
dpiSocket = "/run/secubox/dpi.sock"
cookiesSocket = "/run/secubox/cookies.sock"
ja4Socket = "/run/secubox/threat-analyst.sock"
dpiRoute = "/classify"
cookiesRoute = "/inject"
ja4Route = "/ja4"
)
// Caps + truncation limits, matching the Python addons exactly.
const (
maxSetCookieNames = 30 // cookies.py _names_only(set_cookies, cap=30)
maxCookieNames = 50 // cookies.py sent_names[:50]
maxCookieNameLen = 32 // cookies.py name[:32]
maxCookieURL = 300 // cookies.py pretty_url[:300]
)
// nowMS returns the current time as unix milliseconds (ts_ms in every payload).
func nowMS() int64 { return time.Now().UnixMilli() }
// ── gate ─────────────────────────────────────────────────────────────────────
// relayEnabled reports whether per-flow analysis relaying is on (the
// --analysis-relay flag → Proxy.analysisRelay). When false, nothing is emitted.
// Nil-safe so tests / the CONNECT PoC that build a bare Proxy can call it.
func (px *Proxy) relayEnabled() bool {
return px != nil && px.analysisRelay
}
// relayEmit is the gated, fire-and-forget emit used by every relay call site.
// It NEVER blocks (delegates to emit() which detaches a goroutine with its own
// timeout) and emits nothing when the relay gate is off.
func (px *Proxy) relayEmit(socketPath, route string, payload []byte) {
if !px.relayEnabled() || len(payload) == 0 {
return
}
emit(socketPath, route, payload)
}
// ── dpi payload ──────────────────────────────────────────────────────────────
// dpiEvent mirrors the JSON the Python DPIRelay.request() emitted. user_agent is
// a *string so an absent UA serialises to JSON null (not ""), matching
// headers.get("user-agent") → None. scheme + sni are constant "https" / host on
// the MITM'd path (we only relay terminated TLS flows).
type dpiEvent struct {
TSMs int64 `json:"ts_ms"`
ClientIP string `json:"client_ip"`
MacHash string `json:"client_mac_hash"`
Host string `json:"host"`
Scheme string `json:"scheme"`
Method string `json:"method"`
UserAgent *string `json:"user_agent"`
SNI string `json:"sni"`
}
// buildDPIPayload builds the /classify payload for one MITM'd request.
func buildDPIPayload(clientIP, macHash, host string, req *http.Request) []byte {
var ua *string
if v := req.Header.Get("User-Agent"); v != "" {
ua = &v
}
ev := dpiEvent{
TSMs: nowMS(),
ClientIP: clientIP,
MacHash: macHash,
Host: host,
Scheme: "https",
Method: req.Method,
UserAgent: ua,
SNI: host,
}
b, _ := json.Marshal(ev)
return b
}
// emitDPI relays the DPI classification hint for a MITM'd request (gated).
func (px *Proxy) emitDPI(clientIP, macHash, host string, req *http.Request) {
if !px.relayEnabled() {
return
}
px.relayEmit(dpiSocket, dpiRoute, buildDPIPayload(clientIP, macHash, host, req))
}
// ── cookies payload ──────────────────────────────────────────────────────────
// cookiesEvent mirrors the JSON the Python CookiesRelay.response() emitted.
// NAMES ONLY — never cookie values (privacy / CSPN).
type cookiesEvent struct {
TSMs int64 `json:"ts_ms"`
ClientIP string `json:"client_ip"`
MacHash string `json:"client_mac_hash"`
URL string `json:"url"`
Method string `json:"method"`
SetCookieNames []string `json:"set_cookie_names"`
CookieNames []string `json:"cookie_names"`
SetCookieCount int `json:"set_cookie_count"`
CookieCount int `json:"cookie_count"`
Status int `json:"status"`
}
// cookiesRelevant reports whether a flow carries any cookie signal worth
// relaying: ≥1 Set-Cookie in the response OR ≥1 Cookie in the request. Mirrors
// the Python `if not (set_cookies or req_cookies): return`.
func cookiesRelevant(req *http.Request, resp *http.Response) bool {
if resp != nil && len(resp.Header.Values("Set-Cookie")) > 0 {
return true
}
return req != nil && len(req.Header.Values("Cookie")) > 0
}
// setCookieName extracts the cookie NAME from a Set-Cookie header line: the text
// before the first '=' of the first ';'-delimited field, trimmed and capped.
// Returns "" for attribute-only / malformed / empty-name lines (skipped).
func setCookieName(sc string) string {
head := sc
if i := strings.IndexByte(sc, ';'); i >= 0 {
head = sc[:i]
}
eq := strings.IndexByte(head, '=')
if eq < 0 {
return ""
}
n := strings.TrimSpace(head[:eq])
if len(n) > maxCookieNameLen {
n = n[:maxCookieNameLen]
}
return n
}
// parseCookieHeaderNames splits a single "Cookie:" header value into its
// individual cookie NAMES (text before each '=' across ';'-separated pairs),
// trimmed + capped. Mirrors cookies.py _parse_cookie_header.
func parseCookieHeaderNames(value string) []string {
var names []string
for _, part := range strings.Split(value, ";") {
eq := strings.IndexByte(part, '=')
if eq < 0 {
continue
}
n := strings.TrimSpace(part[:eq])
if len(n) > maxCookieNameLen {
n = n[:maxCookieNameLen]
}
if n != "" {
names = append(names, n)
}
}
return names
}
// setCookieNames returns the NAMES of the response Set-Cookie lines, scanning at
// most the first `cap` header lines (Python _names_only(headers[:cap])).
func setCookieNames(setCookies []string, cap int) []string {
out := make([]string, 0, len(setCookies))
for i, sc := range setCookies {
if i >= cap {
break
}
if n := setCookieName(sc); n != "" {
out = append(out, n)
}
}
return out
}
// buildCookiesPayload builds the /inject payload for one MITM'd response that
// carries a cookie signal. The caller is expected to have checked
// cookiesRelevant; building on an empty flow yields empty name lists.
func buildCookiesPayload(clientIP, macHash string, req *http.Request, resp *http.Response) []byte {
setCookies := resp.Header.Values("Set-Cookie")
reqCookies := req.Header.Values("Cookie")
// Sent cookie names: flatten every Cookie header line, then cap to 50 total.
var sent []string
for _, ch := range reqCookies {
sent = append(sent, parseCookieHeaderNames(ch)...)
}
if len(sent) > maxCookieNames {
sent = sent[:maxCookieNames]
}
u := req.URL.String()
if len(u) > maxCookieURL {
u = u[:maxCookieURL]
}
ev := cookiesEvent{
TSMs: nowMS(),
ClientIP: clientIP,
MacHash: macHash,
URL: u,
Method: req.Method,
SetCookieNames: setCookieNames(setCookies, maxSetCookieNames),
CookieNames: sent,
SetCookieCount: len(setCookies),
CookieCount: len(reqCookies),
Status: resp.StatusCode,
}
b, _ := json.Marshal(ev)
return b
}
// emitCookies relays the cookie metadata for a MITM'd response (gated). No-op
// when neither a Set-Cookie nor a request Cookie is present.
func (px *Proxy) emitCookies(clientIP, macHash string, req *http.Request, resp *http.Response) {
if !px.relayEnabled() || !cookiesRelevant(req, resp) {
return
}
px.relayEmit(cookiesSocket, cookiesRoute, buildCookiesPayload(clientIP, macHash, req, resp))
}
// ── ja4 payload ──────────────────────────────────────────────────────────────
// ja4Event mirrors the JSON the Python JA4Relay.tls_clienthello() emitted.
// alpn_protocols / cipher_suites are always JSON arrays (never null) — matching
// list(ch.alpn_protocols or []). extensions is always null: crypto/tls'
// ClientHelloInfo does not expose the raw extension list, exactly the Python
// `if hasattr(ch, "extensions") else None` fallback (the service tolerates it).
type ja4Event struct {
TSMs int64 `json:"ts_ms"`
ClientIP string `json:"client_ip"`
MacHash string `json:"client_mac_hash"`
SNI string `json:"sni"`
ALPN []string `json:"alpn_protocols"`
Ciphers []uint16 `json:"cipher_suites"`
Extensions *[]int `json:"extensions"` // always nil → JSON null
}
// buildJA4Payload builds the /ja4 payload for one MITM'd TLS ClientHello.
func buildJA4Payload(clientIP, macHash, sni string, alpn []string, ciphers []uint16) []byte {
if alpn == nil {
alpn = []string{}
}
if ciphers == nil {
ciphers = []uint16{}
}
ev := ja4Event{
TSMs: nowMS(),
ClientIP: clientIP,
MacHash: macHash,
SNI: sni,
ALPN: alpn,
Ciphers: ciphers,
Extensions: nil,
}
b, _ := json.Marshal(ev)
return b
}
// emitJA4 relays the captured ClientHello fingerprint material for a MITM'd
// handshake (gated). Called once per handshake, before Decide — so blocked and
// allowed flows alike are relayed, matching the Python addon which ran on every
// tls_clienthello.
func (px *Proxy) emitJA4(clientIP, macHash, sni string, alpn []string, ciphers []uint16) {
if !px.relayEnabled() {
return
}
px.relayEmit(ja4Socket, ja4Route, buildJA4Payload(clientIP, macHash, sni, alpn, ciphers))
}

View File

@ -0,0 +1,355 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// Unit tests for the per-flow analysis relay payload builders + emit wiring
// (#662 — restoring the dpi/cookies/ja4 events that feed "Qui te piste?").
package main
import (
"encoding/json"
"net"
"net/http"
"net/url"
"path/filepath"
"strings"
"testing"
"time"
)
// ── dpi payload ──────────────────────────────────────────────────────────────
func TestBuildDPIPayload(t *testing.T) {
req, _ := http.NewRequest("GET", "https://tracker.example.com/pixel?x=1", nil)
req.Header.Set("User-Agent", "Mozilla/5.0 (X11)")
p := buildDPIPayload("203.0.113.7", "abcd1234", "tracker.example.com", req)
var m map[string]any
if err := json.Unmarshal(p, &m); err != nil {
t.Fatalf("unmarshal: %v\n%s", err, p)
}
if m["client_ip"] != "203.0.113.7" {
t.Errorf("client_ip = %v", m["client_ip"])
}
if m["client_mac_hash"] != "abcd1234" {
t.Errorf("client_mac_hash = %v", m["client_mac_hash"])
}
if m["host"] != "tracker.example.com" {
t.Errorf("host = %v", m["host"])
}
if m["scheme"] != "https" {
t.Errorf("scheme = %v", m["scheme"])
}
if m["method"] != "GET" {
t.Errorf("method = %v", m["method"])
}
if m["user_agent"] != "Mozilla/5.0 (X11)" {
t.Errorf("user_agent = %v", m["user_agent"])
}
if m["sni"] != "tracker.example.com" {
t.Errorf("sni = %v", m["sni"])
}
// ts_ms present and plausible (a recent unix-millis value).
ts, ok := m["ts_ms"].(float64)
if !ok || ts < 1_600_000_000_000 {
t.Errorf("ts_ms = %v (want recent unix millis)", m["ts_ms"])
}
}
// Absent User-Agent → JSON null (not "" and not omitted), mirroring the Python
// addon's headers.get("user-agent") → None.
func TestBuildDPIPayloadNullUserAgent(t *testing.T) {
req, _ := http.NewRequest("GET", "https://h.example/", nil)
p := buildDPIPayload("1.2.3.4", "h", "h.example", req)
if !strings.Contains(string(p), `"user_agent":null`) {
t.Errorf("expected user_agent null, got: %s", p)
}
}
// ── cookies payload ──────────────────────────────────────────────────────────
func TestBuildCookiesPayloadNamesOnly(t *testing.T) {
req, _ := http.NewRequest("POST", "https://shop.example.com/cart", nil)
req.Header.Add("Cookie", "sessionid=SECRET_VALUE; csrftoken=ANOTHER_SECRET")
req.Header.Add("Cookie", "_ga=GA1.2.deadbeef")
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
resp.Header.Add("Set-Cookie", "_fbp=fb.1.SECRET; Path=/; HttpOnly; SameSite=Lax")
resp.Header.Add("Set-Cookie", "uid=PRIVATE; Domain=.example.com")
p := buildCookiesPayload("10.99.1.5", "wgpersona", req, resp)
var m map[string]any
if err := json.Unmarshal(p, &m); err != nil {
t.Fatalf("unmarshal: %v\n%s", err, p)
}
if m["url"] != "https://shop.example.com/cart" {
t.Errorf("url = %v", m["url"])
}
if m["method"] != "POST" {
t.Errorf("method = %v", m["method"])
}
if int(m["status"].(float64)) != 200 {
t.Errorf("status = %v", m["status"])
}
if int(m["set_cookie_count"].(float64)) != 2 {
t.Errorf("set_cookie_count = %v", m["set_cookie_count"])
}
if int(m["cookie_count"].(float64)) != 2 {
t.Errorf("cookie_count (header lines) = %v", m["cookie_count"])
}
setNames := toStrings(m["set_cookie_names"])
if !equalStrSet(setNames, []string{"_fbp", "uid"}) {
t.Errorf("set_cookie_names = %v", setNames)
}
cookieNames := toStrings(m["cookie_names"])
if !equalStrSet(cookieNames, []string{"sessionid", "csrftoken", "_ga"}) {
t.Errorf("cookie_names = %v", cookieNames)
}
// Hard privacy guarantee: NO value leaked anywhere in the payload.
raw := string(p)
for _, secret := range []string{"SECRET_VALUE", "ANOTHER_SECRET", "deadbeef", "fb.1.SECRET", "PRIVATE", "GA1.2"} {
if strings.Contains(raw, secret) {
t.Errorf("payload leaked cookie value %q: %s", secret, raw)
}
}
}
// Set-Cookie name parse: text before the first '='. Cookie header split on ';'.
func TestCookieNameParsing(t *testing.T) {
if got := setCookieName("name=val; Path=/; Secure"); got != "name" {
t.Errorf("setCookieName = %q", got)
}
if got := setCookieName(" spaced = v"); got != "spaced" {
t.Errorf("setCookieName trim = %q", got)
}
if got := setCookieName("=novalue"); got != "" {
t.Errorf("setCookieName empty name = %q", got)
}
if got := setCookieName("attributeonly"); got != "" {
t.Errorf("setCookieName no eq = %q", got)
}
names := parseCookieHeaderNames("a=1; b=2;c=3")
if !equalStrSet(names, []string{"a", "b", "c"}) {
t.Errorf("parseCookieHeaderNames = %v", names)
}
}
// Caps: ≤30 Set-Cookie names, ≤50 sent cookie names.
func TestCookiesPayloadCaps(t *testing.T) {
req, _ := http.NewRequest("GET", "https://e.example/", nil)
var bigCookie strings.Builder
for i := 0; i < 80; i++ {
if i > 0 {
bigCookie.WriteString("; ")
}
bigCookie.WriteString("c")
bigCookie.WriteByte(byte('0' + i%10))
bigCookie.WriteString("_")
bigCookie.WriteByte(byte('a' + i%26))
bigCookie.WriteString("=v")
}
req.Header.Add("Cookie", bigCookie.String())
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
for i := 0; i < 45; i++ {
resp.Header.Add("Set-Cookie", "sc"+string(rune('A'+i%26))+string(rune('0'+i%10))+"=v")
}
p := buildCookiesPayload("1.1.1.1", "h", req, resp)
var m map[string]any
json.Unmarshal(p, &m)
if n := len(toStrings(m["set_cookie_names"])); n > 30 {
t.Errorf("set_cookie_names not capped at 30: %d", n)
}
if n := len(toStrings(m["cookie_names"])); n > 50 {
t.Errorf("cookie_names not capped at 50: %d", n)
}
// raw counts still reflect the real totals.
if int(m["set_cookie_count"].(float64)) != 45 {
t.Errorf("set_cookie_count = %v", m["set_cookie_count"])
}
}
// URL truncated to ≤300 chars.
func TestCookiesPayloadURLTruncation(t *testing.T) {
long := "https://e.example/" + strings.Repeat("a", 500)
u, _ := url.Parse(long)
req := &http.Request{Method: "GET", URL: u, Header: http.Header{}}
req.Header.Add("Cookie", "x=1")
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
p := buildCookiesPayload("1.1.1.1", "h", req, resp)
var m map[string]any
json.Unmarshal(p, &m)
if len(m["url"].(string)) > 300 {
t.Errorf("url not truncated: %d chars", len(m["url"].(string)))
}
}
// cookiesRelevant gates emission: only when ≥1 Set-Cookie OR ≥1 Cookie.
func TestCookiesRelevant(t *testing.T) {
mk := func(setC, reqC bool) (*http.Request, *http.Response) {
req, _ := http.NewRequest("GET", "https://e/", nil)
if reqC {
req.Header.Add("Cookie", "a=1")
}
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
if setC {
resp.Header.Add("Set-Cookie", "x=1")
}
return req, resp
}
if r, p := mk(false, false); cookiesRelevant(r, p) {
t.Error("no cookies → should not be relevant")
}
if r, p := mk(true, false); !cookiesRelevant(r, p) {
t.Error("set-cookie present → relevant")
}
if r, p := mk(false, true); !cookiesRelevant(r, p) {
t.Error("request cookie present → relevant")
}
}
// ── ja4 payload ──────────────────────────────────────────────────────────────
func TestBuildJA4Payload(t *testing.T) {
p := buildJA4Payload("198.51.100.9", "tlspersona", "secure.example.com",
[]string{"h2", "http/1.1"}, []uint16{4865, 4866, 49195})
var m map[string]any
if err := json.Unmarshal(p, &m); err != nil {
t.Fatalf("unmarshal: %v\n%s", err, p)
}
if m["sni"] != "secure.example.com" {
t.Errorf("sni = %v", m["sni"])
}
if m["client_ip"] != "198.51.100.9" {
t.Errorf("client_ip = %v", m["client_ip"])
}
if m["client_mac_hash"] != "tlspersona" {
t.Errorf("client_mac_hash = %v", m["client_mac_hash"])
}
alpn := toStrings(m["alpn_protocols"])
if !equalStrSet(alpn, []string{"h2", "http/1.1"}) {
t.Errorf("alpn = %v", alpn)
}
cs := m["cipher_suites"].([]any)
if len(cs) != 3 || int(cs[0].(float64)) != 4865 {
t.Errorf("cipher_suites = %v", cs)
}
// extensions: always null (stdlib doesn't expose them).
if !strings.Contains(string(p), `"extensions":null`) {
t.Errorf("expected extensions null, got: %s", p)
}
}
// Empty ALPN / ciphers → JSON empty arrays (mirrors list(... or [])), not null.
func TestBuildJA4PayloadEmptySlices(t *testing.T) {
p := buildJA4Payload("1.1.1.1", "h", "", nil, nil)
raw := string(p)
if !strings.Contains(raw, `"alpn_protocols":[]`) {
t.Errorf("alpn should be [] not null: %s", raw)
}
if !strings.Contains(raw, `"cipher_suites":[]`) {
t.Errorf("cipher_suites should be [] not null: %s", raw)
}
}
// ── gate wiring ──────────────────────────────────────────────────────────────
// The flag wires into Proxy.analysisRelay and gates emission.
func TestAnalysisRelayGate(t *testing.T) {
on := &Proxy{analysisRelay: true}
off := &Proxy{analysisRelay: false}
if !on.relayEnabled() {
t.Error("analysisRelay=true → relayEnabled() should be true")
}
if off.relayEnabled() {
t.Error("analysisRelay=false → relayEnabled() should be false")
}
}
// emitDPI/emitCookies/emitJA4 respect the gate: with analysisRelay=false they
// deliver nothing to a live socket; with it true they deliver.
func TestEmitGateRespected(t *testing.T) {
sock := filepath.Join(t.TempDir(), "dpi.sock")
ln, err := net.Listen("unix", sock)
if err != nil {
t.Fatal(err)
}
defer ln.Close()
hits := make(chan struct{}, 4)
go func() {
for {
c, err := ln.Accept()
if err != nil {
return
}
buf := make([]byte, 1024)
c.Read(buf)
c.Write([]byte("HTTP/1.1 204 No Content\r\nContent-Length: 0\r\nConnection: close\r\n\r\n"))
c.Close()
hits <- struct{}{}
}
}()
// Gate off → nothing delivered.
off := &Proxy{analysisRelay: false}
off.relayEmit(sock, "/classify", []byte(`{"k":"v"}`))
select {
case <-hits:
t.Fatal("gate off but a payload was delivered")
case <-time.After(300 * time.Millisecond):
}
// Gate on → delivered.
on := &Proxy{analysisRelay: true}
on.relayEmit(sock, "/classify", []byte(`{"k":"v"}`))
select {
case <-hits:
case <-time.After(2 * time.Second):
t.Fatal("gate on but nothing delivered")
}
}
// ── socket-path consts ─────────────────────────────────────────────────────
func TestRelaySocketPaths(t *testing.T) {
if dpiSocket != "/run/secubox/dpi.sock" {
t.Errorf("dpiSocket = %q", dpiSocket)
}
if cookiesSocket != "/run/secubox/cookies.sock" {
t.Errorf("cookiesSocket = %q", cookiesSocket)
}
if ja4Socket != "/run/secubox/threat-analyst.sock" {
t.Errorf("ja4Socket = %q", ja4Socket)
}
}
// ── test helpers ───────────────────────────────────────────────────────────
func toStrings(v any) []string {
arr, ok := v.([]any)
if !ok {
return nil
}
out := make([]string, 0, len(arr))
for _, e := range arr {
out = append(out, e.(string))
}
return out
}
func equalStrSet(got, want []string) bool {
if len(got) != len(want) {
return false
}
seen := map[string]int{}
for _, g := range got {
seen[g]++
}
for _, w := range want {
seen[w]--
}
for _, n := range seen {
if n != 0 {
return false
}
}
return true
}

View File

@ -0,0 +1,189 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: policy live-reload tests (#662 auto-learn loop)
//
// The #662 Go cutover loaded the BLOCK/SPLICE lists ONCE at startup, so an
// autolearn promotion (or a manual edit) of learned-trackers.txt never took
// effect until a worker restart — the very thing that made new adwares slip
// through forever. These tests prove the mtime-based live-reload: after the
// throttle window, a host appended to learned-trackers.txt flips Decide from
// "mitm" to "block" with NO restart. Concurrency is exercised under -race.
package main
import (
"os"
"path/filepath"
"sync"
"sync/atomic"
"testing"
"time"
)
// writeFile is a tiny helper that (re)writes a backing list file with content.
func writeFile(t *testing.T, path, content string) {
t.Helper()
if err := os.WriteFile(path, []byte(content), 0o644); err != nil {
t.Fatalf("write %s: %v", path, err)
}
}
// bumpMtime forces the file's mtime forward so the reload's stat sees a change
// even on coarse-granularity filesystems or sub-second test runs.
func bumpMtime(t *testing.T, path string, d time.Duration) {
t.Helper()
ft := time.Now().Add(d)
if err := os.Chtimes(path, ft, ft); err != nil {
t.Fatalf("chtimes %s: %v", path, err)
}
}
// TestMaybeReloadPicksUpAppendedLearnedTracker is the linchpin test: a host that
// initially Decides "mitm" must flip to "block" once it is appended to
// learned-trackers.txt and the throttle window elapses — without reloading the
// Policy from scratch.
func TestMaybeReloadPicksUpAppendedLearnedTracker(t *testing.T) {
dir := t.TempDir()
learned := filepath.Join(dir, "learned-trackers.txt")
allow := filepath.Join(dir, "ad-allowlist.txt")
writeFile(t, learned, "")
writeFile(t, allow, "")
pol, err := LoadPolicy(PolicyOpts{
LearnedPath: learned,
AllowPath: allow,
// keep the splice/never paths in the temp dir so missing-file behaviour
// (empty set) is deterministic.
SpliceSeedPath: filepath.Join(dir, "seed"),
SpliceLearnPath: filepath.Join(dir, "slearn"),
PureTrackersPath: filepath.Join(dir, "pure"),
SelfDomains: []string{"secubox.in"},
})
if err != nil {
t.Fatalf("LoadPolicy: %v", err)
}
// Make the reload eager for the test (no 15s wait): zero throttle.
pol.reloadThrottle = 0
const host = "acotedemoi.com"
if got := pol.Decide(host, host); got != "mitm" {
t.Fatalf("before promotion: Decide(%q) = %q, want mitm", host, got)
}
// Promote: append the host and bump mtime forward.
writeFile(t, learned, host+"\n")
bumpMtime(t, learned, 2*time.Second)
if got := pol.Decide(host, host); got != "block" {
t.Fatalf("after promotion: Decide(%q) = %q, want block", host, got)
}
}
// TestMaybeReloadThrottled proves the throttle: with a non-zero throttle window,
// a change made just after a reload is NOT observed until the window elapses,
// keeping the hot path cheap (one stat per ~window, not per request).
func TestMaybeReloadThrottled(t *testing.T) {
dir := t.TempDir()
learned := filepath.Join(dir, "learned-trackers.txt")
writeFile(t, learned, "")
pol, err := LoadPolicy(PolicyOpts{LearnedPath: learned, AllowPath: filepath.Join(dir, "allow")})
if err != nil {
t.Fatalf("LoadPolicy: %v", err)
}
pol.reloadThrottle = time.Hour // effectively "never re-stat during the test"
// Prime the throttle clock with one Decide (does the initial stat).
_ = pol.Decide("x.example", "x.example")
const host = "tracker.example"
writeFile(t, learned, host+"\n")
bumpMtime(t, learned, 2*time.Second)
if got := pol.Decide(host, host); got != "mitm" {
t.Fatalf("throttled: Decide(%q) = %q, want mitm (change not yet observed)", host, got)
}
}
// TestMaybeReloadAllowlist proves the allowlist file is live-reloaded too: a
// host the ad-host regex would block ("doubleclick.net") flips block→allow once
// appended to the allowlist and the window elapses.
func TestMaybeReloadAllowlist(t *testing.T) {
dir := t.TempDir()
learned := filepath.Join(dir, "learned-trackers.txt")
allow := filepath.Join(dir, "ad-allowlist.txt")
writeFile(t, learned, "")
writeFile(t, allow, "")
pol, err := LoadPolicy(PolicyOpts{LearnedPath: learned, AllowPath: allow})
if err != nil {
t.Fatalf("LoadPolicy: %v", err)
}
pol.reloadThrottle = 0
const host = "doubleclick.net"
if got := pol.Decide(host, host); got != "block" {
t.Fatalf("before allow: Decide(%q) = %q, want block", host, got)
}
writeFile(t, allow, host+"\n")
bumpMtime(t, allow, 2*time.Second)
if got := pol.Decide(host, host); got != "allow" {
t.Fatalf("after allow: Decide(%q) = %q, want allow", host, got)
}
}
// TestMaybeReloadConcurrent runs Decide from many goroutines while the backing
// learned file is rewritten concurrently. Under `go test -race` this proves the
// RWMutex-guarded swap is data-race-free.
func TestMaybeReloadConcurrent(t *testing.T) {
dir := t.TempDir()
learned := filepath.Join(dir, "learned-trackers.txt")
writeFile(t, learned, "seed.example\n")
pol, err := LoadPolicy(PolicyOpts{LearnedPath: learned, AllowPath: filepath.Join(dir, "allow")})
if err != nil {
t.Fatalf("LoadPolicy: %v", err)
}
pol.reloadThrottle = 0 // force a stat on every Decide → maximal contention
var wg sync.WaitGroup
var blocks int64
stop := make(chan struct{})
// Writer: keep appending hosts + bumping mtime.
wg.Add(1)
go func() {
defer wg.Done()
i := 0
for {
select {
case <-stop:
return
default:
}
writeFile(t, learned, "seed.example\nh"+itoa(i)+".example\n")
bumpMtime(t, learned, time.Duration(i+1)*time.Second)
i++
}
}()
// Readers: hammer Decide on the seed (stable → always block) + a live host.
for r := 0; r < 8; r++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := 0; j < 2000; j++ {
if pol.Decide("seed.example", "seed.example") == "block" {
atomic.AddInt64(&blocks, 1)
}
pol.Decide("h0.example", "h0.example")
}
}()
}
time.Sleep(50 * time.Millisecond)
close(stop)
wg.Wait()
if blocks == 0 {
t.Fatal("expected the stable seed host to block at least once")
}
}

View File

@ -18,7 +18,9 @@
// avatar → /run/secubox/avatar.sock POST /fingerprint
// ja4 → /run/secubox/threat-analyst.sock POST /ja4
// soc_relay → /run/secubox/soc.sock POST /event
// social_graph: in-process (no socket) — correlated inside the engine, not emitted.
// social_graph: correlated in-process (social.go) — edges (hash-only, never raw
// cookie values) are NOT emitted to a module socket but POSTed to the portal
// /__toolbox/social-event ingest (the social store lives in the toolbox/portal).
//
// emit takes the full socket PATH (not an http+unix:// URL) plus the route in
// the payload's destination; callers build the path from the table above.

View File

@ -0,0 +1,605 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: cross-site cookie-tracker correlation (#662)
//
// Restores the kbin "/social" cross-site tracker graph, frozen since the #662
// Phase-7 cutover decommissioned the in-process Python `social_graph` addon
// (packages/secubox-toolbox/mitmproxy_addons/social_graph.py). The graph reads
// social_nodes/social_links in toolbox.db, folded from raw social_edges — and
// the edges stopped flowing when the Python addon was retired.
//
// This is a FAITHFUL Go port of the addon's correlation logic:
// - cookieIDHash : byte-exact port of social.cookie_id_hash (Python = source
// of truth, proven by social_test.go ↔ tests/test_social_parity.py over a
// shared fixture — the same anti-rig discipline as jar.go).
// - isDenyListed + the _DEFAULT_DENY_COOKIES set (social.py).
// - registrableSocial : the addon's _registrable_domain eTLD+1 helper
// (DIFFERENT from policy.go's registrable() — IP literals pass through,
// no port strip, a larger multi-label-TLD table; the graph correctness
// depends on this exact flavour, so it is replicated verbatim and NOT
// consolidated with policy.registrable).
// - the 3rd-party decision (tracker_domain != src_site on eTLD+1) on BOTH the
// response Set-Cookie path and the request Cookie path, mirroring the
// addon's response()+request hooks.
// - the CMP consent-platform detection → consent_state ∈ {none_seen,
// pre_consent, post_consent} via a per-(peer,site) in-memory log.
//
// Privacy/CSPN invariant (the reason the original ran in-process): raw cookie
// VALUES NEVER leave the engine — only the truncated SHA-256 cookieIDHash is
// emitted. The edges are POSTed fire-and-forget to the portal's
// /__toolbox/social-event ingest (sibling of /__toolbox/ad-event), which calls
// social.record_edge(). Best-effort throughout; a dead/slow portal can never
// block or delay a client flow.
//
// Pure standard library — no external modules, no go.sum.
package main
import (
"bytes"
"crypto/sha256"
"encoding/hex"
"encoding/json"
"log"
"net/http"
"strings"
"sync"
"time"
)
// ── registrableSocial: port of social_graph._registrable_domain ─────────────
//
// Python (mitmproxy_addons/social_graph.py):
//
// h = (host or "").lower().strip(".")
// if not h or h.replace(".", "").isdigit(): return h # raw IP → as-is
// parts = h.split(".")
// if len(parts) < 2: return h
// last_two = ".".join(parts[-2:])
// if last_two in _MULTI_LABEL_TLDS and len(parts) >= 3: return ".".join(parts[-3:])
// return last_two
//
// This DIFFERS from policy.registrable (ad_ghost flavour): no port strip, IP
// literals pass through unchanged (the store later drops IP trackers via
// _is_ip), and the multi-label-TLD table below is the addon's larger set. The
// graph's 3rd-party comparison is done with THIS function, so it must match the
// addon exactly.
var socialMultiLabelTLDs = map[string]bool{
"co.uk": true, "ac.uk": true, "gov.uk": true, "org.uk": true, "net.uk": true,
"co.jp": true, "ne.jp": true, "ac.jp": true,
"com.au": true, "net.au": true, "org.au": true,
"com.br": true, "com.cn": true, "com.hk": true, "com.tw": true, "com.mx": true,
}
func registrableSocial(host string) string {
h := strings.Trim(strings.ToLower(host), ".")
if h == "" {
return h
}
// h.replace(".","").isdigit() → all-digit (IPv4-ish) → return as-is.
if isAllDigits(strings.ReplaceAll(h, ".", "")) {
return h
}
parts := strings.Split(h, ".")
if len(parts) < 2 {
return h
}
last2 := strings.Join(parts[len(parts)-2:], ".")
if socialMultiLabelTLDs[last2] && len(parts) >= 3 {
return strings.Join(parts[len(parts)-3:], ".")
}
return last2
}
// ── cookieIDHash: BYTE-EXACT port of social.cookie_id_hash ───────────────────
//
// Python (secubox_toolbox/social.py):
//
// h = sha256()
// h.update(tracker_domain.lower().encode("utf-8","replace")); h.update(b"\x00")
// h.update(cookie_name.lower().encode("utf-8","replace")); h.update(b"\x00")
// h.update(cookie_value.encode("utf-8","replace"))
// return h.hexdigest()[:16]
//
// CRITICAL: tracker_domain + cookie_name are LOWER-cased; the cookie_value is
// NOT. NUL (0x00) separators between the three fields. Go strings are already
// UTF-8, and strings.ToLower is byte-identical to Python str.lower for the
// ASCII + Latin domain/name inputs the fixtures exercise (incl. the Ünîcödé
// case, verified at parity). hex of the first 8 digest bytes == hexdigest()[:16].
func cookieIDHash(trackerDomain, cookieName, cookieValue string) string {
h := sha256.New()
h.Write([]byte(strings.ToLower(trackerDomain)))
h.Write([]byte{0x00})
h.Write([]byte(strings.ToLower(cookieName)))
h.Write([]byte{0x00})
h.Write([]byte(cookieValue)) // value NOT lower-cased
sum := h.Sum(nil)
return hex.EncodeToString(sum)[:16]
}
// ── deny-list: port of social._DEFAULT_DENY_COOKIES + is_deny_listed ─────────
//
// Names whose presence on a flow is NEVER recorded as a tracker identifier
// (session / csrf / auth / cloudflare / consent / locale). Replicated verbatim
// from social.py; matched case-insensitively after trimming.
var socialDenyCookies = map[string]bool{
// session
"phpsessid": true, "jsessionid": true, "asp.net_sessionid": true, "ci_session": true,
"express.sid": true, "connect.sid": true, "sails.sid": true, "django_session": true,
"laravel_session": true, "flask_session": true, "session": true, "sessionid": true,
// csrf
"_csrf": true, "_csrf_token": true, "xsrf-token": true, "csrftoken": true, "csrf": true,
"x-csrf-token": true, "anti-csrf-token": true,
// auth (1st-party)
"auth": true, "auth_token": true, "access_token": true, "refresh_token": true, "bearer": true,
"remember_token": true, "remember_me": true, "_oauth2_proxy": true,
// cloudflare / consent / locale (low signal)
"__cf_bm": true, "cf_clearance": true, "consent": true, "cookieconsent_status": true,
"locale": true, "lang": true, "language": true, "_locale": true,
}
// isDenyListed mirrors social.is_deny_listed (default-deny set only; the engine
// does not load the TOML extra_deny override). An empty name is deny-listed
// (Python returns True for a blank name).
func isDenyListed(cookieName string) bool {
name := strings.ToLower(strings.TrimSpace(cookieName))
if name == "" {
return true
}
return socialDenyCookies[name]
}
// ── cookie parsers: port of _parse_set_cookie / _parse_cookie_header ─────────
// parseSetCookieNameValue mirrors social_graph._parse_set_cookie: name=value is
// the text up to the first ';'; the name is everything before the first '=',
// trimmed; the value is the rest of that first field, trimmed. Returns ok=false
// for an attribute-only / nameless / empty line.
func parseSetCookieNameValue(header string) (name, value string, ok bool) {
field := header
if i := strings.IndexByte(field, ';'); i >= 0 {
field = field[:i]
}
eq := strings.IndexByte(field, '=')
if eq < 0 {
return "", "", false
}
name = strings.TrimSpace(field[:eq])
value = strings.TrimSpace(field[eq+1:])
if name == "" {
return "", "", false
}
return name, value, true
}
// cookiePair is one (name,value) parsed from a request Cookie header.
type cookiePair struct{ name, value string }
// parseCookieHeader mirrors social_graph._parse_cookie_header: split on ';',
// each "name=value" yields a trimmed (name,value); nameless pairs are dropped.
func parseCookieHeader(header string) []cookiePair {
var out []cookiePair
for _, part := range strings.Split(header, ";") {
eq := strings.IndexByte(part, '=')
if eq < 0 {
continue
}
name := strings.TrimSpace(part[:eq])
value := strings.TrimSpace(part[eq+1:])
if name != "" {
out = append(out, cookiePair{name: name, value: value})
}
}
return out
}
// extractSetCookieDomainAttr mirrors social_graph._extract_domain_attr: pull the
// "; Domain=…" attribute from a Set-Cookie line, trimmed, leading dot stripped,
// lower-cased. Returns "" when absent.
func extractSetCookieDomainAttr(setCookie string) string {
low := strings.ToLower(setCookie)
idx := strings.Index(low, "domain")
for idx >= 0 {
// require it to be an attribute (preceded by ';' after optional spaces),
// mirroring the Python regex `;\s*domain\s*=`.
j := idx + len("domain")
// skip spaces, then '='
k := j
for k < len(setCookie) && (setCookie[k] == ' ' || setCookie[k] == '\t') {
k++
}
if k < len(setCookie) && setCookie[k] == '=' {
// confirm a ';' (or start) precedes `domain` (after spaces).
p := idx - 1
for p >= 0 && (setCookie[p] == ' ' || setCookie[p] == '\t') {
p--
}
if p < 0 || setCookie[p] == ';' {
rest := setCookie[k+1:]
if e := strings.IndexByte(rest, ';'); e >= 0 {
rest = rest[:e]
}
val := strings.ToLower(strings.TrimLeft(strings.TrimSpace(rest), "."))
if val == "" {
return ""
}
return val
}
}
next := strings.Index(low[idx+1:], "domain")
if next < 0 {
return ""
}
idx = idx + 1 + next
}
return ""
}
// srcSiteFromReferer mirrors social_graph._src_site_from_referer: take Referer
// (else Origin), strip scheme/path/query, return registrableSocial of the host.
func srcSiteFromReferer(req *http.Request) string {
ref := req.Header.Get("Referer")
if ref == "" {
ref = req.Header.Get("Origin")
}
if ref == "" {
return ""
}
s := ref
if i := strings.Index(s, "://"); i >= 0 {
s = s[i+3:]
}
if i := strings.IndexByte(s, '/'); i >= 0 {
s = s[:i]
}
if i := strings.IndexByte(s, '?'); i >= 0 {
s = s[:i]
}
return registrableSocial(s)
}
// ── consent-state detection: port of the _consent_log machinery ──────────────
//
// CMP (Consent Management Platform) cookie name prefixes + loader URL fragments,
// verbatim from social_graph._CMP_COOKIE_PREFIXES / _CMP_LOADER_FRAGMENTS. Seen
// on a flow → the site runs a CMP (has_cmp) and, for a cookie, consent recorded
// (consented). consent_state classifies a tracker edge as pre/post/none-consent.
var cmpCookiePrefixes = []string{
"optanonconsent", "onetrustconsent", "optanonalertboxclosed", // OneTrust
"didomi_token", "euconsent-v2", // Didomi / IAB TCF
"__qca", "quantcast", // Quantcast
"sp_choice", "consentuid", "_sp_", // Sourcepoint
}
var cmpLoaderFragments = []string{
"cdn.cookielaw.org", "onetrust.com", // OneTrust
"sdk.privacy-center.org", "didomi.io", // Didomi
"quantcast.mgr.consensu.org", "quantcast.com/choice", // Quantcast
"sourcepoint.mgr.consensu.org", "sp-prod.net", // Sourcepoint
}
// consentObservation is the per-(peer,site) state, mirroring the Python dict
// {"has_cmp": bool, "consented": bool}.
type consentObservation struct {
hasCMP bool
consented bool
}
// consentKey mirrors social_graph._consent_key = (mac_hash, site).
type consentKey struct{ macHash, site string }
// consentLog is the bounded in-memory per-(peer,site) observation log, mirroring
// the module-level _consent_log + its 20k soft-cap wholesale clear. The Go proxy
// is genuinely concurrent (Python relied on the GIL), so all access is
// mutex-guarded.
type consentLog struct {
mu sync.Mutex
log map[consentKey]consentObservation
}
const consentLogCap = 20000 // mirrors social_graph._consent_log soft cap
func newConsentLog() *consentLog {
return &consentLog{log: map[consentKey]consentObservation{}}
}
// update mirrors social_graph._update_consent_log: observe whether this flow
// reveals a CMP loader (URL fragment, both request and response side) or a CMP
// cookie (either direction) for the (peer,site) pair, and fold it into the log.
// - url is flow.request.pretty_url (lower-cased here).
// - cookieBlobs are the raw request Cookie + response Set-Cookie header lines.
func (cl *consentLog) update(macHash, site, url string, cookieBlobs []string) {
cl.mu.Lock()
defer cl.mu.Unlock()
if len(cl.log) > consentLogCap {
cl.log = map[consentKey]consentObservation{}
}
key := consentKey{macHash: macHash, site: site}
st := cl.log[key]
lurl := strings.ToLower(url)
for _, frag := range cmpLoaderFragments {
if strings.Contains(lurl, frag) {
st.hasCMP = true
break
}
}
for _, blob := range cookieBlobs {
low := strings.ToLower(blob)
for _, pref := range cmpCookiePrefixes {
if strings.Contains(low, pref) {
st.hasCMP = true
st.consented = true
break
}
}
}
cl.log[key] = st
}
// stateFor mirrors social_graph._consent_state_for: post_consent if a consent
// cookie was seen here, pre_consent if a CMP is present but no consent cookie
// yet, none_seen otherwise.
func (cl *consentLog) stateFor(macHash, site string) string {
cl.mu.Lock()
defer cl.mu.Unlock()
st, ok := cl.log[consentKey{macHash: macHash, site: site}]
if !ok {
return "none_seen"
}
if st.consented {
return "post_consent"
}
if st.hasCMP {
return "pre_consent"
}
return "none_seen"
}
// ── edge extraction: port of SocialGraph.response()+request() hook logic ──────
// socialEdge is one cross-site tracker edge, mirroring the kwargs the Python
// social.record_edge accepts; serialised straight into the ingest batch.
type socialEdge struct {
ClientMacHash string `json:"client_mac_hash"`
SrcSite string `json:"src_site"`
TrackerDomain string `json:"tracker_domain"`
CookieIDHashVal string `json:"cookie_id_hash_val"`
JA4Hash string `json:"ja4_hash,omitempty"`
ConsentState string `json:"consent_state"`
}
// socialEdgesFor extracts the cross-site tracker edges for ONE MITM'd flow,
// mirroring SocialGraph.response() + the request-Cookie tail. Pure (no I/O): the
// caller emits the returned edges. macHash MUST be the WG persona hash (the
// addon only fires for known R3 peers — empty macHash yields no edges). reqHost
// is flow.request.host; reqURL is flow.request.pretty_url (for CMP loader
// detection); ja4 is the captured fingerprint (may be "").
//
// Decision logic, faithful to the addon:
// - src_site = registrableSocial(reqHost); skip if empty.
// - update the consent log for (macHash, src_site), derive consent_state.
// - Set-Cookie path (first 50): for each non-deny-listed cookie, tracker_domain
// = registrableSocial(Domain= attr OR reqHost); emit IFF tracker_domain != ""
// and != src_site (3rd-party).
// - Cookie path: only when a Referer/Origin context site exists and differs
// from the tracker (= registrableSocial(reqHost)); cap 5 Cookie headers ×
// 50 pairs; emit per non-deny-listed cookie with the context site's
// consent_state.
func socialEdgesFor(macHash string, req *http.Request, resp *http.Response, reqHost, reqURL, ja4 string, cl *consentLog) []socialEdge {
if macHash == "" || cl == nil {
return nil
}
srcSite := registrableSocial(reqHost)
if srcSite == "" {
return nil
}
// Gather the cookie blobs (both directions) for the CMP cookie check, then
// fold the consent observation BEFORE deriving consent_state (matches the
// addon's ordering: _update_consent_log then _consent_state_for).
var setCookies []string
if resp != nil {
setCookies = resp.Header.Values("Set-Cookie")
}
var reqCookies []string
if req != nil {
reqCookies = req.Header.Values("Cookie")
}
blobs := make([]string, 0, len(reqCookies)+len(setCookies))
blobs = append(blobs, reqCookies...)
blobs = append(blobs, setCookies...)
cl.update(macHash, srcSite, reqURL, blobs)
consentState := cl.stateFor(macHash, srcSite)
var edges []socialEdge
// Set-Cookie path — first 50 lines (matches the addon's [:50]).
for i, sc := range setCookies {
if i >= 50 {
break
}
name, value, ok := parseSetCookieNameValue(sc)
if !ok || isDenyListed(name) {
continue
}
domainAttr := extractSetCookieDomainAttr(sc)
issuer := domainAttr
if issuer == "" {
issuer = reqHost
}
trackerDomain := registrableSocial(issuer)
if trackerDomain == "" || trackerDomain == srcSite {
continue // 1st-party Set-Cookie: not a cross-site tracker signal.
}
edges = append(edges, socialEdge{
ClientMacHash: macHash,
SrcSite: srcSite,
TrackerDomain: trackerDomain,
CookieIDHashVal: cookieIDHash(trackerDomain, name, value),
JA4Hash: ja4,
ConsentState: consentState,
})
}
// Request-Cookie path — only when this request is itself for a 3rd-party
// tracker and we have a differing 1st-party context from the Referer/Origin.
if len(reqCookies) == 0 {
return edges
}
trackerDomain := registrableSocial(reqHost)
if trackerDomain == "" {
return edges
}
ctxSite := srcSiteFromReferer(req)
if ctxSite == "" || ctxSite == trackerDomain {
return edges
}
ctxConsent := cl.stateFor(macHash, ctxSite)
for i, hdr := range reqCookies {
if i >= 5 { // addon caps Cookie headers at [:5]
break
}
pairs := parseCookieHeader(hdr)
for j, p := range pairs {
if j >= 50 { // and pairs at [:50]
break
}
if isDenyListed(p.name) {
continue
}
edges = append(edges, socialEdge{
ClientMacHash: macHash,
SrcSite: ctxSite,
TrackerDomain: trackerDomain,
CookieIDHashVal: cookieIDHash(trackerDomain, p.name, p.value),
JA4Hash: ja4,
ConsentState: ctxConsent,
})
}
}
return edges
}
// ── relay: batch + POST to the portal /__toolbox/social-event ingest ─────────
const (
socialFlushInterval = 10 * time.Second // drain cadence (sibling of adFlushInterval)
socialBatchCap = 5000 // max edges held between flushes (drop excess)
)
// socialEventPayload mirrors the portal /__toolbox/social-event JSON contract.
type socialEventPayload struct {
Edges []socialEdge `json:"edges"`
}
func (p socialEventPayload) empty() bool { return len(p.Edges) == 0 }
// socialRelay buffers extracted edges and flushes them to the portal. Bounded:
// once the buffer holds socialBatchCap edges, NEW edges are dropped until the
// next flush clears it (a dead portal can never grow memory unbounded). Edges
// carry ONLY the cookieIDHash — never raw values (privacy/CSPN).
type socialRelay struct {
mu sync.Mutex
buf []socialEdge
}
func newSocialRelay() *socialRelay { return &socialRelay{} }
// add appends edges to the buffer under the cap. Never blocks the flow.
func (s *socialRelay) add(edges ...socialEdge) {
if len(edges) == 0 {
return
}
s.mu.Lock()
defer s.mu.Unlock()
for _, e := range edges {
if len(s.buf) >= socialBatchCap {
return
}
s.buf = append(s.buf, e)
}
}
// snapshot atomically reads-and-clears the buffer.
func (s *socialRelay) snapshot() socialEventPayload {
s.mu.Lock()
defer s.mu.Unlock()
if len(s.buf) == 0 {
return socialEventPayload{}
}
p := socialEventPayload{Edges: s.buf}
s.buf = nil
return p
}
// socialEventClient is the short-timeout fire-and-forget client for the
// social-event POST (sibling of adEventClient). Never follows redirects (SSRF
// hygiene); tight timeout so a slow portal can't stall the flusher.
var socialEventClient = &http.Client{
Timeout: 5 * time.Second,
CheckRedirect: func(*http.Request, []*http.Request) error { return http.ErrUseLastResponse },
}
// flushOnce snapshots the buffer and, if non-empty, POSTs it to the portal's
// /__toolbox/social-event ingest. Best-effort: any error is swallowed with at
// most a log line — the engine must never block on the portal. Returns the
// flushed payload so the test can assert the snapshot/clear + shape.
func (s *socialRelay) flushOnce(portal string) socialEventPayload {
p := s.snapshot()
if p.empty() {
return p
}
buf, err := json.Marshal(p)
if err != nil {
log.Printf("social-event marshal failed: %v", err)
return p
}
url := portalTargetURL(portal, "/__toolbox/social-event")
resp, err := socialEventClient.Post(url, "application/json", bytes.NewReader(buf))
if err != nil {
log.Printf("social-event post failed for %s: %v", url, err)
return p
}
resp.Body.Close()
return p
}
// ── proxy wiring ──────────────────────────────────────────────────────────
// socialEnabled reports whether cross-site correlation is on (--social-relay →
// Proxy.socialRelayOn, with the buffer + consent log allocated). Nil-safe so the
// CONNECT PoC / tests that build a bare Proxy can call it.
func (px *Proxy) socialEnabled() bool {
return px != nil && px.socialRelayOn && px.social != nil && px.consent != nil
}
// emitSocial extracts the cross-site tracker edges for a MITM'd flow and buffers
// them for the batched portal POST. clientIP is the client's peer IP; the per-
// client identity is the WG persona hash (macHashOf) — NOT the raw-IP fallback,
// so non-WG flows produce no edges, exactly like the Python addon's
// _client_mac_hash gate. Gated, pure (the buffer.add is O(1) under a short
// mutex), never blocks the flow. reqURL feeds the CMP loader-fragment check.
func (px *Proxy) emitSocial(clientIP, host string, req *http.Request, resp *http.Response) {
if !px.socialEnabled() || req == nil {
return
}
macHash := macHashOf(clientIP)
if macHash == "" {
return // known R3 WG peers only (addon: `if not mac_hash: return`)
}
reqURL := req.URL.String()
edges := socialEdgesFor(macHash, req, resp, host, reqURL, "", px.consent)
px.social.add(edges...)
}
// runFlusher is the background flusher goroutine: every socialFlushInterval it
// drains the buffer to the portal. Start once from main(); runs for the process
// lifetime.
func (s *socialRelay) runFlusher(portal string) {
t := time.NewTicker(socialFlushInterval)
defer t.Stop()
for range t.C {
s.flushOnce(portal)
}
}

View File

@ -0,0 +1,297 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// Cross-engine SOCIAL parity + decision harness — Go side (#662).
//
// Anti-rig: loads testdata/social-cookie-id-fixtures.json (GENERATED by the real
// secubox_toolbox.social.cookie_id_hash) and asserts cookieIDHash reproduces
// every `expect` byte-for-byte — Python is the source of truth, exactly like the
// jar parity harness. The Python side is tests/test_social_parity.py.
//
// The rest exercises the ported decision surface: deny-list, registrableSocial
// (the addon flavour, NOT policy.registrable), the 3rd-party Set-Cookie + Cookie
// edge extraction, consent_state classification, and the relay buffer/flush.
package main
import (
"encoding/json"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"testing"
)
type socialCookieFixture struct {
TrackerDomain string `json:"tracker_domain"`
CookieName string `json:"cookie_name"`
CookieValue string `json:"cookie_value"`
Expect string `json:"expect"`
Why string `json:"why"`
}
type socialCookieFile struct {
Fixtures []socialCookieFixture `json:"fixtures"`
}
// TestCookieIDHashParity: cookieIDHash == the Python-generated expect for every
// fixture. This is the anti-rig that proves the Go hash is byte-identical to
// social.cookie_id_hash (lower-case domain+name, raw value, NUL separators).
func TestCookieIDHashParity(t *testing.T) {
dir := testdataDir(t)
raw, err := os.ReadFile(filepath.Join(dir, "social-cookie-id-fixtures.json"))
if err != nil {
t.Fatalf("read social fixtures: %v", err)
}
var f socialCookieFile
if err := json.Unmarshal(raw, &f); err != nil {
t.Fatalf("parse social fixtures: %v", err)
}
if len(f.Fixtures) == 0 {
t.Fatal("no social cookie-id fixtures")
}
for _, fx := range f.Fixtures {
got := cookieIDHash(fx.TrackerDomain, fx.CookieName, fx.CookieValue)
if got != fx.Expect {
t.Errorf("cookieIDHash(%q,%q,%q)=%q want %q (%s)",
fx.TrackerDomain, fx.CookieName, fx.CookieValue, got, fx.Expect, fx.Why)
}
}
}
// TestCookieIDHashFolding: domain+name are lower-cased but the value is NOT —
// the explicit invariant the store contract pins.
func TestCookieIDHashFolding(t *testing.T) {
if cookieIDHash("DoubleClick.NET", "IDE", "AbC") != cookieIDHash("doubleclick.net", "ide", "AbC") {
t.Error("domain+name must be case-folded")
}
if cookieIDHash("d.net", "n", "AbC") == cookieIDHash("d.net", "n", "abc") {
t.Error("value must NOT be case-folded")
}
}
func TestIsDenyListed(t *testing.T) {
deny := []string{"PHPSESSID", "session", " csrftoken ", "__cf_bm", "consent", "locale", "", " "}
for _, n := range deny {
if !isDenyListed(n) {
t.Errorf("isDenyListed(%q) = false, want true", n)
}
}
allow := []string{"IDE", "_ga", "_fbp", "uid", "datr"}
for _, n := range allow {
if isDenyListed(n) {
t.Errorf("isDenyListed(%q) = true, want false", n)
}
}
}
// TestRegistrableSocial: the addon flavour — IP literals pass through (NOT ""),
// no port strip semantics needed, the larger multi-label table.
func TestRegistrableSocial(t *testing.T) {
cases := map[string]string{
"www.lemonde.fr": "lemonde.fr",
"cdn.api.example.co.uk": "example.co.uk",
"tracker.com": "tracker.com",
"a.b.c.doubleclick.net": "doubleclick.net",
"WWW.Example.COM": "example.com",
"sub.example.com.au": "example.com.au",
"192.168.1.1": "192.168.1.1", // IP literal as-is (addon), store drops later
".trailing.dot.net.": "dot.net",
"single": "single",
"": "",
}
for in, want := range cases {
if got := registrableSocial(in); got != want {
t.Errorf("registrableSocial(%q)=%q want %q", in, got, want)
}
}
}
func TestParseSetCookieNameValue(t *testing.T) {
cases := []struct {
in string
name, value string
ok bool
}{
{"IDE=AHWqTUm; Domain=.doubleclick.net; Path=/", "IDE", "AHWqTUm", true},
{" _ga = GA1.2.3 ; Max-Age=63", "_ga", "GA1.2.3", true},
{"Secure; HttpOnly", "", "", false},
{"=novalue", "", "", false},
{"empty=", "empty", "", true},
}
for _, c := range cases {
n, v, ok := parseSetCookieNameValue(c.in)
if n != c.name || v != c.value || ok != c.ok {
t.Errorf("parseSetCookieNameValue(%q)=(%q,%q,%v) want (%q,%q,%v)", c.in, n, v, ok, c.name, c.value, c.ok)
}
}
}
func TestExtractSetCookieDomainAttr(t *testing.T) {
cases := map[string]string{
"IDE=x; Domain=.doubleclick.net; Path=/": "doubleclick.net",
"a=b; domain=Example.COM": "example.com",
"a=b; Path=/": "",
"a=b": "",
"a=domainlike=1; Path=/": "", // value containing "domain" is not the attr
}
for in, want := range cases {
if got := extractSetCookieDomainAttr(in); got != want {
t.Errorf("extractSetCookieDomainAttr(%q)=%q want %q", in, got, want)
}
}
}
func TestSrcSiteFromReferer(t *testing.T) {
req := httptest.NewRequest("GET", "https://tracker.io/p.gif", nil)
if got := srcSiteFromReferer(req); got != "" {
t.Errorf("no referer → %q want \"\"", got)
}
req.Header.Set("Referer", "https://www.lemonde.fr/article?x=1")
if got := srcSiteFromReferer(req); got != "lemonde.fr" {
t.Errorf("referer → %q want lemonde.fr", got)
}
req.Header.Del("Referer")
req.Header.Set("Origin", "https://news.example.co.uk")
if got := srcSiteFromReferer(req); got != "example.co.uk" {
t.Errorf("origin fallback → %q want example.co.uk", got)
}
}
// helper: build a response with the given Set-Cookie lines.
func respWithSetCookies(lines ...string) *http.Response {
h := http.Header{}
for _, l := range lines {
h.Add("Set-Cookie", l)
}
return &http.Response{Header: h}
}
// TestSocialEdgesThirdParty: a 3rd-party Set-Cookie (Domain= a different eTLD+1)
// on a 1st-party page yields one edge with the right src_site/tracker_domain.
func TestSocialEdgesThirdParty(t *testing.T) {
req := httptest.NewRequest("GET", "https://ads.doubleclick.net/pixel", nil)
resp := respWithSetCookies("IDE=AHWqTUm; Domain=.doubleclick.net; Path=/")
// reqHost is the responding host (doubleclick) — but src_site is also derived
// from it; so to model a TRUE 3rd-party we use the Domain attr differing from
// the request host's registrable. Here both are doubleclick.net → 1st-party,
// expect NO edge.
edges := socialEdgesFor("machash1", req, resp, "ads.doubleclick.net", "https://ads.doubleclick.net/pixel", "", newConsentLog())
if len(edges) != 0 {
t.Fatalf("1st-party Set-Cookie should yield 0 edges, got %d", len(edges))
}
// Now a genuine 3rd-party: the page host is lemonde.fr, a Set-Cookie with
// Domain=.doubleclick.net (the embedded tracker setting on its own domain via
// the request being to doubleclick but src derived from referer is the
// request-cookie path; the Set-Cookie path uses reqHost as src). Model the
// addon's Set-Cookie path: reqHost=lemonde.fr, Domain attr=doubleclick.net.
resp2 := respWithSetCookies("IDE=AHWqTUm; Domain=.doubleclick.net; Path=/")
edges = socialEdgesFor("machash1", req, resp2, "www.lemonde.fr", "https://www.lemonde.fr/", "", newConsentLog())
if len(edges) != 1 {
t.Fatalf("3rd-party Set-Cookie should yield 1 edge, got %d", len(edges))
}
e := edges[0]
if e.SrcSite != "lemonde.fr" || e.TrackerDomain != "doubleclick.net" {
t.Errorf("edge src/tracker = %q/%q want lemonde.fr/doubleclick.net", e.SrcSite, e.TrackerDomain)
}
if e.CookieIDHashVal != cookieIDHash("doubleclick.net", "IDE", "AHWqTUm") {
t.Errorf("edge cookie id hash mismatch: %q", e.CookieIDHashVal)
}
if e.ConsentState != "none_seen" {
t.Errorf("consent_state = %q want none_seen", e.ConsentState)
}
}
// TestSocialEdgesDenyAndIP: deny-listed names produce no edge; IP-literal hosts
// produce no edge (registrableSocial returns the IP, store drops it — but src
// derivation: an IP src_site == IP tracker → not 3rd party anyway).
func TestSocialEdgesDenyAndIP(t *testing.T) {
req := httptest.NewRequest("GET", "https://x/", nil)
resp := respWithSetCookies("PHPSESSID=abc; Domain=.doubleclick.net")
edges := socialEdgesFor("m", req, resp, "www.lemonde.fr", "https://www.lemonde.fr/", "", newConsentLog())
if len(edges) != 0 {
t.Fatalf("deny-listed cookie should yield 0 edges, got %d", len(edges))
}
// empty mac hash → no edges (R3-only gate)
if e := socialEdgesFor("", req, respWithSetCookies("IDE=x; Domain=.doubleclick.net"), "www.lemonde.fr", "u", "", newConsentLog()); len(e) != 0 {
t.Fatalf("empty macHash should yield 0 edges, got %d", len(e))
}
}
// TestSocialEdgesRequestCookiePath: a request TO a tracker carrying a Cookie,
// with a Referer to a different 1st-party, yields an edge attributed to the
// referer's site.
func TestSocialEdgesRequestCookiePath(t *testing.T) {
req := httptest.NewRequest("GET", "https://ads.doubleclick.net/px", nil)
req.Header.Set("Cookie", "IDE=AHWqTUm; session=secret")
req.Header.Set("Referer", "https://www.lemonde.fr/article")
// No Set-Cookie in the response; src_site = registrableSocial(reqHost) =
// doubleclick.net; the Set-Cookie loop emits nothing; the request-Cookie tail
// uses ctxSite=lemonde.fr (referer) != tracker doubleclick.net → edge. The
// deny-listed `session` cookie is skipped, so exactly 1 edge (IDE).
edges := socialEdgesFor("m", req, &http.Response{Header: http.Header{}}, "ads.doubleclick.net", "https://ads.doubleclick.net/px", "", newConsentLog())
if len(edges) != 1 {
t.Fatalf("request-cookie path should yield 1 edge, got %d", len(edges))
}
if edges[0].SrcSite != "lemonde.fr" || edges[0].TrackerDomain != "doubleclick.net" {
t.Errorf("edge = %q/%q want lemonde.fr/doubleclick.net", edges[0].SrcSite, edges[0].TrackerDomain)
}
}
// TestConsentLog: loader fragment → pre_consent; CMP cookie → post_consent.
func TestConsentLog(t *testing.T) {
cl := newConsentLog()
if got := cl.stateFor("m", "lemonde.fr"); got != "none_seen" {
t.Errorf("fresh → %q want none_seen", got)
}
// CMP loader request observed (no consent cookie yet) → pre_consent.
cl.update("m", "lemonde.fr", "https://cdn.cookielaw.org/consent/scripttemplates/otSDKStub.js", nil)
if got := cl.stateFor("m", "lemonde.fr"); got != "pre_consent" {
t.Errorf("after CMP loader → %q want pre_consent", got)
}
// CMP consent cookie observed → post_consent.
cl.update("m", "lemonde.fr", "https://www.lemonde.fr/", []string{"OptanonConsent=isGpcEnabled=0; Path=/"})
if got := cl.stateFor("m", "lemonde.fr"); got != "post_consent" {
t.Errorf("after CMP cookie → %q want post_consent", got)
}
}
// TestSocialRelayFlush: the buffer batches edges and flushOnce POSTs them to the
// portal /__toolbox/social-event, then clears.
func TestSocialRelayFlush(t *testing.T) {
var got socialEventPayload
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/__toolbox/social-event" {
t.Errorf("unexpected path %q", r.URL.Path)
}
_ = json.NewDecoder(r.Body).Decode(&got)
w.WriteHeader(204)
}))
defer srv.Close()
s := newSocialRelay()
s.add(socialEdge{ClientMacHash: "m", SrcSite: "a.fr", TrackerDomain: "t.com", CookieIDHashVal: "deadbeef", ConsentState: "none_seen"})
p := s.flushOnce(srv.URL)
if len(p.Edges) != 1 || len(got.Edges) != 1 {
t.Fatalf("flush sent %d / server got %d, want 1/1", len(p.Edges), len(got.Edges))
}
if got.Edges[0].TrackerDomain != "t.com" {
t.Errorf("server edge tracker = %q want t.com", got.Edges[0].TrackerDomain)
}
// Buffer cleared: a second flush sends nothing.
if p2 := s.flushOnce(srv.URL); !p2.empty() {
t.Errorf("second flush should be empty, got %d edges", len(p2.Edges))
}
}
// TestSocialRelayCap: the buffer never exceeds socialBatchCap.
func TestSocialRelayCap(t *testing.T) {
s := newSocialRelay()
for i := 0; i < socialBatchCap+100; i++ {
s.add(socialEdge{ClientMacHash: "m", SrcSite: "a", TrackerDomain: "t", CookieIDHashVal: "h", ConsentState: "none_seen"})
}
if got := s.snapshot(); len(got.Edges) != socialBatchCap {
t.Errorf("buffer held %d edges, want cap %d", len(got.Edges), socialBatchCap)
}
}

View File

@ -389,7 +389,11 @@ func (px *Proxy) handleTransparent(client net.Conn) {
// over a replayable conn, then run the shared pipeline dialling the captured
// original-dst (NOT the SNI).
replay := &prefixConn{prefix: hello, Conn: client}
tconn := tls.Server(replay, px.serverTLSConfig())
// The capture hook relays the ja4 ClientHello payload for this handshake,
// tagged with the REAL transparent peer IP from the raw client conn (#662).
// nil when the relay gate is off. Emitted around Decide → blocked/allowed
// alike, matching the Python addon's per-tls_clienthello behaviour.
tconn := tls.Server(replay, px.serverTLSConfigCapture(px.captureAndEmitJA4(client)))
if err := tconn.Handshake(); err != nil {
return
}

View File

@ -1,3 +1,58 @@
secubox-toolbox-ng (0.1.14-1~bookworm1) bookworm; urgency=medium
* quic/banner: strip Alt-Svc response header so browsers stop learning/preferring
HTTP/3 (h3) and stay on HTTP/2-over-TCP (MITM-able). Complements the nft
udp443 reject; addresses sites where browsers ignore the reject and keep
retrying QUIC, bypassing inject/adblock/metrics. (ref #662)
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 14:30:00 +0000
secubox-toolbox-ng (0.1.13-1~bookworm1) bookworm; urgency=medium
* banner: INLINE the banner (server-side bundle fetch, baked literals) instead
of <script src>/fetch — defeats site service workers that intercept the
same-origin /__toolbox/* requests (leparisien, cnn). Fail-open. (ref #662)
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 13:15:00 +0000
secubox-toolbox-ng (0.1.12-1~bookworm1) bookworm; urgency=medium
* adlearn: live-reload the blocklist (mtime) so promotions/edits block without
a worker restart; emit ad-candidates (3rd-party ad-path) to the portal;
autolearn also promotes cross-site trackers from social_edges. Learned
trackers are auto-204 + poison-smogged. (ref #662)
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 12:30:00 +0000
secubox-toolbox-ng (0.1.11-1~bookworm1) bookworm; urgency=medium
* social: ALSO correlate on the block path — blocked 3rd-party trackers still
carry the browser's request Cookie (the cross-site evidence); without this
the /social graph misses the very trackers it exists to expose (they're 204'd
before the allow/mitm correlation). resp=nil request-only, hash-only. (ref #662)
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 11:55:00 +0000
secubox-toolbox-ng (0.1.10-1~bookworm1) bookworm; urgency=medium
* social: faithfully port the in-process social_graph correlation — the engine
computes cross-site tracker edges (byte-exact cookie_id_hash, deny-list,
eTLD+1 3rd-party check, CMP consent_state) and relays HASH-ONLY edges
(never raw values, WG-only) to the new portal /__toolbox/social-event →
social.record_edge → /social graph un-frozen. --social-relay (default on). (ref #662)
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 11:30:00 +0000
secubox-toolbox-ng (0.1.9-1~bookworm1) bookworm; urgency=medium
* telemetry: relay per-flow metadata to the analysis sidecars (dpi /classify,
cookies /inject, threat-analyst /ja4) — restoring the kbin "Qui te piste?"
events frozen since the Phase-7 cutover. Fire-and-forget, names-only cookies,
gated --analysis-relay (default on). The sidecars enrich + write toolbox
events → cumulative-stats live again with real host classification. (ref #662)
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 10:40:00 +0000
secubox-toolbox-ng (0.1.8-1~bookworm1) bookworm; urgency=medium
* demo/csp: only relax + flag 🔓 when the page's effective script directive

View File

@ -0,0 +1,61 @@
{
"_comment": "Cross-engine parity fixtures for social.cookie_id_hash (#662). GENERATED by the real secubox_toolbox.social.cookie_id_hash (Python = source of truth); the Go cookieIDHash MUST reproduce every `expect` byte-for-byte. Note: tracker_domain + cookie_name are LOWER-cased before hashing, the cookie_value is NOT; NUL (0x00) separators; UTF-8 with 'replace' errors. See tests/test_social_parity.py (Python) ↔ social_test.go (Go).",
"fixtures": [
{
"tracker_domain": "doubleclick.net",
"cookie_name": "IDE",
"cookie_value": "AHWqTUm123",
"expect": "8e7fadaeb2584768",
"why": "plain ascii"
},
{
"tracker_domain": "DoubleClick.NET",
"cookie_name": "ide",
"cookie_value": "AHWqTUm123",
"expect": "8e7fadaeb2584768",
"why": "domain+name UPPER folded, value verbatim -> identical hash to #1 (proves domain+name are lower-cased)"
},
{
"tracker_domain": "doubleclick.net",
"cookie_name": "IDE",
"cookie_value": "ahwqtum123",
"expect": "550317c9729652c2",
"why": "value lower-cased DIFFERS from #1 (proves the VALUE is NOT folded)"
},
{
"tracker_domain": "ads.example.com",
"cookie_name": "_ga",
"cookie_value": "GA1.2.999.111",
"expect": "89a398ebd72ee863",
"why": "GA cookie"
},
{
"tracker_domain": "tracker.io",
"cookie_name": "uid",
"cookie_value": "Ünîcødé✓",
"expect": "3b4923e9d9bb77a2",
"why": "unicode value (utf-8 encoded)"
},
{
"tracker_domain": "tracker.io",
"cookie_name": "Ünîcödé",
"cookie_value": "val",
"expect": "d4db5a0d71216313",
"why": "unicode cookie NAME (lower-cased + utf-8)"
},
{
"tracker_domain": "",
"cookie_name": "x",
"cookie_value": "y",
"expect": "2081f4f26135019e",
"why": "empty domain still hashes (NUL separators)"
},
{
"tracker_domain": "d.net",
"cookie_name": "n",
"cookie_value": "",
"expect": "b0da6b889cb198a1",
"why": "empty value"
}
]
}

View File

@ -1,3 +1,19 @@
secubox-toolbox (2.7.0-1~bookworm1) bookworm; urgency=medium
* MIDDLE RELEASE — caps the 2.6.x line (ad-intelligence / Anti-Track v2 /
anti-bot uTLS) and opens the kbin "first tool of the Swiss-army cyber kit"
chapter. kbin now delivers: transparent performance, full-MITM encrypted
inspection, ad poison/smog injection, the adware-ban transparency banner,
and safe browsing.
* docs: kbin use-case consolidated — wiki `Kbin-Toolbox.md`, `FAQ-KBIN-TOR.md`,
README positioning blurb.
* plan(#683): next chapter staged — kbin Tor endpoint, a quick-switch that
re-routes consenting client surfing through Tor (outbound egress, pseudo-
network) so the kbin exit is anonymized. Design spec landed; no behaviour
change yet (default OFF, fail-closed by design).
-- Gerald KERMA <devel@cybermind.fr> Fri, 19 Jun 2026 11:00:00 +0200
secubox-toolbox (2.6.59-1~bookworm1) bookworm; urgency=medium
* ui: cap all admin dashboard lists to top-5 shown — #filtres bypass hosts,

View File

@ -51,13 +51,19 @@ table inet wg-toolbox {
chain forward {
type filter hook forward priority filter; policy accept;
# Phase 6.K / #662 — drop UDP 443 (QUIC/HTTP3) FIRST, before the blanket
# outbound accept below. If it sits AFTER the accept it is never reached
# (the accept terminates evaluation) → QUIC slips through and the whole
# MITM is bypassed (no inject, no ad-block, no metrics, no social). The
# REJECT (not drop) forces Chrome/Firefox to fall back to HTTP/2 over TCP
# IMMEDIATELY: a silent drop just makes the browser RETRY QUIC for tens of
# seconds (observed 199 retry packets, never falling back) — an ICMP
# port-unreachable tells it "no QUIC here" at once. First in the chain so
# it also breaks existing QUIC sessions (outbound). ORDER IS LOAD-BEARING.
iif "wg-toolbox" udp dport 443 counter reject
# Outbound from tunnel → internet
iif "wg-toolbox" oif "lan0" accept
# Return traffic
iif "lan0" oif "wg-toolbox" ct state established,related accept
# Phase 6.K — drop UDP 443 (QUIC/HTTP3) so browsers fall back to
# HTTP/2 over TCP, which our DNAT can intercept. Without this,
# Chrome/Firefox prefer QUIC and bypass mitm entirely.
iif "wg-toolbox" udp dport 443 counter drop
}
}

View File

@ -221,6 +221,92 @@ def _ad_feed() -> int:
return len(promoted)
# #662 — cross-site-reuse promotion. A tracker_domain seen issuing cookies on
# >= SOCIAL_MIN_SITES DISTINCT src_site (across peers, recent window) is a
# BEHAVIOURALLY-confirmed cross-site tracker (the social graph), independent of
# the ad-path heuristic. Promote it into learned-trackers.txt so the engine
# blocks (204) + smogs it. Conservative + reuses the SAME allowlist/self guard as
# _ad_feed (NEVER promote allowlisted or self domains). De-dups against OUT.
SOCIAL_MIN_SITES = int(os.environ.get("SECUBOX_SOCIAL_MIN_SITES", "3"))
SOCIAL_WINDOW_HOURS = int(os.environ.get("SECUBOX_SOCIAL_WINDOW_HOURS", "168"))
def _social_feed() -> int:
"""Promote cross-site cookie-reuse trackers (social_edges) into the learned
blocklist. A tracker_domain linking >= SOCIAL_MIN_SITES distinct src_site in
the last SOCIAL_WINDOW_HOURS is promoted. Allowlist + self domains excluded
(reused guard). MERGES into OUT (never overwrites). Returns count promoted, or
-1 if unavailable (e.g. no social_edges table). Best-effort: never raises."""
cutoff = int(time.time()) - SOCIAL_WINDOW_HOURS * 3600
try:
con = sqlite3.connect(DB, timeout=5)
rows = con.execute(
"SELECT tracker_domain, COUNT(DISTINCT src_site) AS sites "
"FROM social_edges WHERE ts >= ? "
"GROUP BY tracker_domain", (cutoff,)).fetchall()
con.close()
except Exception as e:
sys.stderr.write(f"autolearn: social query failed: {e}\n")
return -1
# Fold to registrable and aggregate the distinct-site count per eTLD+1 (two
# tracker subdomains of the same registrable jointly meet the threshold).
by_reg: dict[str, set] = {}
try:
scon = sqlite3.connect(DB, timeout=5)
for td, _sites in rows:
reg = registrable(td)
if not reg:
continue
ss = by_reg.setdefault(reg, set())
for (s,) in scon.execute(
"SELECT DISTINCT src_site FROM social_edges "
"WHERE ts >= ? AND tracker_domain = ?", (cutoff, td)):
if s:
ss.add(s)
scon.close()
except Exception as e:
sys.stderr.write(f"autolearn: social fold failed: {e}\n")
return -1
allow = _load_ad_allowlist()
self_doms = {d.strip().lower() for d in
os.environ.get("SECUBOX_SELF_DOMAINS", "secubox.in").split(",")
if d.strip()}
promoted: set = set()
for reg, sites in by_reg.items():
if len(sites) < SOCIAL_MIN_SITES:
continue
if reg in allow:
continue
if reg in self_doms or any(reg == d or reg.endswith("." + d) for d in self_doms):
continue
promoted.add(reg)
if not promoted:
return 0
existing: set = set()
try:
if os.path.exists(OUT):
with open(OUT, encoding="utf-8") as fh:
for ln in fh:
ln = ln.strip()
if ln:
existing.add(ln)
except Exception as e:
sys.stderr.write(f"autolearn: social merge read failed: {e}\n")
new = promoted - existing
merged = sorted(existing | promoted)[:MAX_ENTRIES]
try:
os.makedirs(os.path.dirname(OUT), exist_ok=True)
tmp = OUT + ".tmp"
with open(tmp, "w", encoding="utf-8") as fh:
fh.write("\n".join(merged) + ("\n" if merged else ""))
os.replace(tmp, OUT)
except Exception as e:
sys.stderr.write(f"autolearn: social write failed: {e}\n")
return -1
return len(new)
def main() -> int:
learned: set[str] = set()
try:
@ -317,6 +403,11 @@ def main() -> int:
sys.stderr.write(f"autolearn: {_n_ad} ad-candidate hosts promoted\n")
except Exception as e:
sys.stderr.write(f"autolearn: ad feed error: {e}\n")
try:
_n_social = _social_feed()
sys.stderr.write(f"autolearn: {_n_social} cross-site cookie trackers promoted\n")
except Exception as e:
sys.stderr.write(f"autolearn: social feed error: {e}\n")
sys.stderr.write(
f"autolearn: {len(out)} hosts learned ({ti} threat-intel + "
f"{len(out) - ti} classified cross-site) @ {int(time.time())}"

View File

@ -57,10 +57,14 @@ router = APIRouter(tags=["toolbox"])
@router.get("/__toolbox/loader.js")
async def toolbox_loader_js() -> Response:
"""Static cosmetic loader (applies the banner client-side from the bundle)."""
# no-store: the loader is the banner entry point and evolves (SPA re-assert,
# CSP proof, …). A long cache (was max-age=3600) pins stale loaders in clients
# for up to an hour — so loader changes never reach already-visited sites. It's
# 4 KB; serve it fresh every load so updates propagate immediately.
return Response(
content=bundlemod.LOADER_JS,
media_type="application/javascript",
headers={"Cache-Control": "public, max-age=3600"},
headers={"Cache-Control": "no-store, no-cache, must-revalidate, max-age=0"},
)
@ -74,6 +78,31 @@ async def toolbox_bundle(mh: str = Query(default=""), wg: int = Query(default=0)
)
@router.get("/__toolbox/inline")
async def toolbox_inline(
mh: str = Query(default=""),
wg: int = Query(default=0),
csp: int = Query(default=0),
) -> Response:
"""#662 — COMPLETE self-contained inline banner script BODY.
Sites with a SERVICE WORKER (leparisien, cnn) intercept every same-origin
request, so the legacy ``<script src="/__toolbox/loader.js">`` + its
``fetch("/__toolbox/bundle")`` are hijacked by the SW (404 / app-shell)
before reaching our MITM engine no banner. The Go engine fetches THIS
body server-side at inject time and bakes it into a self-contained
``<script></script>`` no same-origin fetch for the SW to touch.
``mh`` / ``wg`` / ``csp`` come from the query params (baked as JS literals,
not data-attrs / currentScript); the bundle is ``get_bundle(mh, wg)`` baked
as a JSON literal (not fetched). no-store like the loader (it evolves)."""
return Response(
content=bundlemod.inline_script(mh, bool(wg), bool(csp)),
media_type="application/javascript",
headers={"Cache-Control": "no-store, no-cache, must-revalidate, max-age=0"},
)
# #662 — ad-block metrics ingest from the Go MITM engine (sbxmitm). The #662
# cutover moved the BLOCK decision (204 on ad/tracker hosts) into the Go engine
# but left the METRICS unported, so the #ads dashboard froze. The engine now
@ -109,12 +138,20 @@ async def toolbox_ad_event(request: Request) -> Response:
return Response(status_code=204)
blocks = body.get("blocks") or []
clients = body.get("clients") or []
# #662 — the Go engine now also feeds the AUTO-LEARN loop: 3rd-party
# ad-path requests it saw on the allow/mitm path (ad_ghost's _AD_PATH
# heuristic), recorded as candidates here for secubox-toolbox-autolearn
# to promote into learned-trackers.txt at AD_MIN_SITES distinct sites.
candidates = body.get("candidates") or []
if not isinstance(blocks, list):
blocks = []
if not isinstance(clients, list):
clients = []
if not isinstance(candidates, list):
candidates = []
blocks = blocks[:_AD_EVENT_ROW_CAP]
clients = clients[:_AD_EVENT_ROW_CAP]
candidates = candidates[:_AD_EVENT_ROW_CAP]
block_rows = [
(b["ad_host"], b.get("site", ""), "block", int(b.get("hits", 0)), int(b.get("bytes", 0)))
@ -126,14 +163,102 @@ async def toolbox_ad_event(request: Request) -> Response:
for c in clients
if isinstance(c, dict) and c.get("mac_hash") and c.get("ad_host")
]
cand_rows = [
(c["host"], c.get("site", ""), int(c.get("hits", 0)))
for c in candidates
if isinstance(c, dict) and c.get("host")
]
if block_rows:
store.record_ad_blocks(block_rows)
if client_rows:
store.record_ad_client_blocks(client_rows)
if cand_rows:
store.record_ad_candidates(cand_rows)
except Exception as e: # never raise into the engine's fire-and-forget POST
log.debug("ad-event ingest failed: %s", e)
return Response(status_code=204)
# #662 — cross-site cookie-tracker edge ingest from the Go MITM engine (sbxmitm).
# The #662 Phase-7 cutover decommissioned the in-process Python social_graph addon
# that fed social.record_edge(), so the kbin /social graph (social_edges →
# social_nodes/social_links) froze. The engine now computes the SAME 3rd-party
# cookie-tracker edges (FAITHFUL port of social_graph.py: deny-list, eTLD+1
# 3rd-party check, cookie_id_hash, CMP consent_state) and POSTs a batch here. We
# call social.record_edge() per row, which writes raw social_edges; the existing
# app.py social_fold_loop folds them into nodes/links.
#
# Raw cookie VALUES never reach this endpoint — only the truncated cookie_id_hash
# (privacy/CSPN; this is exactly why the original ran in-process).
#
# UNAUTHENTICATED, same trust note as /__toolbox/ad-event: the engine reaches the
# portal only over the R3 nft perimeter (loopback / WG ingress).
_SOCIAL_EVENT_ROW_CAP = 5000 # bound the edge list so a misbehaving engine can't flood us
_SOCIAL_FOLD_DEBOUNCE = 60 # seconds: floor between in-handler safety folds
_social_last_fold = 0.0 # module-level throttle timestamp
@router.post("/__toolbox/social-event")
async def toolbox_social_event(request: Request) -> Response:
"""Ingest a batch of cross-site tracker edges from the Go engine. Best-effort:
never 500s the engine (it is fire-and-forget) always returns 204. See the
trust note above for why this is unauthenticated."""
global _social_last_fold
try:
# Body-size guard BEFORE parsing (mirrors /__toolbox/ad-event): the legit
# payload (≤5000 edges) is well under 2 MB; reject larger outright so a
# misbehaving/compromised WG peer can't pressure portal memory.
try:
clen = int(request.headers.get("content-length") or 0)
except (TypeError, ValueError):
clen = 0
if clen > 2 * 1024 * 1024:
return Response(status_code=204)
body = await request.json()
if not isinstance(body, dict):
return Response(status_code=204)
edges = body.get("edges") or []
if not isinstance(edges, list):
edges = []
edges = edges[:_SOCIAL_EVENT_ROW_CAP]
from . import social as _social
recorded = 0
for e in edges:
if not isinstance(e, dict):
continue
try:
_social.record_edge(
client_mac_hash=e.get("client_mac_hash") or "",
src_site=e.get("src_site") or "",
tracker_domain=e.get("tracker_domain") or "",
cookie_id_hash_val=e.get("cookie_id_hash_val") or "",
ja4_hash=e.get("ja4_hash") or None,
consent_state=e.get("consent_state") or "none_seen",
)
recorded += 1
except Exception as row_err: # one bad row never fails the batch
log.debug("social-event row failed: %s", row_err)
# Safety fold: the app.py social_fold_loop already folds every 5 min, but
# fold here too (debounced to ≤ once / 60 s via a module-level timestamp)
# so a freshly-ingested edge surfaces in the d3 graph promptly even between
# loop ticks. Cheap (indexed window scan) and self-throttling; a fold
# failure is swallowed (the loop will catch up).
if recorded:
now = time.time()
if now - _social_last_fold >= _SOCIAL_FOLD_DEBOUNCE:
_social_last_fold = now
try:
_social.fold_recent(window_seconds=600)
except Exception as fold_err:
log.debug("social-event fold failed: %s", fold_err)
except Exception as e: # never raise into the engine's fire-and-forget POST
log.debug("social-event ingest failed: %s", e)
return Response(status_code=204)
# Cap geo/UA enrichment on /admin/clients/rich to the rows the UI actually shows
# (top-5 + headroom). Beyond this, clients get bare fields — avoids ~51 cached
# geo lookups per poll (ref #644).
@ -2994,6 +3119,14 @@ async def admin_clients_rich() -> dict:
# Use module-level imports so monkeypatching in tests works correctly.
_av = avatar_analysis
_geo = geo
# Phase 6 (#662) : map each WG client to its REAL external (pre-tunnel)
# endpoint IP so the flag reflects the client's true origin country, not
# the internal 10.99.1.x (which GeoIPs to nothing). Best-effort, cached.
try:
from . import wg as _wg
_wg_eps = _wg.wg_endpoints()
except Exception:
_wg_eps = {}
rows = store.list_clients()
rows = sorted(rows, key=lambda r: (r.get("last_seen") or 0), reverse=True)
now = _t.time()
@ -3030,7 +3163,13 @@ async def admin_clients_rich() -> dict:
except Exception:
pass
try:
gi = _geo.lookup(r.get("ip") or "")
# PRIVACY : the external endpoint IP is used transiently for the
# GeoIP lookup ONLY — it is NEVER stored or returned in the API
# response. The appliance is privacy-focused: country-granularity
# only (flag / ISO), never the raw client origin IP. Fall back to
# the stored (internal) IP for non-WG / captive clients.
geo_key = _wg_eps.get(r.get("mac_hash") or "") or (r.get("ip") or "")
gi = _geo.lookup(geo_key)
flag = gi.get("flag", "") or ""
country_iso = gi.get("country_iso", "") or ""
asn_org = gi.get("asn_org", "") or ""

View File

@ -103,26 +103,31 @@ def get_bundle(client_id: str, is_wg: bool = False) -> dict:
"tracker_patterns": TRACKER_PATTERNS, "ts": int(time.time())}
# Cosmetic client-side loader. Served static + cached; applies the transparency
# banner from the bundle off the page's critical render path. Per-page stats
# (trackers, cookies) are derived in-browser (Resource Timing / document.cookie),
# so the proxy never scans the body. Self-guarded, dismissible, fail-silent.
LOADER_JS = r"""(function(){
"use strict";
if (window.__SBX_LOADER__) return; window.__SBX_LOADER__ = 1;
var s = document.currentScript || {};
var ds = s.dataset || {};
var mh = ds.mh || "", wg = ds.wg || "0";
// #662 CONSENTED-DEMONSTRATION: the engine relaxed this page's CSP so this
// loader could run even under a strict policy, and stamped data-csp="1" on our
// <script>. When set, the banner shows a 🔓 as VISIBLE proof the page's CSP was
// bypassed to inject. Absent no proof emoji (page had no CSP to bypass).
var csp = ds.csp || "";
// SPA support (#662): cache the bundle + remember an explicit dismiss, so the
// banner can be re-asserted after client-side navigation / DOM re-renders
// (cnn, youtube swap content without reloading the one-shot loader would
// otherwise vanish). Re-assert never fights a user who clicked .
var bundle = null, dismissed = false;
# ── shared banner JS body (#662) ─────────────────────────────────────────────
#
# The render + SPA-re-assert + dismiss + countTrackers + 🔓 cspProof logic is
# IDENTICAL between the legacy src-loader (LOADER_JS, fetched as
# /__toolbox/loader.js → fetch()es /__toolbox/bundle) and the new INLINE banner
# (inline_script(), baked into the page by the Go engine at inject time). To
# avoid drift, that logic lives ONCE in _BANNER_CORE; each caller differs only in
# its PRELUDE — how `bundle`, `mh`, `wg`, `csp`, `dismissed` are obtained:
#
# * LOADER_JS → reads data-mh/data-wg/data-csp off document.currentScript and
# fetch()es the bundle (legacy; kept working for the
# /__toolbox/loader.js route).
# * inline → mh/wg/csp/bundle are baked as JS LITERALS (no currentScript,
# no fetch) so a site's SERVICE WORKER has nothing same-origin to
# hijack (leparisien, cnn… run a SW that 404s our assets).
#
# _BANNER_CORE assumes `mh`, `wg`, `csp`, `bundle`, `dismissed` are already
# declared by the prelude and runs render/SPA off them.
# render + SPA-re-assert + dismiss + countTrackers + 🔓 cspProof. Shared verbatim
# by both preludes. References `mh`, `wg`, `csp`, `bundle`, `dismissed` from the
# enclosing prelude scope. Defines ensure() + installs the history/popstate hooks
# + 2s poll; the prelude calls ensure() (inline) or sets `bundle` then ensure()s
# (src-loader).
_BANNER_CORE = r"""
function ready(fn){ if (document.body) { fn(); } else { setTimeout(function(){ready(fn);}, 30); } }
function esc(t){ return String(t).replace(/[&<>"]/g, function(c){
return {"&":"&amp;","<":"&lt;",">":"&gt;","\"":"&quot;"}[c]; }); }
@ -168,10 +173,6 @@ LOADER_JS = r"""(function(){
// ensure(): (re)render the banner if it's absent and the bundle is loaded and
// the user hasn't dismissed it. Cheap (a getElementById guard inside render).
function ensure(){ if (bundle && !dismissed) ready(function(){ render(bundle); }); }
fetch("/__toolbox/bundle?mh=" + encodeURIComponent(mh) + "&wg=" + encodeURIComponent(wg), {credentials:"omit"})
.then(function(r){ return r.json(); })
.then(function(b){ bundle = b; ensure(); })
.catch(function(){});
// SPA re-assert: wrap history nav + popstate (defer so the framework settles),
// plus a light 2s poll as a catch-all for DOM re-renders that drop the banner.
["pushState","replaceState"].forEach(function(m){
@ -182,5 +183,93 @@ LOADER_JS = r"""(function(){
});
window.addEventListener("popstate", function(){ setTimeout(ensure, 150); });
setInterval(ensure, 2000);
"""
def _js_str(value: str) -> str:
"""JS string LITERAL for an arbitrary string. json.dumps yields a valid JS
string; we additionally escape ``</`` ``<\\/`` so a value can never close
the surrounding inline <script> (e.g. a value of "</script>")."""
return json.dumps(value).replace("</", "<\\/")
def _js_json(obj) -> str:
"""JS object LITERAL for a JSON-serialisable object, hardened against a
``</script>`` breakout: json.dumps is valid JS, and escaping ``</`` ``<\\/``
means no nested string (pin, report_url) can terminate the inline script."""
return json.dumps(obj, ensure_ascii=False).replace("</", "<\\/")
def inline_script(mh: str, wg: bool, csp: bool) -> str:
"""Build the COMPLETE self-contained inline banner script BODY (#662).
Service-worker survival: sites like leparisien / cnn register a SW that
intercepts every same-origin request so the legacy
``<script src="/__toolbox/loader.js">`` + its ``fetch("/__toolbox/bundle")``
are hijacked by the SW (404 / app-shell) before reaching our MITM engine, and
the banner never appears. The fix is to bake EVERYTHING as JS literals so the
inline script makes NO same-origin request the SW can touch:
* ``bundle`` is ``get_bundle(mh, wg)`` baked as a JSON literal (not fetched),
* ``mh`` / ``wg`` / ``csp`` are baked as string literals (NOT data-attrs /
currentScript the null-currentScript-in-async bug killed #653),
* NO ``document.currentScript``, NO ``fetch()``.
Returns an IIFE string suitable for ``<script></script>``. The single-run
guard (``window.__SBX_LOADER__``), the ``#sbx-banner`` element-id guard, the
dismissed flag, the history pushState/replaceState/popstate hooks + 2s poll,
and the 🔓 proof when ``csp`` is set are all preserved (from _BANNER_CORE).
"""
bundle_obj = get_bundle(mh, bool(wg))
prelude = (
"(function(){\n"
' "use strict";\n'
" if (window.__SBX_LOADER__) return; window.__SBX_LOADER__ = 1;\n"
# Baked literals — no currentScript / dataset, no fetch (SW-immune).
" var mh = " + _js_str(mh or "") + ";\n"
" var wg = " + _js_str("1" if wg else "0") + ";\n"
# csp=="1" → the engine relaxed a real CSP to inject; render the 🔓 proof.
" var csp = " + _js_str("1" if csp else "0") + ";\n"
" var bundle = " + _js_json(bundle_obj) + ";\n"
" var dismissed = false;\n"
)
# Inline path renders on the first tick — the bundle is already present (no
# async fetch to wait on), so ensure() can run immediately.
return prelude + _BANNER_CORE + " ensure();\n})();"
# Cosmetic client-side loader. Served static + cached; applies the transparency
# banner from the bundle off the page's critical render path. Per-page stats
# (trackers, cookies) are derived in-browser (Resource Timing / document.cookie),
# so the proxy never scans the body. Self-guarded, dismissible, fail-silent.
#
# Legacy src-loader (#620): kept working for the /__toolbox/loader.js route. The
# INLINE path (inline_script) supersedes it in the live engine inject path because
# a site service-worker hijacks the same-origin src + fetch (#662).
_LOADER_PRELUDE = r"""(function(){
"use strict";
if (window.__SBX_LOADER__) return; window.__SBX_LOADER__ = 1;
var s = document.currentScript || {};
var ds = s.dataset || {};
var mh = ds.mh || "", wg = ds.wg || "0";
// #662 CONSENTED-DEMONSTRATION: the engine relaxed this page's CSP so this
// loader could run even under a strict policy, and stamped data-csp="1" on our
// <script>. When set, the banner shows a 🔓 as VISIBLE proof the page's CSP was
// bypassed to inject. Absent no proof emoji (page had no CSP to bypass).
var csp = ds.csp || "";
// SPA support (#662): cache the bundle + remember an explicit dismiss, so the
// banner can be re-asserted after client-side navigation / DOM re-renders
// (cnn, youtube swap content without reloading the one-shot loader would
// otherwise vanish). Re-assert never fights a user who clicked .
var bundle = null, dismissed = false;
"""
# The legacy src-loader fetches the bundle (same-origin), then ensure()s. The
# render + SPA logic is the SAME _BANNER_CORE the inline path uses (no drift).
LOADER_JS = _LOADER_PRELUDE + _BANNER_CORE + r"""
fetch("/__toolbox/bundle?mh=" + encodeURIComponent(mh) + "&wg=" + encodeURIComponent(wg), {credentials:"omit"})
.then(function(r){ return r.json(); })
.then(function(b){ bundle = b; ensure(); })
.catch(function(){});
})();
"""

View File

@ -32,11 +32,95 @@ DB = Path("/var/lib/secubox/toolbox/toolbox.db")
CACHE_FILE = Path("/var/lib/secubox/toolbox/cumulative-cache.json")
CACHE_TTL_SECONDS = 60 # refresh every minute
# Live analysis-module event stores (post-#662 Phase-7 cutover). The legacy
# toolbox.db `events` table froze at the cutover; the live counts + hosts now
# live in each analysis module, exposed over its own unix socket.
# GET /mitm-events/stats?since_seconds=N -> {"kind":..,"count":n,...}
# GET /mitm-events?limit=N -> {"events":[{...payload...}],"count":n}
_MITM_MODULES = [
("dpi", "/run/secubox/dpi.sock"),
("cookies", "/run/secubox/cookies.sock"),
("ja4", "/run/secubox/threat-analyst.sock"),
]
# dpi socket is the one carrying host/sni payloads for top-hosts aggregation.
_DPI_SOCK = "/run/secubox/dpi.sock"
def _now() -> int:
return int(time.time())
def _uds_get_json(sock_path: str, path: str, timeout: int = 2) -> dict | None:
"""GET a JSON document over a unix socket. Returns the parsed dict, or
None on any error (never raises). Mirrors api._pull_mitm_module_events's
UDSConnection pattern."""
import socket as _sock
import http.client as _hc
try:
class UDSConnection(_hc.HTTPConnection):
def connect(self):
self.sock = _sock.socket(_sock.AF_UNIX, _sock.SOCK_STREAM)
self.sock.settimeout(self.timeout)
self.sock.connect(sock_path)
conn = UDSConnection("localhost", timeout=timeout)
try:
conn.request("GET", path)
resp = conn.getresponse()
if resp.status != 200:
return None
raw = resp.read().decode("utf-8", errors="ignore")[:1000000]
return json.loads(raw)
finally:
conn.close()
except Exception as e:
log.debug("uds get %s%s failed: %s", sock_path, path, e)
return None
def _live_event_counts(window_seconds: int) -> dict | None:
"""Query each analysis module's GET /mitm-events/stats for its event count
in the window. Returns {"dpi":n,"cookies":n,"ja4":n} (missing/error module
omitted). Returns None only if EVERY module call failed (caller falls back
to the legacy toolbox.db query)."""
counts: dict[str, int] = {}
any_ok = False
for kind, sock_path in _MITM_MODULES:
data = _uds_get_json(
sock_path, f"/mitm-events/stats?since_seconds={int(window_seconds)}"
)
if data is None:
continue
any_ok = True
# Prefer the module's self-reported kind; fall back to our tag.
k = data.get("kind") or kind
try:
counts[k] = int(data.get("count", 0))
except (TypeError, ValueError):
counts[k] = 0
return counts if any_ok else None
def _live_top_hosts(limit: int = 5000, top: int = 25) -> list | None:
"""Aggregate top hosts from the dpi module's recent events. Returns a list
of {"host":..,"count":..} (same shape as the legacy top_hosts_7d), or None
if the dpi module call failed."""
data = _uds_get_json(_DPI_SOCK, f"/mitm-events?limit={int(limit)}")
if data is None:
return None
host_counter: Counter = Counter()
for ev in data.get("events", []) or []:
try:
p = ev.get("payload") or {}
h = p.get("host") or p.get("sni")
if h:
host_counter[h] += 1
except Exception:
pass
return [{"host": h, "count": n} for h, n in host_counter.most_common(top)]
def _safe_query(db, sql: str, params: tuple = ()) -> list:
try:
cur = db.execute(sql, params)
@ -74,30 +158,48 @@ def compute() -> dict:
out["sessions"]["all_time"] = (_safe_query(c,
"SELECT COUNT(DISTINCT mac_hash) FROM clients") or [(0,)])[0][0]
# Event counts by source (last 7 days for relevance)
for row in _safe_query(c,
"SELECT source, COUNT(*) as n FROM events WHERE ts > ? GROUP BY source",
(d7d,)):
out["events"][row["source"]] = row["n"]
out["events"]["total_7d"] = sum(out["events"].values())
# Event counts by source (last 7 days for relevance).
# Post-#662 Phase-7: the live counts live in the analysis modules'
# own stores (queried over unix sockets). The legacy toolbox.db
# `events` table froze at the cutover, so prefer the live path and
# only fall back to the frozen table if EVERY module call fails.
live_counts = _live_event_counts(86400 * 7)
if live_counts is not None:
out["events"].update(live_counts)
else:
for row in _safe_query(c,
"SELECT source, COUNT(*) as n FROM events WHERE ts > ? GROUP BY source",
(d7d,)):
out["events"][row["source"]] = row["n"]
out["events"]["total_7d"] = sum(
v for v in out["events"].values() if isinstance(v, int)
)
# Top hosts (anonymized — just hostnames, no mac_hash)
host_counter = Counter()
for row in _safe_query(c,
"SELECT payload FROM events WHERE source='dpi' AND ts > ? LIMIT 5000",
(d7d,)):
try:
p = json.loads(row["payload"])
h = p.get("host") or p.get("sni")
if h:
host_counter[h] += 1
except Exception:
pass
out["top_hosts_7d"] = [
{"host": h, "count": n}
for h, n in host_counter.most_common(15)
]
# Top hosts (anonymized — just hostnames, no mac_hash).
# Live path: aggregate the dpi module's recent events; fall back to
# the frozen toolbox.db `events` table only if the dpi call fails.
live_hosts = _live_top_hosts()
if live_hosts is not None:
out["top_hosts_7d"] = live_hosts
else:
host_counter = Counter()
for row in _safe_query(c,
"SELECT payload FROM events WHERE source='dpi' AND ts > ? LIMIT 5000",
(d7d,)):
try:
p = json.loads(row["payload"])
h = p.get("host") or p.get("sni")
if h:
host_counter[h] += 1
except Exception:
pass
out["top_hosts_7d"] = [
{"host": h, "count": n}
for h, n in host_counter.most_common(15)
]
# Risk score / level distributions read the `clients` table (not
# the frozen `events` table), so they stay on toolbox.db for now.
# Risk score distribution (last 7d)
score_buckets = {"low": 0, "medium": 0, "high": 0}
for row in _safe_query(c,

View File

@ -233,3 +233,84 @@ def revoke_client(client_pubkey: str) -> bool:
def _now_ts() -> float:
import time
return time.time()
# Phase 6 (#662) : map each WG peer to its REAL external (pre-tunnel) endpoint IP
# so the admin client table can show the client's true origin country flag —
# the stored client IP is the internal 10.99.1.x which GeoIPs to nothing.
import hashlib as _hashlib
import ipaddress as _ipaddress
_ENDPOINTS_CACHE: dict[str, str] = {}
_ENDPOINTS_TS: float = 0.0
_ENDPOINTS_TTL = 30.0 # endpoints change rarely; don't shell out per request/row
def _is_private_or_loopback(ip: str) -> bool:
"""True for RFC1918 / loopback / link-local / ULA — non-routable, no
meaningful country (a client on the local LAN has no public geo)."""
try:
a = _ipaddress.ip_address(ip)
except ValueError:
return True
return (
a.is_private # 10/8, 172.16/12, 192.168/16, fc00::/7
or a.is_loopback # 127/8, ::1
or a.is_link_local # 169.254/16, fe80::/10
or a.is_unspecified
)
def _strip_endpoint_port(endpoint: str) -> str | None:
"""`IP:port` or `[IPv6]:port` → bare IP. None for `(none)` / malformed."""
ep = (endpoint or "").strip()
if not ep or ep == "(none)":
return None
if ep.startswith("["): # IPv6 literal: [2001:db8::1]:51820
host = ep[1:].split("]", 1)[0]
return host or None
# IPv4 (or bare host): split off the last :port
return ep.rsplit(":", 1)[0] or None
def wg_endpoints() -> dict[str, str]:
"""Return {mac_hash: external_ip} for every WG peer with a real, routable
endpoint, derived from `wg show wg-toolbox dump`.
mac_hash = sha256(pubkey)[:16] the SAME derivation used when the peer is
registered (api.wg_profile_new). The external IP is the peer's pre-tunnel
endpoint, i.e. its true public origin. RFC1918 / loopback / link-local
endpoints and `(none)` are skipped (no meaningful country).
Best-effort : empty dict on any error or if `wg` is missing. Cached ~30s.
"""
global _ENDPOINTS_CACHE, _ENDPOINTS_TS
now = _now_ts()
if _ENDPOINTS_CACHE and (now - _ENDPOINTS_TS) < _ENDPOINTS_TTL:
return _ENDPOINTS_CACHE
out: dict[str, str] = {}
try:
proc = subprocess.run(
["wg", "show", WG_INTERFACE, "dump"],
capture_output=True, text=True, timeout=2, check=False,
)
lines = proc.stdout.splitlines()
# First line is the interface (privkey, pubkey, port, fwmark) — skip it.
# Peer lines: pubkey presharedkey endpoint allowed-ips ...
for line in lines[1:]:
fields = line.split("\t")
if len(fields) < 3:
continue
pubkey = fields[0].strip()
ip = _strip_endpoint_port(fields[2])
if not pubkey or not ip or _is_private_or_loopback(ip):
continue
mac_hash = _hashlib.sha256(pubkey.encode()).hexdigest()[:16]
out[mac_hash] = ip
except Exception as e: # missing wg, timeout, permission, parse error
log.debug("wg_endpoints unavailable: %s", e)
return _ENDPOINTS_CACHE or {}
_ENDPOINTS_CACHE = out
_ENDPOINTS_TS = now
return out

View File

@ -0,0 +1,68 @@
# tests/test_ad_event_candidates.py
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
"""#662 — /__toolbox/ad-event accepts a "candidates" list (the Go engine's
auto-learn feed) store.record_ad_candidates(). Never 500s the engine."""
import asyncio
import json
from secubox_toolbox import api, store
class _FakeRequest:
"""Minimal Request stand-in: headers + an async json() body."""
def __init__(self, body: dict, content_length=None):
self._body = body
cl = content_length
if cl is None:
cl = len(json.dumps(body).encode())
self.headers = {"content-length": str(cl)}
async def json(self):
return self._body
def test_candidates_ingested(monkeypatch):
captured = {}
monkeypatch.setattr(store, "record_ad_candidates", lambda rows: captured.setdefault("cand", list(rows)))
monkeypatch.setattr(store, "record_ad_blocks", lambda rows: None)
monkeypatch.setattr(store, "record_ad_client_blocks", lambda rows: None)
body = {
"blocks": [],
"clients": [],
"candidates": [
{"host": "metrics.acotedemoi.com", "site": "lemonde.fr", "hits": 3},
{"host": "ads.foo.io", "site": "news.example", "hits": 1},
{"site": "no-host.example", "hits": 9}, # missing host → skipped
{"host": "", "site": "x", "hits": 2}, # empty host → skipped
],
}
resp = asyncio.run(api.toolbox_ad_event(_FakeRequest(body)))
assert resp.status_code == 204
rows = captured.get("cand")
assert rows == [
("metrics.acotedemoi.com", "lemonde.fr", 3),
("ads.foo.io", "news.example", 1),
]
def test_candidates_absent_is_noop(monkeypatch):
called = {"cand": False}
monkeypatch.setattr(store, "record_ad_candidates", lambda rows: called.__setitem__("cand", True))
monkeypatch.setattr(store, "record_ad_blocks", lambda rows: None)
monkeypatch.setattr(store, "record_ad_client_blocks", lambda rows: None)
resp = asyncio.run(api.toolbox_ad_event(_FakeRequest({"blocks": [], "clients": []})))
assert resp.status_code == 204
assert called["cand"] is False # no candidates key → record_ad_candidates not called
def test_candidates_bad_payload_never_500s(monkeypatch):
monkeypatch.setattr(store, "record_ad_candidates", lambda rows: (_ for _ in ()).throw(RuntimeError("boom")))
monkeypatch.setattr(store, "record_ad_blocks", lambda rows: None)
monkeypatch.setattr(store, "record_ad_client_blocks", lambda rows: None)
body = {"candidates": [{"host": "x.io", "site": "s", "hits": 1}]}
resp = asyncio.run(api.toolbox_ad_event(_FakeRequest(body)))
assert resp.status_code == 204 # store raised, but the endpoint swallows it

View File

@ -0,0 +1,98 @@
# tests/test_autolearn_socialfeed.py
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
"""#662 — cross-site-reuse promotion: a tracker_domain seen on >= N distinct
src_site across recent social_edges is a behaviourally-confirmed cross-site
tracker and gets promoted into learned-trackers.txt. Allowlist + self guard
reused from _ad_feed; merges (never overwrites)."""
import sqlite3
import importlib.util
import pathlib
import time
def _load_autolearn():
p = pathlib.Path(__file__).resolve().parents[1] / "sbin" / "secubox-toolbox-autolearn"
spec = importlib.util.spec_from_loader("autolearn", loader=None)
mod = importlib.util.module_from_spec(spec)
exec(compile(p.read_text(), str(p), "exec"), mod.__dict__)
return mod
def _mk_db(db):
con = sqlite3.connect(db)
con.executescript(
"CREATE TABLE social_edges("
" id INTEGER PRIMARY KEY AUTOINCREMENT, ts INTEGER NOT NULL,"
" client_mac_hash TEXT, src_site TEXT NOT NULL,"
" tracker_domain TEXT NOT NULL, cookie_id_hash TEXT,"
" ja4_hash TEXT, consent_state TEXT DEFAULT 'none_seen');")
return con
def test_social_feed_promotes_cross_site_tracker(tmp_path, monkeypatch):
db = tmp_path / "t.db"
con = _mk_db(db)
now = int(time.time())
rows = [
# tracker.io: 3 distinct src_sites (>= SOCIAL_MIN_SITES=3) → promote
(now, "m1", "cnn.com", "tracker.io"),
(now, "m1", "bbc.com", "tracker.io"),
(now, "m2", "lemonde.fr", "tracker.io"),
# twosite.net: only 2 distinct sites → NOT promoted
(now, "m1", "cnn.com", "twosite.net"),
(now, "m1", "bbc.com", "twosite.net"),
# safe.cdn.net: 3 sites but ALLOWLISTED → excluded
(now, "m1", "a.com", "safe.cdn.net"),
(now, "m1", "b.com", "safe.cdn.net"),
(now, "m1", "c.com", "safe.cdn.net"),
# secubox.in: 3 sites but SELF domain → excluded
(now, "m1", "a.com", "secubox.in"),
(now, "m1", "b.com", "secubox.in"),
(now, "m1", "c.com", "secubox.in"),
# stale.io: 3 sites but OUTSIDE the recent window → excluded
(now - 999999, "m1", "a.com", "stale.io"),
(now - 999999, "m1", "b.com", "stale.io"),
(now - 999999, "m1", "c.com", "stale.io"),
]
con.executemany(
"INSERT INTO social_edges(ts,client_mac_hash,src_site,tracker_domain) "
"VALUES(?,?,?,?)", rows)
con.commit()
con.close()
allow = tmp_path / "ad-allowlist.txt"
allow.write_text("safe.cdn.net\n")
out = tmp_path / "learned-trackers.txt"
out.write_text("preexisting.tracker.com\n")
monkeypatch.setenv("SECUBOX_AUTOLEARN_DB", str(db))
monkeypatch.setenv("SECUBOX_AUTOLEARN_OUT", str(out))
monkeypatch.setenv("SECUBOX_AD_ALLOWLIST", str(allow))
monkeypatch.setenv("SECUBOX_SOCIAL_MIN_SITES", "3")
monkeypatch.setenv("SECUBOX_SOCIAL_WINDOW_HOURS", "168")
al = _load_autolearn()
n = al._social_feed()
lines = out.read_text().split()
assert "tracker.io" in lines # 3 distinct sites, recent → promoted
assert "twosite.net" not in lines # below threshold
assert "safe.cdn.net" not in lines # allowlisted
assert "secubox.in" not in lines # self domain
assert "stale.io" not in lines # outside window
assert "preexisting.tracker.com" in lines # merge, not overwrite
assert len(lines) == len(set(lines)) # no dups
assert n == 1
def test_social_feed_no_table_is_safe(tmp_path, monkeypatch):
db = tmp_path / "empty.db"
sqlite3.connect(db).close() # no social_edges table
out = tmp_path / "learned-trackers.txt"
out.write_text("x.tracker.com\n")
monkeypatch.setenv("SECUBOX_AUTOLEARN_DB", str(db))
monkeypatch.setenv("SECUBOX_AUTOLEARN_OUT", str(out))
al = _load_autolearn()
n = al._social_feed()
assert n == -1 # gated/unavailable, not a crash
assert "x.tracker.com" in out.read_text() # file untouched

View File

@ -48,6 +48,9 @@ def test_get_bundle_caches(monkeypatch):
def test_loader_js_is_served_string():
assert "addEventListener" not in bundle.LOADER_JS # uses currentScript pattern
# The legacy src-loader uses the currentScript pattern and fetch()es the
# bundle same-origin (the inline path #662 supersedes it in the live engine
# but /__toolbox/loader.js still serves this).
assert "currentScript" in bundle.LOADER_JS
assert "__toolbox/bundle" in bundle.LOADER_JS
assert bundle.LOADER_JS.strip().startswith("(function()")

View File

@ -0,0 +1,138 @@
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
# Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
# Source-Disclosed License — All rights reserved except as expressly granted.
# See LICENCE-CMSD-1.0.md for terms.
"""SecuBox-Deb :: toolbox :: inline (SW-immune) banner script tests (#662).
The inline banner survives sites with a SERVICE WORKER (leparisien, cnn): the
engine bakes the bundle + mh/wg/csp as JS literals so there is NO same-origin
fetch the SW can hijack. These tests pin that contract:
* a valid baked `var bundle = {...}` (JSON), mh/wg/csp literals,
* the 🔓 proof gated by csp,
* NO currentScript (the #653 null-in-async bug) and NO fetch(,
* `</script>` is escaped (no inline-script breakout),
* get_bundle is called with (mh, bool(wg)).
"""
import json
import os
import re
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
from secubox_toolbox import api, bundle # noqa: E402
def _baked_bundle(script: str) -> dict:
"""Extract + parse the baked `var bundle = {...};` JSON from an inline script.
Undoes the `</` `<\\/` breakout escaping before parsing as JSON."""
m = re.search(r"var bundle = (\{.*?\});\n", script, re.S)
assert m, "no baked `var bundle = {...};` in inline script"
return json.loads(m.group(1).replace("<\\/", "</"))
def test_inline_bakes_valid_bundle_json():
s = bundle.inline_script("x", wg=True, csp=True)
b = _baked_bundle(s)
assert b["v"] == 1
assert b["client_id"] == "x"
# wg=True → public report URL (proves get_bundle was called with wg=True)
assert b["report_url"] == bundle.REPORT_URL_PUBLIC + "?mh=x"
assert isinstance(b["tracker_patterns"], list) and b["tracker_patterns"]
def test_inline_bakes_mh_wg_csp_literals():
s = bundle.inline_script("deadbeef", wg=True, csp=True)
assert 'var mh = "deadbeef";' in s
assert 'var wg = "1";' in s
assert 'var csp = "1";' in s
s0 = bundle.inline_script("deadbeef", wg=False, csp=False)
assert 'var wg = "0";' in s0
assert 'var csp = "0";' in s0
def test_inline_csp_literal_and_proof_logic():
# The 🔓 literal lives in the shared render core, gated at runtime by
# csp === "1". csp=1 → var csp = "1" so render shows the proof.
s1 = bundle.inline_script("x", wg=False, csp=True)
assert "\U0001f513" in s1 # 🔓 present in the render logic
assert 'var csp = "1";' in s1 # runtime gate ON
# csp=0 → gate OFF (no proof rendered), even though the literal is in core.
s0 = bundle.inline_script("x", wg=False, csp=False)
assert 'var csp = "0";' in s0
def test_inline_has_no_currentscript_no_fetch():
# #653 root cause: document.currentScript is null in an async context. The
# inline script MUST NOT read it, and MUST NOT fetch() (SW would hijack it).
s = bundle.inline_script("x", wg=True, csp=True)
assert "currentScript" not in s
assert "fetch(" not in s
def test_inline_keeps_guards_and_spa_hooks():
s = bundle.inline_script("x", wg=True, csp=True)
assert "window.__SBX_LOADER__" in s # single-run guard
assert 'getElementById("sbx-banner")' in s # element-id guard
assert "dismissed" in s
assert "pushState" in s and "replaceState" in s and "popstate" in s
assert "setInterval(ensure, 2000)" in s
assert "countTrackers" in s
def test_inline_escapes_script_breakout():
# A bundle value that literally contains </script> must NOT close the inline
# <script> — it must be escaped to <\/script>.
orig = bundle._read_pin
bundle._read_pin = lambda: "</script><img src=x onerror=alert(1)>"
bundle._cache.clear()
try:
s = bundle.inline_script("z", wg=False, csp=False)
finally:
bundle._read_pin = orig
bundle._cache.clear()
# The IIFE close is the only legitimate "})();"; nothing before the final
# close should contain a raw "</script>".
head = s[: s.rfind("})();")]
assert "</script>" not in head
assert "<\\/script>" in head # escaped form present
def test_inline_get_bundle_called_with_bool_wg(monkeypatch):
seen = {}
def fake_get_bundle(mh, is_wg=False):
seen["args"] = (mh, is_wg)
return {"v": 1, "client_id": mh, "level": "r1", "pin": "",
"report_url": "http://x", "tracker_patterns": ["doubleclick"],
"ts": 0}
monkeypatch.setattr(bundle, "get_bundle", fake_get_bundle)
bundle.inline_script("abc", wg=1, csp=0) # wg passed as truthy int
assert seen["args"] == ("abc", True) # coerced to bool
def test_legacy_loader_still_intact():
# The src-loader must keep working: it reads currentScript + data-attrs and
# fetch()es the bundle (the inline path supersedes it in the live engine, but
# the /__toolbox/loader.js route still serves it).
assert "currentScript" in bundle.LOADER_JS
assert "fetch(" in bundle.LOADER_JS
assert "function render" in bundle.LOADER_JS
assert "window.__SBX_LOADER__" in bundle.LOADER_JS
def test_inline_route_returns_javascript_body():
import asyncio
resp = asyncio.run(api.toolbox_inline(mh="abc", wg=1, csp=1))
assert resp.status_code == 200
assert "javascript" in resp.media_type
assert "no-store" in resp.headers.get("Cache-Control", "")
body = resp.body.decode("utf-8")
assert "window.__SBX_LOADER__" in body
assert "currentScript" not in body
assert "fetch(" not in body
assert 'var mh = "abc";' in body
assert 'var csp = "1";' in body

View File

@ -0,0 +1,48 @@
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
# Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
"""Cross-engine SOCIAL parity harness — Python side (#662).
Loads the SAME ``social-cookie-id-fixtures.json`` the Go core uses
(``../secubox-toolbox-ng/testdata``) and asserts ``social.cookie_id_hash``
reproduces each fixture's ``expect``.
Python is the source of truth: the ``expect`` values were GENERATED by this very
``social.cookie_id_hash``. The Go side (cmd/sbxmitm/social_test.go) must
reproduce them byte-for-byte. Both files reading identical inputs is what makes
the parity meaningful the same anti-rig discipline as the jar parity harness.
"""
from __future__ import annotations
import json
import os
from secubox_toolbox import social
_HERE = os.path.dirname(os.path.abspath(__file__))
# tests/ → packages/secubox-toolbox → packages → packages/secubox-toolbox-ng
_NG_TESTDATA = os.path.normpath(
os.path.join(_HERE, "..", "..", "secubox-toolbox-ng", "testdata"))
_FIXTURES = os.path.join(_NG_TESTDATA, "social-cookie-id-fixtures.json")
def _load():
with open(_FIXTURES, encoding="utf-8") as f:
return json.load(f)
def test_cookie_id_hash_parity():
data = _load()
assert data["fixtures"], "no fixtures"
failures = []
for fx in data["fixtures"]:
got = social.cookie_id_hash(
fx["tracker_domain"], fx["cookie_name"], fx["cookie_value"])
if got != fx["expect"]:
failures.append((fx, got))
assert not failures, f"cookie_id_hash drift: {failures}"
def test_cookie_id_hash_invariants():
# domain + name are lower-cased; the value is NOT.
assert social.cookie_id_hash("A.NET", "N", "v") == social.cookie_id_hash("a.net", "n", "v")
assert social.cookie_id_hash("a.net", "n", "V") != social.cookie_id_hash("a.net", "n", "v")

View File

@ -0,0 +1,118 @@
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
# Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
"""Phase 6 (#662) — per-client country flag from the REAL external WG endpoint IP."""
import asyncio
import hashlib
from types import SimpleNamespace
from secubox_toolbox import api
from secubox_toolbox import wg
# A `wg show wg-toolbox dump` blob. First line = interface (skipped).
# Peer fields are TAB-separated: pubkey psk endpoint allowed-ips ...
_PUB_PUBLIC = "cVZ7s8d2pubkeyAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" # real public endpoint
_PUB_NONE = "noneZZZpubkeyBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=" # endpoint (none)
_PUB_LAN = "lanZZZZZpubkeyCCCCCCCCCCCCCCCCCCCCCCCCCCCC=" # RFC1918 endpoint
_PUB_V6 = "v6ZZZZZZpubkeyDDDDDDDDDDDDDDDDDDDDDDDDDDDDD=" # IPv6 endpoint
_DUMP = "\t".join([
"srvPrivKey", "srvPubKey", "51820", "off",
]) + "\n" + "\n".join([
"\t".join([_PUB_PUBLIC, "(none)", "88.163.66.208:51820", "10.99.1.2/32", "0", "0", "0"]),
"\t".join([_PUB_NONE, "(none)", "(none)", "10.99.1.3/32", "0", "0", "0"]),
"\t".join([_PUB_LAN, "(none)", "192.168.1.50:41234", "10.99.1.4/32", "0", "0", "0"]),
"\t".join([_PUB_V6, "(none)", "[2606:4700:4700::1111]:51820", "10.99.1.5/32", "0", "0", "0"]),
])
def _hash(pub: str) -> str:
return hashlib.sha256(pub.encode()).hexdigest()[:16]
def _fake_run(blob):
def _run(cmd, **kw):
return SimpleNamespace(stdout=blob, stderr="", returncode=0)
return _run
def test_wg_endpoints_parsing(monkeypatch):
# Bust the 30s cache between tests.
wg._ENDPOINTS_CACHE, wg._ENDPOINTS_TS = {}, 0.0
monkeypatch.setattr(wg.subprocess, "run", _fake_run(_DUMP))
eps = wg.wg_endpoints()
# Public IPv4 endpoint → mapped, port stripped.
assert eps[_hash(_PUB_PUBLIC)] == "88.163.66.208"
# mac_hash derivation matches the known pubkey→hash.
assert _hash(_PUB_PUBLIC) == "ad32e736309b1348"
# IPv6 endpoint → bracket + port stripped (global IPv6 kept).
assert eps[_hash(_PUB_V6)] == "2606:4700:4700::1111"
# `(none)` endpoint skipped.
assert _hash(_PUB_NONE) not in eps
# RFC1918 LAN endpoint skipped (no meaningful country).
assert _hash(_PUB_LAN) not in eps
def test_wg_endpoints_besteffort_empty(monkeypatch):
wg._ENDPOINTS_CACHE, wg._ENDPOINTS_TS = {}, 0.0
def _boom(*a, **k):
raise FileNotFoundError("wg not installed")
monkeypatch.setattr(wg.subprocess, "run", _boom)
assert wg.wg_endpoints() == {}
def test_clients_rich_uses_external_endpoint_flag(monkeypatch):
wg_pub = "clientPubKeyEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE="
wg_mac = _hash(wg_pub)
rows = [
# WG client: stored ip is internal 10.99.1.x, has a public endpoint.
{"mac_hash": wg_mac, "ip": "10.99.1.7", "state": "active",
"level": "r3", "score": 0, "last_seen": 100.0, "first_seen": 0.0},
# Non-WG / captive client: no endpoint → falls back to stored ip.
{"mac_hash": "captive01", "ip": "203.0.113.9", "state": "active",
"level": "r1", "score": 0, "last_seen": 50.0, "first_seen": 0.0},
]
monkeypatch.setattr(api.store, "list_clients", lambda: rows)
monkeypatch.setattr(api.store, "latest_user_agent", lambda mh: "")
# External endpoint for the WG client only. admin_clients_rich does a lazy
# `from . import wg`, so patching the wg module attribute is what takes effect.
import secubox_toolbox.wg as _wgmod
monkeypatch.setattr(_wgmod, "wg_endpoints", lambda: {wg_mac: "88.163.66.208"})
seen_keys = []
def fake_lookup(key):
seen_keys.append(key)
if key == "88.163.66.208":
return {"flag": "🇫🇷", "country_iso": "FR", "asn_org": "Orange"}
if key == "203.0.113.9":
return {"flag": "🇺🇸", "country_iso": "US", "asn_org": "Example"}
return {"flag": "", "country_iso": "", "asn_org": ""}
monkeypatch.setattr(api.geo, "lookup", fake_lookup)
out = asyncio.run(api.admin_clients_rich())
clients = {c["mac_hash"]: c for c in out["clients"]}
# WG client: flag derived from the EXTERNAL IP, not the internal 10.99.1.7.
assert clients[wg_mac]["flag"] == "🇫🇷"
assert clients[wg_mac]["country_iso"] == "FR"
assert "88.163.66.208" in seen_keys
assert "10.99.1.7" not in seen_keys # internal IP never geo-looked-up
# Non-WG client: falls back to the stored ip.
assert clients["captive01"]["flag"] == "🇺🇸"
assert "203.0.113.9" in seen_keys
# PRIVACY: the raw external IP must NOT appear anywhere in the response.
import json
dumped = json.dumps(out, default=str)
assert "88.163.66.208" not in dumped
# The stored (internal) ip is still the only ip field exposed.
assert clients[wg_mac]["ip"] == "10.99.1.7"