mirror of
https://github.com/CyberMind-FR/secubox-deb.git
synced 2026-06-29 19:43:10 +00:00
Compare commits
23 Commits
ded89934d0
...
4c6777dc68
| Author | SHA1 | Date | |
|---|---|---|---|
| 4c6777dc68 | |||
|
|
1a315317e7 | ||
|
|
04598482fb | ||
| be0497e6de | |||
| 7db7a73d65 | |||
| 3ade5619d0 | |||
| a48f43607b | |||
|
|
27ba48c1a1 | ||
| c04a9d0c1c | |||
| 3009ef93d9 | |||
|
|
78ad554ece | ||
| 895356dc00 | |||
| 4063ae1a95 | |||
|
|
77da033371 | ||
| 3850da5479 | |||
| 040e460876 | |||
| 55f9e4c803 | |||
| 257fc95182 | |||
|
|
591106ec65 | ||
|
|
15a668829b | ||
| 73b8ad36b1 | |||
| d0db3e87fd | |||
| 05c659b4ca |
|
|
@ -3,6 +3,27 @@
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## 2026-06-19 — kbin milestone: ToolBoX 2.7.0 (middle release) + Tor chapter staged (#683)
|
||||||
|
|
||||||
|
- **End-of-session checkpoint** — docs + positioning + version, no runtime behaviour change.
|
||||||
|
- **`secubox-toolbox` 2.6.59 → 2.7.0** (middle release) — caps the 2.6.x line
|
||||||
|
(ad-intelligence / Anti-Track v2 / anti-bot uTLS #662) and opens the **kbin** chapter:
|
||||||
|
kbin (`kbin.gk2.secubox.in`, the public ToolBoX portal) framed as the *first tool of the
|
||||||
|
CyberMind Swiss-army cyber kit* — transparent performance, full-encrypted MITM inspection,
|
||||||
|
ad poison/smog injection, adware-ban transparency banner, safe browsing.
|
||||||
|
- **Docs** — new wiki use-case `docs/wiki/Kbin-Toolbox.md`, `docs/FAQ-KBIN-TOR.md`,
|
||||||
|
README positioning blurb.
|
||||||
|
- **Plan #683 (issue + spec)** — kbin **Tor endpoint**: a quick-switch re-routing consenting
|
||||||
|
client surfing through Tor (outbound egress, pseudo-network) so the kbin exit is anonymized.
|
||||||
|
Spec `docs/superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md`. Invariants:
|
||||||
|
inspection preserved (Tor after the forging core), fail-closed, opt-in/default-OFF, no DNS
|
||||||
|
leak, CSPN audit-logged. Opposite direction of `secubox-exposure` (inbound hidden services);
|
||||||
|
reuses its Tor control. Depends on the #662 Go core for the preferred SOCKS5-dialer transport.
|
||||||
|
- **Caveat recorded** — Tor mode must force `tls_splice` (#649) OFF per-client or asset flows
|
||||||
|
leak the real IP.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## 2026-06-19 — #662 anti-bot: Chrome TLS fingerprint (uTLS) — defeat DataDome without splice (PR #674)
|
## 2026-06-19 — #662 anti-bot: Chrome TLS fingerprint (uTLS) — defeat DataDome without splice (PR #674)
|
||||||
|
|
||||||
- lemonde.fr (DataDome) blocked R3 navigation at the 2nd level: the engine re-origined
|
- lemonde.fr (DataDome) blocked R3 navigation at the 2nd level: the engine re-origined
|
||||||
|
|
|
||||||
|
|
@ -1,10 +1,26 @@
|
||||||
# TODO — SecuBox-DEB Backlog
|
# TODO — SecuBox-DEB Backlog
|
||||||
*Mis à jour : 2026-06-13*
|
*Mis à jour : 2026-06-19*
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 🔥 P0 — Immediate (in flight)
|
## 🔥 P0 — Immediate (in flight)
|
||||||
|
|
||||||
|
### kbin Tor endpoint — anonymized quick-switch surfing (#683)
|
||||||
|
|
||||||
|
> Capstone du couteau suisse cyber : l'anonymat de la sortie. Spec :
|
||||||
|
> `docs/superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md`.
|
||||||
|
> Invariants : inspection préservée, fail-closed, opt-in (défaut OFF), no DNS leak, CSPN audit.
|
||||||
|
|
||||||
|
- [ ] **Transport** — Option A dialer SOCKS5 upstream (cœur Go #662, *préféré*) vs
|
||||||
|
Option B nft mark → Tor TransPort (fallback pré-#662).
|
||||||
|
- [ ] **Profil Tor egress** — réutiliser `secubox-exposure` (bootstrap/NEWNYM), egress-only.
|
||||||
|
- [ ] **API toolbox** — `POST /admin/tor/{on,off}` (WG-hash scoped) + `GET /tor/state` +
|
||||||
|
`POST /tor/newnym` + état SQLite per-client (TTL 24h).
|
||||||
|
- [ ] **UI kbin** — toggle 🧅 + badge état + flag pays de sortie + bouton « nouvelle identité ».
|
||||||
|
- [ ] **Leak-guard nft** + DNS-over-Tor (test exit IP + resolver ≠ Unbound).
|
||||||
|
- [ ] **`tls_splice` OFF en mode Tor** (#649) — sinon les flux asset fuient l'IP réelle.
|
||||||
|
- [ ] **CSPN** — audit-log chaque bascule ; soak DARK (flag présent, UI cachée) avant flip.
|
||||||
|
|
||||||
### ToolBox clients (`clients/`)
|
### ToolBox clients (`clients/`)
|
||||||
|
|
||||||
- [x] **#531 Android scaffold + CI** — Gradle/Compose one-tap onboarding,
|
- [x] **#531 Android scaffold + CI** — Gradle/Compose one-tap onboarding,
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,35 @@
|
||||||
# WIP — Work In Progress
|
# WIP — Work In Progress
|
||||||
*Mis à jour : 2026-06-18*
|
*Mis à jour : 2026-06-19*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔄 2026-06-19 : kbin milestone — ToolBoX 2.7.0 + chapitre Tor (plan)
|
||||||
|
|
||||||
|
Checkpoint de fin de session. Pas de changement de comportement runtime — docs +
|
||||||
|
positionnement + version + plan de la lame suivante.
|
||||||
|
|
||||||
|
- ✅ **ToolBoX 2.7.0** (middle release) — clôt la ligne 2.6.x (ad-intelligence /
|
||||||
|
Anti-Track v2 / anti-bot uTLS #662), ouvre le chapitre kbin « premier outil du
|
||||||
|
couteau suisse cyber ». kbin = perf transparente + full encrypted + poison/smog +
|
||||||
|
bandeau anti-adware + safe browsing.
|
||||||
|
- ✅ **Docs kbin** — wiki [`Kbin-Toolbox.md`](../docs/wiki/Kbin-Toolbox.md),
|
||||||
|
[`FAQ-KBIN-TOR.md`](../docs/FAQ-KBIN-TOR.md), blurb README.
|
||||||
|
- ✅ **Plan #683** — spec
|
||||||
|
[`2026-06-19-kbin-tor-anonymized-surfing-design.md`](../docs/superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md) :
|
||||||
|
endpoint Tor quick-switch (egress sortant, fail-closed, opt-in, no DNS leak,
|
||||||
|
inspection préservée). Dépend du cœur Go #662.
|
||||||
|
|
||||||
|
### ⬜ Next Up — chapitre Tor (#683)
|
||||||
|
|
||||||
|
- **Décider le transport** : Option A (dialer SOCKS5 upstream via le cœur Go #662,
|
||||||
|
*préféré*) vs Option B (nft mark → Tor TransPort, fallback pré-#662).
|
||||||
|
- **Profil Tor egress** dans `secubox-exposure` (ou unit `tor-egress` dédié) —
|
||||||
|
egress-only, pas de relay/hidden-service dans ce profil.
|
||||||
|
- **API toolbox** : `POST /admin/tor/{on,off}` (par client, WG-hash), `GET /tor/state`,
|
||||||
|
`POST /tor/newnym` + état SQLite + bandeau 🧅 UI.
|
||||||
|
- **Leak-guard nft** + DNS-over-Tor (test : exit IP + resolver ≠ Unbound local).
|
||||||
|
- **Caveat** : en mode Tor, forcer `tls_splice` OFF pour ce client (sinon les flux
|
||||||
|
asset fuient l'IP réelle). Soak DARK (flag présent, UI cachée) avant flip.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
24
README.md
24
README.md
|
|
@ -57,6 +57,30 @@
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## 🗡️ kbin — le premier outil du couteau suisse cyber
|
||||||
|
|
||||||
|
**kbin** (`kbin.gk2.secubox.in`) est le portail public de la **ToolBoX** SecuBox — la
|
||||||
|
*cabine numérique* et **première lame du couteau suisse cyber modulaire** de
|
||||||
|
[cybermind.fr](https://cybermind.fr). On s'y branche, on surfe normalement, et la lame
|
||||||
|
inspecte et protège le trafic de façon transparente :
|
||||||
|
|
||||||
|
| 🗡️ | Lame |
|
||||||
|
|----|------|
|
||||||
|
| ⚡ | **Performance transparente** — on ne déchiffre que ce qu'on modifie (SNI-splice sélectif) |
|
||||||
|
| 🔒 | **Full encrypted** — inspection MITM complète, forge de cert par hôte, fingerprint Chrome uTLS |
|
||||||
|
| ☠️ | **Injection de poison & smog** — le trafic ad-tech ressort empoisonné, pas seulement bloqué |
|
||||||
|
| 🚫 | **Bandeau anti-adware** — transparence injectée, immune au CSP, SPA-aware |
|
||||||
|
| 🛡️ | **Safe browsing** — Vortex DNS + blacklist nft + détection anti-bot |
|
||||||
|
|
||||||
|
> **Prochaine lame — 🧅 mode Tor quick-switch ([#683](https://github.com/CyberMind-FR/secubox-deb/issues/683)).**
|
||||||
|
> Un tap → le surf ressort par le réseau Tor (egress sortant, pseudo-network) : l'inspection
|
||||||
|
> reste intacte, seule l'**IP de sortie** devient anonyme. Fail-closed, opt-in, sans fuite DNS.
|
||||||
|
|
||||||
|
- Use-case : [docs/wiki/Kbin-Toolbox.md](docs/wiki/Kbin-Toolbox.md)
|
||||||
|
- FAQ : [docs/FAQ-KBIN-TOR.md](docs/FAQ-KBIN-TOR.md)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## License — CyberMind Source-Disclosed (CMSD-1.0)
|
## License — CyberMind Source-Disclosed (CMSD-1.0)
|
||||||
|
|
||||||
> **Source disclosed, rights reserved.**
|
> **Source disclosed, rights reserved.**
|
||||||
|
|
|
||||||
93
docs/FAQ-KBIN-TOR.md
Normal file
93
docs/FAQ-KBIN-TOR.md
Normal file
|
|
@ -0,0 +1,93 @@
|
||||||
|
# FAQ — kbin & le mode Tor anonymisé
|
||||||
|
|
||||||
|
> kbin (`kbin.gk2.secubox.in`) = le portail public de la **ToolBoX** SecuBox, premier
|
||||||
|
> outil du couteau suisse cyber CyberMind. Cette FAQ couvre le surf protégé et le futur
|
||||||
|
> **mode Tor quick-switch** ([#683](https://github.com/CyberMind-FR/secubox-deb/issues/683)).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Qu'est-ce que kbin exactement ?
|
||||||
|
|
||||||
|
Le portail public de `secubox-toolbox`. On rejoint l'AP libre de la cabine, on consent,
|
||||||
|
et tout le trafic traverse le pipeline de forge MITM SecuBox : inspection chiffrée,
|
||||||
|
nettoyage pub/tracker, bandeau de transparence, safe browsing. Voir
|
||||||
|
[Kbin-Toolbox](wiki/Kbin-Toolbox.md).
|
||||||
|
|
||||||
|
### kbin voit-il tout mon trafic ? C'est pas dangereux ?
|
||||||
|
|
||||||
|
C'est **consenti et éphémère**. La MAC est hashée avec un sel rotatif 24 h, aucune valeur
|
||||||
|
de cookie brute n'est persistée, aucun mapping session ↔ identité réelle ne survit au TTL.
|
||||||
|
Trois niveaux d'opt-in : R0 (bypass complet), R1 (analyse passive, recommandé), R2/R3
|
||||||
|
(TLS-break + bandeau). Sans consentement, **pas** de déchiffrement.
|
||||||
|
|
||||||
|
### « Performance transparente », ça veut dire quoi ?
|
||||||
|
|
||||||
|
On ne déchiffre que ce qu'on modifie. Les flux pur-asset (vidéo, images CDN) sont
|
||||||
|
*splicés* dès le ClientHello TLS (`tls_splice`, #649) — les workers ne forgent/déchiffrent
|
||||||
|
pas ce qui n'a aucune valeur L7. Débit ligne, latence quasi nulle.
|
||||||
|
|
||||||
|
### C'est quoi « l'injection de poison et de smog » ?
|
||||||
|
|
||||||
|
Le trafic ad-tech et tracker n'est pas seulement bloqué : il est **empoisonné**. Anti-Track
|
||||||
|
v2 (#633) renvoie des pseudo-réponses, neutralise les scripts CDN préchargés, et au niveau
|
||||||
|
réseau fait de l'IP-drop + DNS-refuse. Le profil publicitaire ressort pollué, pas vide —
|
||||||
|
indistinguable d'un vrai blocage côté tracker.
|
||||||
|
|
||||||
|
### Le bandeau anti-adware, il bloque quoi ?
|
||||||
|
|
||||||
|
Une bannière de transparence injectée dans la page : nombre de trackers vus/bloqués,
|
||||||
|
acteurs reconnus cross-site. Elle est immune au CSP et SPA-aware (#636/#639, webext #655).
|
||||||
|
C'est l'affichage ; le blocage réel vient des blocklists Vortex DNS + blacklist nft.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Mode Tor (plan #683)
|
||||||
|
|
||||||
|
### Le mode Tor, ça fait quoi ?
|
||||||
|
|
||||||
|
Un interrupteur 🧅 sur kbin : un tap → ton surf ressort **par le réseau Tor** au lieu du
|
||||||
|
WAN de la box. IP de sortie anonyme, identité réseau masquée — du « pseudo-network
|
||||||
|
surfing ».
|
||||||
|
|
||||||
|
### Est-ce que kbin arrête de m'inspecter/protéger en mode Tor ?
|
||||||
|
|
||||||
|
Non. Tor se place **après** le cœur de forge MITM, sur le transport upstream (dialer
|
||||||
|
SOCKS5). Tu gardes le poison/smog, le bandeau et le safe browsing ; **seules l'IP de sortie
|
||||||
|
et l'identité réseau changent**.
|
||||||
|
|
||||||
|
### Et si Tor tombe, ça repasse en clair ?
|
||||||
|
|
||||||
|
**Jamais.** Le design est **fail-closed** : si Tor n'est pas disponible, le trafic est
|
||||||
|
coupé, pas renvoyé en clearnet. L'anonymat est un invariant, pas un best-effort.
|
||||||
|
|
||||||
|
### Y a-t-il des fuites DNS ?
|
||||||
|
|
||||||
|
Non. Quand le mode Tor est actif, la résolution passe **par Tor**, pas par l'Unbound local.
|
||||||
|
|
||||||
|
### C'est la même chose que `secubox-exposure` ?
|
||||||
|
|
||||||
|
Non, direction opposée. `secubox-exposure` publie des **services cachés** Tor (entrant —
|
||||||
|
exposer un service interne). kbin Tor endpoint fait sortir ton **surf** par Tor (sortant).
|
||||||
|
Le contrôle Tor (bootstrap, NEWNYM/nouvelle identité) est réutilisé entre les deux.
|
||||||
|
|
||||||
|
### Comment je change d'IP de sortie ?
|
||||||
|
|
||||||
|
Bouton « nouvelle identité » (NEWNYM) → nouveau circuit Tor → nouvelle IP de sortie, à la
|
||||||
|
volée, sans reconnecter.
|
||||||
|
|
||||||
|
### C'est activé par défaut ?
|
||||||
|
|
||||||
|
Non. **Opt-in par client** (scopé WG-hash), **défaut OFF**, respecte ton niveau de
|
||||||
|
consentement R. Chaque bascule on/off est journalisée (audit-log CSPN immuable).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Voir aussi
|
||||||
|
|
||||||
|
- [Kbin-Toolbox](wiki/Kbin-Toolbox.md) — la page use-case complète
|
||||||
|
- [Spec mode Tor](superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md)
|
||||||
|
- [Anti-Track](wiki/Anti-Track.md) — bloque/empoisonne/anonymise (couche DNS/IP)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*CyberMind — Gérald Kerma · LicenseRef-CMSD-1.0*
|
||||||
|
|
@ -0,0 +1,99 @@
|
||||||
|
# Design — kbin Tor endpoint: quick-switch anonymized web surfing
|
||||||
|
|
||||||
|
*Spec · 2026-06-19 · issue [#683](https://github.com/CyberMind-FR/secubox-deb/issues/683) · status: PLAN (no code yet)*
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
kbin (the public ToolBoX portal, first tool of the Swiss-army cyber kit) already gives
|
||||||
|
transparent perf + full-MITM inspection + ad poison/smog + adware-ban banner + safe
|
||||||
|
browsing. The **egress is still clearnet**: a kbin session exits via the board WAN with the
|
||||||
|
real IP. The capstone is **anonymity of the exit** — a quick-switch that re-routes a
|
||||||
|
consenting client's surfing through **Tor** (outbound), turning kbin into a pseudo-network
|
||||||
|
surfing booth.
|
||||||
|
|
||||||
|
This is the **opposite direction** of `secubox-exposure` (which publishes inbound Tor
|
||||||
|
hidden services). We reuse its Tor control plumbing (bootstrap, NEWNYM) but for egress.
|
||||||
|
|
||||||
|
## Invariants (non-negotiable)
|
||||||
|
|
||||||
|
1. **Inspection preserved** — Tor sits *after* the MITM forging core, on the upstream
|
||||||
|
transport (SOCKS5 dialer). Poison/smog + banner + safe-browsing stay; only the **exit
|
||||||
|
IP + network identity** change.
|
||||||
|
2. **Fail-closed** — if Tor is down/not bootstrapped, traffic is dropped, never falls back
|
||||||
|
to clearnet. Anonymity is an invariant, not best-effort.
|
||||||
|
3. **No DNS leak** — when Tor mode is on, resolution goes through Tor, not local Unbound.
|
||||||
|
4. **Opt-in, default OFF** — per-client (WG-hash scoped), honors the existing R consent
|
||||||
|
level. No silent global toggle.
|
||||||
|
5. **CSPN** — every Tor on/off decision written to the immutable audit-log; no plaintext
|
||||||
|
exit; TLS 1.3 floor unchanged.
|
||||||
|
|
||||||
|
## Two transport options (decide first)
|
||||||
|
|
||||||
|
| Option | Mechanism | Pros | Cons |
|
||||||
|
|--------|-----------|------|------|
|
||||||
|
| **A — SOCKS5 upstream dialer** (preferred) | The Go forging core (#662) dials upstream via Tor's SOCKS5 (`127.0.0.1:9050`) when the client is Tor-flagged. | Clean integration with #662; per-flow choice; cert verify + uTLS preserved; DNS-over-Tor native (SOCKS5 remote resolve). | Requires the Go core to land first (#662 dependency). |
|
||||||
|
| **B — nft mark → Tor TransPort** | Per-client nft mark routes 80/443 to Tor `TransPort`/`DNSPort`; transparent at L3. | Engine-agnostic; works without #662. | Bypasses the forging core unless chained carefully → risk of losing inspection (violates invariant 1). |
|
||||||
|
|
||||||
|
**Recommendation:** Option A, gated on #662 Go core. Option B only as a pre-#662 fallback,
|
||||||
|
and only if the mark routes *through* the MITM TPROXY first, then Tor.
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
- **Tor daemon** — `tor.service`, SOCKS5 `9050` + control port (cookie auth). Reuse
|
||||||
|
`secubox-exposure` bootstrap; ensure egress-only config (no relay, no hidden service in
|
||||||
|
this profile).
|
||||||
|
- **toolbox API** — `POST /admin/tor/{on,off}` (per-client, kbin-gated for bulk),
|
||||||
|
`GET /tor/state` (bootstrapped? exit country? client flag?), `POST /tor/newnym`.
|
||||||
|
- **Go forging core (#662)** — upstream dialer switch: Tor-flagged client → SOCKS5 dialer
|
||||||
|
(remote DNS) instead of direct. uTLS Chrome FP + manual cert verify unchanged.
|
||||||
|
- **State store** — per-client `tor_enabled` (WG-hash scoped, TTL-bound) in the toolbox
|
||||||
|
SQLite (`clients` table extension or a small `tor_flags` table).
|
||||||
|
- **nft leak-guard** — when a client is Tor-flagged, a guard rule ensures no 80/443 path
|
||||||
|
reaches the WAN except via the Tor dialer (defense-in-depth for invariant 2/3).
|
||||||
|
- **kbin UI** — 🧅 toggle + state badge (bootstrapping / on / exit-country flag) + "new
|
||||||
|
identity" button; respects R-level (greyed if R0).
|
||||||
|
|
||||||
|
## UX
|
||||||
|
|
||||||
|
```
|
||||||
|
[kbin page] ── tap 🧅 ──▶ POST /admin/tor/on (this client)
|
||||||
|
▼
|
||||||
|
Tor bootstrapped? ──no──▶ "Tor démarre…" (spinner, fail-closed until ready)
|
||||||
|
│yes
|
||||||
|
▼
|
||||||
|
flag client tor_enabled (WG-hash, TTL 24h) + audit-log
|
||||||
|
▼
|
||||||
|
forging core dials upstream via SOCKS5 → exit IP changes
|
||||||
|
▼
|
||||||
|
badge: 🧅 ON · 🌍 <exit-country flag> [Nouvelle identité]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Open questions (resolve next session)
|
||||||
|
|
||||||
|
- Per-flow vs per-session Tor? (start per-session/per-client; per-flow later)
|
||||||
|
- Exit-country selection (`ExitNodes {cc}`) exposed to user, or auto?
|
||||||
|
- Latency expectation messaging — Tor is slower; the perf banner must set expectations.
|
||||||
|
- Interaction with `tls_splice` (#649): splice = direct fast-path; in Tor mode, splice
|
||||||
|
must be disabled or also routed through Tor (else asset flows leak the real IP).
|
||||||
|
**Likely: Tor mode forces splice OFF for that client.**
|
||||||
|
- Interaction with Anti-Track v2 IP-drop/DNS-refuse: ordering vs Tor resolution.
|
||||||
|
|
||||||
|
## Dependencies & sequencing
|
||||||
|
|
||||||
|
1. **#662 Go core** lands the upstream dialer abstraction → enables Option A.
|
||||||
|
2. Tor egress profile in `secubox-exposure` (or a dedicated `tor-egress` unit).
|
||||||
|
3. toolbox API + state + UI.
|
||||||
|
4. nft leak-guard + DNS-over-Tor verification (leak test: compare exit IP + DNS resolver).
|
||||||
|
5. CSPN audit-log wiring + soak DARK (flag exists, UI hidden) → flip.
|
||||||
|
|
||||||
|
## Test plan (sketch)
|
||||||
|
|
||||||
|
- Leak test: with Tor mode on, `check.torproject.org` confirms Tor; DNS resolver is not the
|
||||||
|
local Unbound; real WAN IP never observed upstream.
|
||||||
|
- Fail-closed test: stop `tor.service` mid-session → traffic drops, no clearnet egress.
|
||||||
|
- Inspection test: ad-block + banner + poison still fire while on Tor.
|
||||||
|
- NEWNYM test: exit IP changes after "new identity".
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*CyberMind — Gérald Kerma · LicenseRef-CMSD-1.0*
|
||||||
94
docs/wiki/Kbin-Toolbox.md
Normal file
94
docs/wiki/Kbin-Toolbox.md
Normal file
|
|
@ -0,0 +1,94 @@
|
||||||
|
# kbin — ToolBoX, le premier outil du couteau suisse cyber
|
||||||
|
|
||||||
|
**CyberMind · Gondwana · Notre-Dame-du-Cruet · Savoie** | [Home](Home) | [Anti-Track](Anti-Track) | [Modules](Modules)
|
||||||
|
|
||||||
|
> **kbin** (`kbin.gk2.secubox.in`) est le portail public de la **ToolBoX** SecuBox —
|
||||||
|
> la *cabine téléphonique numérique*. C'est le **premier outil du couteau suisse cyber
|
||||||
|
> modulaire** de [cybermind.fr](https://cybermind.fr) : on s'y connecte, on surfe, et la
|
||||||
|
> lame inspecte, nettoie et protège le trafic de façon transparente.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Le concept en une phrase
|
||||||
|
|
||||||
|
> **Branche-toi, navigue normalement — kbin rend ta session rapide, chiffrée, sans pub
|
||||||
|
> et bientôt anonyme.**
|
||||||
|
|
||||||
|
kbin est la face publique du module [`secubox-toolbox`](../../packages/secubox-toolbox/).
|
||||||
|
Le client rejoint l'AP libre, consent (R1 passif / R2 TLS-break), et tout son trafic
|
||||||
|
traverse le pipeline de forge MITM SecuBox — sans configuration, sans app obligatoire.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Les 5 lames déjà affûtées
|
||||||
|
|
||||||
|
| 🗡️ Lame | Ce qu'elle fait | Implémentation |
|
||||||
|
|---------|-----------------|----------------|
|
||||||
|
| **⚡ Performance transparente** | Débit ligne, latence quasi nulle ; on ne déchiffre que ce qu'on modifie (SNI-splice sélectif des flux pur-asset). | `tls_splice` addon (#649), workers R3 |
|
||||||
|
| **🔒 Full encrypted** | Inspection MITM complète sur HTTPS sortant : forge de cert par hôte, chaîne de certs vérifiée, fingerprint Chrome (uTLS) côté upstream. | Go forging core (#662), uTLS HelloChrome |
|
||||||
|
| **☠️ Injection de poison & smog** | Le trafic ad-tech / tracker entre dans la chambre d'inspection et ressort empoisonné/embrumé : pseudo-réponses, scripts neutralisés, IP-drop + DNS-refuse. | Anti-Track v2 (#633), `privacy_guard`, ad-ghoster |
|
||||||
|
| **🚫 Bandeau anti-adware** | Bannière de transparence injectée dans la page : « tu as été pisté / X trackers bloqués », immune au CSP, SPA-aware. | banner saga (#636/#639), webext (#655) |
|
||||||
|
| **🛡️ Safe browsing** | Blocklists Vortex DNS, blacklist nft (CrowdSec + threat-intel), détection anti-bot/challenge passive. | Phase 13 enforcement plane, Vortex Unbound |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## La lame suivante : 🧅 Tor quick-switch (plan #683)
|
||||||
|
|
||||||
|
C'est la **pointe manquante** : l'anonymat de la sortie.
|
||||||
|
|
||||||
|
Aujourd'hui kbin voit, nettoie et protège — mais le trafic ressort par le WAN de la box,
|
||||||
|
avec l'IP réelle. Le **endpoint Tor** ajoute un interrupteur :
|
||||||
|
|
||||||
|
> **Un tap sur kbin → 🧅 « Mode Tor »** → le surf du client ressort **par le réseau Tor**
|
||||||
|
> au lieu du WAN. Pseudo-réseau, IP de sortie anonyme, identité réseau masquée.
|
||||||
|
|
||||||
|
Invariants de conception (voir
|
||||||
|
[spec](../superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md)) :
|
||||||
|
|
||||||
|
- **L'inspection reste intacte** — Tor se place *après* le cœur de forge MITM, sur le
|
||||||
|
transport upstream (dialer SOCKS5). On garde poison/smog + bandeau + safe browsing ;
|
||||||
|
seules **l'IP de sortie et l'identité réseau** changent.
|
||||||
|
- **Opt-in par client** (scopé WG-hash), **défaut OFF**, respecte le niveau de consentement R.
|
||||||
|
- **Fail-closed** — si Tor tombe, **pas** de repli clearnet (l'anonymat est un invariant,
|
||||||
|
pas un best-effort).
|
||||||
|
- **Pas de fuite DNS** — résolution via Tor quand le mode est actif, pas via l'Unbound local.
|
||||||
|
- **CSPN** — chaque bascule Tor on/off est journalisée (audit-log immuable) ; aucune sortie
|
||||||
|
en clair.
|
||||||
|
|
||||||
|
### Cas d'usage
|
||||||
|
|
||||||
|
1. **Cabine VILLAGE3B** — un visiteur veut consulter un site sensible (santé, juridique,
|
||||||
|
presse) depuis la borne publique sans laisser l'IP de la box. Tap 🧅 → surf anonyme.
|
||||||
|
2. **Pseudo-network surfing** — naviguer comme depuis un autre pays / une autre identité
|
||||||
|
réseau, le temps d'une session éphémère 24h.
|
||||||
|
3. **Renouvellement de circuit** — bouton « nouvelle identité » (NEWNYM) pour changer
|
||||||
|
d'IP de sortie à la volée.
|
||||||
|
|
||||||
|
> Direction **opposée** à `secubox-exposure` : celui-ci publie des *services cachés* Tor
|
||||||
|
> (entrant) ; kbin Tor endpoint fait sortir le surf client *par* Tor (sortant).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Où ça vit
|
||||||
|
|
||||||
|
| Élément | Emplacement |
|
||||||
|
|---------|-------------|
|
||||||
|
| Portail public | `kbin.gk2.secubox.in` → HAProxy → `toolbox_landing` → `10.99.0.1:8088` |
|
||||||
|
| Tableau opérateur | `admin.gk2.secubox.in/toolbox/` |
|
||||||
|
| Vue carto perso | `kbin.gk2.secubox.in/social/me` |
|
||||||
|
| Module | [`packages/secubox-toolbox/`](../../packages/secubox-toolbox/) |
|
||||||
|
| Canal Tor (réutilisé) | [`packages/secubox-exposure/`](../../packages/secubox-exposure/) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Voir aussi
|
||||||
|
|
||||||
|
- [Anti-Track](Anti-Track) — moteur bloque/empoisonne/anonymise (couche DNS/IP)
|
||||||
|
- [FAQ kbin & Tor](../FAQ-KBIN-TOR.md)
|
||||||
|
- Punk Exposure Engine — canal Tor, doctrine dans `CLAUDE.md`
|
||||||
|
- Epic [#662](https://github.com/CyberMind-FR/secubox-deb/issues/662) — migration cœur MITM (Go)
|
||||||
|
- Plan [#683](https://github.com/CyberMind-FR/secubox-deb/issues/683) — kbin Tor endpoint
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*CyberMind — Gérald Kerma · LicenseRef-CMSD-1.0*
|
||||||
147
packages/secubox-toolbox-ng/cmd/sbxmitm/adcand_test.go
Normal file
147
packages/secubox-toolbox-ng/cmd/sbxmitm/adcand_test.go
Normal file
|
|
@ -0,0 +1,147 @@
|
||||||
|
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||||
|
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||||
|
//
|
||||||
|
// SecuBox-Deb :: toolbox-ng :: ad-candidate learning-feed tests (#662)
|
||||||
|
//
|
||||||
|
// The Go cutover blocked from STATIC lists but never emitted LEARNING
|
||||||
|
// candidates, so a brand-new adware (acotedemoi.com) was never observed → never
|
||||||
|
// promoted → slipped through forever. These tests prove the engine now ports
|
||||||
|
// ad_ghost's _AD_PATH heuristic and records a candidate (host,site) for every
|
||||||
|
// 3rd-party ad-path request on the allow/mitm path — the feed autolearn promotes.
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestAdPathRegex(t *testing.T) {
|
||||||
|
hit := []string{
|
||||||
|
"/ad/1.gif", "/ads/x", "/adserver/req", "/pagead/conversion",
|
||||||
|
"/gampad/ads", "/doubleclick/x", "/beacon", "/pixel.gif",
|
||||||
|
"/collect", "/track", "/tracking/p", "/telemetry/v2", "/metric",
|
||||||
|
"/PAGEAD/Upper", // case-insensitive
|
||||||
|
}
|
||||||
|
for _, p := range hit {
|
||||||
|
if !adPathRE.MatchString(p) {
|
||||||
|
t.Errorf("adPathRE should MATCH %q", p)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
miss := []string{"/", "/index.html", "/api/users", "/static/app.js", "/cart", "/headline"}
|
||||||
|
for _, p := range miss {
|
||||||
|
if adPathRE.MatchString(p) {
|
||||||
|
t.Errorf("adPathRE should NOT match %q", p)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// newAdCandTestPolicy builds a Policy with doubleclick.net allowlisted (so the
|
||||||
|
// allowlist-skip branch is exercised) and nothing learned.
|
||||||
|
func newAdCandTestPolicy(t *testing.T) *Policy {
|
||||||
|
t.Helper()
|
||||||
|
pol, err := LoadPolicy(PolicyOpts{
|
||||||
|
AllowPath: writeTemp(t, "doubleclick.net\n"),
|
||||||
|
LearnedPath: writeTemp(t, ""),
|
||||||
|
SpliceSeedPath: writeTemp(t, ""),
|
||||||
|
SpliceLearnPath: writeTemp(t, ""),
|
||||||
|
PureTrackersPath: writeTemp(t, ""),
|
||||||
|
SelfDomains: []string{"secubox.in"},
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("LoadPolicy: %v", err)
|
||||||
|
}
|
||||||
|
return pol
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestMaybeRecordAdCandidate(t *testing.T) {
|
||||||
|
pol := newAdCandTestPolicy(t)
|
||||||
|
|
||||||
|
cases := []struct {
|
||||||
|
name string
|
||||||
|
host string // request host
|
||||||
|
site string // referer site (registrable)
|
||||||
|
path string
|
||||||
|
want bool // candidate recorded?
|
||||||
|
wantHK string
|
||||||
|
}{
|
||||||
|
{"3rd-party ad-path → candidate", "metrics.acotedemoi.com", "lemonde.fr", "/collect", true, "metrics.acotedemoi.com"},
|
||||||
|
{"3rd-party ad-path /pagead", "ads.foo.io", "news.example", "/pagead/x", true, "ads.foo.io"},
|
||||||
|
{"1st-party (same registrable) → no candidate", "static.lemonde.fr", "lemonde.fr", "/ads/x", false, ""},
|
||||||
|
{"3rd-party non-ad-path → no candidate", "cdn.acotedemoi.com", "lemonde.fr", "/app.js", false, ""},
|
||||||
|
{"no site (no Referer) → no candidate", "metrics.acotedemoi.com", "", "/collect", false, ""},
|
||||||
|
{"allowlisted host → no candidate", "ads.doubleclick.net", "lemonde.fr", "/pagead/x", false, ""},
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, tc := range cases {
|
||||||
|
t.Run(tc.name, func(t *testing.T) {
|
||||||
|
cand := newAdCandidates()
|
||||||
|
px := &Proxy{pol: pol, cand: cand, analysisRelay: true}
|
||||||
|
px.maybeRecordAdCandidate(tc.host, tc.site, tc.path)
|
||||||
|
snap := cand.snapshot()
|
||||||
|
if tc.want {
|
||||||
|
if len(snap) != 1 {
|
||||||
|
t.Fatalf("want 1 candidate, got %d (%+v)", len(snap), snap)
|
||||||
|
}
|
||||||
|
if snap[0].Host != tc.wantHK {
|
||||||
|
t.Fatalf("candidate host = %q, want %q", snap[0].Host, tc.wantHK)
|
||||||
|
}
|
||||||
|
if snap[0].Site != tc.site {
|
||||||
|
t.Fatalf("candidate site = %q, want %q", snap[0].Site, tc.site)
|
||||||
|
}
|
||||||
|
if snap[0].Hits != 1 {
|
||||||
|
t.Fatalf("candidate hits = %d, want 1", snap[0].Hits)
|
||||||
|
}
|
||||||
|
} else if len(snap) != 0 {
|
||||||
|
t.Fatalf("want 0 candidates, got %d (%+v)", len(snap), snap)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestAdCandidateGatedByRelay proves the feed is gated behind the analysis/ad
|
||||||
|
// relay flag: with the gate off, nothing is recorded even on a textbook hit.
|
||||||
|
func TestAdCandidateGatedByRelay(t *testing.T) {
|
||||||
|
pol := newAdCandTestPolicy(t)
|
||||||
|
cand := newAdCandidates()
|
||||||
|
px := &Proxy{pol: pol, cand: cand, analysisRelay: false}
|
||||||
|
px.maybeRecordAdCandidate("metrics.acotedemoi.com", "lemonde.fr", "/collect")
|
||||||
|
if n := len(cand.snapshot()); n != 0 {
|
||||||
|
t.Fatalf("relay off: want 0 candidates, got %d", n)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestAdCandidateHitsAccumulate proves repeated (host,site) hits coalesce.
|
||||||
|
func TestAdCandidateHitsAccumulate(t *testing.T) {
|
||||||
|
cand := newAdCandidates()
|
||||||
|
for i := 0; i < 5; i++ {
|
||||||
|
cand.record("x.tracker.io", "site.example")
|
||||||
|
}
|
||||||
|
snap := cand.snapshot()
|
||||||
|
if len(snap) != 1 || snap[0].Hits != 5 {
|
||||||
|
t.Fatalf("want 1 row hits=5, got %+v", snap)
|
||||||
|
}
|
||||||
|
// snapshot clears.
|
||||||
|
if n := len(cand.snapshot()); n != 0 {
|
||||||
|
t.Fatalf("snapshot should clear: got %d", n)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestAdCandidatePayloadShape proves the candidates list serialises into the
|
||||||
|
// extended ad-event payload (host/site/hits keys).
|
||||||
|
func TestAdCandidatePayloadShape(t *testing.T) {
|
||||||
|
cand := newAdCandidates()
|
||||||
|
cand.record("x.tracker.io", "site.example")
|
||||||
|
rows := cand.snapshot()
|
||||||
|
p := adEventPayload{Candidates: rows}
|
||||||
|
if p.empty() {
|
||||||
|
t.Fatal("payload with candidates must not be empty()")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// writeTemp writes content to a fresh temp file and returns its path.
|
||||||
|
func writeTemp(t *testing.T, content string) string {
|
||||||
|
t.Helper()
|
||||||
|
f := filepath.Join(t.TempDir(), "list.txt")
|
||||||
|
writeFile(t, f, content)
|
||||||
|
return f
|
||||||
|
}
|
||||||
|
|
@ -26,10 +26,74 @@ import (
|
||||||
"log"
|
"log"
|
||||||
"net/http"
|
"net/http"
|
||||||
"net/url"
|
"net/url"
|
||||||
|
"regexp"
|
||||||
"sync"
|
"sync"
|
||||||
"time"
|
"time"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
// ── ad-candidate learning feed (#662 auto-learn loop) ─────────────────────────
|
||||||
|
//
|
||||||
|
// The STATIC block list never grows on its own; ad_ghost fed autolearn by
|
||||||
|
// capturing CANDIDATES — 3rd-party requests whose PATH smells like an ad/track
|
||||||
|
// endpoint — into ad_candidates, which secubox-toolbox-autolearn later promotes
|
||||||
|
// into learned-trackers.txt at AD_MIN_SITES distinct sites. The Go cutover
|
||||||
|
// dropped this feed, so new adwares (acotedemoi.com) were never observed. This
|
||||||
|
// restores it in the engine: the allow/mitm hot path records (host,site) when
|
||||||
|
// the request is 3rd-party AND adPathRE matches, buffered + flushed with the
|
||||||
|
// existing ad-event machinery.
|
||||||
|
|
||||||
|
// adPathRE ports ad_ghost._AD_PATH (RE2-safe, case-insensitive). Matches a path
|
||||||
|
// that looks like an ad/track endpoint. Learning only — never a block decision.
|
||||||
|
//
|
||||||
|
// Python: re.compile(r"/ads?/|/adserver|/pagead|/gampad|/doubleclick|/beacon|"
|
||||||
|
// r"/pixel|/collect|/track(ing)?|/telemetry|/metric", re.I)
|
||||||
|
var adPathRE = regexp.MustCompile(`(?i)/ads?/|/adserver|/pagead|/gampad|/doubleclick|/beacon|/pixel|/collect|/track(ing)?|/telemetry|/metric`)
|
||||||
|
|
||||||
|
// adCandMapCap bounds the candidate buffer (mirrors ad_ghost's `len(_cand) <
|
||||||
|
// 20000` guard): NEW keys past the cap are dropped until the next flush clears
|
||||||
|
// it, so a dead portal can never grow memory unbounded.
|
||||||
|
const adCandMapCap = 20000
|
||||||
|
|
||||||
|
// adCandidates is the lock-guarded (host,site)→hits candidate aggregator,
|
||||||
|
// drained by the ad-stats flusher into the ad-event payload's "candidates" list.
|
||||||
|
type adCandidates struct {
|
||||||
|
mu sync.Mutex
|
||||||
|
hit map[adKey]int64
|
||||||
|
}
|
||||||
|
|
||||||
|
func newAdCandidates() *adCandidates { return &adCandidates{hit: map[adKey]int64{}} }
|
||||||
|
|
||||||
|
// record tallies one ad-candidate (host,site). O(1); the cap drops only NEW keys
|
||||||
|
// (existing keys keep accumulating). Empty host is ignored.
|
||||||
|
func (a *adCandidates) record(host, site string) {
|
||||||
|
if host == "" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
a.mu.Lock()
|
||||||
|
defer a.mu.Unlock()
|
||||||
|
k := adKey{adHost: host, site: site}
|
||||||
|
if _, ok := a.hit[k]; ok {
|
||||||
|
a.hit[k]++
|
||||||
|
} else if len(a.hit) < adCandMapCap {
|
||||||
|
a.hit[k] = 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// snapshot atomically reads-and-clears the buffer, returning the candidate rows.
|
||||||
|
func (a *adCandidates) snapshot() []adCandidateRow {
|
||||||
|
a.mu.Lock()
|
||||||
|
defer a.mu.Unlock()
|
||||||
|
if len(a.hit) == 0 {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
rows := make([]adCandidateRow, 0, len(a.hit))
|
||||||
|
for k, n := range a.hit {
|
||||||
|
rows = append(rows, adCandidateRow{Host: k.adHost, Site: k.site, Hits: n})
|
||||||
|
}
|
||||||
|
a.hit = map[adKey]int64{}
|
||||||
|
return rows
|
||||||
|
}
|
||||||
|
|
||||||
// refererSite ports the ad_ghost _site_of logic: parse the Referer header as a
|
// refererSite ports the ad_ghost _site_of logic: parse the Referer header as a
|
||||||
// URL, take its hostname, and return registrable(hostname). Empty Referer or a
|
// URL, take its hostname, and return registrable(hostname). Empty Referer or a
|
||||||
// parse failure → "" (the page that issued the blocked request is unknown).
|
// parse failure → "" (the page that issued the blocked request is unknown).
|
||||||
|
|
@ -133,9 +197,19 @@ type adClientRow struct {
|
||||||
Bytes int64 `json:"bytes"`
|
Bytes int64 `json:"bytes"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// adCandidateRow is one learning candidate (host seen issuing ad-path requests
|
||||||
|
// from a 1st-party site). Mirrors the portal /__toolbox/ad-event "candidates"
|
||||||
|
// contract → store.record_ad_candidates([(host, site, hits), ...]).
|
||||||
|
type adCandidateRow struct {
|
||||||
|
Host string `json:"host"`
|
||||||
|
Site string `json:"site"`
|
||||||
|
Hits int64 `json:"hits"`
|
||||||
|
}
|
||||||
|
|
||||||
type adEventPayload struct {
|
type adEventPayload struct {
|
||||||
Blocks []adBlockRow `json:"blocks"`
|
Blocks []adBlockRow `json:"blocks"`
|
||||||
Clients []adClientRow `json:"clients"`
|
Clients []adClientRow `json:"clients"`
|
||||||
|
Candidates []adCandidateRow `json:"candidates,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
// snapshot atomically reads-and-clears both maps, returning the accumulated rows.
|
// snapshot atomically reads-and-clears both maps, returning the accumulated rows.
|
||||||
|
|
@ -159,7 +233,9 @@ func (a *adStats) snapshot() adEventPayload {
|
||||||
}
|
}
|
||||||
|
|
||||||
// empty reports whether a payload carries no rows (nothing to POST).
|
// empty reports whether a payload carries no rows (nothing to POST).
|
||||||
func (p adEventPayload) empty() bool { return len(p.Blocks) == 0 && len(p.Clients) == 0 }
|
func (p adEventPayload) empty() bool {
|
||||||
|
return len(p.Blocks) == 0 && len(p.Clients) == 0 && len(p.Candidates) == 0
|
||||||
|
}
|
||||||
|
|
||||||
// adEventClient is a short-timeout fire-and-forget client for the ad-event POST.
|
// adEventClient is a short-timeout fire-and-forget client for the ad-event POST.
|
||||||
// Sibling of portalClient (banner.go): the portal is a fixed loopback base, so
|
// Sibling of portalClient (banner.go): the portal is a fixed loopback base, so
|
||||||
|
|
@ -175,8 +251,15 @@ var adEventClient = &http.Client{
|
||||||
// non-2xx) is swallowed with at most a debug log — the metrics are stats, not
|
// non-2xx) is swallowed with at most a debug log — the metrics are stats, not
|
||||||
// security, and the engine must never block on the portal. Exposed (returns the
|
// security, and the engine must never block on the portal. Exposed (returns the
|
||||||
// flushed payload) so the test can assert the snapshot/clear + payload shape.
|
// flushed payload) so the test can assert the snapshot/clear + payload shape.
|
||||||
func (a *adStats) flushOnce(portal string) adEventPayload {
|
//
|
||||||
|
// cand may be nil (the CONNECT PoC / tests with no learning feed); when set its
|
||||||
|
// candidate rows are drained into the SAME payload so the learning feed rides
|
||||||
|
// the existing ad-event channel (one POST per 10s, not two).
|
||||||
|
func (a *adStats) flushOnce(portal string, cand *adCandidates) adEventPayload {
|
||||||
p := a.snapshot()
|
p := a.snapshot()
|
||||||
|
if cand != nil {
|
||||||
|
p.Candidates = cand.snapshot()
|
||||||
|
}
|
||||||
if p.empty() {
|
if p.empty() {
|
||||||
return p
|
return p
|
||||||
}
|
}
|
||||||
|
|
@ -198,10 +281,10 @@ func (a *adStats) flushOnce(portal string) adEventPayload {
|
||||||
// runAdStatsFlusher is the background flusher goroutine: every adFlushInterval it
|
// runAdStatsFlusher is the background flusher goroutine: every adFlushInterval it
|
||||||
// drains the aggregator to the portal. Start it once from main() (like the
|
// drains the aggregator to the portal. Start it once from main() (like the
|
||||||
// engine's other startup goroutines). It runs forever (the process lifetime).
|
// engine's other startup goroutines). It runs forever (the process lifetime).
|
||||||
func (a *adStats) runAdStatsFlusher(portal string) {
|
func (a *adStats) runAdStatsFlusher(portal string, cand *adCandidates) {
|
||||||
t := time.NewTicker(adFlushInterval)
|
t := time.NewTicker(adFlushInterval)
|
||||||
defer t.Stop()
|
defer t.Stop()
|
||||||
for range t.C {
|
for range t.C {
|
||||||
a.flushOnce(portal)
|
a.flushOnce(portal, cand)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -41,7 +41,7 @@ func TestRecordAdBlockEmptyHostIgnored(t *testing.T) {
|
||||||
|
|
||||||
func TestRecordAdBlockPerClientOnlyWhenMacSet(t *testing.T) {
|
func TestRecordAdBlockPerClientOnlyWhenMacSet(t *testing.T) {
|
||||||
a := newAdStats()
|
a := newAdStats()
|
||||||
a.recordAdBlock("ads.example.com", "site", "") // no mac → no client row
|
a.recordAdBlock("ads.example.com", "site", "") // no mac → no client row
|
||||||
a.recordAdBlock("ads.example.com", "site", "mac1") // mac → client row
|
a.recordAdBlock("ads.example.com", "site", "mac1") // mac → client row
|
||||||
a.recordAdBlock("ads.example.com", "site", "mac1")
|
a.recordAdBlock("ads.example.com", "site", "mac1")
|
||||||
|
|
||||||
|
|
@ -111,7 +111,7 @@ func TestFlushOncePayloadShapeMatchesContract(t *testing.T) {
|
||||||
}))
|
}))
|
||||||
defer srv.Close()
|
defer srv.Close()
|
||||||
|
|
||||||
a.flushOnce(srv.URL)
|
a.flushOnce(srv.URL, nil)
|
||||||
|
|
||||||
if ct != "application/json" {
|
if ct != "application/json" {
|
||||||
t.Fatalf("Content-Type = %q, want application/json", ct)
|
t.Fatalf("Content-Type = %q, want application/json", ct)
|
||||||
|
|
@ -145,7 +145,7 @@ func TestFlushOnceEmptySkipsPost(t *testing.T) {
|
||||||
w.WriteHeader(http.StatusNoContent)
|
w.WriteHeader(http.StatusNoContent)
|
||||||
}))
|
}))
|
||||||
defer srv.Close()
|
defer srv.Close()
|
||||||
a.flushOnce(srv.URL)
|
a.flushOnce(srv.URL, nil)
|
||||||
if posted {
|
if posted {
|
||||||
t.Fatalf("flushOnce on empty aggregator must not POST")
|
t.Fatalf("flushOnce on empty aggregator must not POST")
|
||||||
}
|
}
|
||||||
|
|
@ -156,7 +156,7 @@ func TestFlushOnceSwallowsPortalError(t *testing.T) {
|
||||||
a.recordAdBlock("ads.example.com", "site", "")
|
a.recordAdBlock("ads.example.com", "site", "")
|
||||||
// Unreachable portal → must not panic, must still clear the maps (snapshot
|
// Unreachable portal → must not panic, must still clear the maps (snapshot
|
||||||
// happens before the POST).
|
// happens before the POST).
|
||||||
a.flushOnce("http://127.0.0.1:1")
|
a.flushOnce("http://127.0.0.1:1", nil)
|
||||||
if len(a.blocks) != 0 {
|
if len(a.blocks) != 0 {
|
||||||
t.Fatalf("flushOnce must clear maps even on POST failure")
|
t.Fatalf("flushOnce must clear maps even on POST failure")
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -24,6 +24,7 @@ import (
|
||||||
"io"
|
"io"
|
||||||
"log"
|
"log"
|
||||||
"net/http"
|
"net/http"
|
||||||
|
"net/url"
|
||||||
"strings"
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
)
|
)
|
||||||
|
|
@ -111,6 +112,94 @@ func injectLoader(body []byte, clientHash string, wg, cspBypassed bool) []byte {
|
||||||
return body
|
return body
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── inline banner (#662, supersedes injectLoader in the live path) ──────────
|
||||||
|
//
|
||||||
|
// Sites with a SERVICE WORKER (leparisien, cnn…) intercept EVERY same-origin
|
||||||
|
// request, so the legacy <script src="/__toolbox/loader.js"> tag and the
|
||||||
|
// fetch("/__toolbox/bundle") it makes are hijacked by the page's SW (404 /
|
||||||
|
// app-shell) BEFORE they reach this engine → the banner never appears. The fix
|
||||||
|
// is to INLINE the whole banner: the engine fetches the COMPLETE script body
|
||||||
|
// from the portal server-side (once per injected HTML response) and bakes it
|
||||||
|
// into a self-contained <script>…</script> with mh/wg/csp + the bundle as JS
|
||||||
|
// literals — so there is NOTHING same-origin for the SW to hijack.
|
||||||
|
//
|
||||||
|
// injectLoader + the /__toolbox/loader.js short-circuit are KEPT (not removed)
|
||||||
|
// for compatibility, but the live inject path now uses the inline banner.
|
||||||
|
|
||||||
|
// fetchInlineBanner fetches the COMPLETE inline banner script BODY from the
|
||||||
|
// portal's /__toolbox/inline endpoint (which bakes mh/wg/csp + the bundle as JS
|
||||||
|
// literals). Returns (body, true) on a 2xx; FAIL-OPEN (returns "", false) on any
|
||||||
|
// error — portal down, timeout, non-2xx, read failure — so the caller simply
|
||||||
|
// skips the inject and serves the page intact (no banner, like today's fail-open
|
||||||
|
// when the portal asset 204s). It NEVER breaks a navigation over a banner.
|
||||||
|
//
|
||||||
|
// wg → "1" else "0"; cspBypassed → csp=1 (the 🔓 proof) else 0; clientHash is
|
||||||
|
// ascii-sanitised exactly like the data-mh attribute was.
|
||||||
|
func fetchInlineBanner(portal, clientHash string, wg, cspBypassed bool) (string, bool) {
|
||||||
|
wgVal := "0"
|
||||||
|
if wg {
|
||||||
|
wgVal = "1"
|
||||||
|
}
|
||||||
|
cspVal := "0"
|
||||||
|
if cspBypassed {
|
||||||
|
cspVal = "1"
|
||||||
|
}
|
||||||
|
q := url.Values{}
|
||||||
|
q.Set("mh", asciiOnly(clientHash))
|
||||||
|
q.Set("wg", wgVal)
|
||||||
|
q.Set("csp", cspVal)
|
||||||
|
target := strings.TrimRight(portal, "/") + "/__toolbox/inline?" + q.Encode()
|
||||||
|
resp, err := portalClient.Get(target)
|
||||||
|
if err != nil {
|
||||||
|
log.Printf("inline banner fetch failed for %s: %v", target, err)
|
||||||
|
return "", false
|
||||||
|
}
|
||||||
|
defer resp.Body.Close()
|
||||||
|
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
||||||
|
log.Printf("inline banner fetch non-2xx (%d) for %s", resp.StatusCode, target)
|
||||||
|
return "", false
|
||||||
|
}
|
||||||
|
body, rerr := io.ReadAll(io.LimitReader(resp.Body, 8<<20))
|
||||||
|
if rerr != nil {
|
||||||
|
log.Printf("inline banner read failed for %s: %v", target, rerr)
|
||||||
|
return "", false
|
||||||
|
}
|
||||||
|
return string(body), true
|
||||||
|
}
|
||||||
|
|
||||||
|
// injectInlineBanner inserts a SELF-CONTAINED <script>scriptBody</script> into an
|
||||||
|
// HTML body once. It is idempotent via the SAME bannerGuard marker injectLoader
|
||||||
|
// uses (so a body already carrying either form is never double-injected), and it
|
||||||
|
// uses the SAME placement injectLoader did:
|
||||||
|
// - guard idempotency: body already contains bannerGuard → unchanged.
|
||||||
|
// - after the first (case-insensitive) "<head"'s closing '>'.
|
||||||
|
// - else right BEFORE the first "<body".
|
||||||
|
// - else return the body unchanged (no inject).
|
||||||
|
//
|
||||||
|
// scriptBody is the COMPLETE inline IIFE from fetchInlineBanner (NOT a src tag);
|
||||||
|
// an empty scriptBody is a no-op (returns the body unchanged) so a failed/skipped
|
||||||
|
// fetch is handled gracefully by the caller passing "".
|
||||||
|
func injectInlineBanner(body []byte, scriptBody string) []byte {
|
||||||
|
if scriptBody == "" {
|
||||||
|
return body
|
||||||
|
}
|
||||||
|
if bytes.Contains(body, []byte(bannerGuard)) {
|
||||||
|
return body
|
||||||
|
}
|
||||||
|
script := []byte("<!-- " + bannerGuard + " --><script>" + scriptBody + "</script>")
|
||||||
|
low := bytes.ToLower(body)
|
||||||
|
|
||||||
|
if h := bytes.Index(low, []byte("<head")); h >= 0 {
|
||||||
|
if j := bytes.IndexByte(body[h:], '>'); j >= 0 {
|
||||||
|
return spliceAt(body, script, h+j+1)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if b := bytes.Index(low, []byte("<body")); b >= 0 {
|
||||||
|
return spliceAt(body, script, b)
|
||||||
|
}
|
||||||
|
return body
|
||||||
|
}
|
||||||
|
|
||||||
// ── /__toolbox/* reverse-proxy to the portal ─────────────────────────────────
|
// ── /__toolbox/* reverse-proxy to the portal ─────────────────────────────────
|
||||||
|
|
||||||
// isToolboxAssetPath reports whether a request path is one of the banner assets
|
// isToolboxAssetPath reports whether a request path is one of the banner assets
|
||||||
|
|
|
||||||
|
|
@ -10,10 +10,19 @@
|
||||||
package main
|
package main
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"net/http"
|
||||||
|
"net/http/httptest"
|
||||||
"strings"
|
"strings"
|
||||||
"testing"
|
"testing"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
// inlineTestScript is a stand-in for the COMPLETE inline banner body that
|
||||||
|
// fetchInlineBanner pulls from the portal. The Go engine treats it as an opaque
|
||||||
|
// string (the JS literal-baking is the portal's job, covered by the Python
|
||||||
|
// tests); these tests only assert placement / idempotency / fail-open. Shared
|
||||||
|
// across banner_test, gzip_test, compress_test, cosmetic_test.
|
||||||
|
const inlineTestScript = `(function(){window.__SBX_LOADER__=1;})();`
|
||||||
|
|
||||||
func TestInjectLoaderGuardIdempotent(t *testing.T) {
|
func TestInjectLoaderGuardIdempotent(t *testing.T) {
|
||||||
// Body already carrying the guard → returned byte-for-byte unchanged.
|
// Body already carrying the guard → returned byte-for-byte unchanged.
|
||||||
body := []byte("<html><head><!-- " + bannerGuard + " --><script></script></head><body>hi</body></html>")
|
body := []byte("<html><head><!-- " + bannerGuard + " --><script></script></head><body>hi</body></html>")
|
||||||
|
|
@ -130,3 +139,141 @@ func TestPortalTargetURL(t *testing.T) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── #662 inline banner (SW-immune; supersedes injectLoader in the live path) ──
|
||||||
|
|
||||||
|
func TestInjectInlineBannerEmptyScriptNoop(t *testing.T) {
|
||||||
|
// scriptBody == "" (fetch failed/skipped) → no inject, body unchanged.
|
||||||
|
body := []byte(`<html><head></head><body>hi</body></html>`)
|
||||||
|
out := injectInlineBanner(body, "")
|
||||||
|
if string(out) != string(body) {
|
||||||
|
t.Fatalf("empty scriptBody must be a no-op.\n got: %s", out)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectInlineBannerGuardIdempotent(t *testing.T) {
|
||||||
|
// Body already carrying the guard → returned byte-for-byte unchanged.
|
||||||
|
body := []byte("<html><head><!-- " + bannerGuard + " --><script></script></head><body>hi</body></html>")
|
||||||
|
out := injectInlineBanner(body, inlineTestScript)
|
||||||
|
if string(out) != string(body) {
|
||||||
|
t.Fatalf("guarded body must be unchanged.\n got: %s", out)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectInlineBannerHeadInsertion(t *testing.T) {
|
||||||
|
body := []byte(`<html><head lang="en"><title>x</title></head><body>hi</body></html>`)
|
||||||
|
out := string(injectInlineBanner(body, inlineTestScript))
|
||||||
|
headOpen := `<head lang="en">`
|
||||||
|
idx := strings.Index(out, headOpen)
|
||||||
|
if idx < 0 {
|
||||||
|
t.Fatalf("head open lost: %s", out)
|
||||||
|
}
|
||||||
|
after := out[idx+len(headOpen):]
|
||||||
|
// An INLINE <script> (not <script src), carrying the body verbatim, right
|
||||||
|
// after the <head>'s '>'.
|
||||||
|
wantTag := `<!-- ` + bannerGuard + ` --><script>` + inlineTestScript + `</script>`
|
||||||
|
if !strings.HasPrefix(after, wantTag) {
|
||||||
|
t.Fatalf("inline tag not inserted right after <head>'s '>'.\n got: %s", after)
|
||||||
|
}
|
||||||
|
if strings.Contains(out, "<script src=") {
|
||||||
|
t.Fatalf("inline banner must NOT be a <script src> tag: %s", out)
|
||||||
|
}
|
||||||
|
if !strings.Contains(out, wantTag+`<title>x</title>`) {
|
||||||
|
t.Fatalf("original head content displaced: %s", out)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectInlineBannerBodyFallback(t *testing.T) {
|
||||||
|
body := []byte(`<html><body class="x">hi</body></html>`)
|
||||||
|
out := string(injectInlineBanner(body, inlineTestScript))
|
||||||
|
wantTag := `<!-- ` + bannerGuard + ` --><script>` + inlineTestScript + `</script>`
|
||||||
|
if !strings.Contains(out, wantTag+`<body class="x">`) {
|
||||||
|
t.Fatalf("inline tag not inserted right before <body>.\n got: %s", out)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectInlineBannerNeitherHeadNorBody(t *testing.T) {
|
||||||
|
body := []byte(`<p>just a fragment</p>`)
|
||||||
|
out := injectInlineBanner(body, inlineTestScript)
|
||||||
|
if string(out) != string(body) {
|
||||||
|
t.Fatalf("no head/body → must be unchanged.\n got: %s", out)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestInjectInlineBannerCaseInsensitiveHead(t *testing.T) {
|
||||||
|
body := []byte(`<HTML><HEAD></HEAD><BODY>hi</BODY></HTML>`)
|
||||||
|
out := string(injectInlineBanner(body, inlineTestScript))
|
||||||
|
if !strings.Contains(out, `<HEAD><!-- `+bannerGuard) {
|
||||||
|
t.Fatalf("case-insensitive <HEAD> match failed: %s", out)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestFetchInlineBannerOK(t *testing.T) {
|
||||||
|
// Portal returns a body + 200 → fetchInlineBanner returns (body, true) and
|
||||||
|
// echoes mh/wg/csp into the query.
|
||||||
|
var gotQuery string
|
||||||
|
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||||
|
gotQuery = r.URL.RawQuery
|
||||||
|
w.Header().Set("Content-Type", "application/javascript")
|
||||||
|
_, _ = w.Write([]byte(inlineTestScript))
|
||||||
|
}))
|
||||||
|
defer srv.Close()
|
||||||
|
|
||||||
|
body, ok := fetchInlineBanner(srv.URL, "deadbeef", true, true)
|
||||||
|
if !ok {
|
||||||
|
t.Fatal("fetchInlineBanner must report ok=true on a 200")
|
||||||
|
}
|
||||||
|
if body != inlineTestScript {
|
||||||
|
t.Fatalf("fetchInlineBanner body mismatch: %q", body)
|
||||||
|
}
|
||||||
|
for _, want := range []string{"mh=deadbeef", "wg=1", "csp=1"} {
|
||||||
|
if !strings.Contains(gotQuery, want) {
|
||||||
|
t.Fatalf("query %q missing %q", gotQuery, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestFetchInlineBannerWGCSPZero(t *testing.T) {
|
||||||
|
var gotQuery string
|
||||||
|
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||||
|
gotQuery = r.URL.RawQuery
|
||||||
|
_, _ = w.Write([]byte(inlineTestScript))
|
||||||
|
}))
|
||||||
|
defer srv.Close()
|
||||||
|
if _, ok := fetchInlineBanner(srv.URL, "x", false, false); !ok {
|
||||||
|
t.Fatal("ok=true expected")
|
||||||
|
}
|
||||||
|
for _, want := range []string{"wg=0", "csp=0"} {
|
||||||
|
if !strings.Contains(gotQuery, want) {
|
||||||
|
t.Fatalf("query %q missing %q", gotQuery, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestFetchInlineBannerFailOpenDeadPortal(t *testing.T) {
|
||||||
|
// A dead portal (closed listener) → fail-open: ("", false) → caller skips the
|
||||||
|
// inject and serves the page intact. No panic, no error surfaced.
|
||||||
|
srv := httptest.NewServer(http.HandlerFunc(func(http.ResponseWriter, *http.Request) {}))
|
||||||
|
url := srv.URL
|
||||||
|
srv.Close() // close BEFORE the fetch → dial error
|
||||||
|
|
||||||
|
body, ok := fetchInlineBanner(url, "x", false, false)
|
||||||
|
if ok {
|
||||||
|
t.Fatal("dead portal must fail open (ok=false)")
|
||||||
|
}
|
||||||
|
if body != "" {
|
||||||
|
t.Fatalf("fail-open body must be empty, got %q", body)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestFetchInlineBannerNon2xxFailOpen(t *testing.T) {
|
||||||
|
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||||
|
w.WriteHeader(http.StatusInternalServerError)
|
||||||
|
_, _ = w.Write([]byte("boom"))
|
||||||
|
}))
|
||||||
|
defer srv.Close()
|
||||||
|
body, ok := fetchInlineBanner(srv.URL, "x", false, false)
|
||||||
|
if ok || body != "" {
|
||||||
|
t.Fatalf("non-2xx must fail open: ok=%v body=%q", ok, body)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
|
||||||
|
|
@ -90,7 +90,7 @@ func TestInjectIntoBodyBrotli(t *testing.T) {
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatal(err)
|
t.Fatal(err)
|
||||||
}
|
}
|
||||||
out, ok := injectIntoBody(enc, "br", "abc123", true, false)
|
out, ok := injectIntoBody(enc, "br", inlineTestScript, true)
|
||||||
if !ok {
|
if !ok {
|
||||||
t.Fatal("br inject must report ok=true")
|
t.Fatal("br inject must report ok=true")
|
||||||
}
|
}
|
||||||
|
|
@ -113,7 +113,7 @@ func TestInjectIntoBodyZstd(t *testing.T) {
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatal(err)
|
t.Fatal(err)
|
||||||
}
|
}
|
||||||
out, ok := injectIntoBody(enc, "zstd", "abc123", true, false)
|
out, ok := injectIntoBody(enc, "zstd", inlineTestScript, true)
|
||||||
if !ok {
|
if !ok {
|
||||||
t.Fatal("zstd inject must report ok=true")
|
t.Fatal("zstd inject must report ok=true")
|
||||||
}
|
}
|
||||||
|
|
@ -132,7 +132,7 @@ func TestInjectIntoBodyZstd(t *testing.T) {
|
||||||
|
|
||||||
func TestInjectIntoBodyBrotliCaseInsensitive(t *testing.T) {
|
func TestInjectIntoBodyBrotliCaseInsensitive(t *testing.T) {
|
||||||
enc, _ := brotliBytes([]byte(`<head></head>`))
|
enc, _ := brotliBytes([]byte(`<head></head>`))
|
||||||
out, ok := injectIntoBody(enc, "BR", "z", false, false)
|
out, ok := injectIntoBody(enc, "BR", inlineTestScript, false)
|
||||||
if !ok {
|
if !ok {
|
||||||
t.Fatal("Content-Encoding BR (upper) must be recognised → ok=true")
|
t.Fatal("Content-Encoding BR (upper) must be recognised → ok=true")
|
||||||
}
|
}
|
||||||
|
|
@ -147,7 +147,7 @@ func TestInjectIntoBodyBrotliCaseInsensitive(t *testing.T) {
|
||||||
|
|
||||||
func TestInjectIntoBodyBrotliFailOpen(t *testing.T) {
|
func TestInjectIntoBodyBrotliFailOpen(t *testing.T) {
|
||||||
bad := []byte("not brotli at all <head></head>")
|
bad := []byte("not brotli at all <head></head>")
|
||||||
out, ok := injectIntoBody(bad, "br", "x", false, false)
|
out, ok := injectIntoBody(bad, "br", inlineTestScript, false)
|
||||||
if ok {
|
if ok {
|
||||||
t.Fatal("corrupt br body must fail open (ok=false)")
|
t.Fatal("corrupt br body must fail open (ok=false)")
|
||||||
}
|
}
|
||||||
|
|
@ -158,7 +158,7 @@ func TestInjectIntoBodyBrotliFailOpen(t *testing.T) {
|
||||||
|
|
||||||
func TestInjectIntoBodyZstdFailOpen(t *testing.T) {
|
func TestInjectIntoBodyZstdFailOpen(t *testing.T) {
|
||||||
bad := []byte("not zstd at all <head></head>")
|
bad := []byte("not zstd at all <head></head>")
|
||||||
out, ok := injectIntoBody(bad, "zstd", "x", false, false)
|
out, ok := injectIntoBody(bad, "zstd", inlineTestScript, false)
|
||||||
if ok {
|
if ok {
|
||||||
t.Fatal("corrupt zstd body must fail open (ok=false)")
|
t.Fatal("corrupt zstd body must fail open (ok=false)")
|
||||||
}
|
}
|
||||||
|
|
@ -177,7 +177,7 @@ func TestBrotliZstdBombGuard(t *testing.T) {
|
||||||
t.Fatal("unbrotliBytes must reject output exceeding gunzipCap")
|
t.Fatal("unbrotliBytes must reject output exceeding gunzipCap")
|
||||||
}
|
}
|
||||||
// fail-open through the inject path.
|
// fail-open through the inject path.
|
||||||
if out, ok := injectIntoBody(brBomb, "br", "x", false, false); ok || !bytes.Equal(out, brBomb) {
|
if out, ok := injectIntoBody(brBomb, "br", inlineTestScript, false); ok || !bytes.Equal(out, brBomb) {
|
||||||
t.Fatal("over-cap br body must fail open with original bytes")
|
t.Fatal("over-cap br body must fail open with original bytes")
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -188,7 +188,7 @@ func TestBrotliZstdBombGuard(t *testing.T) {
|
||||||
if _, err := unzstdBytes(zsBomb); err == nil {
|
if _, err := unzstdBytes(zsBomb); err == nil {
|
||||||
t.Fatal("unzstdBytes must reject output exceeding gunzipCap")
|
t.Fatal("unzstdBytes must reject output exceeding gunzipCap")
|
||||||
}
|
}
|
||||||
if out, ok := injectIntoBody(zsBomb, "zstd", "x", false, false); ok || !bytes.Equal(out, zsBomb) {
|
if out, ok := injectIntoBody(zsBomb, "zstd", inlineTestScript, false); ok || !bytes.Equal(out, zsBomb) {
|
||||||
t.Fatal("over-cap zstd body must fail open with original bytes")
|
t.Fatal("over-cap zstd body must fail open with original bytes")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -132,27 +132,32 @@ func TestInjectCosmeticCaseInsensitive(t *testing.T) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestInjectLoaderAndCosmeticCompose(t *testing.T) {
|
func TestInjectInlineBannerAndCosmeticCompose(t *testing.T) {
|
||||||
// Both markers must be present after composing the two injects (wg client).
|
// Both markers must be present after composing the two injects (wg client).
|
||||||
|
// #662 — the banner is now the INLINE script (not a <script src> tag).
|
||||||
body := []byte(`<html><head></head><body>hi</body></html>`)
|
body := []byte(`<html><head></head><body>hi</body></html>`)
|
||||||
out := string(injectHTML(body, "deadbeef", true, false))
|
out := string(injectHTML(body, inlineTestScript, true))
|
||||||
if !strings.Contains(out, bannerGuard) {
|
if !strings.Contains(out, bannerGuard) {
|
||||||
t.Fatalf("loader marker missing after compose: %s", out)
|
t.Fatalf("banner marker missing after compose: %s", out)
|
||||||
}
|
}
|
||||||
if !strings.Contains(out, cosmeticGuard) {
|
if !strings.Contains(out, cosmeticGuard) {
|
||||||
t.Fatalf("cosmetic marker missing after compose: %s", out)
|
t.Fatalf("cosmetic marker missing after compose: %s", out)
|
||||||
}
|
}
|
||||||
if !strings.Contains(out, `data-mh="deadbeef"`) {
|
// The inline banner is an inline <script> carrying the baked body, NOT a src.
|
||||||
t.Fatalf("loader data-mh missing after compose: %s", out)
|
if !strings.Contains(out, "<script>"+inlineTestScript+"</script>") {
|
||||||
|
t.Fatalf("inline banner body missing after compose: %s", out)
|
||||||
|
}
|
||||||
|
if strings.Contains(out, "<script src=") {
|
||||||
|
t.Fatalf("inline path must NOT emit a <script src> tag: %s", out)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestInjectHTMLNonWGSkipsCosmetic(t *testing.T) {
|
func TestInjectHTMLNonWGSkipsCosmetic(t *testing.T) {
|
||||||
// Non-WG (non-R3) clients get the loader but NOT the cosmetic style.
|
// Non-WG (non-R3) clients get the banner but NOT the cosmetic style.
|
||||||
body := []byte(`<html><head></head><body>hi</body></html>`)
|
body := []byte(`<html><head></head><body>hi</body></html>`)
|
||||||
out := string(injectHTML(body, "x", false, false))
|
out := string(injectHTML(body, inlineTestScript, false))
|
||||||
if !strings.Contains(out, bannerGuard) {
|
if !strings.Contains(out, bannerGuard) {
|
||||||
t.Fatalf("loader marker missing for non-wg: %s", out)
|
t.Fatalf("banner marker missing for non-wg: %s", out)
|
||||||
}
|
}
|
||||||
if strings.Contains(out, cosmeticGuard) {
|
if strings.Contains(out, cosmeticGuard) {
|
||||||
t.Fatalf("cosmetic style must NOT be injected for non-wg client: %s", out)
|
t.Fatalf("cosmetic style must NOT be injected for non-wg client: %s", out)
|
||||||
|
|
@ -163,7 +168,7 @@ func TestInjectIntoBodyGzipCarriesCosmetic(t *testing.T) {
|
||||||
// The gzip decompress→inject→recompress path must carry BOTH injects for wg.
|
// The gzip decompress→inject→recompress path must carry BOTH injects for wg.
|
||||||
body := []byte(`<html><head></head><body>hi</body></html>`)
|
body := []byte(`<html><head></head><body>hi</body></html>`)
|
||||||
gz := gzipBytes(body)
|
gz := gzipBytes(body)
|
||||||
out, ok := injectIntoBody(gz, "gzip", "mh1", true, false)
|
out, ok := injectIntoBody(gz, "gzip", inlineTestScript, true)
|
||||||
if !ok {
|
if !ok {
|
||||||
t.Fatalf("injectIntoBody(gzip) returned ok=false")
|
t.Fatalf("injectIntoBody(gzip) returned ok=false")
|
||||||
}
|
}
|
||||||
|
|
@ -174,4 +179,8 @@ func TestInjectIntoBodyGzipCarriesCosmetic(t *testing.T) {
|
||||||
if !strings.Contains(string(plain), bannerGuard) || !strings.Contains(string(plain), cosmeticGuard) {
|
if !strings.Contains(string(plain), bannerGuard) || !strings.Contains(string(plain), cosmeticGuard) {
|
||||||
t.Fatalf("gzip path lost a marker: %s", plain)
|
t.Fatalf("gzip path lost a marker: %s", plain)
|
||||||
}
|
}
|
||||||
|
// The inline banner script body survives the gzip round-trip.
|
||||||
|
if !strings.Contains(string(plain), "<script>"+inlineTestScript+"</script>") {
|
||||||
|
t.Fatalf("inline banner body lost on gzip path: %s", plain)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -146,31 +146,39 @@ func zstdBytes(in []byte) ([]byte, error) {
|
||||||
}
|
}
|
||||||
|
|
||||||
// injectHTML applies BOTH HTML transforms in one pass over the DECOMPRESSED
|
// injectHTML applies BOTH HTML transforms in one pass over the DECOMPRESSED
|
||||||
// body: the transparency-banner loader (always) AND, for R3 (wg) clients, the
|
// body: the transparency-banner (always, via the INLINE script) AND, for R3 (wg)
|
||||||
// ad/popup-hiding cosmetic <style> (#662 — the cutover left this unported). Both
|
// clients, the ad/popup-hiding cosmetic <style> (#662 — the cutover left this
|
||||||
// are idempotent (own guard markers) and order-independent; running them in the
|
// unported). Both are idempotent (own guard markers) and order-independent;
|
||||||
// same decompressed step means the cosmetic style benefits from the gzip
|
// running them in the same decompressed step means the cosmetic style benefits
|
||||||
// handling exactly like the loader. The cosmetic style is gated to wg because it
|
// from the gzip handling exactly like the banner. The cosmetic style is gated to
|
||||||
// is an R3-tunnel opt-in behaviour (mirrors the Python addon's _is_r3plus gate).
|
// wg because it is an R3-tunnel opt-in behaviour (mirrors the Python addon's
|
||||||
func injectHTML(plain []byte, clientHash string, wg, cspBypassed bool) []byte {
|
// _is_r3plus gate).
|
||||||
out := injectLoader(plain, clientHash, wg, cspBypassed)
|
//
|
||||||
|
// #662 — scriptBody is the COMPLETE inline banner IIFE pre-fetched server-side
|
||||||
|
// from the portal (fetchInlineBanner). We INLINE it (injectInlineBanner) instead
|
||||||
|
// of a <script src="/__toolbox/loader.js"> tag so a site's SERVICE WORKER has no
|
||||||
|
// same-origin request to hijack. An empty scriptBody (fetch failed/skipped) makes
|
||||||
|
// the banner inject a no-op — fail-open, page intact. The cosmetic <style> is
|
||||||
|
// already inline and SW-immune, so it is UNCHANGED.
|
||||||
|
func injectHTML(plain []byte, scriptBody string, wg bool) []byte {
|
||||||
|
out := injectInlineBanner(plain, scriptBody)
|
||||||
if wg {
|
if wg {
|
||||||
out = injectCosmetic(out)
|
out = injectCosmetic(out)
|
||||||
}
|
}
|
||||||
return out
|
return out
|
||||||
}
|
}
|
||||||
|
|
||||||
// injectIntoBody runs the HTML injection (loader + R3 cosmetic style) over a
|
// injectIntoBody runs the HTML injection (inline banner + R3 cosmetic style) over
|
||||||
// (possibly gzip-compressed) HTML body, returning the new body bytes to serve
|
// a (possibly compressed) HTML body, returning the new body bytes to serve and
|
||||||
// and whether the body was rewritten. cspBypassed (#662) is threaded into the
|
// whether the body was rewritten. scriptBody (#662) is the COMPLETE inline banner
|
||||||
// loader tag as data-csp="1" when a real CSP was relaxed on this page.
|
// IIFE pre-fetched from the portal; "" → the banner inject is skipped (fail-open).
|
||||||
//
|
//
|
||||||
// - encoding == "" (identity): injectHTML runs directly on body; the result
|
// - encoding == "" (identity): injectHTML runs directly on body; the result
|
||||||
// is returned (ok=true). The caller MUST update Content-Length to len(out).
|
// is returned (ok=true). The caller MUST update Content-Length to len(out).
|
||||||
// - encoding ∈ {gzip, br, zstd} (case-insensitive): the body is decoded,
|
// - encoding ∈ {gzip, br, zstd} (case-insensitive): the body is decoded,
|
||||||
// injected, then RE-ENCODED in the SAME codec so the client transfer stays
|
// injected, then RE-ENCODED in the SAME codec so the client transfer stays
|
||||||
// compressed (the tunnel is perf-sensitive) and Content-Encoding is
|
// compressed (the tunnel is perf-sensitive) and Content-Encoding is
|
||||||
// UNCHANGED. The caller sets Content-Length to len(out). BOTH the loader and
|
// UNCHANGED. The caller sets Content-Length to len(out). BOTH the banner and
|
||||||
// the cosmetic style are injected on the decompressed body, so the cosmetic
|
// the cosmetic style are injected on the decompressed body, so the cosmetic
|
||||||
// CSS lands on compressed pages too (the common case).
|
// CSS lands on compressed pages too (the common case).
|
||||||
// - any other encoding (deflate, multi-value, …): pass through untouched,
|
// - any other encoding (deflate, multi-value, …): pass through untouched,
|
||||||
|
|
@ -181,23 +189,23 @@ func injectHTML(plain []byte, clientHash string, wg, cspBypassed bool) []byte {
|
||||||
// never broken or corrupted.
|
// never broken or corrupted.
|
||||||
//
|
//
|
||||||
// The 32MiB decompression-bomb cap (gunzipCap) is enforced uniformly across
|
// The 32MiB decompression-bomb cap (gunzipCap) is enforced uniformly across
|
||||||
// gzip/br/zstd. idempotency / placement live inside injectLoader/injectCosmetic.
|
// gzip/br/zstd. idempotency / placement live inside injectInlineBanner/injectCosmetic.
|
||||||
func injectIntoBody(body []byte, encoding, clientHash string, wg, cspBypassed bool) (out []byte, ok bool) {
|
func injectIntoBody(body []byte, encoding, scriptBody string, wg bool) (out []byte, ok bool) {
|
||||||
switch strings.ToLower(strings.TrimSpace(encoding)) {
|
switch strings.ToLower(strings.TrimSpace(encoding)) {
|
||||||
case "":
|
case "":
|
||||||
return injectHTML(body, clientHash, wg, cspBypassed), true
|
return injectHTML(body, scriptBody, wg), true
|
||||||
case "gzip":
|
case "gzip":
|
||||||
plain, err := gunzipBytes(body)
|
plain, err := gunzipBytes(body)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return body, false // fail open: serve the original compressed bytes
|
return body, false // fail open: serve the original compressed bytes
|
||||||
}
|
}
|
||||||
return gzipBytes(injectHTML(plain, clientHash, wg, cspBypassed)), true
|
return gzipBytes(injectHTML(plain, scriptBody, wg)), true
|
||||||
case "br":
|
case "br":
|
||||||
plain, err := unbrotliBytes(body)
|
plain, err := unbrotliBytes(body)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return body, false // fail open
|
return body, false // fail open
|
||||||
}
|
}
|
||||||
reenc, err := brotliBytes(injectHTML(plain, clientHash, wg, cspBypassed))
|
reenc, err := brotliBytes(injectHTML(plain, scriptBody, wg))
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return body, false // fail open: never serve a truncated br frame
|
return body, false // fail open: never serve a truncated br frame
|
||||||
}
|
}
|
||||||
|
|
@ -207,7 +215,7 @@ func injectIntoBody(body []byte, encoding, clientHash string, wg, cspBypassed bo
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return body, false // fail open
|
return body, false // fail open
|
||||||
}
|
}
|
||||||
reenc, err := zstdBytes(injectHTML(plain, clientHash, wg, cspBypassed))
|
reenc, err := zstdBytes(injectHTML(plain, scriptBody, wg))
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return body, false // fail open: never serve a truncated zstd frame
|
return body, false // fail open: never serve a truncated zstd frame
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -44,7 +44,7 @@ func TestInjectIntoBodyGzip(t *testing.T) {
|
||||||
// End-to-end-ish: HTML with <head>, gzipped, run through the exact transform
|
// End-to-end-ish: HTML with <head>, gzipped, run through the exact transform
|
||||||
// the inject path uses. Result must gunzip back to an injected, intact doc.
|
// the inject path uses. Result must gunzip back to an injected, intact doc.
|
||||||
html := `<html><head><title>page</title></head><body>content</body></html>`
|
html := `<html><head><title>page</title></head><body>content</body></html>`
|
||||||
out, ok := injectIntoBody(gzipBytes([]byte(html)), "gzip", "abc123", true, false)
|
out, ok := injectIntoBody(gzipBytes([]byte(html)), "gzip", inlineTestScript, true)
|
||||||
if !ok {
|
if !ok {
|
||||||
t.Fatal("gzip inject must report ok=true")
|
t.Fatal("gzip inject must report ok=true")
|
||||||
}
|
}
|
||||||
|
|
@ -68,7 +68,7 @@ func TestInjectIntoBodyGzip(t *testing.T) {
|
||||||
|
|
||||||
func TestInjectIntoBodyGzipCaseInsensitiveEncoding(t *testing.T) {
|
func TestInjectIntoBodyGzipCaseInsensitiveEncoding(t *testing.T) {
|
||||||
html := `<head></head>`
|
html := `<head></head>`
|
||||||
out, ok := injectIntoBody(gzipBytes([]byte(html)), "GZIP", "z", false, false)
|
out, ok := injectIntoBody(gzipBytes([]byte(html)), "GZIP", inlineTestScript, false)
|
||||||
if !ok {
|
if !ok {
|
||||||
t.Fatal("Content-Encoding GZIP (upper) must be recognised → ok=true")
|
t.Fatal("Content-Encoding GZIP (upper) must be recognised → ok=true")
|
||||||
}
|
}
|
||||||
|
|
@ -85,7 +85,7 @@ func TestInjectIntoBodyGzipFailOpen(t *testing.T) {
|
||||||
// Bytes labelled gzip but NOT gzip → fail open: original bytes, ok=false,
|
// Bytes labelled gzip but NOT gzip → fail open: original bytes, ok=false,
|
||||||
// no panic.
|
// no panic.
|
||||||
bad := []byte("not gzip at all <head></head>")
|
bad := []byte("not gzip at all <head></head>")
|
||||||
out, ok := injectIntoBody(bad, "gzip", "x", false, false)
|
out, ok := injectIntoBody(bad, "gzip", inlineTestScript, false)
|
||||||
if ok {
|
if ok {
|
||||||
t.Fatal("corrupt gzip body must fail open (ok=false)")
|
t.Fatal("corrupt gzip body must fail open (ok=false)")
|
||||||
}
|
}
|
||||||
|
|
@ -97,7 +97,7 @@ func TestInjectIntoBodyGzipFailOpen(t *testing.T) {
|
||||||
func TestInjectIntoBodyIdentity(t *testing.T) {
|
func TestInjectIntoBodyIdentity(t *testing.T) {
|
||||||
// Identity (empty Content-Encoding): inject directly, grown body returned.
|
// Identity (empty Content-Encoding): inject directly, grown body returned.
|
||||||
html := []byte(`<html><head></head><body>hi</body></html>`)
|
html := []byte(`<html><head></head><body>hi</body></html>`)
|
||||||
out, ok := injectIntoBody(html, "", "deadbeef", false, false)
|
out, ok := injectIntoBody(html, "", inlineTestScript, false)
|
||||||
if !ok {
|
if !ok {
|
||||||
t.Fatal("identity inject must report ok=true")
|
t.Fatal("identity inject must report ok=true")
|
||||||
}
|
}
|
||||||
|
|
@ -113,7 +113,7 @@ func TestInjectIntoBodyUnknownEncodingPassthrough(t *testing.T) {
|
||||||
// #662 — gzip/br/zstd are now ALL decoded+re-encoded; deflate (and any other
|
// #662 — gzip/br/zstd are now ALL decoded+re-encoded; deflate (and any other
|
||||||
// codec / multi-value AE) remains an unknown encoding we pass through.
|
// codec / multi-value AE) remains an unknown encoding we pass through.
|
||||||
body := []byte("\x78\x9c some deflate-ish bytes")
|
body := []byte("\x78\x9c some deflate-ish bytes")
|
||||||
out, ok := injectIntoBody(body, "deflate", "x", false, false)
|
out, ok := injectIntoBody(body, "deflate", inlineTestScript, false)
|
||||||
if ok {
|
if ok {
|
||||||
t.Fatal("unknown encoding must pass through (ok=false)")
|
t.Fatal("unknown encoding must pass through (ok=false)")
|
||||||
}
|
}
|
||||||
|
|
@ -131,7 +131,7 @@ func TestGunzipBombGuard(t *testing.T) {
|
||||||
t.Fatal("gunzipBytes must reject output exceeding gunzipCap")
|
t.Fatal("gunzipBytes must reject output exceeding gunzipCap")
|
||||||
}
|
}
|
||||||
// And via the inject path: fail open, original bytes preserved.
|
// And via the inject path: fail open, original bytes preserved.
|
||||||
out, ok := injectIntoBody(big, "gzip", "x", false, false)
|
out, ok := injectIntoBody(big, "gzip", inlineTestScript, false)
|
||||||
if ok {
|
if ok {
|
||||||
t.Fatal("over-cap gzip body must fail open through injectIntoBody")
|
t.Fatal("over-cap gzip body must fail open through injectIntoBody")
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -199,12 +199,26 @@ func ja4ish(h *tls.ClientHelloInfo) string {
|
||||||
type Proxy struct {
|
type Proxy struct {
|
||||||
ca *CA
|
ca *CA
|
||||||
pol *Policy
|
pol *Policy
|
||||||
jaSink func(string) // JA4 observations (logged; a sidecar in prod)
|
jaSink func(string) // JA4 observations (logged; a sidecar in prod)
|
||||||
jarKey []byte // anti-track HMAC fake-identity seed (nil → poison off)
|
jarKey []byte // anti-track HMAC fake-identity seed (nil → poison off)
|
||||||
poison bool // master gate: poison tracker Set-Cookies (default on when jarKey present)
|
poison bool // master gate: poison tracker Set-Cookies (default on when jarKey present)
|
||||||
portal string // portal base URL for /__toolbox/* reverse-proxy (banner assets)
|
portal string // portal base URL for /__toolbox/* reverse-proxy (banner assets)
|
||||||
ads *adStats // #662 — ad-block metrics aggregator (flushed to the portal)
|
ads *adStats // #662 — ad-block metrics aggregator (flushed to the portal)
|
||||||
cspDemo bool // #662 CONSENTED-DEMONSTRATION: relax a page's CSP so the injected loader runs, and flag the bypass (data-csp=1 → 🔓). Default on.
|
cand *adCandidates // #662 — ad-candidate learning feed (flushed with ads to the portal)
|
||||||
|
cspDemo bool // #662 CONSENTED-DEMONSTRATION: relax a page's CSP so the injected loader runs, and flag the bypass (data-csp=1 → 🔓). Default on.
|
||||||
|
|
||||||
|
// analysisRelay gates the per-flow telemetry relay to the dpi/cookies/ja4
|
||||||
|
// analysis sidecar sockets (#662 — restoring the "Qui te piste?" events the
|
||||||
|
// decommissioned Python addons fed). Default on; relay.go is the transport.
|
||||||
|
analysisRelay bool
|
||||||
|
|
||||||
|
// socialRelay gates the cross-site cookie-tracker correlation (#662 — restoring
|
||||||
|
// the kbin /social graph the decommissioned Python social_graph addon fed).
|
||||||
|
// Default on. social.go is the engine; edges are batched + POSTed to the
|
||||||
|
// portal's /__toolbox/social-event ingest. nil → off (CONNECT PoC / tests).
|
||||||
|
socialRelayOn bool
|
||||||
|
social *socialRelay
|
||||||
|
consent *consentLog
|
||||||
}
|
}
|
||||||
|
|
||||||
// recordAdBlock forwards a 204'd ad/tracker block to the engine's metrics
|
// recordAdBlock forwards a 204'd ad/tracker block to the engine's metrics
|
||||||
|
|
@ -216,12 +230,52 @@ func (px *Proxy) recordAdBlock(adHost, site, macHash string) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// maybeRecordAdCandidate feeds the auto-learn loop (#662): on the allow/mitm
|
||||||
|
// path (NOT block — already caught; NOT allowlisted/own-infra), it records an
|
||||||
|
// ad-candidate (host, site) when the request is 3rd-party
|
||||||
|
// (registrable(host) != registrable(site)) AND the path smells like an ad/track
|
||||||
|
// endpoint (adPathRE). It is the engine port of ad_ghost's candidate capture —
|
||||||
|
// the feed secubox-toolbox-autolearn promotes into learned-trackers.txt at
|
||||||
|
// AD_MIN_SITES distinct sites. Gated behind the analysis/ad relay flag, O(1) hot
|
||||||
|
// path, fire-and-forget, nil-safe (CONNECT PoC / tests with no feed).
|
||||||
|
func (px *Proxy) maybeRecordAdCandidate(host, site, path string) {
|
||||||
|
if px == nil || px.cand == nil || !px.relayEnabled() || px.pol == nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if site == "" || host == "" {
|
||||||
|
return // no 1st-party context (no Referer) → nothing to attribute.
|
||||||
|
}
|
||||||
|
if px.pol.allowedSafe(host) {
|
||||||
|
return // own-infra / allowlist: never learn our own / trusted hosts.
|
||||||
|
}
|
||||||
|
if registrable(host) == registrable(site) {
|
||||||
|
return // 1st-party request: not a cross-site ad/track signal.
|
||||||
|
}
|
||||||
|
if !adPathRE.MatchString(path) {
|
||||||
|
return // path doesn't look like an ad/track endpoint.
|
||||||
|
}
|
||||||
|
px.cand.record(host, site)
|
||||||
|
}
|
||||||
|
|
||||||
func (px *Proxy) serverTLSConfig() *tls.Config {
|
func (px *Proxy) serverTLSConfig() *tls.Config {
|
||||||
|
return px.serverTLSConfigCapture(nil)
|
||||||
|
}
|
||||||
|
|
||||||
|
// serverTLSConfigCapture is serverTLSConfig with an extra per-handshake hook:
|
||||||
|
// capture, if non-nil, is invoked inside GetCertificate with the live
|
||||||
|
// *tls.ClientHelloInfo (SNI, SupportedProtos, CipherSuites). The accept-path
|
||||||
|
// handlers use it to relay the ja4 ClientHello payload (relay.go) WITH the
|
||||||
|
// client conn's peer IP — which is known at the handler, not inside the TLS
|
||||||
|
// config. Passing nil yields the plain forging config (CONNECT PoC, tests).
|
||||||
|
func (px *Proxy) serverTLSConfigCapture(capture func(*tls.ClientHelloInfo)) *tls.Config {
|
||||||
return &tls.Config{
|
return &tls.Config{
|
||||||
GetCertificate: func(h *tls.ClientHelloInfo) (*tls.Certificate, error) {
|
GetCertificate: func(h *tls.ClientHelloInfo) (*tls.Certificate, error) {
|
||||||
if px.jaSink != nil {
|
if px.jaSink != nil {
|
||||||
px.jaSink(ja4ish(h)) // capture handshake fingerprint
|
px.jaSink(ja4ish(h)) // capture handshake fingerprint
|
||||||
}
|
}
|
||||||
|
if capture != nil {
|
||||||
|
capture(h) // ja4 relay material (peer IP threaded in by the handler)
|
||||||
|
}
|
||||||
name := h.ServerName
|
name := h.ServerName
|
||||||
if name == "" {
|
if name == "" {
|
||||||
name = "unknown.local"
|
name = "unknown.local"
|
||||||
|
|
@ -231,6 +285,38 @@ func (px *Proxy) serverTLSConfig() *tls.Config {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// peerIP returns the remote IP (no port) of a client conn, the same basis as
|
||||||
|
// clientHashFromConn. Used as the client_ip field of every relay payload.
|
||||||
|
func peerIP(conn net.Conn) string {
|
||||||
|
if conn == nil {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
host, _, err := net.SplitHostPort(conn.RemoteAddr().String())
|
||||||
|
if err != nil {
|
||||||
|
return conn.RemoteAddr().String()
|
||||||
|
}
|
||||||
|
return host
|
||||||
|
}
|
||||||
|
|
||||||
|
// captureAndEmitJA4 returns a GetCertificate capture hook that relays the ja4
|
||||||
|
// ClientHello payload for THIS handshake (once), tagged with the given client
|
||||||
|
// conn's peer IP + mac-hash-aware clientHash. Gated by analysisRelay (emitJA4
|
||||||
|
// checks). The hook copies the ClientHelloInfo fields it needs immediately
|
||||||
|
// (the struct is only valid during the callback). Returns nil when the relay is
|
||||||
|
// off so the plain config is used (no per-handshake allocation).
|
||||||
|
func (px *Proxy) captureAndEmitJA4(rawClient net.Conn) func(*tls.ClientHelloInfo) {
|
||||||
|
if !px.relayEnabled() {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
ip := peerIP(rawClient)
|
||||||
|
hash := clientHashFromConn(rawClient)
|
||||||
|
return func(h *tls.ClientHelloInfo) {
|
||||||
|
alpn := append([]string(nil), h.SupportedProtos...)
|
||||||
|
ciphers := append([]uint16(nil), h.CipherSuites...)
|
||||||
|
px.emitJA4(ip, hash, h.ServerName, alpn, ciphers)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func (px *Proxy) handleConnect(w http.ResponseWriter, r *http.Request) {
|
func (px *Proxy) handleConnect(w http.ResponseWriter, r *http.Request) {
|
||||||
host := r.URL.Hostname()
|
host := r.URL.Hostname()
|
||||||
hj, ok := w.(http.Hijacker)
|
hj, ok := w.(http.Hijacker)
|
||||||
|
|
@ -262,7 +348,9 @@ func (px *Proxy) handleConnect(w http.ResponseWriter, r *http.Request) {
|
||||||
}
|
}
|
||||||
|
|
||||||
// MITM: TLS-terminate the client with a forged cert (+ ClientHello capture).
|
// MITM: TLS-terminate the client with a forged cert (+ ClientHello capture).
|
||||||
tconn := tls.Server(client, px.serverTLSConfig())
|
// The capture hook relays the ja4 ClientHello payload for this handshake,
|
||||||
|
// tagged with the client's peer IP (#662). nil when the relay gate is off.
|
||||||
|
tconn := tls.Server(client, px.serverTLSConfigCapture(px.captureAndEmitJA4(client)))
|
||||||
if err := tconn.Handshake(); err != nil {
|
if err := tconn.Handshake(); err != nil {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
@ -326,6 +414,12 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
|
||||||
// per-client breakdown keys on the WG persona hash. recordAdBlock is
|
// per-client breakdown keys on the WG persona hash. recordAdBlock is
|
||||||
// O(1) and never blocks the block path.
|
// O(1) and never blocks the block path.
|
||||||
px.recordAdBlock(host, refererSite(req.Header.Get("Referer")), clientHashFromConn(rawClient))
|
px.recordAdBlock(host, refererSite(req.Header.Get("Referer")), clientHashFromConn(rawClient))
|
||||||
|
// #662 — the cross-site tracking evidence lives PRECISELY on the blocked
|
||||||
|
// trackers: the browser still SENT its 3rd-party Cookie to doubleclick/
|
||||||
|
// adnxs/… before we 204 it. Correlate that request-Cookie here (resp=nil,
|
||||||
|
// request-only) or the /social graph misses the very trackers it exists to
|
||||||
|
// expose. Hash-only, WG-peer only, fire-and-forget — same as the allow path.
|
||||||
|
px.emitSocial(peerIP(rawClient), host, req, nil)
|
||||||
writeRaw(tconn, 204, "No Content", map[string]string{"X-SecuBox-Ng": "blocked"}, nil)
|
writeRaw(tconn, 204, "No Content", map[string]string{"X-SecuBox-Ng": "blocked"}, nil)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
@ -339,6 +433,24 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
|
||||||
// allow — stripping operator headers + asserting opt-out is universally
|
// allow — stripping operator headers + asserting opt-out is universally
|
||||||
// safe and never touches own-infra correctness).
|
// safe and never touches own-infra correctness).
|
||||||
clientHash := clientHashFromConn(rawClient) // mac_hash-aware (WG persona)
|
clientHash := clientHashFromConn(rawClient) // mac_hash-aware (WG persona)
|
||||||
|
|
||||||
|
// #662 — relay the DPI classification hint for this MITM'd request (allow|mitm
|
||||||
|
// only; never the block 204 / splice paths). Fire-and-forget BEFORE anonymize
|
||||||
|
// mutates headers, so we relay the client's original User-Agent (the Python
|
||||||
|
// DPIRelay ran on the unmodified request). Gated by --analysis-relay; a
|
||||||
|
// dead/slow dpi.sock can never block or delay the proxy flow.
|
||||||
|
relayIP := peerIP(rawClient)
|
||||||
|
px.emitDPI(relayIP, clientHash, host, req)
|
||||||
|
|
||||||
|
// #662 — feed the auto-learn loop: on this allow/mitm flow, record an
|
||||||
|
// ad-candidate when the request is 3rd-party AND its path smells like an
|
||||||
|
// ad/track endpoint (ad_ghost's _AD_PATH heuristic). site = registrable of
|
||||||
|
// the Referer (the ad_ghost _site_of flavour). Done BEFORE anonymize mutates
|
||||||
|
// headers (so the Referer is the client's original). O(1), gated,
|
||||||
|
// fire-and-forget — a new adware host gets observed here, promoted by
|
||||||
|
// autolearn, then blocked+smogged after the policy live-reloads it.
|
||||||
|
px.maybeRecordAdCandidate(host, refererSite(req.Header.Get("Referer")), req.URL.Path)
|
||||||
|
|
||||||
anonymizeRequest(req.Header)
|
anonymizeRequest(req.Header)
|
||||||
|
|
||||||
// #662 — do NOT touch Accept-Encoding. We FORWARD the client's original
|
// #662 — do NOT touch Accept-Encoding. We FORWARD the client's original
|
||||||
|
|
@ -379,6 +491,24 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
|
||||||
}
|
}
|
||||||
defer resp.Body.Close()
|
defer resp.Body.Close()
|
||||||
|
|
||||||
|
// #662 — relay the cookie metadata for this MITM'd response (allow|mitm only).
|
||||||
|
// NAMES ONLY (never values — privacy/CSPN); no-op unless ≥1 Set-Cookie OR ≥1
|
||||||
|
// request Cookie is present. Emitted before poison rewrites Set-Cookie VALUES,
|
||||||
|
// which is irrelevant here (names are unchanged by poison) but keeps the
|
||||||
|
// relayed names byte-for-byte the origin's. Fire-and-forget, gated.
|
||||||
|
px.emitCookies(relayIP, clientHash, req, resp)
|
||||||
|
|
||||||
|
// #662 — cross-site cookie-tracker correlation (restores the kbin /social
|
||||||
|
// graph). FAITHFUL to the decommissioned Python social_graph addon: extract
|
||||||
|
// 3rd-party cookie edges (Set-Cookie + request Cookie), hash the identifier
|
||||||
|
// (cookieIDHash — NEVER the raw value), classify consent_state, and buffer
|
||||||
|
// them for the batched POST to the portal /__toolbox/social-event ingest.
|
||||||
|
// Like the addon, this ONLY fires for known R3 WG peers (macHashOf, not the
|
||||||
|
// raw-IP fallback): non-WG flows yield no edges. allow|mitm only (the block
|
||||||
|
// 204 / splice paths return before here). Gated by --social-relay; pure +
|
||||||
|
// non-blocking (the flush is a background goroutine).
|
||||||
|
px.emitSocial(relayIP, host, req, resp)
|
||||||
|
|
||||||
// Poison: only on MITM'd tracker flows (never on allow/own-infra), and only
|
// Poison: only on MITM'd tracker flows (never on allow/own-infra), and only
|
||||||
// when the jar key is loaded. Replaces tracking-id Set-Cookie values with a
|
// when the jar key is loaded. Replaces tracking-id Set-Cookie values with a
|
||||||
// stable fabricated persona; benign cookies pass through untouched.
|
// stable fabricated persona; benign cookies pass through untouched.
|
||||||
|
|
@ -409,16 +539,26 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
|
||||||
strings.Contains(resp.Header.Get("Content-Type"), "text/html") {
|
strings.Contains(resp.Header.Get("Content-Type"), "text/html") {
|
||||||
// #662 CONSENTED-DEMONSTRATION — ONLY here, on the responses we actually
|
// #662 CONSENTED-DEMONSTRATION — ONLY here, on the responses we actually
|
||||||
// inject into (2xx text/html, R3/wg gate), and ONLY when the operator
|
// inject into (2xx text/html, R3/wg gate), and ONLY when the operator
|
||||||
// left the demo on, do we relax the page's CSP so the same-origin
|
// left the demo on, do we relax the page's CSP so the inline banner can
|
||||||
// /__toolbox/loader.js can execute even on strict-CSP sites. cspBypassed
|
// run even on strict-CSP sites. cspBypassed is true iff there was a real
|
||||||
// is true iff there was a real CSP to bypass — it becomes data-csp="1" on
|
// CSP to bypass — it becomes csp=1 on the inline script and the banner
|
||||||
// the loader tag and the portal banner renders a 🔓 as the visible proof.
|
// renders a 🔓 as the visible proof. We never strip CSP on non-injected
|
||||||
// We never strip CSP on non-injected responses.
|
// responses.
|
||||||
cspBypassed := false
|
cspBypassed := false
|
||||||
if px.cspDemo {
|
if px.cspDemo {
|
||||||
cspBypassed = relaxCSPForLoader(resp.Header)
|
cspBypassed = relaxCSPForLoader(resp.Header)
|
||||||
}
|
}
|
||||||
if out, ok := injectIntoBody(body, resp.Header.Get("Content-Encoding"), clientHash, wg, cspBypassed); ok {
|
// #662 — INLINE the banner (supersedes the <script src="/__toolbox/
|
||||||
|
// loader.js"> tag): sites with a SERVICE WORKER (leparisien, cnn…) hijack
|
||||||
|
// the same-origin src + its fetch("/__toolbox/bundle") before they reach
|
||||||
|
// this engine, so the banner never appeared. We fetch the COMPLETE script
|
||||||
|
// body from the portal server-side (mh/wg/csp + bundle baked as JS
|
||||||
|
// literals — no same-origin request for the SW to touch) and bake it into
|
||||||
|
// a self-contained <script>…</script>. Fail-open: a dead/slow portal →
|
||||||
|
// scriptBody=="" → the banner inject is skipped and the page is served
|
||||||
|
// intact (the cosmetic <style>, already inline, is unaffected).
|
||||||
|
scriptBody, _ := fetchInlineBanner(px.portal, clientHash, wg, cspBypassed)
|
||||||
|
if out, ok := injectIntoBody(body, resp.Header.Get("Content-Encoding"), scriptBody, wg); ok {
|
||||||
body = out
|
body = out
|
||||||
// Keep the response framing consistent with the served bytes. The
|
// Keep the response framing consistent with the served bytes. The
|
||||||
// encoding is unchanged (gzip stays gzip, identity stays identity);
|
// encoding is unchanged (gzip stays gzip, identity stays identity);
|
||||||
|
|
@ -428,6 +568,11 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
|
||||||
resp.ContentLength = int64(len(body))
|
resp.ContentLength = int64(len(body))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
// #662 — strip Alt-Svc so the browser is never told this origin offers HTTP/3
|
||||||
|
// (h3). With h3 unadvertised it keeps using HTTP/2 over TCP, which we MITM;
|
||||||
|
// otherwise it caches "h3 available" and keeps trying QUIC (UDP 443) — which
|
||||||
|
// bypasses this TCP proxy and is only best-effort blocked by the nft reject.
|
||||||
|
resp.Header.Del("Alt-Svc")
|
||||||
writeResponse(tconn, resp, body)
|
writeResponse(tconn, resp, body)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -445,6 +590,10 @@ func main() {
|
||||||
"portal base URL; /__toolbox/loader.js + /__toolbox/bundle are reverse-proxied here (banner assets, served for any MITM'd origin)")
|
"portal base URL; /__toolbox/loader.js + /__toolbox/bundle are reverse-proxied here (banner assets, served for any MITM'd origin)")
|
||||||
cspDemo := flag.Bool("csp-bypass-demo", true,
|
cspDemo := flag.Bool("csp-bypass-demo", true,
|
||||||
"CONSENTED DEMONSTRATION: relax a page's CSP so the injected transparency-banner loader runs even on strict-CSP sites, and flag the bypass (banner shows 🔓). Only on injected 2xx text/html R3 responses; never on non-injected responses. Set false to never touch CSP.")
|
"CONSENTED DEMONSTRATION: relax a page's CSP so the injected transparency-banner loader runs even on strict-CSP sites, and flag the bypass (banner shows 🔓). Only on injected 2xx text/html R3 responses; never on non-injected responses. Set false to never touch CSP.")
|
||||||
|
analysisRelay := flag.Bool("analysis-relay", true,
|
||||||
|
"relay per-flow telemetry (dpi/cookies/ja4) to the analysis sidecar sockets so the kbin \"Qui te piste?\" events refill (#662; replaces the decommissioned Python relay addons). Fire-and-forget; a dead/slow sidecar never affects the proxy. Set false to emit nothing.")
|
||||||
|
socialRelay := flag.Bool("social-relay", true,
|
||||||
|
"compute cross-site cookie-tracker edges and POST them to the portal /__toolbox/social-event ingest so the kbin /social graph refills (#662; replaces the decommissioned Python social_graph addon). Hash-only (never raw cookie values); WG-peer flows only; batched + fire-and-forget — a dead/slow portal never affects the proxy. Set false to emit nothing.")
|
||||||
flag.Parse()
|
flag.Parse()
|
||||||
ca, err := loadCA(*caCert, *caKey)
|
ca, err := loadCA(*caCert, *caKey)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
|
|
@ -472,12 +621,26 @@ func main() {
|
||||||
poison: *poison,
|
poison: *poison,
|
||||||
portal: *portal,
|
portal: *portal,
|
||||||
ads: newAdStats(),
|
ads: newAdStats(),
|
||||||
|
cand: newAdCandidates(),
|
||||||
cspDemo: *cspDemo,
|
cspDemo: *cspDemo,
|
||||||
|
|
||||||
|
analysisRelay: *analysisRelay,
|
||||||
|
|
||||||
|
socialRelayOn: *socialRelay,
|
||||||
|
social: newSocialRelay(),
|
||||||
|
consent: newConsentLog(),
|
||||||
}
|
}
|
||||||
|
// #662 — start the social-edge flusher: the MITM path buffers cross-site
|
||||||
|
// tracker edges into px.social, drained every 10s to the portal's
|
||||||
|
// /__toolbox/social-event (best-effort, fire-and-forget) so the kbin /social
|
||||||
|
// graph (frozen since the cutover) refills.
|
||||||
|
go px.social.runFlusher(*portal)
|
||||||
// #662 — start the ad-block metrics flusher: the block path tallies every
|
// #662 — start the ad-block metrics flusher: the block path tallies every
|
||||||
// 204 into px.ads, drained every 10s to the portal's /__toolbox/ad-event
|
// 204 into px.ads, drained every 10s to the portal's /__toolbox/ad-event
|
||||||
// (best-effort, fire-and-forget) so the #ads dashboard sees blocks again.
|
// (best-effort, fire-and-forget) so the #ads dashboard sees blocks again.
|
||||||
go px.ads.runAdStatsFlusher(*portal)
|
// #662 — the candidate feed (px.cand) is drained in the SAME flush so the
|
||||||
|
// learning candidates ride the existing ad-event channel (one POST / 10s).
|
||||||
|
go px.ads.runAdStatsFlusher(*portal, px.cand)
|
||||||
if *transparent {
|
if *transparent {
|
||||||
// Transparent R3 mode: raw accept loop, each conn carries its pre-DNAT
|
// Transparent R3 mode: raw accept loop, each conn carries its pre-DNAT
|
||||||
// destination via SO_ORIGINAL_DST (recovered in handleTransparent). The
|
// destination via SO_ORIGINAL_DST (recovered in handleTransparent). The
|
||||||
|
|
|
||||||
|
|
@ -17,6 +17,8 @@ import (
|
||||||
"os"
|
"os"
|
||||||
"regexp"
|
"regexp"
|
||||||
"strings"
|
"strings"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
)
|
)
|
||||||
|
|
||||||
// ── ad_ghost: static ad/tracker host pattern (port of _AD_HOST) ──────────────
|
// ── ad_ghost: static ad/tracker host pattern (port of _AD_HOST) ──────────────
|
||||||
|
|
@ -95,19 +97,55 @@ func envOr(key, def string) string {
|
||||||
// Policy carries the loaded sets/regex and decides per-host actions. It also
|
// Policy carries the loaded sets/regex and decides per-host actions. It also
|
||||||
// keeps the legacy PoC fields (Inject) so the existing wiring/tests still work.
|
// keeps the legacy PoC fields (Inject) so the existing wiring/tests still work.
|
||||||
type Policy struct {
|
type Policy struct {
|
||||||
adHost *regexp.Regexp
|
// mu guards the live-reloadable map fields below. Decide/allowed/blockedByAd/
|
||||||
learned map[string]bool // learned-trackers (host or registrable, lowercased)
|
// shouldSplice take RLock; maybeReload takes Lock only when a backing file
|
||||||
allow map[string]bool // ad-allowlist (host or registrable, lowercased)
|
// actually changed (the throttle + stat happen under a separate lighter lock).
|
||||||
spliceSeed map[string]bool // splice seed patterns
|
mu sync.RWMutex
|
||||||
spliceLearn map[string]bool // splice learned patterns
|
|
||||||
never map[string]bool // pure-trackers ∪ fortknox (splice never-set)
|
adHost *regexp.Regexp
|
||||||
selfRegs map[string]bool // own-infra registrable domains
|
learned map[string]bool // learned-trackers (host or registrable, lowercased)
|
||||||
selfDomains []string // own-infra (for the host==d || host endswith .d guard)
|
allow map[string]bool // ad-allowlist (host or registrable, lowercased)
|
||||||
|
spliceSeed map[string]bool // splice seed patterns
|
||||||
|
spliceLearn map[string]bool // splice learned patterns
|
||||||
|
never map[string]bool // pure-trackers ∪ fortknox (splice never-set)
|
||||||
|
selfRegs map[string]bool // own-infra registrable domains
|
||||||
|
selfDomains []string // own-infra (for the host==d || host endswith .d guard)
|
||||||
|
|
||||||
|
// ── live-reload state (#662 auto-learn loop) ─────────────────────────────
|
||||||
|
//
|
||||||
|
// The lists are loaded once at startup, then re-read on-disk when their
|
||||||
|
// mtime changes so autolearn promotions / manual edits take effect WITHOUT a
|
||||||
|
// worker restart (mirrors ad_ghost._maybe_reload). The hot path (Decide)
|
||||||
|
// calls maybeReload(): a throttle check, then — at most every reloadThrottle —
|
||||||
|
// a cheap stat() of each backing file. Only a changed file is re-read and its
|
||||||
|
// map atomically swapped under mu.
|
||||||
|
reloadFiles []reloadTarget // backing files + their swap target
|
||||||
|
fortknoxSites []string // kept for rebuilding the never-set on pure-trackers reload
|
||||||
|
reloadMu sync.Mutex // guards lastReloadCheck + the per-file mtimes
|
||||||
|
lastReloadID int64 // unix-nano of the last throttle pass (0 = never)
|
||||||
|
reloadThrottle time.Duration // min interval between stat passes (0 in tests = eager)
|
||||||
|
|
||||||
// Legacy PoC fields kept so non-policy behaviour is unchanged.
|
// Legacy PoC fields kept so non-policy behaviour is unchanged.
|
||||||
Inject []byte // banner / ad-CSS marker injected before </head> or </body>
|
Inject []byte // banner / ad-CSS marker injected before </head> or </body>
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// reloadTarget describes one backing file the engine live-reloads: its path, the
|
||||||
|
// last mtime we read, whether comment-stripping applies (loadLines vs
|
||||||
|
// loadLinesRaw), and an applier that swaps the freshly-read set into the right
|
||||||
|
// Policy field (under p.mu, held by the caller). pure-trackers re-derives the
|
||||||
|
// never-set (∪ fortknox) so it stays consistent.
|
||||||
|
type reloadTarget struct {
|
||||||
|
path string
|
||||||
|
stripComm bool
|
||||||
|
lastMtime int64
|
||||||
|
apply func(p *Policy, set map[string]bool)
|
||||||
|
}
|
||||||
|
|
||||||
|
// defaultReloadThrottle is the production stat cadence: a backing-file change
|
||||||
|
// (autolearn runs hourly; a promotion is rare) is observed within ~15s, and the
|
||||||
|
// hot path stats at most ~4×/minute regardless of request rate.
|
||||||
|
const defaultReloadThrottle = 15 * time.Second
|
||||||
|
|
||||||
// loadLines mirrors the comment-stripping Python loaders (splice._load_lines,
|
// loadLines mirrors the comment-stripping Python loaders (splice._load_lines,
|
||||||
// ad_ghost._allowed's allowlist read): split on first '#', trim, lowercase,
|
// ad_ghost._allowed's allowlist read): split on first '#', trim, lowercase,
|
||||||
// skip blanks. Missing/unreadable file → empty set (best-effort).
|
// skip blanks. Missing/unreadable file → empty set (best-effort).
|
||||||
|
|
@ -196,16 +234,107 @@ func LoadPolicy(opts PolicyOpts) (*Policy, error) {
|
||||||
selfDomains = append(selfDomains, d)
|
selfDomains = append(selfDomains, d)
|
||||||
}
|
}
|
||||||
|
|
||||||
return &Policy{
|
p := &Policy{
|
||||||
adHost: re,
|
adHost: re,
|
||||||
learned: loadLinesRaw(opts.LearnedPath), // mirrors _learned_set (no comment-strip)
|
learned: loadLinesRaw(opts.LearnedPath), // mirrors _learned_set (no comment-strip)
|
||||||
allow: loadLines(opts.AllowPath),
|
allow: loadLines(opts.AllowPath),
|
||||||
spliceSeed: loadLines(opts.SpliceSeedPath),
|
spliceSeed: loadLines(opts.SpliceSeedPath),
|
||||||
spliceLearn: loadLines(opts.SpliceLearnPath),
|
spliceLearn: loadLines(opts.SpliceLearnPath),
|
||||||
never: never,
|
never: never,
|
||||||
selfRegs: selfRegs,
|
selfRegs: selfRegs,
|
||||||
selfDomains: selfDomains,
|
selfDomains: selfDomains,
|
||||||
}, nil
|
fortknoxSites: append([]string(nil), opts.FortknoxSites...),
|
||||||
|
reloadThrottle: defaultReloadThrottle,
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── register the live-reloadable backing files (#662 auto-learn loop) ─────
|
||||||
|
//
|
||||||
|
// Each entry re-reads its file when its mtime changes and atomically swaps
|
||||||
|
// the map under p.mu (held by maybeReload). learned-trackers + ad-allowlist
|
||||||
|
// are the load-bearing pair (autolearn promotes into learned; the operator
|
||||||
|
// edits the allowlist); the splice seed/learned + pure-trackers files are
|
||||||
|
// reloaded too for consistency (pure-trackers re-derives the never-set).
|
||||||
|
p.reloadFiles = []reloadTarget{
|
||||||
|
{path: opts.LearnedPath, stripComm: false, lastMtime: statMtime(opts.LearnedPath),
|
||||||
|
apply: func(p *Policy, s map[string]bool) { p.learned = s }},
|
||||||
|
{path: opts.AllowPath, stripComm: true, lastMtime: statMtime(opts.AllowPath),
|
||||||
|
apply: func(p *Policy, s map[string]bool) { p.allow = s }},
|
||||||
|
{path: opts.SpliceSeedPath, stripComm: true, lastMtime: statMtime(opts.SpliceSeedPath),
|
||||||
|
apply: func(p *Policy, s map[string]bool) { p.spliceSeed = s }},
|
||||||
|
{path: opts.SpliceLearnPath, stripComm: true, lastMtime: statMtime(opts.SpliceLearnPath),
|
||||||
|
apply: func(p *Policy, s map[string]bool) { p.spliceLearn = s }},
|
||||||
|
{path: opts.PureTrackersPath, stripComm: true, lastMtime: statMtime(opts.PureTrackersPath),
|
||||||
|
apply: func(p *Policy, s map[string]bool) {
|
||||||
|
// pure-trackers ∪ fortknox → never-set (mirrors LoadPolicy above).
|
||||||
|
for _, fk := range p.fortknoxSites {
|
||||||
|
if fk = strings.Trim(strings.ToLower(strings.TrimSpace(fk)), "."); fk != "" {
|
||||||
|
s[fk] = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
p.never = s
|
||||||
|
}},
|
||||||
|
}
|
||||||
|
return p, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// statMtime returns the file's mtime in unix-nano, or 0 when the file is missing
|
||||||
|
// or unreadable (best-effort, like the Python loaders: a missing file → empty
|
||||||
|
// set, mtime 0). A file appearing/disappearing therefore registers as a change.
|
||||||
|
func statMtime(path string) int64 {
|
||||||
|
if path == "" {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
fi, err := os.Stat(path)
|
||||||
|
if err != nil {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
return fi.ModTime().UnixNano()
|
||||||
|
}
|
||||||
|
|
||||||
|
// maybeReload re-reads any backing list whose on-disk mtime changed since the
|
||||||
|
// last pass, swapping the affected map(s) under p.mu. Throttled to at most one
|
||||||
|
// stat pass per p.reloadThrottle (cheap: a time compare + a few stats), so the
|
||||||
|
// Decide hot path pays almost nothing. Concurrency-safe: the throttle/mtime
|
||||||
|
// bookkeeping is under reloadMu and the map swap under mu — Decide's readers
|
||||||
|
// hold mu.RLock, so a swap is atomic w.r.t. any in-flight decision.
|
||||||
|
func (p *Policy) maybeReload() {
|
||||||
|
now := time.Now()
|
||||||
|
p.reloadMu.Lock()
|
||||||
|
if p.reloadThrottle > 0 && p.lastReloadID != 0 &&
|
||||||
|
now.Sub(time.Unix(0, p.lastReloadID)) < p.reloadThrottle {
|
||||||
|
p.reloadMu.Unlock()
|
||||||
|
return
|
||||||
|
}
|
||||||
|
p.lastReloadID = now.UnixNano()
|
||||||
|
|
||||||
|
// Collect the files that changed (stat under reloadMu; re-read outside mu).
|
||||||
|
type pending struct {
|
||||||
|
idx int
|
||||||
|
set map[string]bool
|
||||||
|
}
|
||||||
|
var changed []pending
|
||||||
|
for i := range p.reloadFiles {
|
||||||
|
rt := &p.reloadFiles[i]
|
||||||
|
if rt.path == "" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
m := statMtime(rt.path)
|
||||||
|
if m != rt.lastMtime {
|
||||||
|
rt.lastMtime = m
|
||||||
|
changed = append(changed, pending{idx: i, set: scanLines(rt.path, rt.stripComm)})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
p.reloadMu.Unlock()
|
||||||
|
|
||||||
|
if len(changed) == 0 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
// Swap the affected maps atomically under the write lock.
|
||||||
|
p.mu.Lock()
|
||||||
|
for _, c := range changed {
|
||||||
|
p.reloadFiles[c.idx].apply(p, c.set)
|
||||||
|
}
|
||||||
|
p.mu.Unlock()
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── registrable: port of ad_ghost._registrable ───────────────────────────────
|
// ── registrable: port of ad_ghost._registrable ───────────────────────────────
|
||||||
|
|
@ -279,6 +408,11 @@ func hostMatches(host string, patterns map[string]bool) bool {
|
||||||
|
|
||||||
// allowed: port of ad_ghost._allowed. Own-infra ALWAYS wins (reflash-safe),
|
// allowed: port of ad_ghost._allowed. Own-infra ALWAYS wins (reflash-safe),
|
||||||
// then the operator allowlist (host or registrable).
|
// then the operator allowlist (host or registrable).
|
||||||
|
//
|
||||||
|
// LOCK CONTRACT: reads the reloadable allow map — the caller MUST hold at least
|
||||||
|
// p.mu.RLock (Decide / shouldPoison do). Lock-free internally so Decide can call
|
||||||
|
// it alongside shouldSplice/blockedByAd under a single RLock (sync.RWMutex is
|
||||||
|
// not reentrant).
|
||||||
func (p *Policy) allowed(host string) bool {
|
func (p *Policy) allowed(host string) bool {
|
||||||
h := strings.ToLower(host)
|
h := strings.ToLower(host)
|
||||||
reg := registrable(h)
|
reg := registrable(h)
|
||||||
|
|
@ -297,7 +431,19 @@ func (p *Policy) allowed(host string) bool {
|
||||||
return p.allow[h] || p.allow[reg]
|
return p.allow[h] || p.allow[reg]
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// allowedSafe is the lock-taking entry point to allowed() for callers OUTSIDE a
|
||||||
|
// Decide RLock (e.g. the ad-candidate feed). It also picks up a live-reloaded
|
||||||
|
// allowlist via maybeReload, so a freshly-allowlisted host stops being learned.
|
||||||
|
func (p *Policy) allowedSafe(host string) bool {
|
||||||
|
p.maybeReload()
|
||||||
|
p.mu.RLock()
|
||||||
|
defer p.mu.RUnlock()
|
||||||
|
return p.allowed(host)
|
||||||
|
}
|
||||||
|
|
||||||
// shouldSplice: port of splice.should_splice (never wins; then seed ∪ learned).
|
// shouldSplice: port of splice.should_splice (never wins; then seed ∪ learned).
|
||||||
|
// LOCK CONTRACT: reads the reloadable never/spliceSeed/spliceLearn maps — the
|
||||||
|
// caller MUST hold at least p.mu.RLock (Decide does).
|
||||||
func (p *Policy) shouldSplice(sni string) bool {
|
func (p *Policy) shouldSplice(sni string) bool {
|
||||||
s := strings.Trim(strings.ToLower(sni), ".")
|
s := strings.Trim(strings.ToLower(sni), ".")
|
||||||
if s == "" {
|
if s == "" {
|
||||||
|
|
@ -312,6 +458,10 @@ func (p *Policy) shouldSplice(sni string) bool {
|
||||||
// blockedByAd: port of the ad_ghost requestheaders block decision (sans the
|
// blockedByAd: port of the ad_ghost requestheaders block decision (sans the
|
||||||
// allowlist guard, which Decide applies first): _AD_HOST match OR
|
// allowlist guard, which Decide applies first): _AD_HOST match OR
|
||||||
// registrable/host in learned-trackers.
|
// registrable/host in learned-trackers.
|
||||||
|
//
|
||||||
|
// LOCK CONTRACT: reads the reloadable learned map — the caller MUST hold at
|
||||||
|
// least p.mu.RLock. Decide and shouldPoison (via isTracker) do; the candidate-
|
||||||
|
// emit path calls it only through those.
|
||||||
func (p *Policy) blockedByAd(host string) bool {
|
func (p *Policy) blockedByAd(host string) bool {
|
||||||
if p.adHost.MatchString(host) {
|
if p.adHost.MatchString(host) {
|
||||||
return true
|
return true
|
||||||
|
|
@ -339,9 +489,16 @@ func (p *Policy) blockedByAd(host string) bool {
|
||||||
// sni defaults to host when empty (the live engine splices on SNI == the TLS
|
// sni defaults to host when empty (the live engine splices on SNI == the TLS
|
||||||
// host; for the parity harness host and sni are the same value).
|
// host; for the parity harness host and sni are the same value).
|
||||||
func (p *Policy) Decide(host, sni string) string {
|
func (p *Policy) Decide(host, sni string) string {
|
||||||
|
// #662 — pick up autolearn promotions / manual edits without a worker
|
||||||
|
// restart. Throttled to ~every reloadThrottle and best-effort, so the hot
|
||||||
|
// path normally pays only a time compare. Done BEFORE taking the read lock
|
||||||
|
// (maybeReload may take the write lock to swap a changed map).
|
||||||
|
p.maybeReload()
|
||||||
if sni == "" {
|
if sni == "" {
|
||||||
sni = host
|
sni = host
|
||||||
}
|
}
|
||||||
|
p.mu.RLock()
|
||||||
|
defer p.mu.RUnlock()
|
||||||
if p.allowed(host) {
|
if p.allowed(host) {
|
||||||
return "allow"
|
return "allow"
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -148,6 +148,12 @@ func (p *Policy) isTracker(host string) bool {
|
||||||
// allowlisted — own-infra flows are left clean (same dark safety as the block
|
// allowlisted — own-infra flows are left clean (same dark safety as the block
|
||||||
// path). The caller additionally requires a loaded jar key.
|
// path). The caller additionally requires a loaded jar key.
|
||||||
func (p *Policy) shouldPoison(host string) bool {
|
func (p *Policy) shouldPoison(host string) bool {
|
||||||
|
// #662 — consult the same live-reloaded learned set Decide uses, so a host
|
||||||
|
// promoted into learned-trackers (by autolearn) is poisoned (smogged), not
|
||||||
|
// only 204'd, without a worker restart. RLock-guard the reloadable maps
|
||||||
|
// (allowed + isTracker→blockedByAd read them); maybeReload may swap them.
|
||||||
|
p.mu.RLock()
|
||||||
|
defer p.mu.RUnlock()
|
||||||
if p.allowed(host) {
|
if p.allowed(host) {
|
||||||
return false // own-infra / allowlist → never poison
|
return false // own-infra / allowlist → never poison
|
||||||
}
|
}
|
||||||
|
|
|
||||||
291
packages/secubox-toolbox-ng/cmd/sbxmitm/relay.go
Normal file
291
packages/secubox-toolbox-ng/cmd/sbxmitm/relay.go
Normal file
|
|
@ -0,0 +1,291 @@
|
||||||
|
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||||
|
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||||
|
//
|
||||||
|
// SecuBox-Deb :: toolbox-ng :: per-flow analysis relay (#662)
|
||||||
|
//
|
||||||
|
// Restores the dpi / cookies / ja4 EVENTS that feed the kbin "Qui te piste?"
|
||||||
|
// cumulative-stats page, frozen since the #662 Phase-7 cutover decommissioned
|
||||||
|
// the Python mitmproxy relay addons (packages/secubox-toolbox/mitmproxy_addons/
|
||||||
|
// {dpi,cookies,ja4}.py). The Go engine is now the live R3 MITM core; this file
|
||||||
|
// re-implements EXACTLY what those addons did — extract privacy-safe flow
|
||||||
|
// metadata and fire-and-forget it to the analysis sidecar sockets, which
|
||||||
|
// enrich + write toolbox.db.events keyed by client_mac_hash.
|
||||||
|
//
|
||||||
|
// Transport is the existing emit() helper (sidecar.go): a detached goroutine
|
||||||
|
// with its own 2s timeout — a dead/slow analysis socket can NEVER block, delay,
|
||||||
|
// or break a client flow. The payload builders here are pure (no I/O), O(1)-ish
|
||||||
|
// per flow, and emit NAMES ONLY for cookies (never values — privacy / CSPN).
|
||||||
|
//
|
||||||
|
// Pure standard library — no external modules.
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"encoding/json"
|
||||||
|
"net/http"
|
||||||
|
"strings"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Stable socket paths — verbatim from the Python addons' TARGET constants
|
||||||
|
// (the http+unix:///run/secubox/<x>.sock/<route> URLs), split into path+route.
|
||||||
|
const (
|
||||||
|
dpiSocket = "/run/secubox/dpi.sock"
|
||||||
|
cookiesSocket = "/run/secubox/cookies.sock"
|
||||||
|
ja4Socket = "/run/secubox/threat-analyst.sock"
|
||||||
|
|
||||||
|
dpiRoute = "/classify"
|
||||||
|
cookiesRoute = "/inject"
|
||||||
|
ja4Route = "/ja4"
|
||||||
|
)
|
||||||
|
|
||||||
|
// Caps + truncation limits, matching the Python addons exactly.
|
||||||
|
const (
|
||||||
|
maxSetCookieNames = 30 // cookies.py _names_only(set_cookies, cap=30)
|
||||||
|
maxCookieNames = 50 // cookies.py sent_names[:50]
|
||||||
|
maxCookieNameLen = 32 // cookies.py name[:32]
|
||||||
|
maxCookieURL = 300 // cookies.py pretty_url[:300]
|
||||||
|
)
|
||||||
|
|
||||||
|
// nowMS returns the current time as unix milliseconds (ts_ms in every payload).
|
||||||
|
func nowMS() int64 { return time.Now().UnixMilli() }
|
||||||
|
|
||||||
|
// ── gate ─────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// relayEnabled reports whether per-flow analysis relaying is on (the
|
||||||
|
// --analysis-relay flag → Proxy.analysisRelay). When false, nothing is emitted.
|
||||||
|
// Nil-safe so tests / the CONNECT PoC that build a bare Proxy can call it.
|
||||||
|
func (px *Proxy) relayEnabled() bool {
|
||||||
|
return px != nil && px.analysisRelay
|
||||||
|
}
|
||||||
|
|
||||||
|
// relayEmit is the gated, fire-and-forget emit used by every relay call site.
|
||||||
|
// It NEVER blocks (delegates to emit() which detaches a goroutine with its own
|
||||||
|
// timeout) and emits nothing when the relay gate is off.
|
||||||
|
func (px *Proxy) relayEmit(socketPath, route string, payload []byte) {
|
||||||
|
if !px.relayEnabled() || len(payload) == 0 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
emit(socketPath, route, payload)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── dpi payload ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// dpiEvent mirrors the JSON the Python DPIRelay.request() emitted. user_agent is
|
||||||
|
// a *string so an absent UA serialises to JSON null (not ""), matching
|
||||||
|
// headers.get("user-agent") → None. scheme + sni are constant "https" / host on
|
||||||
|
// the MITM'd path (we only relay terminated TLS flows).
|
||||||
|
type dpiEvent struct {
|
||||||
|
TSMs int64 `json:"ts_ms"`
|
||||||
|
ClientIP string `json:"client_ip"`
|
||||||
|
MacHash string `json:"client_mac_hash"`
|
||||||
|
Host string `json:"host"`
|
||||||
|
Scheme string `json:"scheme"`
|
||||||
|
Method string `json:"method"`
|
||||||
|
UserAgent *string `json:"user_agent"`
|
||||||
|
SNI string `json:"sni"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// buildDPIPayload builds the /classify payload for one MITM'd request.
|
||||||
|
func buildDPIPayload(clientIP, macHash, host string, req *http.Request) []byte {
|
||||||
|
var ua *string
|
||||||
|
if v := req.Header.Get("User-Agent"); v != "" {
|
||||||
|
ua = &v
|
||||||
|
}
|
||||||
|
ev := dpiEvent{
|
||||||
|
TSMs: nowMS(),
|
||||||
|
ClientIP: clientIP,
|
||||||
|
MacHash: macHash,
|
||||||
|
Host: host,
|
||||||
|
Scheme: "https",
|
||||||
|
Method: req.Method,
|
||||||
|
UserAgent: ua,
|
||||||
|
SNI: host,
|
||||||
|
}
|
||||||
|
b, _ := json.Marshal(ev)
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// emitDPI relays the DPI classification hint for a MITM'd request (gated).
|
||||||
|
func (px *Proxy) emitDPI(clientIP, macHash, host string, req *http.Request) {
|
||||||
|
if !px.relayEnabled() {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
px.relayEmit(dpiSocket, dpiRoute, buildDPIPayload(clientIP, macHash, host, req))
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── cookies payload ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// cookiesEvent mirrors the JSON the Python CookiesRelay.response() emitted.
|
||||||
|
// NAMES ONLY — never cookie values (privacy / CSPN).
|
||||||
|
type cookiesEvent struct {
|
||||||
|
TSMs int64 `json:"ts_ms"`
|
||||||
|
ClientIP string `json:"client_ip"`
|
||||||
|
MacHash string `json:"client_mac_hash"`
|
||||||
|
URL string `json:"url"`
|
||||||
|
Method string `json:"method"`
|
||||||
|
SetCookieNames []string `json:"set_cookie_names"`
|
||||||
|
CookieNames []string `json:"cookie_names"`
|
||||||
|
SetCookieCount int `json:"set_cookie_count"`
|
||||||
|
CookieCount int `json:"cookie_count"`
|
||||||
|
Status int `json:"status"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// cookiesRelevant reports whether a flow carries any cookie signal worth
|
||||||
|
// relaying: ≥1 Set-Cookie in the response OR ≥1 Cookie in the request. Mirrors
|
||||||
|
// the Python `if not (set_cookies or req_cookies): return`.
|
||||||
|
func cookiesRelevant(req *http.Request, resp *http.Response) bool {
|
||||||
|
if resp != nil && len(resp.Header.Values("Set-Cookie")) > 0 {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
return req != nil && len(req.Header.Values("Cookie")) > 0
|
||||||
|
}
|
||||||
|
|
||||||
|
// setCookieName extracts the cookie NAME from a Set-Cookie header line: the text
|
||||||
|
// before the first '=' of the first ';'-delimited field, trimmed and capped.
|
||||||
|
// Returns "" for attribute-only / malformed / empty-name lines (skipped).
|
||||||
|
func setCookieName(sc string) string {
|
||||||
|
head := sc
|
||||||
|
if i := strings.IndexByte(sc, ';'); i >= 0 {
|
||||||
|
head = sc[:i]
|
||||||
|
}
|
||||||
|
eq := strings.IndexByte(head, '=')
|
||||||
|
if eq < 0 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
n := strings.TrimSpace(head[:eq])
|
||||||
|
if len(n) > maxCookieNameLen {
|
||||||
|
n = n[:maxCookieNameLen]
|
||||||
|
}
|
||||||
|
return n
|
||||||
|
}
|
||||||
|
|
||||||
|
// parseCookieHeaderNames splits a single "Cookie:" header value into its
|
||||||
|
// individual cookie NAMES (text before each '=' across ';'-separated pairs),
|
||||||
|
// trimmed + capped. Mirrors cookies.py _parse_cookie_header.
|
||||||
|
func parseCookieHeaderNames(value string) []string {
|
||||||
|
var names []string
|
||||||
|
for _, part := range strings.Split(value, ";") {
|
||||||
|
eq := strings.IndexByte(part, '=')
|
||||||
|
if eq < 0 {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
n := strings.TrimSpace(part[:eq])
|
||||||
|
if len(n) > maxCookieNameLen {
|
||||||
|
n = n[:maxCookieNameLen]
|
||||||
|
}
|
||||||
|
if n != "" {
|
||||||
|
names = append(names, n)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return names
|
||||||
|
}
|
||||||
|
|
||||||
|
// setCookieNames returns the NAMES of the response Set-Cookie lines, scanning at
|
||||||
|
// most the first `cap` header lines (Python _names_only(headers[:cap])).
|
||||||
|
func setCookieNames(setCookies []string, cap int) []string {
|
||||||
|
out := make([]string, 0, len(setCookies))
|
||||||
|
for i, sc := range setCookies {
|
||||||
|
if i >= cap {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
if n := setCookieName(sc); n != "" {
|
||||||
|
out = append(out, n)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
|
// buildCookiesPayload builds the /inject payload for one MITM'd response that
|
||||||
|
// carries a cookie signal. The caller is expected to have checked
|
||||||
|
// cookiesRelevant; building on an empty flow yields empty name lists.
|
||||||
|
func buildCookiesPayload(clientIP, macHash string, req *http.Request, resp *http.Response) []byte {
|
||||||
|
setCookies := resp.Header.Values("Set-Cookie")
|
||||||
|
reqCookies := req.Header.Values("Cookie")
|
||||||
|
|
||||||
|
// Sent cookie names: flatten every Cookie header line, then cap to 50 total.
|
||||||
|
var sent []string
|
||||||
|
for _, ch := range reqCookies {
|
||||||
|
sent = append(sent, parseCookieHeaderNames(ch)...)
|
||||||
|
}
|
||||||
|
if len(sent) > maxCookieNames {
|
||||||
|
sent = sent[:maxCookieNames]
|
||||||
|
}
|
||||||
|
|
||||||
|
u := req.URL.String()
|
||||||
|
if len(u) > maxCookieURL {
|
||||||
|
u = u[:maxCookieURL]
|
||||||
|
}
|
||||||
|
|
||||||
|
ev := cookiesEvent{
|
||||||
|
TSMs: nowMS(),
|
||||||
|
ClientIP: clientIP,
|
||||||
|
MacHash: macHash,
|
||||||
|
URL: u,
|
||||||
|
Method: req.Method,
|
||||||
|
SetCookieNames: setCookieNames(setCookies, maxSetCookieNames),
|
||||||
|
CookieNames: sent,
|
||||||
|
SetCookieCount: len(setCookies),
|
||||||
|
CookieCount: len(reqCookies),
|
||||||
|
Status: resp.StatusCode,
|
||||||
|
}
|
||||||
|
b, _ := json.Marshal(ev)
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// emitCookies relays the cookie metadata for a MITM'd response (gated). No-op
|
||||||
|
// when neither a Set-Cookie nor a request Cookie is present.
|
||||||
|
func (px *Proxy) emitCookies(clientIP, macHash string, req *http.Request, resp *http.Response) {
|
||||||
|
if !px.relayEnabled() || !cookiesRelevant(req, resp) {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
px.relayEmit(cookiesSocket, cookiesRoute, buildCookiesPayload(clientIP, macHash, req, resp))
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── ja4 payload ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// ja4Event mirrors the JSON the Python JA4Relay.tls_clienthello() emitted.
|
||||||
|
// alpn_protocols / cipher_suites are always JSON arrays (never null) — matching
|
||||||
|
// list(ch.alpn_protocols or []). extensions is always null: crypto/tls'
|
||||||
|
// ClientHelloInfo does not expose the raw extension list, exactly the Python
|
||||||
|
// `if hasattr(ch, "extensions") else None` fallback (the service tolerates it).
|
||||||
|
type ja4Event struct {
|
||||||
|
TSMs int64 `json:"ts_ms"`
|
||||||
|
ClientIP string `json:"client_ip"`
|
||||||
|
MacHash string `json:"client_mac_hash"`
|
||||||
|
SNI string `json:"sni"`
|
||||||
|
ALPN []string `json:"alpn_protocols"`
|
||||||
|
Ciphers []uint16 `json:"cipher_suites"`
|
||||||
|
Extensions *[]int `json:"extensions"` // always nil → JSON null
|
||||||
|
}
|
||||||
|
|
||||||
|
// buildJA4Payload builds the /ja4 payload for one MITM'd TLS ClientHello.
|
||||||
|
func buildJA4Payload(clientIP, macHash, sni string, alpn []string, ciphers []uint16) []byte {
|
||||||
|
if alpn == nil {
|
||||||
|
alpn = []string{}
|
||||||
|
}
|
||||||
|
if ciphers == nil {
|
||||||
|
ciphers = []uint16{}
|
||||||
|
}
|
||||||
|
ev := ja4Event{
|
||||||
|
TSMs: nowMS(),
|
||||||
|
ClientIP: clientIP,
|
||||||
|
MacHash: macHash,
|
||||||
|
SNI: sni,
|
||||||
|
ALPN: alpn,
|
||||||
|
Ciphers: ciphers,
|
||||||
|
Extensions: nil,
|
||||||
|
}
|
||||||
|
b, _ := json.Marshal(ev)
|
||||||
|
return b
|
||||||
|
}
|
||||||
|
|
||||||
|
// emitJA4 relays the captured ClientHello fingerprint material for a MITM'd
|
||||||
|
// handshake (gated). Called once per handshake, before Decide — so blocked and
|
||||||
|
// allowed flows alike are relayed, matching the Python addon which ran on every
|
||||||
|
// tls_clienthello.
|
||||||
|
func (px *Proxy) emitJA4(clientIP, macHash, sni string, alpn []string, ciphers []uint16) {
|
||||||
|
if !px.relayEnabled() {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
px.relayEmit(ja4Socket, ja4Route, buildJA4Payload(clientIP, macHash, sni, alpn, ciphers))
|
||||||
|
}
|
||||||
355
packages/secubox-toolbox-ng/cmd/sbxmitm/relay_test.go
Normal file
355
packages/secubox-toolbox-ng/cmd/sbxmitm/relay_test.go
Normal file
|
|
@ -0,0 +1,355 @@
|
||||||
|
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||||
|
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||||
|
//
|
||||||
|
// Unit tests for the per-flow analysis relay payload builders + emit wiring
|
||||||
|
// (#662 — restoring the dpi/cookies/ja4 events that feed "Qui te piste?").
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"encoding/json"
|
||||||
|
"net"
|
||||||
|
"net/http"
|
||||||
|
"net/url"
|
||||||
|
"path/filepath"
|
||||||
|
"strings"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ── dpi payload ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestBuildDPIPayload(t *testing.T) {
|
||||||
|
req, _ := http.NewRequest("GET", "https://tracker.example.com/pixel?x=1", nil)
|
||||||
|
req.Header.Set("User-Agent", "Mozilla/5.0 (X11)")
|
||||||
|
p := buildDPIPayload("203.0.113.7", "abcd1234", "tracker.example.com", req)
|
||||||
|
|
||||||
|
var m map[string]any
|
||||||
|
if err := json.Unmarshal(p, &m); err != nil {
|
||||||
|
t.Fatalf("unmarshal: %v\n%s", err, p)
|
||||||
|
}
|
||||||
|
if m["client_ip"] != "203.0.113.7" {
|
||||||
|
t.Errorf("client_ip = %v", m["client_ip"])
|
||||||
|
}
|
||||||
|
if m["client_mac_hash"] != "abcd1234" {
|
||||||
|
t.Errorf("client_mac_hash = %v", m["client_mac_hash"])
|
||||||
|
}
|
||||||
|
if m["host"] != "tracker.example.com" {
|
||||||
|
t.Errorf("host = %v", m["host"])
|
||||||
|
}
|
||||||
|
if m["scheme"] != "https" {
|
||||||
|
t.Errorf("scheme = %v", m["scheme"])
|
||||||
|
}
|
||||||
|
if m["method"] != "GET" {
|
||||||
|
t.Errorf("method = %v", m["method"])
|
||||||
|
}
|
||||||
|
if m["user_agent"] != "Mozilla/5.0 (X11)" {
|
||||||
|
t.Errorf("user_agent = %v", m["user_agent"])
|
||||||
|
}
|
||||||
|
if m["sni"] != "tracker.example.com" {
|
||||||
|
t.Errorf("sni = %v", m["sni"])
|
||||||
|
}
|
||||||
|
// ts_ms present and plausible (a recent unix-millis value).
|
||||||
|
ts, ok := m["ts_ms"].(float64)
|
||||||
|
if !ok || ts < 1_600_000_000_000 {
|
||||||
|
t.Errorf("ts_ms = %v (want recent unix millis)", m["ts_ms"])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Absent User-Agent → JSON null (not "" and not omitted), mirroring the Python
|
||||||
|
// addon's headers.get("user-agent") → None.
|
||||||
|
func TestBuildDPIPayloadNullUserAgent(t *testing.T) {
|
||||||
|
req, _ := http.NewRequest("GET", "https://h.example/", nil)
|
||||||
|
p := buildDPIPayload("1.2.3.4", "h", "h.example", req)
|
||||||
|
if !strings.Contains(string(p), `"user_agent":null`) {
|
||||||
|
t.Errorf("expected user_agent null, got: %s", p)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── cookies payload ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestBuildCookiesPayloadNamesOnly(t *testing.T) {
|
||||||
|
req, _ := http.NewRequest("POST", "https://shop.example.com/cart", nil)
|
||||||
|
req.Header.Add("Cookie", "sessionid=SECRET_VALUE; csrftoken=ANOTHER_SECRET")
|
||||||
|
req.Header.Add("Cookie", "_ga=GA1.2.deadbeef")
|
||||||
|
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
|
||||||
|
resp.Header.Add("Set-Cookie", "_fbp=fb.1.SECRET; Path=/; HttpOnly; SameSite=Lax")
|
||||||
|
resp.Header.Add("Set-Cookie", "uid=PRIVATE; Domain=.example.com")
|
||||||
|
|
||||||
|
p := buildCookiesPayload("10.99.1.5", "wgpersona", req, resp)
|
||||||
|
var m map[string]any
|
||||||
|
if err := json.Unmarshal(p, &m); err != nil {
|
||||||
|
t.Fatalf("unmarshal: %v\n%s", err, p)
|
||||||
|
}
|
||||||
|
if m["url"] != "https://shop.example.com/cart" {
|
||||||
|
t.Errorf("url = %v", m["url"])
|
||||||
|
}
|
||||||
|
if m["method"] != "POST" {
|
||||||
|
t.Errorf("method = %v", m["method"])
|
||||||
|
}
|
||||||
|
if int(m["status"].(float64)) != 200 {
|
||||||
|
t.Errorf("status = %v", m["status"])
|
||||||
|
}
|
||||||
|
if int(m["set_cookie_count"].(float64)) != 2 {
|
||||||
|
t.Errorf("set_cookie_count = %v", m["set_cookie_count"])
|
||||||
|
}
|
||||||
|
if int(m["cookie_count"].(float64)) != 2 {
|
||||||
|
t.Errorf("cookie_count (header lines) = %v", m["cookie_count"])
|
||||||
|
}
|
||||||
|
setNames := toStrings(m["set_cookie_names"])
|
||||||
|
if !equalStrSet(setNames, []string{"_fbp", "uid"}) {
|
||||||
|
t.Errorf("set_cookie_names = %v", setNames)
|
||||||
|
}
|
||||||
|
cookieNames := toStrings(m["cookie_names"])
|
||||||
|
if !equalStrSet(cookieNames, []string{"sessionid", "csrftoken", "_ga"}) {
|
||||||
|
t.Errorf("cookie_names = %v", cookieNames)
|
||||||
|
}
|
||||||
|
// Hard privacy guarantee: NO value leaked anywhere in the payload.
|
||||||
|
raw := string(p)
|
||||||
|
for _, secret := range []string{"SECRET_VALUE", "ANOTHER_SECRET", "deadbeef", "fb.1.SECRET", "PRIVATE", "GA1.2"} {
|
||||||
|
if strings.Contains(raw, secret) {
|
||||||
|
t.Errorf("payload leaked cookie value %q: %s", secret, raw)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Set-Cookie name parse: text before the first '='. Cookie header split on ';'.
|
||||||
|
func TestCookieNameParsing(t *testing.T) {
|
||||||
|
if got := setCookieName("name=val; Path=/; Secure"); got != "name" {
|
||||||
|
t.Errorf("setCookieName = %q", got)
|
||||||
|
}
|
||||||
|
if got := setCookieName(" spaced = v"); got != "spaced" {
|
||||||
|
t.Errorf("setCookieName trim = %q", got)
|
||||||
|
}
|
||||||
|
if got := setCookieName("=novalue"); got != "" {
|
||||||
|
t.Errorf("setCookieName empty name = %q", got)
|
||||||
|
}
|
||||||
|
if got := setCookieName("attributeonly"); got != "" {
|
||||||
|
t.Errorf("setCookieName no eq = %q", got)
|
||||||
|
}
|
||||||
|
|
||||||
|
names := parseCookieHeaderNames("a=1; b=2;c=3")
|
||||||
|
if !equalStrSet(names, []string{"a", "b", "c"}) {
|
||||||
|
t.Errorf("parseCookieHeaderNames = %v", names)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Caps: ≤30 Set-Cookie names, ≤50 sent cookie names.
|
||||||
|
func TestCookiesPayloadCaps(t *testing.T) {
|
||||||
|
req, _ := http.NewRequest("GET", "https://e.example/", nil)
|
||||||
|
var bigCookie strings.Builder
|
||||||
|
for i := 0; i < 80; i++ {
|
||||||
|
if i > 0 {
|
||||||
|
bigCookie.WriteString("; ")
|
||||||
|
}
|
||||||
|
bigCookie.WriteString("c")
|
||||||
|
bigCookie.WriteByte(byte('0' + i%10))
|
||||||
|
bigCookie.WriteString("_")
|
||||||
|
bigCookie.WriteByte(byte('a' + i%26))
|
||||||
|
bigCookie.WriteString("=v")
|
||||||
|
}
|
||||||
|
req.Header.Add("Cookie", bigCookie.String())
|
||||||
|
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
|
||||||
|
for i := 0; i < 45; i++ {
|
||||||
|
resp.Header.Add("Set-Cookie", "sc"+string(rune('A'+i%26))+string(rune('0'+i%10))+"=v")
|
||||||
|
}
|
||||||
|
p := buildCookiesPayload("1.1.1.1", "h", req, resp)
|
||||||
|
var m map[string]any
|
||||||
|
json.Unmarshal(p, &m)
|
||||||
|
if n := len(toStrings(m["set_cookie_names"])); n > 30 {
|
||||||
|
t.Errorf("set_cookie_names not capped at 30: %d", n)
|
||||||
|
}
|
||||||
|
if n := len(toStrings(m["cookie_names"])); n > 50 {
|
||||||
|
t.Errorf("cookie_names not capped at 50: %d", n)
|
||||||
|
}
|
||||||
|
// raw counts still reflect the real totals.
|
||||||
|
if int(m["set_cookie_count"].(float64)) != 45 {
|
||||||
|
t.Errorf("set_cookie_count = %v", m["set_cookie_count"])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// URL truncated to ≤300 chars.
|
||||||
|
func TestCookiesPayloadURLTruncation(t *testing.T) {
|
||||||
|
long := "https://e.example/" + strings.Repeat("a", 500)
|
||||||
|
u, _ := url.Parse(long)
|
||||||
|
req := &http.Request{Method: "GET", URL: u, Header: http.Header{}}
|
||||||
|
req.Header.Add("Cookie", "x=1")
|
||||||
|
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
|
||||||
|
p := buildCookiesPayload("1.1.1.1", "h", req, resp)
|
||||||
|
var m map[string]any
|
||||||
|
json.Unmarshal(p, &m)
|
||||||
|
if len(m["url"].(string)) > 300 {
|
||||||
|
t.Errorf("url not truncated: %d chars", len(m["url"].(string)))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// cookiesRelevant gates emission: only when ≥1 Set-Cookie OR ≥1 Cookie.
|
||||||
|
func TestCookiesRelevant(t *testing.T) {
|
||||||
|
mk := func(setC, reqC bool) (*http.Request, *http.Response) {
|
||||||
|
req, _ := http.NewRequest("GET", "https://e/", nil)
|
||||||
|
if reqC {
|
||||||
|
req.Header.Add("Cookie", "a=1")
|
||||||
|
}
|
||||||
|
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
|
||||||
|
if setC {
|
||||||
|
resp.Header.Add("Set-Cookie", "x=1")
|
||||||
|
}
|
||||||
|
return req, resp
|
||||||
|
}
|
||||||
|
if r, p := mk(false, false); cookiesRelevant(r, p) {
|
||||||
|
t.Error("no cookies → should not be relevant")
|
||||||
|
}
|
||||||
|
if r, p := mk(true, false); !cookiesRelevant(r, p) {
|
||||||
|
t.Error("set-cookie present → relevant")
|
||||||
|
}
|
||||||
|
if r, p := mk(false, true); !cookiesRelevant(r, p) {
|
||||||
|
t.Error("request cookie present → relevant")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── ja4 payload ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestBuildJA4Payload(t *testing.T) {
|
||||||
|
p := buildJA4Payload("198.51.100.9", "tlspersona", "secure.example.com",
|
||||||
|
[]string{"h2", "http/1.1"}, []uint16{4865, 4866, 49195})
|
||||||
|
var m map[string]any
|
||||||
|
if err := json.Unmarshal(p, &m); err != nil {
|
||||||
|
t.Fatalf("unmarshal: %v\n%s", err, p)
|
||||||
|
}
|
||||||
|
if m["sni"] != "secure.example.com" {
|
||||||
|
t.Errorf("sni = %v", m["sni"])
|
||||||
|
}
|
||||||
|
if m["client_ip"] != "198.51.100.9" {
|
||||||
|
t.Errorf("client_ip = %v", m["client_ip"])
|
||||||
|
}
|
||||||
|
if m["client_mac_hash"] != "tlspersona" {
|
||||||
|
t.Errorf("client_mac_hash = %v", m["client_mac_hash"])
|
||||||
|
}
|
||||||
|
alpn := toStrings(m["alpn_protocols"])
|
||||||
|
if !equalStrSet(alpn, []string{"h2", "http/1.1"}) {
|
||||||
|
t.Errorf("alpn = %v", alpn)
|
||||||
|
}
|
||||||
|
cs := m["cipher_suites"].([]any)
|
||||||
|
if len(cs) != 3 || int(cs[0].(float64)) != 4865 {
|
||||||
|
t.Errorf("cipher_suites = %v", cs)
|
||||||
|
}
|
||||||
|
// extensions: always null (stdlib doesn't expose them).
|
||||||
|
if !strings.Contains(string(p), `"extensions":null`) {
|
||||||
|
t.Errorf("expected extensions null, got: %s", p)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Empty ALPN / ciphers → JSON empty arrays (mirrors list(... or [])), not null.
|
||||||
|
func TestBuildJA4PayloadEmptySlices(t *testing.T) {
|
||||||
|
p := buildJA4Payload("1.1.1.1", "h", "", nil, nil)
|
||||||
|
raw := string(p)
|
||||||
|
if !strings.Contains(raw, `"alpn_protocols":[]`) {
|
||||||
|
t.Errorf("alpn should be [] not null: %s", raw)
|
||||||
|
}
|
||||||
|
if !strings.Contains(raw, `"cipher_suites":[]`) {
|
||||||
|
t.Errorf("cipher_suites should be [] not null: %s", raw)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── gate wiring ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// The flag wires into Proxy.analysisRelay and gates emission.
|
||||||
|
func TestAnalysisRelayGate(t *testing.T) {
|
||||||
|
on := &Proxy{analysisRelay: true}
|
||||||
|
off := &Proxy{analysisRelay: false}
|
||||||
|
if !on.relayEnabled() {
|
||||||
|
t.Error("analysisRelay=true → relayEnabled() should be true")
|
||||||
|
}
|
||||||
|
if off.relayEnabled() {
|
||||||
|
t.Error("analysisRelay=false → relayEnabled() should be false")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// emitDPI/emitCookies/emitJA4 respect the gate: with analysisRelay=false they
|
||||||
|
// deliver nothing to a live socket; with it true they deliver.
|
||||||
|
func TestEmitGateRespected(t *testing.T) {
|
||||||
|
sock := filepath.Join(t.TempDir(), "dpi.sock")
|
||||||
|
ln, err := net.Listen("unix", sock)
|
||||||
|
if err != nil {
|
||||||
|
t.Fatal(err)
|
||||||
|
}
|
||||||
|
defer ln.Close()
|
||||||
|
hits := make(chan struct{}, 4)
|
||||||
|
go func() {
|
||||||
|
for {
|
||||||
|
c, err := ln.Accept()
|
||||||
|
if err != nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
buf := make([]byte, 1024)
|
||||||
|
c.Read(buf)
|
||||||
|
c.Write([]byte("HTTP/1.1 204 No Content\r\nContent-Length: 0\r\nConnection: close\r\n\r\n"))
|
||||||
|
c.Close()
|
||||||
|
hits <- struct{}{}
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Gate off → nothing delivered.
|
||||||
|
off := &Proxy{analysisRelay: false}
|
||||||
|
off.relayEmit(sock, "/classify", []byte(`{"k":"v"}`))
|
||||||
|
select {
|
||||||
|
case <-hits:
|
||||||
|
t.Fatal("gate off but a payload was delivered")
|
||||||
|
case <-time.After(300 * time.Millisecond):
|
||||||
|
}
|
||||||
|
|
||||||
|
// Gate on → delivered.
|
||||||
|
on := &Proxy{analysisRelay: true}
|
||||||
|
on.relayEmit(sock, "/classify", []byte(`{"k":"v"}`))
|
||||||
|
select {
|
||||||
|
case <-hits:
|
||||||
|
case <-time.After(2 * time.Second):
|
||||||
|
t.Fatal("gate on but nothing delivered")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── socket-path consts ─────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func TestRelaySocketPaths(t *testing.T) {
|
||||||
|
if dpiSocket != "/run/secubox/dpi.sock" {
|
||||||
|
t.Errorf("dpiSocket = %q", dpiSocket)
|
||||||
|
}
|
||||||
|
if cookiesSocket != "/run/secubox/cookies.sock" {
|
||||||
|
t.Errorf("cookiesSocket = %q", cookiesSocket)
|
||||||
|
}
|
||||||
|
if ja4Socket != "/run/secubox/threat-analyst.sock" {
|
||||||
|
t.Errorf("ja4Socket = %q", ja4Socket)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── test helpers ───────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
func toStrings(v any) []string {
|
||||||
|
arr, ok := v.([]any)
|
||||||
|
if !ok {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
out := make([]string, 0, len(arr))
|
||||||
|
for _, e := range arr {
|
||||||
|
out = append(out, e.(string))
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
|
func equalStrSet(got, want []string) bool {
|
||||||
|
if len(got) != len(want) {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
seen := map[string]int{}
|
||||||
|
for _, g := range got {
|
||||||
|
seen[g]++
|
||||||
|
}
|
||||||
|
for _, w := range want {
|
||||||
|
seen[w]--
|
||||||
|
}
|
||||||
|
for _, n := range seen {
|
||||||
|
if n != 0 {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return true
|
||||||
|
}
|
||||||
189
packages/secubox-toolbox-ng/cmd/sbxmitm/reload_test.go
Normal file
189
packages/secubox-toolbox-ng/cmd/sbxmitm/reload_test.go
Normal file
|
|
@ -0,0 +1,189 @@
|
||||||
|
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||||
|
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||||
|
//
|
||||||
|
// SecuBox-Deb :: toolbox-ng :: policy live-reload tests (#662 auto-learn loop)
|
||||||
|
//
|
||||||
|
// The #662 Go cutover loaded the BLOCK/SPLICE lists ONCE at startup, so an
|
||||||
|
// autolearn promotion (or a manual edit) of learned-trackers.txt never took
|
||||||
|
// effect until a worker restart — the very thing that made new adwares slip
|
||||||
|
// through forever. These tests prove the mtime-based live-reload: after the
|
||||||
|
// throttle window, a host appended to learned-trackers.txt flips Decide from
|
||||||
|
// "mitm" to "block" with NO restart. Concurrency is exercised under -race.
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"sync"
|
||||||
|
"sync/atomic"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// writeFile is a tiny helper that (re)writes a backing list file with content.
|
||||||
|
func writeFile(t *testing.T, path, content string) {
|
||||||
|
t.Helper()
|
||||||
|
if err := os.WriteFile(path, []byte(content), 0o644); err != nil {
|
||||||
|
t.Fatalf("write %s: %v", path, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// bumpMtime forces the file's mtime forward so the reload's stat sees a change
|
||||||
|
// even on coarse-granularity filesystems or sub-second test runs.
|
||||||
|
func bumpMtime(t *testing.T, path string, d time.Duration) {
|
||||||
|
t.Helper()
|
||||||
|
ft := time.Now().Add(d)
|
||||||
|
if err := os.Chtimes(path, ft, ft); err != nil {
|
||||||
|
t.Fatalf("chtimes %s: %v", path, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestMaybeReloadPicksUpAppendedLearnedTracker is the linchpin test: a host that
|
||||||
|
// initially Decides "mitm" must flip to "block" once it is appended to
|
||||||
|
// learned-trackers.txt and the throttle window elapses — without reloading the
|
||||||
|
// Policy from scratch.
|
||||||
|
func TestMaybeReloadPicksUpAppendedLearnedTracker(t *testing.T) {
|
||||||
|
dir := t.TempDir()
|
||||||
|
learned := filepath.Join(dir, "learned-trackers.txt")
|
||||||
|
allow := filepath.Join(dir, "ad-allowlist.txt")
|
||||||
|
writeFile(t, learned, "")
|
||||||
|
writeFile(t, allow, "")
|
||||||
|
|
||||||
|
pol, err := LoadPolicy(PolicyOpts{
|
||||||
|
LearnedPath: learned,
|
||||||
|
AllowPath: allow,
|
||||||
|
// keep the splice/never paths in the temp dir so missing-file behaviour
|
||||||
|
// (empty set) is deterministic.
|
||||||
|
SpliceSeedPath: filepath.Join(dir, "seed"),
|
||||||
|
SpliceLearnPath: filepath.Join(dir, "slearn"),
|
||||||
|
PureTrackersPath: filepath.Join(dir, "pure"),
|
||||||
|
SelfDomains: []string{"secubox.in"},
|
||||||
|
})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("LoadPolicy: %v", err)
|
||||||
|
}
|
||||||
|
// Make the reload eager for the test (no 15s wait): zero throttle.
|
||||||
|
pol.reloadThrottle = 0
|
||||||
|
|
||||||
|
const host = "acotedemoi.com"
|
||||||
|
if got := pol.Decide(host, host); got != "mitm" {
|
||||||
|
t.Fatalf("before promotion: Decide(%q) = %q, want mitm", host, got)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Promote: append the host and bump mtime forward.
|
||||||
|
writeFile(t, learned, host+"\n")
|
||||||
|
bumpMtime(t, learned, 2*time.Second)
|
||||||
|
|
||||||
|
if got := pol.Decide(host, host); got != "block" {
|
||||||
|
t.Fatalf("after promotion: Decide(%q) = %q, want block", host, got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestMaybeReloadThrottled proves the throttle: with a non-zero throttle window,
|
||||||
|
// a change made just after a reload is NOT observed until the window elapses,
|
||||||
|
// keeping the hot path cheap (one stat per ~window, not per request).
|
||||||
|
func TestMaybeReloadThrottled(t *testing.T) {
|
||||||
|
dir := t.TempDir()
|
||||||
|
learned := filepath.Join(dir, "learned-trackers.txt")
|
||||||
|
writeFile(t, learned, "")
|
||||||
|
|
||||||
|
pol, err := LoadPolicy(PolicyOpts{LearnedPath: learned, AllowPath: filepath.Join(dir, "allow")})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("LoadPolicy: %v", err)
|
||||||
|
}
|
||||||
|
pol.reloadThrottle = time.Hour // effectively "never re-stat during the test"
|
||||||
|
|
||||||
|
// Prime the throttle clock with one Decide (does the initial stat).
|
||||||
|
_ = pol.Decide("x.example", "x.example")
|
||||||
|
|
||||||
|
const host = "tracker.example"
|
||||||
|
writeFile(t, learned, host+"\n")
|
||||||
|
bumpMtime(t, learned, 2*time.Second)
|
||||||
|
|
||||||
|
if got := pol.Decide(host, host); got != "mitm" {
|
||||||
|
t.Fatalf("throttled: Decide(%q) = %q, want mitm (change not yet observed)", host, got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestMaybeReloadAllowlist proves the allowlist file is live-reloaded too: a
|
||||||
|
// host the ad-host regex would block ("doubleclick.net") flips block→allow once
|
||||||
|
// appended to the allowlist and the window elapses.
|
||||||
|
func TestMaybeReloadAllowlist(t *testing.T) {
|
||||||
|
dir := t.TempDir()
|
||||||
|
learned := filepath.Join(dir, "learned-trackers.txt")
|
||||||
|
allow := filepath.Join(dir, "ad-allowlist.txt")
|
||||||
|
writeFile(t, learned, "")
|
||||||
|
writeFile(t, allow, "")
|
||||||
|
|
||||||
|
pol, err := LoadPolicy(PolicyOpts{LearnedPath: learned, AllowPath: allow})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("LoadPolicy: %v", err)
|
||||||
|
}
|
||||||
|
pol.reloadThrottle = 0
|
||||||
|
|
||||||
|
const host = "doubleclick.net"
|
||||||
|
if got := pol.Decide(host, host); got != "block" {
|
||||||
|
t.Fatalf("before allow: Decide(%q) = %q, want block", host, got)
|
||||||
|
}
|
||||||
|
writeFile(t, allow, host+"\n")
|
||||||
|
bumpMtime(t, allow, 2*time.Second)
|
||||||
|
if got := pol.Decide(host, host); got != "allow" {
|
||||||
|
t.Fatalf("after allow: Decide(%q) = %q, want allow", host, got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestMaybeReloadConcurrent runs Decide from many goroutines while the backing
|
||||||
|
// learned file is rewritten concurrently. Under `go test -race` this proves the
|
||||||
|
// RWMutex-guarded swap is data-race-free.
|
||||||
|
func TestMaybeReloadConcurrent(t *testing.T) {
|
||||||
|
dir := t.TempDir()
|
||||||
|
learned := filepath.Join(dir, "learned-trackers.txt")
|
||||||
|
writeFile(t, learned, "seed.example\n")
|
||||||
|
|
||||||
|
pol, err := LoadPolicy(PolicyOpts{LearnedPath: learned, AllowPath: filepath.Join(dir, "allow")})
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("LoadPolicy: %v", err)
|
||||||
|
}
|
||||||
|
pol.reloadThrottle = 0 // force a stat on every Decide → maximal contention
|
||||||
|
|
||||||
|
var wg sync.WaitGroup
|
||||||
|
var blocks int64
|
||||||
|
stop := make(chan struct{})
|
||||||
|
|
||||||
|
// Writer: keep appending hosts + bumping mtime.
|
||||||
|
wg.Add(1)
|
||||||
|
go func() {
|
||||||
|
defer wg.Done()
|
||||||
|
i := 0
|
||||||
|
for {
|
||||||
|
select {
|
||||||
|
case <-stop:
|
||||||
|
return
|
||||||
|
default:
|
||||||
|
}
|
||||||
|
writeFile(t, learned, "seed.example\nh"+itoa(i)+".example\n")
|
||||||
|
bumpMtime(t, learned, time.Duration(i+1)*time.Second)
|
||||||
|
i++
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Readers: hammer Decide on the seed (stable → always block) + a live host.
|
||||||
|
for r := 0; r < 8; r++ {
|
||||||
|
wg.Add(1)
|
||||||
|
go func() {
|
||||||
|
defer wg.Done()
|
||||||
|
for j := 0; j < 2000; j++ {
|
||||||
|
if pol.Decide("seed.example", "seed.example") == "block" {
|
||||||
|
atomic.AddInt64(&blocks, 1)
|
||||||
|
}
|
||||||
|
pol.Decide("h0.example", "h0.example")
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
}
|
||||||
|
time.Sleep(50 * time.Millisecond)
|
||||||
|
close(stop)
|
||||||
|
wg.Wait()
|
||||||
|
if blocks == 0 {
|
||||||
|
t.Fatal("expected the stable seed host to block at least once")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -18,7 +18,9 @@
|
||||||
// avatar → /run/secubox/avatar.sock POST /fingerprint
|
// avatar → /run/secubox/avatar.sock POST /fingerprint
|
||||||
// ja4 → /run/secubox/threat-analyst.sock POST /ja4
|
// ja4 → /run/secubox/threat-analyst.sock POST /ja4
|
||||||
// soc_relay → /run/secubox/soc.sock POST /event
|
// soc_relay → /run/secubox/soc.sock POST /event
|
||||||
// social_graph: in-process (no socket) — correlated inside the engine, not emitted.
|
// social_graph: correlated in-process (social.go) — edges (hash-only, never raw
|
||||||
|
// cookie values) are NOT emitted to a module socket but POSTed to the portal
|
||||||
|
// /__toolbox/social-event ingest (the social store lives in the toolbox/portal).
|
||||||
//
|
//
|
||||||
// emit takes the full socket PATH (not an http+unix:// URL) plus the route in
|
// emit takes the full socket PATH (not an http+unix:// URL) plus the route in
|
||||||
// the payload's destination; callers build the path from the table above.
|
// the payload's destination; callers build the path from the table above.
|
||||||
|
|
|
||||||
605
packages/secubox-toolbox-ng/cmd/sbxmitm/social.go
Normal file
605
packages/secubox-toolbox-ng/cmd/sbxmitm/social.go
Normal file
|
|
@ -0,0 +1,605 @@
|
||||||
|
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||||
|
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||||
|
//
|
||||||
|
// SecuBox-Deb :: toolbox-ng :: cross-site cookie-tracker correlation (#662)
|
||||||
|
//
|
||||||
|
// Restores the kbin "/social" cross-site tracker graph, frozen since the #662
|
||||||
|
// Phase-7 cutover decommissioned the in-process Python `social_graph` addon
|
||||||
|
// (packages/secubox-toolbox/mitmproxy_addons/social_graph.py). The graph reads
|
||||||
|
// social_nodes/social_links in toolbox.db, folded from raw social_edges — and
|
||||||
|
// the edges stopped flowing when the Python addon was retired.
|
||||||
|
//
|
||||||
|
// This is a FAITHFUL Go port of the addon's correlation logic:
|
||||||
|
// - cookieIDHash : byte-exact port of social.cookie_id_hash (Python = source
|
||||||
|
// of truth, proven by social_test.go ↔ tests/test_social_parity.py over a
|
||||||
|
// shared fixture — the same anti-rig discipline as jar.go).
|
||||||
|
// - isDenyListed + the _DEFAULT_DENY_COOKIES set (social.py).
|
||||||
|
// - registrableSocial : the addon's _registrable_domain eTLD+1 helper
|
||||||
|
// (DIFFERENT from policy.go's registrable() — IP literals pass through,
|
||||||
|
// no port strip, a larger multi-label-TLD table; the graph correctness
|
||||||
|
// depends on this exact flavour, so it is replicated verbatim and NOT
|
||||||
|
// consolidated with policy.registrable).
|
||||||
|
// - the 3rd-party decision (tracker_domain != src_site on eTLD+1) on BOTH the
|
||||||
|
// response Set-Cookie path and the request Cookie path, mirroring the
|
||||||
|
// addon's response()+request hooks.
|
||||||
|
// - the CMP consent-platform detection → consent_state ∈ {none_seen,
|
||||||
|
// pre_consent, post_consent} via a per-(peer,site) in-memory log.
|
||||||
|
//
|
||||||
|
// Privacy/CSPN invariant (the reason the original ran in-process): raw cookie
|
||||||
|
// VALUES NEVER leave the engine — only the truncated SHA-256 cookieIDHash is
|
||||||
|
// emitted. The edges are POSTed fire-and-forget to the portal's
|
||||||
|
// /__toolbox/social-event ingest (sibling of /__toolbox/ad-event), which calls
|
||||||
|
// social.record_edge(). Best-effort throughout; a dead/slow portal can never
|
||||||
|
// block or delay a client flow.
|
||||||
|
//
|
||||||
|
// Pure standard library — no external modules, no go.sum.
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"crypto/sha256"
|
||||||
|
"encoding/hex"
|
||||||
|
"encoding/json"
|
||||||
|
"log"
|
||||||
|
"net/http"
|
||||||
|
"strings"
|
||||||
|
"sync"
|
||||||
|
"time"
|
||||||
|
)
|
||||||
|
|
||||||
|
// ── registrableSocial: port of social_graph._registrable_domain ─────────────
|
||||||
|
//
|
||||||
|
// Python (mitmproxy_addons/social_graph.py):
|
||||||
|
//
|
||||||
|
// h = (host or "").lower().strip(".")
|
||||||
|
// if not h or h.replace(".", "").isdigit(): return h # raw IP → as-is
|
||||||
|
// parts = h.split(".")
|
||||||
|
// if len(parts) < 2: return h
|
||||||
|
// last_two = ".".join(parts[-2:])
|
||||||
|
// if last_two in _MULTI_LABEL_TLDS and len(parts) >= 3: return ".".join(parts[-3:])
|
||||||
|
// return last_two
|
||||||
|
//
|
||||||
|
// This DIFFERS from policy.registrable (ad_ghost flavour): no port strip, IP
|
||||||
|
// literals pass through unchanged (the store later drops IP trackers via
|
||||||
|
// _is_ip), and the multi-label-TLD table below is the addon's larger set. The
|
||||||
|
// graph's 3rd-party comparison is done with THIS function, so it must match the
|
||||||
|
// addon exactly.
|
||||||
|
var socialMultiLabelTLDs = map[string]bool{
|
||||||
|
"co.uk": true, "ac.uk": true, "gov.uk": true, "org.uk": true, "net.uk": true,
|
||||||
|
"co.jp": true, "ne.jp": true, "ac.jp": true,
|
||||||
|
"com.au": true, "net.au": true, "org.au": true,
|
||||||
|
"com.br": true, "com.cn": true, "com.hk": true, "com.tw": true, "com.mx": true,
|
||||||
|
}
|
||||||
|
|
||||||
|
func registrableSocial(host string) string {
|
||||||
|
h := strings.Trim(strings.ToLower(host), ".")
|
||||||
|
if h == "" {
|
||||||
|
return h
|
||||||
|
}
|
||||||
|
// h.replace(".","").isdigit() → all-digit (IPv4-ish) → return as-is.
|
||||||
|
if isAllDigits(strings.ReplaceAll(h, ".", "")) {
|
||||||
|
return h
|
||||||
|
}
|
||||||
|
parts := strings.Split(h, ".")
|
||||||
|
if len(parts) < 2 {
|
||||||
|
return h
|
||||||
|
}
|
||||||
|
last2 := strings.Join(parts[len(parts)-2:], ".")
|
||||||
|
if socialMultiLabelTLDs[last2] && len(parts) >= 3 {
|
||||||
|
return strings.Join(parts[len(parts)-3:], ".")
|
||||||
|
}
|
||||||
|
return last2
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── cookieIDHash: BYTE-EXACT port of social.cookie_id_hash ───────────────────
|
||||||
|
//
|
||||||
|
// Python (secubox_toolbox/social.py):
|
||||||
|
//
|
||||||
|
// h = sha256()
|
||||||
|
// h.update(tracker_domain.lower().encode("utf-8","replace")); h.update(b"\x00")
|
||||||
|
// h.update(cookie_name.lower().encode("utf-8","replace")); h.update(b"\x00")
|
||||||
|
// h.update(cookie_value.encode("utf-8","replace"))
|
||||||
|
// return h.hexdigest()[:16]
|
||||||
|
//
|
||||||
|
// CRITICAL: tracker_domain + cookie_name are LOWER-cased; the cookie_value is
|
||||||
|
// NOT. NUL (0x00) separators between the three fields. Go strings are already
|
||||||
|
// UTF-8, and strings.ToLower is byte-identical to Python str.lower for the
|
||||||
|
// ASCII + Latin domain/name inputs the fixtures exercise (incl. the Ünîcödé
|
||||||
|
// case, verified at parity). hex of the first 8 digest bytes == hexdigest()[:16].
|
||||||
|
func cookieIDHash(trackerDomain, cookieName, cookieValue string) string {
|
||||||
|
h := sha256.New()
|
||||||
|
h.Write([]byte(strings.ToLower(trackerDomain)))
|
||||||
|
h.Write([]byte{0x00})
|
||||||
|
h.Write([]byte(strings.ToLower(cookieName)))
|
||||||
|
h.Write([]byte{0x00})
|
||||||
|
h.Write([]byte(cookieValue)) // value NOT lower-cased
|
||||||
|
sum := h.Sum(nil)
|
||||||
|
return hex.EncodeToString(sum)[:16]
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── deny-list: port of social._DEFAULT_DENY_COOKIES + is_deny_listed ─────────
|
||||||
|
//
|
||||||
|
// Names whose presence on a flow is NEVER recorded as a tracker identifier
|
||||||
|
// (session / csrf / auth / cloudflare / consent / locale). Replicated verbatim
|
||||||
|
// from social.py; matched case-insensitively after trimming.
|
||||||
|
var socialDenyCookies = map[string]bool{
|
||||||
|
// session
|
||||||
|
"phpsessid": true, "jsessionid": true, "asp.net_sessionid": true, "ci_session": true,
|
||||||
|
"express.sid": true, "connect.sid": true, "sails.sid": true, "django_session": true,
|
||||||
|
"laravel_session": true, "flask_session": true, "session": true, "sessionid": true,
|
||||||
|
// csrf
|
||||||
|
"_csrf": true, "_csrf_token": true, "xsrf-token": true, "csrftoken": true, "csrf": true,
|
||||||
|
"x-csrf-token": true, "anti-csrf-token": true,
|
||||||
|
// auth (1st-party)
|
||||||
|
"auth": true, "auth_token": true, "access_token": true, "refresh_token": true, "bearer": true,
|
||||||
|
"remember_token": true, "remember_me": true, "_oauth2_proxy": true,
|
||||||
|
// cloudflare / consent / locale (low signal)
|
||||||
|
"__cf_bm": true, "cf_clearance": true, "consent": true, "cookieconsent_status": true,
|
||||||
|
"locale": true, "lang": true, "language": true, "_locale": true,
|
||||||
|
}
|
||||||
|
|
||||||
|
// isDenyListed mirrors social.is_deny_listed (default-deny set only; the engine
|
||||||
|
// does not load the TOML extra_deny override). An empty name is deny-listed
|
||||||
|
// (Python returns True for a blank name).
|
||||||
|
func isDenyListed(cookieName string) bool {
|
||||||
|
name := strings.ToLower(strings.TrimSpace(cookieName))
|
||||||
|
if name == "" {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
return socialDenyCookies[name]
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── cookie parsers: port of _parse_set_cookie / _parse_cookie_header ─────────
|
||||||
|
|
||||||
|
// parseSetCookieNameValue mirrors social_graph._parse_set_cookie: name=value is
|
||||||
|
// the text up to the first ';'; the name is everything before the first '=',
|
||||||
|
// trimmed; the value is the rest of that first field, trimmed. Returns ok=false
|
||||||
|
// for an attribute-only / nameless / empty line.
|
||||||
|
func parseSetCookieNameValue(header string) (name, value string, ok bool) {
|
||||||
|
field := header
|
||||||
|
if i := strings.IndexByte(field, ';'); i >= 0 {
|
||||||
|
field = field[:i]
|
||||||
|
}
|
||||||
|
eq := strings.IndexByte(field, '=')
|
||||||
|
if eq < 0 {
|
||||||
|
return "", "", false
|
||||||
|
}
|
||||||
|
name = strings.TrimSpace(field[:eq])
|
||||||
|
value = strings.TrimSpace(field[eq+1:])
|
||||||
|
if name == "" {
|
||||||
|
return "", "", false
|
||||||
|
}
|
||||||
|
return name, value, true
|
||||||
|
}
|
||||||
|
|
||||||
|
// cookiePair is one (name,value) parsed from a request Cookie header.
|
||||||
|
type cookiePair struct{ name, value string }
|
||||||
|
|
||||||
|
// parseCookieHeader mirrors social_graph._parse_cookie_header: split on ';',
|
||||||
|
// each "name=value" yields a trimmed (name,value); nameless pairs are dropped.
|
||||||
|
func parseCookieHeader(header string) []cookiePair {
|
||||||
|
var out []cookiePair
|
||||||
|
for _, part := range strings.Split(header, ";") {
|
||||||
|
eq := strings.IndexByte(part, '=')
|
||||||
|
if eq < 0 {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
name := strings.TrimSpace(part[:eq])
|
||||||
|
value := strings.TrimSpace(part[eq+1:])
|
||||||
|
if name != "" {
|
||||||
|
out = append(out, cookiePair{name: name, value: value})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return out
|
||||||
|
}
|
||||||
|
|
||||||
|
// extractSetCookieDomainAttr mirrors social_graph._extract_domain_attr: pull the
|
||||||
|
// "; Domain=…" attribute from a Set-Cookie line, trimmed, leading dot stripped,
|
||||||
|
// lower-cased. Returns "" when absent.
|
||||||
|
func extractSetCookieDomainAttr(setCookie string) string {
|
||||||
|
low := strings.ToLower(setCookie)
|
||||||
|
idx := strings.Index(low, "domain")
|
||||||
|
for idx >= 0 {
|
||||||
|
// require it to be an attribute (preceded by ';' after optional spaces),
|
||||||
|
// mirroring the Python regex `;\s*domain\s*=`.
|
||||||
|
j := idx + len("domain")
|
||||||
|
// skip spaces, then '='
|
||||||
|
k := j
|
||||||
|
for k < len(setCookie) && (setCookie[k] == ' ' || setCookie[k] == '\t') {
|
||||||
|
k++
|
||||||
|
}
|
||||||
|
if k < len(setCookie) && setCookie[k] == '=' {
|
||||||
|
// confirm a ';' (or start) precedes `domain` (after spaces).
|
||||||
|
p := idx - 1
|
||||||
|
for p >= 0 && (setCookie[p] == ' ' || setCookie[p] == '\t') {
|
||||||
|
p--
|
||||||
|
}
|
||||||
|
if p < 0 || setCookie[p] == ';' {
|
||||||
|
rest := setCookie[k+1:]
|
||||||
|
if e := strings.IndexByte(rest, ';'); e >= 0 {
|
||||||
|
rest = rest[:e]
|
||||||
|
}
|
||||||
|
val := strings.ToLower(strings.TrimLeft(strings.TrimSpace(rest), "."))
|
||||||
|
if val == "" {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
return val
|
||||||
|
}
|
||||||
|
}
|
||||||
|
next := strings.Index(low[idx+1:], "domain")
|
||||||
|
if next < 0 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
idx = idx + 1 + next
|
||||||
|
}
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
// srcSiteFromReferer mirrors social_graph._src_site_from_referer: take Referer
|
||||||
|
// (else Origin), strip scheme/path/query, return registrableSocial of the host.
|
||||||
|
func srcSiteFromReferer(req *http.Request) string {
|
||||||
|
ref := req.Header.Get("Referer")
|
||||||
|
if ref == "" {
|
||||||
|
ref = req.Header.Get("Origin")
|
||||||
|
}
|
||||||
|
if ref == "" {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
s := ref
|
||||||
|
if i := strings.Index(s, "://"); i >= 0 {
|
||||||
|
s = s[i+3:]
|
||||||
|
}
|
||||||
|
if i := strings.IndexByte(s, '/'); i >= 0 {
|
||||||
|
s = s[:i]
|
||||||
|
}
|
||||||
|
if i := strings.IndexByte(s, '?'); i >= 0 {
|
||||||
|
s = s[:i]
|
||||||
|
}
|
||||||
|
return registrableSocial(s)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── consent-state detection: port of the _consent_log machinery ──────────────
|
||||||
|
//
|
||||||
|
// CMP (Consent Management Platform) cookie name prefixes + loader URL fragments,
|
||||||
|
// verbatim from social_graph._CMP_COOKIE_PREFIXES / _CMP_LOADER_FRAGMENTS. Seen
|
||||||
|
// on a flow → the site runs a CMP (has_cmp) and, for a cookie, consent recorded
|
||||||
|
// (consented). consent_state classifies a tracker edge as pre/post/none-consent.
|
||||||
|
var cmpCookiePrefixes = []string{
|
||||||
|
"optanonconsent", "onetrustconsent", "optanonalertboxclosed", // OneTrust
|
||||||
|
"didomi_token", "euconsent-v2", // Didomi / IAB TCF
|
||||||
|
"__qca", "quantcast", // Quantcast
|
||||||
|
"sp_choice", "consentuid", "_sp_", // Sourcepoint
|
||||||
|
}
|
||||||
|
|
||||||
|
var cmpLoaderFragments = []string{
|
||||||
|
"cdn.cookielaw.org", "onetrust.com", // OneTrust
|
||||||
|
"sdk.privacy-center.org", "didomi.io", // Didomi
|
||||||
|
"quantcast.mgr.consensu.org", "quantcast.com/choice", // Quantcast
|
||||||
|
"sourcepoint.mgr.consensu.org", "sp-prod.net", // Sourcepoint
|
||||||
|
}
|
||||||
|
|
||||||
|
// consentObservation is the per-(peer,site) state, mirroring the Python dict
|
||||||
|
// {"has_cmp": bool, "consented": bool}.
|
||||||
|
type consentObservation struct {
|
||||||
|
hasCMP bool
|
||||||
|
consented bool
|
||||||
|
}
|
||||||
|
|
||||||
|
// consentKey mirrors social_graph._consent_key = (mac_hash, site).
|
||||||
|
type consentKey struct{ macHash, site string }
|
||||||
|
|
||||||
|
// consentLog is the bounded in-memory per-(peer,site) observation log, mirroring
|
||||||
|
// the module-level _consent_log + its 20k soft-cap wholesale clear. The Go proxy
|
||||||
|
// is genuinely concurrent (Python relied on the GIL), so all access is
|
||||||
|
// mutex-guarded.
|
||||||
|
type consentLog struct {
|
||||||
|
mu sync.Mutex
|
||||||
|
log map[consentKey]consentObservation
|
||||||
|
}
|
||||||
|
|
||||||
|
const consentLogCap = 20000 // mirrors social_graph._consent_log soft cap
|
||||||
|
|
||||||
|
func newConsentLog() *consentLog {
|
||||||
|
return &consentLog{log: map[consentKey]consentObservation{}}
|
||||||
|
}
|
||||||
|
|
||||||
|
// update mirrors social_graph._update_consent_log: observe whether this flow
|
||||||
|
// reveals a CMP loader (URL fragment, both request and response side) or a CMP
|
||||||
|
// cookie (either direction) for the (peer,site) pair, and fold it into the log.
|
||||||
|
// - url is flow.request.pretty_url (lower-cased here).
|
||||||
|
// - cookieBlobs are the raw request Cookie + response Set-Cookie header lines.
|
||||||
|
func (cl *consentLog) update(macHash, site, url string, cookieBlobs []string) {
|
||||||
|
cl.mu.Lock()
|
||||||
|
defer cl.mu.Unlock()
|
||||||
|
if len(cl.log) > consentLogCap {
|
||||||
|
cl.log = map[consentKey]consentObservation{}
|
||||||
|
}
|
||||||
|
key := consentKey{macHash: macHash, site: site}
|
||||||
|
st := cl.log[key]
|
||||||
|
|
||||||
|
lurl := strings.ToLower(url)
|
||||||
|
for _, frag := range cmpLoaderFragments {
|
||||||
|
if strings.Contains(lurl, frag) {
|
||||||
|
st.hasCMP = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for _, blob := range cookieBlobs {
|
||||||
|
low := strings.ToLower(blob)
|
||||||
|
for _, pref := range cmpCookiePrefixes {
|
||||||
|
if strings.Contains(low, pref) {
|
||||||
|
st.hasCMP = true
|
||||||
|
st.consented = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
cl.log[key] = st
|
||||||
|
}
|
||||||
|
|
||||||
|
// stateFor mirrors social_graph._consent_state_for: post_consent if a consent
|
||||||
|
// cookie was seen here, pre_consent if a CMP is present but no consent cookie
|
||||||
|
// yet, none_seen otherwise.
|
||||||
|
func (cl *consentLog) stateFor(macHash, site string) string {
|
||||||
|
cl.mu.Lock()
|
||||||
|
defer cl.mu.Unlock()
|
||||||
|
st, ok := cl.log[consentKey{macHash: macHash, site: site}]
|
||||||
|
if !ok {
|
||||||
|
return "none_seen"
|
||||||
|
}
|
||||||
|
if st.consented {
|
||||||
|
return "post_consent"
|
||||||
|
}
|
||||||
|
if st.hasCMP {
|
||||||
|
return "pre_consent"
|
||||||
|
}
|
||||||
|
return "none_seen"
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── edge extraction: port of SocialGraph.response()+request() hook logic ──────
|
||||||
|
|
||||||
|
// socialEdge is one cross-site tracker edge, mirroring the kwargs the Python
|
||||||
|
// social.record_edge accepts; serialised straight into the ingest batch.
|
||||||
|
type socialEdge struct {
|
||||||
|
ClientMacHash string `json:"client_mac_hash"`
|
||||||
|
SrcSite string `json:"src_site"`
|
||||||
|
TrackerDomain string `json:"tracker_domain"`
|
||||||
|
CookieIDHashVal string `json:"cookie_id_hash_val"`
|
||||||
|
JA4Hash string `json:"ja4_hash,omitempty"`
|
||||||
|
ConsentState string `json:"consent_state"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// socialEdgesFor extracts the cross-site tracker edges for ONE MITM'd flow,
|
||||||
|
// mirroring SocialGraph.response() + the request-Cookie tail. Pure (no I/O): the
|
||||||
|
// caller emits the returned edges. macHash MUST be the WG persona hash (the
|
||||||
|
// addon only fires for known R3 peers — empty macHash yields no edges). reqHost
|
||||||
|
// is flow.request.host; reqURL is flow.request.pretty_url (for CMP loader
|
||||||
|
// detection); ja4 is the captured fingerprint (may be "").
|
||||||
|
//
|
||||||
|
// Decision logic, faithful to the addon:
|
||||||
|
// - src_site = registrableSocial(reqHost); skip if empty.
|
||||||
|
// - update the consent log for (macHash, src_site), derive consent_state.
|
||||||
|
// - Set-Cookie path (first 50): for each non-deny-listed cookie, tracker_domain
|
||||||
|
// = registrableSocial(Domain= attr OR reqHost); emit IFF tracker_domain != ""
|
||||||
|
// and != src_site (3rd-party).
|
||||||
|
// - Cookie path: only when a Referer/Origin context site exists and differs
|
||||||
|
// from the tracker (= registrableSocial(reqHost)); cap 5 Cookie headers ×
|
||||||
|
// 50 pairs; emit per non-deny-listed cookie with the context site's
|
||||||
|
// consent_state.
|
||||||
|
func socialEdgesFor(macHash string, req *http.Request, resp *http.Response, reqHost, reqURL, ja4 string, cl *consentLog) []socialEdge {
|
||||||
|
if macHash == "" || cl == nil {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
srcSite := registrableSocial(reqHost)
|
||||||
|
if srcSite == "" {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Gather the cookie blobs (both directions) for the CMP cookie check, then
|
||||||
|
// fold the consent observation BEFORE deriving consent_state (matches the
|
||||||
|
// addon's ordering: _update_consent_log then _consent_state_for).
|
||||||
|
var setCookies []string
|
||||||
|
if resp != nil {
|
||||||
|
setCookies = resp.Header.Values("Set-Cookie")
|
||||||
|
}
|
||||||
|
var reqCookies []string
|
||||||
|
if req != nil {
|
||||||
|
reqCookies = req.Header.Values("Cookie")
|
||||||
|
}
|
||||||
|
blobs := make([]string, 0, len(reqCookies)+len(setCookies))
|
||||||
|
blobs = append(blobs, reqCookies...)
|
||||||
|
blobs = append(blobs, setCookies...)
|
||||||
|
cl.update(macHash, srcSite, reqURL, blobs)
|
||||||
|
consentState := cl.stateFor(macHash, srcSite)
|
||||||
|
|
||||||
|
var edges []socialEdge
|
||||||
|
|
||||||
|
// Set-Cookie path — first 50 lines (matches the addon's [:50]).
|
||||||
|
for i, sc := range setCookies {
|
||||||
|
if i >= 50 {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
name, value, ok := parseSetCookieNameValue(sc)
|
||||||
|
if !ok || isDenyListed(name) {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
domainAttr := extractSetCookieDomainAttr(sc)
|
||||||
|
issuer := domainAttr
|
||||||
|
if issuer == "" {
|
||||||
|
issuer = reqHost
|
||||||
|
}
|
||||||
|
trackerDomain := registrableSocial(issuer)
|
||||||
|
if trackerDomain == "" || trackerDomain == srcSite {
|
||||||
|
continue // 1st-party Set-Cookie: not a cross-site tracker signal.
|
||||||
|
}
|
||||||
|
edges = append(edges, socialEdge{
|
||||||
|
ClientMacHash: macHash,
|
||||||
|
SrcSite: srcSite,
|
||||||
|
TrackerDomain: trackerDomain,
|
||||||
|
CookieIDHashVal: cookieIDHash(trackerDomain, name, value),
|
||||||
|
JA4Hash: ja4,
|
||||||
|
ConsentState: consentState,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// Request-Cookie path — only when this request is itself for a 3rd-party
|
||||||
|
// tracker and we have a differing 1st-party context from the Referer/Origin.
|
||||||
|
if len(reqCookies) == 0 {
|
||||||
|
return edges
|
||||||
|
}
|
||||||
|
trackerDomain := registrableSocial(reqHost)
|
||||||
|
if trackerDomain == "" {
|
||||||
|
return edges
|
||||||
|
}
|
||||||
|
ctxSite := srcSiteFromReferer(req)
|
||||||
|
if ctxSite == "" || ctxSite == trackerDomain {
|
||||||
|
return edges
|
||||||
|
}
|
||||||
|
ctxConsent := cl.stateFor(macHash, ctxSite)
|
||||||
|
for i, hdr := range reqCookies {
|
||||||
|
if i >= 5 { // addon caps Cookie headers at [:5]
|
||||||
|
break
|
||||||
|
}
|
||||||
|
pairs := parseCookieHeader(hdr)
|
||||||
|
for j, p := range pairs {
|
||||||
|
if j >= 50 { // and pairs at [:50]
|
||||||
|
break
|
||||||
|
}
|
||||||
|
if isDenyListed(p.name) {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
edges = append(edges, socialEdge{
|
||||||
|
ClientMacHash: macHash,
|
||||||
|
SrcSite: ctxSite,
|
||||||
|
TrackerDomain: trackerDomain,
|
||||||
|
CookieIDHashVal: cookieIDHash(trackerDomain, p.name, p.value),
|
||||||
|
JA4Hash: ja4,
|
||||||
|
ConsentState: ctxConsent,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return edges
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── relay: batch + POST to the portal /__toolbox/social-event ingest ─────────
|
||||||
|
|
||||||
|
const (
|
||||||
|
socialFlushInterval = 10 * time.Second // drain cadence (sibling of adFlushInterval)
|
||||||
|
socialBatchCap = 5000 // max edges held between flushes (drop excess)
|
||||||
|
)
|
||||||
|
|
||||||
|
// socialEventPayload mirrors the portal /__toolbox/social-event JSON contract.
|
||||||
|
type socialEventPayload struct {
|
||||||
|
Edges []socialEdge `json:"edges"`
|
||||||
|
}
|
||||||
|
|
||||||
|
func (p socialEventPayload) empty() bool { return len(p.Edges) == 0 }
|
||||||
|
|
||||||
|
// socialRelay buffers extracted edges and flushes them to the portal. Bounded:
|
||||||
|
// once the buffer holds socialBatchCap edges, NEW edges are dropped until the
|
||||||
|
// next flush clears it (a dead portal can never grow memory unbounded). Edges
|
||||||
|
// carry ONLY the cookieIDHash — never raw values (privacy/CSPN).
|
||||||
|
type socialRelay struct {
|
||||||
|
mu sync.Mutex
|
||||||
|
buf []socialEdge
|
||||||
|
}
|
||||||
|
|
||||||
|
func newSocialRelay() *socialRelay { return &socialRelay{} }
|
||||||
|
|
||||||
|
// add appends edges to the buffer under the cap. Never blocks the flow.
|
||||||
|
func (s *socialRelay) add(edges ...socialEdge) {
|
||||||
|
if len(edges) == 0 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
s.mu.Lock()
|
||||||
|
defer s.mu.Unlock()
|
||||||
|
for _, e := range edges {
|
||||||
|
if len(s.buf) >= socialBatchCap {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
s.buf = append(s.buf, e)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// snapshot atomically reads-and-clears the buffer.
|
||||||
|
func (s *socialRelay) snapshot() socialEventPayload {
|
||||||
|
s.mu.Lock()
|
||||||
|
defer s.mu.Unlock()
|
||||||
|
if len(s.buf) == 0 {
|
||||||
|
return socialEventPayload{}
|
||||||
|
}
|
||||||
|
p := socialEventPayload{Edges: s.buf}
|
||||||
|
s.buf = nil
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// socialEventClient is the short-timeout fire-and-forget client for the
|
||||||
|
// social-event POST (sibling of adEventClient). Never follows redirects (SSRF
|
||||||
|
// hygiene); tight timeout so a slow portal can't stall the flusher.
|
||||||
|
var socialEventClient = &http.Client{
|
||||||
|
Timeout: 5 * time.Second,
|
||||||
|
CheckRedirect: func(*http.Request, []*http.Request) error { return http.ErrUseLastResponse },
|
||||||
|
}
|
||||||
|
|
||||||
|
// flushOnce snapshots the buffer and, if non-empty, POSTs it to the portal's
|
||||||
|
// /__toolbox/social-event ingest. Best-effort: any error is swallowed with at
|
||||||
|
// most a log line — the engine must never block on the portal. Returns the
|
||||||
|
// flushed payload so the test can assert the snapshot/clear + shape.
|
||||||
|
func (s *socialRelay) flushOnce(portal string) socialEventPayload {
|
||||||
|
p := s.snapshot()
|
||||||
|
if p.empty() {
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
buf, err := json.Marshal(p)
|
||||||
|
if err != nil {
|
||||||
|
log.Printf("social-event marshal failed: %v", err)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
url := portalTargetURL(portal, "/__toolbox/social-event")
|
||||||
|
resp, err := socialEventClient.Post(url, "application/json", bytes.NewReader(buf))
|
||||||
|
if err != nil {
|
||||||
|
log.Printf("social-event post failed for %s: %v", url, err)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
resp.Body.Close()
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── proxy wiring ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
// socialEnabled reports whether cross-site correlation is on (--social-relay →
|
||||||
|
// Proxy.socialRelayOn, with the buffer + consent log allocated). Nil-safe so the
|
||||||
|
// CONNECT PoC / tests that build a bare Proxy can call it.
|
||||||
|
func (px *Proxy) socialEnabled() bool {
|
||||||
|
return px != nil && px.socialRelayOn && px.social != nil && px.consent != nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// emitSocial extracts the cross-site tracker edges for a MITM'd flow and buffers
|
||||||
|
// them for the batched portal POST. clientIP is the client's peer IP; the per-
|
||||||
|
// client identity is the WG persona hash (macHashOf) — NOT the raw-IP fallback,
|
||||||
|
// so non-WG flows produce no edges, exactly like the Python addon's
|
||||||
|
// _client_mac_hash gate. Gated, pure (the buffer.add is O(1) under a short
|
||||||
|
// mutex), never blocks the flow. reqURL feeds the CMP loader-fragment check.
|
||||||
|
func (px *Proxy) emitSocial(clientIP, host string, req *http.Request, resp *http.Response) {
|
||||||
|
if !px.socialEnabled() || req == nil {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
macHash := macHashOf(clientIP)
|
||||||
|
if macHash == "" {
|
||||||
|
return // known R3 WG peers only (addon: `if not mac_hash: return`)
|
||||||
|
}
|
||||||
|
reqURL := req.URL.String()
|
||||||
|
edges := socialEdgesFor(macHash, req, resp, host, reqURL, "", px.consent)
|
||||||
|
px.social.add(edges...)
|
||||||
|
}
|
||||||
|
|
||||||
|
// runFlusher is the background flusher goroutine: every socialFlushInterval it
|
||||||
|
// drains the buffer to the portal. Start once from main(); runs for the process
|
||||||
|
// lifetime.
|
||||||
|
func (s *socialRelay) runFlusher(portal string) {
|
||||||
|
t := time.NewTicker(socialFlushInterval)
|
||||||
|
defer t.Stop()
|
||||||
|
for range t.C {
|
||||||
|
s.flushOnce(portal)
|
||||||
|
}
|
||||||
|
}
|
||||||
297
packages/secubox-toolbox-ng/cmd/sbxmitm/social_test.go
Normal file
297
packages/secubox-toolbox-ng/cmd/sbxmitm/social_test.go
Normal file
|
|
@ -0,0 +1,297 @@
|
||||||
|
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||||
|
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||||
|
//
|
||||||
|
// Cross-engine SOCIAL parity + decision harness — Go side (#662).
|
||||||
|
//
|
||||||
|
// Anti-rig: loads testdata/social-cookie-id-fixtures.json (GENERATED by the real
|
||||||
|
// secubox_toolbox.social.cookie_id_hash) and asserts cookieIDHash reproduces
|
||||||
|
// every `expect` byte-for-byte — Python is the source of truth, exactly like the
|
||||||
|
// jar parity harness. The Python side is tests/test_social_parity.py.
|
||||||
|
//
|
||||||
|
// The rest exercises the ported decision surface: deny-list, registrableSocial
|
||||||
|
// (the addon flavour, NOT policy.registrable), the 3rd-party Set-Cookie + Cookie
|
||||||
|
// edge extraction, consent_state classification, and the relay buffer/flush.
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"encoding/json"
|
||||||
|
"net/http"
|
||||||
|
"net/http/httptest"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
type socialCookieFixture struct {
|
||||||
|
TrackerDomain string `json:"tracker_domain"`
|
||||||
|
CookieName string `json:"cookie_name"`
|
||||||
|
CookieValue string `json:"cookie_value"`
|
||||||
|
Expect string `json:"expect"`
|
||||||
|
Why string `json:"why"`
|
||||||
|
}
|
||||||
|
|
||||||
|
type socialCookieFile struct {
|
||||||
|
Fixtures []socialCookieFixture `json:"fixtures"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestCookieIDHashParity: cookieIDHash == the Python-generated expect for every
|
||||||
|
// fixture. This is the anti-rig that proves the Go hash is byte-identical to
|
||||||
|
// social.cookie_id_hash (lower-case domain+name, raw value, NUL separators).
|
||||||
|
func TestCookieIDHashParity(t *testing.T) {
|
||||||
|
dir := testdataDir(t)
|
||||||
|
raw, err := os.ReadFile(filepath.Join(dir, "social-cookie-id-fixtures.json"))
|
||||||
|
if err != nil {
|
||||||
|
t.Fatalf("read social fixtures: %v", err)
|
||||||
|
}
|
||||||
|
var f socialCookieFile
|
||||||
|
if err := json.Unmarshal(raw, &f); err != nil {
|
||||||
|
t.Fatalf("parse social fixtures: %v", err)
|
||||||
|
}
|
||||||
|
if len(f.Fixtures) == 0 {
|
||||||
|
t.Fatal("no social cookie-id fixtures")
|
||||||
|
}
|
||||||
|
for _, fx := range f.Fixtures {
|
||||||
|
got := cookieIDHash(fx.TrackerDomain, fx.CookieName, fx.CookieValue)
|
||||||
|
if got != fx.Expect {
|
||||||
|
t.Errorf("cookieIDHash(%q,%q,%q)=%q want %q (%s)",
|
||||||
|
fx.TrackerDomain, fx.CookieName, fx.CookieValue, got, fx.Expect, fx.Why)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestCookieIDHashFolding: domain+name are lower-cased but the value is NOT —
|
||||||
|
// the explicit invariant the store contract pins.
|
||||||
|
func TestCookieIDHashFolding(t *testing.T) {
|
||||||
|
if cookieIDHash("DoubleClick.NET", "IDE", "AbC") != cookieIDHash("doubleclick.net", "ide", "AbC") {
|
||||||
|
t.Error("domain+name must be case-folded")
|
||||||
|
}
|
||||||
|
if cookieIDHash("d.net", "n", "AbC") == cookieIDHash("d.net", "n", "abc") {
|
||||||
|
t.Error("value must NOT be case-folded")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestIsDenyListed(t *testing.T) {
|
||||||
|
deny := []string{"PHPSESSID", "session", " csrftoken ", "__cf_bm", "consent", "locale", "", " "}
|
||||||
|
for _, n := range deny {
|
||||||
|
if !isDenyListed(n) {
|
||||||
|
t.Errorf("isDenyListed(%q) = false, want true", n)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
allow := []string{"IDE", "_ga", "_fbp", "uid", "datr"}
|
||||||
|
for _, n := range allow {
|
||||||
|
if isDenyListed(n) {
|
||||||
|
t.Errorf("isDenyListed(%q) = true, want false", n)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestRegistrableSocial: the addon flavour — IP literals pass through (NOT ""),
|
||||||
|
// no port strip semantics needed, the larger multi-label table.
|
||||||
|
func TestRegistrableSocial(t *testing.T) {
|
||||||
|
cases := map[string]string{
|
||||||
|
"www.lemonde.fr": "lemonde.fr",
|
||||||
|
"cdn.api.example.co.uk": "example.co.uk",
|
||||||
|
"tracker.com": "tracker.com",
|
||||||
|
"a.b.c.doubleclick.net": "doubleclick.net",
|
||||||
|
"WWW.Example.COM": "example.com",
|
||||||
|
"sub.example.com.au": "example.com.au",
|
||||||
|
"192.168.1.1": "192.168.1.1", // IP literal as-is (addon), store drops later
|
||||||
|
".trailing.dot.net.": "dot.net",
|
||||||
|
"single": "single",
|
||||||
|
"": "",
|
||||||
|
}
|
||||||
|
for in, want := range cases {
|
||||||
|
if got := registrableSocial(in); got != want {
|
||||||
|
t.Errorf("registrableSocial(%q)=%q want %q", in, got, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestParseSetCookieNameValue(t *testing.T) {
|
||||||
|
cases := []struct {
|
||||||
|
in string
|
||||||
|
name, value string
|
||||||
|
ok bool
|
||||||
|
}{
|
||||||
|
{"IDE=AHWqTUm; Domain=.doubleclick.net; Path=/", "IDE", "AHWqTUm", true},
|
||||||
|
{" _ga = GA1.2.3 ; Max-Age=63", "_ga", "GA1.2.3", true},
|
||||||
|
{"Secure; HttpOnly", "", "", false},
|
||||||
|
{"=novalue", "", "", false},
|
||||||
|
{"empty=", "empty", "", true},
|
||||||
|
}
|
||||||
|
for _, c := range cases {
|
||||||
|
n, v, ok := parseSetCookieNameValue(c.in)
|
||||||
|
if n != c.name || v != c.value || ok != c.ok {
|
||||||
|
t.Errorf("parseSetCookieNameValue(%q)=(%q,%q,%v) want (%q,%q,%v)", c.in, n, v, ok, c.name, c.value, c.ok)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestExtractSetCookieDomainAttr(t *testing.T) {
|
||||||
|
cases := map[string]string{
|
||||||
|
"IDE=x; Domain=.doubleclick.net; Path=/": "doubleclick.net",
|
||||||
|
"a=b; domain=Example.COM": "example.com",
|
||||||
|
"a=b; Path=/": "",
|
||||||
|
"a=b": "",
|
||||||
|
"a=domainlike=1; Path=/": "", // value containing "domain" is not the attr
|
||||||
|
}
|
||||||
|
for in, want := range cases {
|
||||||
|
if got := extractSetCookieDomainAttr(in); got != want {
|
||||||
|
t.Errorf("extractSetCookieDomainAttr(%q)=%q want %q", in, got, want)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestSrcSiteFromReferer(t *testing.T) {
|
||||||
|
req := httptest.NewRequest("GET", "https://tracker.io/p.gif", nil)
|
||||||
|
if got := srcSiteFromReferer(req); got != "" {
|
||||||
|
t.Errorf("no referer → %q want \"\"", got)
|
||||||
|
}
|
||||||
|
req.Header.Set("Referer", "https://www.lemonde.fr/article?x=1")
|
||||||
|
if got := srcSiteFromReferer(req); got != "lemonde.fr" {
|
||||||
|
t.Errorf("referer → %q want lemonde.fr", got)
|
||||||
|
}
|
||||||
|
req.Header.Del("Referer")
|
||||||
|
req.Header.Set("Origin", "https://news.example.co.uk")
|
||||||
|
if got := srcSiteFromReferer(req); got != "example.co.uk" {
|
||||||
|
t.Errorf("origin fallback → %q want example.co.uk", got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// helper: build a response with the given Set-Cookie lines.
|
||||||
|
func respWithSetCookies(lines ...string) *http.Response {
|
||||||
|
h := http.Header{}
|
||||||
|
for _, l := range lines {
|
||||||
|
h.Add("Set-Cookie", l)
|
||||||
|
}
|
||||||
|
return &http.Response{Header: h}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestSocialEdgesThirdParty: a 3rd-party Set-Cookie (Domain= a different eTLD+1)
|
||||||
|
// on a 1st-party page yields one edge with the right src_site/tracker_domain.
|
||||||
|
func TestSocialEdgesThirdParty(t *testing.T) {
|
||||||
|
req := httptest.NewRequest("GET", "https://ads.doubleclick.net/pixel", nil)
|
||||||
|
resp := respWithSetCookies("IDE=AHWqTUm; Domain=.doubleclick.net; Path=/")
|
||||||
|
// reqHost is the responding host (doubleclick) — but src_site is also derived
|
||||||
|
// from it; so to model a TRUE 3rd-party we use the Domain attr differing from
|
||||||
|
// the request host's registrable. Here both are doubleclick.net → 1st-party,
|
||||||
|
// expect NO edge.
|
||||||
|
edges := socialEdgesFor("machash1", req, resp, "ads.doubleclick.net", "https://ads.doubleclick.net/pixel", "", newConsentLog())
|
||||||
|
if len(edges) != 0 {
|
||||||
|
t.Fatalf("1st-party Set-Cookie should yield 0 edges, got %d", len(edges))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Now a genuine 3rd-party: the page host is lemonde.fr, a Set-Cookie with
|
||||||
|
// Domain=.doubleclick.net (the embedded tracker setting on its own domain via
|
||||||
|
// the request being to doubleclick but src derived from referer is the
|
||||||
|
// request-cookie path; the Set-Cookie path uses reqHost as src). Model the
|
||||||
|
// addon's Set-Cookie path: reqHost=lemonde.fr, Domain attr=doubleclick.net.
|
||||||
|
resp2 := respWithSetCookies("IDE=AHWqTUm; Domain=.doubleclick.net; Path=/")
|
||||||
|
edges = socialEdgesFor("machash1", req, resp2, "www.lemonde.fr", "https://www.lemonde.fr/", "", newConsentLog())
|
||||||
|
if len(edges) != 1 {
|
||||||
|
t.Fatalf("3rd-party Set-Cookie should yield 1 edge, got %d", len(edges))
|
||||||
|
}
|
||||||
|
e := edges[0]
|
||||||
|
if e.SrcSite != "lemonde.fr" || e.TrackerDomain != "doubleclick.net" {
|
||||||
|
t.Errorf("edge src/tracker = %q/%q want lemonde.fr/doubleclick.net", e.SrcSite, e.TrackerDomain)
|
||||||
|
}
|
||||||
|
if e.CookieIDHashVal != cookieIDHash("doubleclick.net", "IDE", "AHWqTUm") {
|
||||||
|
t.Errorf("edge cookie id hash mismatch: %q", e.CookieIDHashVal)
|
||||||
|
}
|
||||||
|
if e.ConsentState != "none_seen" {
|
||||||
|
t.Errorf("consent_state = %q want none_seen", e.ConsentState)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestSocialEdgesDenyAndIP: deny-listed names produce no edge; IP-literal hosts
|
||||||
|
// produce no edge (registrableSocial returns the IP, store drops it — but src
|
||||||
|
// derivation: an IP src_site == IP tracker → not 3rd party anyway).
|
||||||
|
func TestSocialEdgesDenyAndIP(t *testing.T) {
|
||||||
|
req := httptest.NewRequest("GET", "https://x/", nil)
|
||||||
|
resp := respWithSetCookies("PHPSESSID=abc; Domain=.doubleclick.net")
|
||||||
|
edges := socialEdgesFor("m", req, resp, "www.lemonde.fr", "https://www.lemonde.fr/", "", newConsentLog())
|
||||||
|
if len(edges) != 0 {
|
||||||
|
t.Fatalf("deny-listed cookie should yield 0 edges, got %d", len(edges))
|
||||||
|
}
|
||||||
|
// empty mac hash → no edges (R3-only gate)
|
||||||
|
if e := socialEdgesFor("", req, respWithSetCookies("IDE=x; Domain=.doubleclick.net"), "www.lemonde.fr", "u", "", newConsentLog()); len(e) != 0 {
|
||||||
|
t.Fatalf("empty macHash should yield 0 edges, got %d", len(e))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestSocialEdgesRequestCookiePath: a request TO a tracker carrying a Cookie,
|
||||||
|
// with a Referer to a different 1st-party, yields an edge attributed to the
|
||||||
|
// referer's site.
|
||||||
|
func TestSocialEdgesRequestCookiePath(t *testing.T) {
|
||||||
|
req := httptest.NewRequest("GET", "https://ads.doubleclick.net/px", nil)
|
||||||
|
req.Header.Set("Cookie", "IDE=AHWqTUm; session=secret")
|
||||||
|
req.Header.Set("Referer", "https://www.lemonde.fr/article")
|
||||||
|
// No Set-Cookie in the response; src_site = registrableSocial(reqHost) =
|
||||||
|
// doubleclick.net; the Set-Cookie loop emits nothing; the request-Cookie tail
|
||||||
|
// uses ctxSite=lemonde.fr (referer) != tracker doubleclick.net → edge. The
|
||||||
|
// deny-listed `session` cookie is skipped, so exactly 1 edge (IDE).
|
||||||
|
edges := socialEdgesFor("m", req, &http.Response{Header: http.Header{}}, "ads.doubleclick.net", "https://ads.doubleclick.net/px", "", newConsentLog())
|
||||||
|
if len(edges) != 1 {
|
||||||
|
t.Fatalf("request-cookie path should yield 1 edge, got %d", len(edges))
|
||||||
|
}
|
||||||
|
if edges[0].SrcSite != "lemonde.fr" || edges[0].TrackerDomain != "doubleclick.net" {
|
||||||
|
t.Errorf("edge = %q/%q want lemonde.fr/doubleclick.net", edges[0].SrcSite, edges[0].TrackerDomain)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestConsentLog: loader fragment → pre_consent; CMP cookie → post_consent.
|
||||||
|
func TestConsentLog(t *testing.T) {
|
||||||
|
cl := newConsentLog()
|
||||||
|
if got := cl.stateFor("m", "lemonde.fr"); got != "none_seen" {
|
||||||
|
t.Errorf("fresh → %q want none_seen", got)
|
||||||
|
}
|
||||||
|
// CMP loader request observed (no consent cookie yet) → pre_consent.
|
||||||
|
cl.update("m", "lemonde.fr", "https://cdn.cookielaw.org/consent/scripttemplates/otSDKStub.js", nil)
|
||||||
|
if got := cl.stateFor("m", "lemonde.fr"); got != "pre_consent" {
|
||||||
|
t.Errorf("after CMP loader → %q want pre_consent", got)
|
||||||
|
}
|
||||||
|
// CMP consent cookie observed → post_consent.
|
||||||
|
cl.update("m", "lemonde.fr", "https://www.lemonde.fr/", []string{"OptanonConsent=isGpcEnabled=0; Path=/"})
|
||||||
|
if got := cl.stateFor("m", "lemonde.fr"); got != "post_consent" {
|
||||||
|
t.Errorf("after CMP cookie → %q want post_consent", got)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestSocialRelayFlush: the buffer batches edges and flushOnce POSTs them to the
|
||||||
|
// portal /__toolbox/social-event, then clears.
|
||||||
|
func TestSocialRelayFlush(t *testing.T) {
|
||||||
|
var got socialEventPayload
|
||||||
|
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||||
|
if r.URL.Path != "/__toolbox/social-event" {
|
||||||
|
t.Errorf("unexpected path %q", r.URL.Path)
|
||||||
|
}
|
||||||
|
_ = json.NewDecoder(r.Body).Decode(&got)
|
||||||
|
w.WriteHeader(204)
|
||||||
|
}))
|
||||||
|
defer srv.Close()
|
||||||
|
|
||||||
|
s := newSocialRelay()
|
||||||
|
s.add(socialEdge{ClientMacHash: "m", SrcSite: "a.fr", TrackerDomain: "t.com", CookieIDHashVal: "deadbeef", ConsentState: "none_seen"})
|
||||||
|
p := s.flushOnce(srv.URL)
|
||||||
|
if len(p.Edges) != 1 || len(got.Edges) != 1 {
|
||||||
|
t.Fatalf("flush sent %d / server got %d, want 1/1", len(p.Edges), len(got.Edges))
|
||||||
|
}
|
||||||
|
if got.Edges[0].TrackerDomain != "t.com" {
|
||||||
|
t.Errorf("server edge tracker = %q want t.com", got.Edges[0].TrackerDomain)
|
||||||
|
}
|
||||||
|
// Buffer cleared: a second flush sends nothing.
|
||||||
|
if p2 := s.flushOnce(srv.URL); !p2.empty() {
|
||||||
|
t.Errorf("second flush should be empty, got %d edges", len(p2.Edges))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// TestSocialRelayCap: the buffer never exceeds socialBatchCap.
|
||||||
|
func TestSocialRelayCap(t *testing.T) {
|
||||||
|
s := newSocialRelay()
|
||||||
|
for i := 0; i < socialBatchCap+100; i++ {
|
||||||
|
s.add(socialEdge{ClientMacHash: "m", SrcSite: "a", TrackerDomain: "t", CookieIDHashVal: "h", ConsentState: "none_seen"})
|
||||||
|
}
|
||||||
|
if got := s.snapshot(); len(got.Edges) != socialBatchCap {
|
||||||
|
t.Errorf("buffer held %d edges, want cap %d", len(got.Edges), socialBatchCap)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -389,7 +389,11 @@ func (px *Proxy) handleTransparent(client net.Conn) {
|
||||||
// over a replayable conn, then run the shared pipeline dialling the captured
|
// over a replayable conn, then run the shared pipeline dialling the captured
|
||||||
// original-dst (NOT the SNI).
|
// original-dst (NOT the SNI).
|
||||||
replay := &prefixConn{prefix: hello, Conn: client}
|
replay := &prefixConn{prefix: hello, Conn: client}
|
||||||
tconn := tls.Server(replay, px.serverTLSConfig())
|
// The capture hook relays the ja4 ClientHello payload for this handshake,
|
||||||
|
// tagged with the REAL transparent peer IP from the raw client conn (#662).
|
||||||
|
// nil when the relay gate is off. Emitted around Decide → blocked/allowed
|
||||||
|
// alike, matching the Python addon's per-tls_clienthello behaviour.
|
||||||
|
tconn := tls.Server(replay, px.serverTLSConfigCapture(px.captureAndEmitJA4(client)))
|
||||||
if err := tconn.Handshake(); err != nil {
|
if err := tconn.Handshake(); err != nil {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,58 @@
|
||||||
|
secubox-toolbox-ng (0.1.14-1~bookworm1) bookworm; urgency=medium
|
||||||
|
|
||||||
|
* quic/banner: strip Alt-Svc response header so browsers stop learning/preferring
|
||||||
|
HTTP/3 (h3) and stay on HTTP/2-over-TCP (MITM-able). Complements the nft
|
||||||
|
udp443 reject; addresses sites where browsers ignore the reject and keep
|
||||||
|
retrying QUIC, bypassing inject/adblock/metrics. (ref #662)
|
||||||
|
|
||||||
|
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 14:30:00 +0000
|
||||||
|
|
||||||
|
secubox-toolbox-ng (0.1.13-1~bookworm1) bookworm; urgency=medium
|
||||||
|
|
||||||
|
* banner: INLINE the banner (server-side bundle fetch, baked literals) instead
|
||||||
|
of <script src>/fetch — defeats site service workers that intercept the
|
||||||
|
same-origin /__toolbox/* requests (leparisien, cnn). Fail-open. (ref #662)
|
||||||
|
|
||||||
|
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 13:15:00 +0000
|
||||||
|
|
||||||
|
secubox-toolbox-ng (0.1.12-1~bookworm1) bookworm; urgency=medium
|
||||||
|
|
||||||
|
* adlearn: live-reload the blocklist (mtime) so promotions/edits block without
|
||||||
|
a worker restart; emit ad-candidates (3rd-party ad-path) to the portal;
|
||||||
|
autolearn also promotes cross-site trackers from social_edges. Learned
|
||||||
|
trackers are auto-204 + poison-smogged. (ref #662)
|
||||||
|
|
||||||
|
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 12:30:00 +0000
|
||||||
|
|
||||||
|
secubox-toolbox-ng (0.1.11-1~bookworm1) bookworm; urgency=medium
|
||||||
|
|
||||||
|
* social: ALSO correlate on the block path — blocked 3rd-party trackers still
|
||||||
|
carry the browser's request Cookie (the cross-site evidence); without this
|
||||||
|
the /social graph misses the very trackers it exists to expose (they're 204'd
|
||||||
|
before the allow/mitm correlation). resp=nil request-only, hash-only. (ref #662)
|
||||||
|
|
||||||
|
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 11:55:00 +0000
|
||||||
|
|
||||||
|
secubox-toolbox-ng (0.1.10-1~bookworm1) bookworm; urgency=medium
|
||||||
|
|
||||||
|
* social: faithfully port the in-process social_graph correlation — the engine
|
||||||
|
computes cross-site tracker edges (byte-exact cookie_id_hash, deny-list,
|
||||||
|
eTLD+1 3rd-party check, CMP consent_state) and relays HASH-ONLY edges
|
||||||
|
(never raw values, WG-only) to the new portal /__toolbox/social-event →
|
||||||
|
social.record_edge → /social graph un-frozen. --social-relay (default on). (ref #662)
|
||||||
|
|
||||||
|
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 11:30:00 +0000
|
||||||
|
|
||||||
|
secubox-toolbox-ng (0.1.9-1~bookworm1) bookworm; urgency=medium
|
||||||
|
|
||||||
|
* telemetry: relay per-flow metadata to the analysis sidecars (dpi /classify,
|
||||||
|
cookies /inject, threat-analyst /ja4) — restoring the kbin "Qui te piste?"
|
||||||
|
events frozen since the Phase-7 cutover. Fire-and-forget, names-only cookies,
|
||||||
|
gated --analysis-relay (default on). The sidecars enrich + write toolbox
|
||||||
|
events → cumulative-stats live again with real host classification. (ref #662)
|
||||||
|
|
||||||
|
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 10:40:00 +0000
|
||||||
|
|
||||||
secubox-toolbox-ng (0.1.8-1~bookworm1) bookworm; urgency=medium
|
secubox-toolbox-ng (0.1.8-1~bookworm1) bookworm; urgency=medium
|
||||||
|
|
||||||
* demo/csp: only relax + flag 🔓 when the page's effective script directive
|
* demo/csp: only relax + flag 🔓 when the page's effective script directive
|
||||||
|
|
|
||||||
61
packages/secubox-toolbox-ng/testdata/social-cookie-id-fixtures.json
vendored
Normal file
61
packages/secubox-toolbox-ng/testdata/social-cookie-id-fixtures.json
vendored
Normal file
|
|
@ -0,0 +1,61 @@
|
||||||
|
{
|
||||||
|
"_comment": "Cross-engine parity fixtures for social.cookie_id_hash (#662). GENERATED by the real secubox_toolbox.social.cookie_id_hash (Python = source of truth); the Go cookieIDHash MUST reproduce every `expect` byte-for-byte. Note: tracker_domain + cookie_name are LOWER-cased before hashing, the cookie_value is NOT; NUL (0x00) separators; UTF-8 with 'replace' errors. See tests/test_social_parity.py (Python) ↔ social_test.go (Go).",
|
||||||
|
"fixtures": [
|
||||||
|
{
|
||||||
|
"tracker_domain": "doubleclick.net",
|
||||||
|
"cookie_name": "IDE",
|
||||||
|
"cookie_value": "AHWqTUm123",
|
||||||
|
"expect": "8e7fadaeb2584768",
|
||||||
|
"why": "plain ascii"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"tracker_domain": "DoubleClick.NET",
|
||||||
|
"cookie_name": "ide",
|
||||||
|
"cookie_value": "AHWqTUm123",
|
||||||
|
"expect": "8e7fadaeb2584768",
|
||||||
|
"why": "domain+name UPPER folded, value verbatim -> identical hash to #1 (proves domain+name are lower-cased)"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"tracker_domain": "doubleclick.net",
|
||||||
|
"cookie_name": "IDE",
|
||||||
|
"cookie_value": "ahwqtum123",
|
||||||
|
"expect": "550317c9729652c2",
|
||||||
|
"why": "value lower-cased DIFFERS from #1 (proves the VALUE is NOT folded)"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"tracker_domain": "ads.example.com",
|
||||||
|
"cookie_name": "_ga",
|
||||||
|
"cookie_value": "GA1.2.999.111",
|
||||||
|
"expect": "89a398ebd72ee863",
|
||||||
|
"why": "GA cookie"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"tracker_domain": "tracker.io",
|
||||||
|
"cookie_name": "uid",
|
||||||
|
"cookie_value": "Ünîcødé✓",
|
||||||
|
"expect": "3b4923e9d9bb77a2",
|
||||||
|
"why": "unicode value (utf-8 encoded)"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"tracker_domain": "tracker.io",
|
||||||
|
"cookie_name": "Ünîcödé",
|
||||||
|
"cookie_value": "val",
|
||||||
|
"expect": "d4db5a0d71216313",
|
||||||
|
"why": "unicode cookie NAME (lower-cased + utf-8)"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"tracker_domain": "",
|
||||||
|
"cookie_name": "x",
|
||||||
|
"cookie_value": "y",
|
||||||
|
"expect": "2081f4f26135019e",
|
||||||
|
"why": "empty domain still hashes (NUL separators)"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"tracker_domain": "d.net",
|
||||||
|
"cookie_name": "n",
|
||||||
|
"cookie_value": "",
|
||||||
|
"expect": "b0da6b889cb198a1",
|
||||||
|
"why": "empty value"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
@ -1,3 +1,19 @@
|
||||||
|
secubox-toolbox (2.7.0-1~bookworm1) bookworm; urgency=medium
|
||||||
|
|
||||||
|
* MIDDLE RELEASE — caps the 2.6.x line (ad-intelligence / Anti-Track v2 /
|
||||||
|
anti-bot uTLS) and opens the kbin "first tool of the Swiss-army cyber kit"
|
||||||
|
chapter. kbin now delivers: transparent performance, full-MITM encrypted
|
||||||
|
inspection, ad poison/smog injection, the adware-ban transparency banner,
|
||||||
|
and safe browsing.
|
||||||
|
* docs: kbin use-case consolidated — wiki `Kbin-Toolbox.md`, `FAQ-KBIN-TOR.md`,
|
||||||
|
README positioning blurb.
|
||||||
|
* plan(#683): next chapter staged — kbin Tor endpoint, a quick-switch that
|
||||||
|
re-routes consenting client surfing through Tor (outbound egress, pseudo-
|
||||||
|
network) so the kbin exit is anonymized. Design spec landed; no behaviour
|
||||||
|
change yet (default OFF, fail-closed by design).
|
||||||
|
|
||||||
|
-- Gerald KERMA <devel@cybermind.fr> Fri, 19 Jun 2026 11:00:00 +0200
|
||||||
|
|
||||||
secubox-toolbox (2.6.59-1~bookworm1) bookworm; urgency=medium
|
secubox-toolbox (2.6.59-1~bookworm1) bookworm; urgency=medium
|
||||||
|
|
||||||
* ui: cap all admin dashboard lists to top-5 shown — #filtres bypass hosts,
|
* ui: cap all admin dashboard lists to top-5 shown — #filtres bypass hosts,
|
||||||
|
|
|
||||||
|
|
@ -51,13 +51,19 @@ table inet wg-toolbox {
|
||||||
|
|
||||||
chain forward {
|
chain forward {
|
||||||
type filter hook forward priority filter; policy accept;
|
type filter hook forward priority filter; policy accept;
|
||||||
|
# Phase 6.K / #662 — drop UDP 443 (QUIC/HTTP3) FIRST, before the blanket
|
||||||
|
# outbound accept below. If it sits AFTER the accept it is never reached
|
||||||
|
# (the accept terminates evaluation) → QUIC slips through and the whole
|
||||||
|
# MITM is bypassed (no inject, no ad-block, no metrics, no social). The
|
||||||
|
# REJECT (not drop) forces Chrome/Firefox to fall back to HTTP/2 over TCP
|
||||||
|
# IMMEDIATELY: a silent drop just makes the browser RETRY QUIC for tens of
|
||||||
|
# seconds (observed 199 retry packets, never falling back) — an ICMP
|
||||||
|
# port-unreachable tells it "no QUIC here" at once. First in the chain so
|
||||||
|
# it also breaks existing QUIC sessions (outbound). ORDER IS LOAD-BEARING.
|
||||||
|
iif "wg-toolbox" udp dport 443 counter reject
|
||||||
# Outbound from tunnel → internet
|
# Outbound from tunnel → internet
|
||||||
iif "wg-toolbox" oif "lan0" accept
|
iif "wg-toolbox" oif "lan0" accept
|
||||||
# Return traffic
|
# Return traffic
|
||||||
iif "lan0" oif "wg-toolbox" ct state established,related accept
|
iif "lan0" oif "wg-toolbox" ct state established,related accept
|
||||||
# Phase 6.K — drop UDP 443 (QUIC/HTTP3) so browsers fall back to
|
|
||||||
# HTTP/2 over TCP, which our DNAT can intercept. Without this,
|
|
||||||
# Chrome/Firefox prefer QUIC and bypass mitm entirely.
|
|
||||||
iif "wg-toolbox" udp dport 443 counter drop
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -221,6 +221,92 @@ def _ad_feed() -> int:
|
||||||
return len(promoted)
|
return len(promoted)
|
||||||
|
|
||||||
|
|
||||||
|
# #662 — cross-site-reuse promotion. A tracker_domain seen issuing cookies on
|
||||||
|
# >= SOCIAL_MIN_SITES DISTINCT src_site (across peers, recent window) is a
|
||||||
|
# BEHAVIOURALLY-confirmed cross-site tracker (the social graph), independent of
|
||||||
|
# the ad-path heuristic. Promote it into learned-trackers.txt so the engine
|
||||||
|
# blocks (204) + smogs it. Conservative + reuses the SAME allowlist/self guard as
|
||||||
|
# _ad_feed (NEVER promote allowlisted or self domains). De-dups against OUT.
|
||||||
|
SOCIAL_MIN_SITES = int(os.environ.get("SECUBOX_SOCIAL_MIN_SITES", "3"))
|
||||||
|
SOCIAL_WINDOW_HOURS = int(os.environ.get("SECUBOX_SOCIAL_WINDOW_HOURS", "168"))
|
||||||
|
|
||||||
|
|
||||||
|
def _social_feed() -> int:
|
||||||
|
"""Promote cross-site cookie-reuse trackers (social_edges) into the learned
|
||||||
|
blocklist. A tracker_domain linking >= SOCIAL_MIN_SITES distinct src_site in
|
||||||
|
the last SOCIAL_WINDOW_HOURS is promoted. Allowlist + self domains excluded
|
||||||
|
(reused guard). MERGES into OUT (never overwrites). Returns count promoted, or
|
||||||
|
-1 if unavailable (e.g. no social_edges table). Best-effort: never raises."""
|
||||||
|
cutoff = int(time.time()) - SOCIAL_WINDOW_HOURS * 3600
|
||||||
|
try:
|
||||||
|
con = sqlite3.connect(DB, timeout=5)
|
||||||
|
rows = con.execute(
|
||||||
|
"SELECT tracker_domain, COUNT(DISTINCT src_site) AS sites "
|
||||||
|
"FROM social_edges WHERE ts >= ? "
|
||||||
|
"GROUP BY tracker_domain", (cutoff,)).fetchall()
|
||||||
|
con.close()
|
||||||
|
except Exception as e:
|
||||||
|
sys.stderr.write(f"autolearn: social query failed: {e}\n")
|
||||||
|
return -1
|
||||||
|
# Fold to registrable and aggregate the distinct-site count per eTLD+1 (two
|
||||||
|
# tracker subdomains of the same registrable jointly meet the threshold).
|
||||||
|
by_reg: dict[str, set] = {}
|
||||||
|
try:
|
||||||
|
scon = sqlite3.connect(DB, timeout=5)
|
||||||
|
for td, _sites in rows:
|
||||||
|
reg = registrable(td)
|
||||||
|
if not reg:
|
||||||
|
continue
|
||||||
|
ss = by_reg.setdefault(reg, set())
|
||||||
|
for (s,) in scon.execute(
|
||||||
|
"SELECT DISTINCT src_site FROM social_edges "
|
||||||
|
"WHERE ts >= ? AND tracker_domain = ?", (cutoff, td)):
|
||||||
|
if s:
|
||||||
|
ss.add(s)
|
||||||
|
scon.close()
|
||||||
|
except Exception as e:
|
||||||
|
sys.stderr.write(f"autolearn: social fold failed: {e}\n")
|
||||||
|
return -1
|
||||||
|
|
||||||
|
allow = _load_ad_allowlist()
|
||||||
|
self_doms = {d.strip().lower() for d in
|
||||||
|
os.environ.get("SECUBOX_SELF_DOMAINS", "secubox.in").split(",")
|
||||||
|
if d.strip()}
|
||||||
|
promoted: set = set()
|
||||||
|
for reg, sites in by_reg.items():
|
||||||
|
if len(sites) < SOCIAL_MIN_SITES:
|
||||||
|
continue
|
||||||
|
if reg in allow:
|
||||||
|
continue
|
||||||
|
if reg in self_doms or any(reg == d or reg.endswith("." + d) for d in self_doms):
|
||||||
|
continue
|
||||||
|
promoted.add(reg)
|
||||||
|
if not promoted:
|
||||||
|
return 0
|
||||||
|
existing: set = set()
|
||||||
|
try:
|
||||||
|
if os.path.exists(OUT):
|
||||||
|
with open(OUT, encoding="utf-8") as fh:
|
||||||
|
for ln in fh:
|
||||||
|
ln = ln.strip()
|
||||||
|
if ln:
|
||||||
|
existing.add(ln)
|
||||||
|
except Exception as e:
|
||||||
|
sys.stderr.write(f"autolearn: social merge read failed: {e}\n")
|
||||||
|
new = promoted - existing
|
||||||
|
merged = sorted(existing | promoted)[:MAX_ENTRIES]
|
||||||
|
try:
|
||||||
|
os.makedirs(os.path.dirname(OUT), exist_ok=True)
|
||||||
|
tmp = OUT + ".tmp"
|
||||||
|
with open(tmp, "w", encoding="utf-8") as fh:
|
||||||
|
fh.write("\n".join(merged) + ("\n" if merged else ""))
|
||||||
|
os.replace(tmp, OUT)
|
||||||
|
except Exception as e:
|
||||||
|
sys.stderr.write(f"autolearn: social write failed: {e}\n")
|
||||||
|
return -1
|
||||||
|
return len(new)
|
||||||
|
|
||||||
|
|
||||||
def main() -> int:
|
def main() -> int:
|
||||||
learned: set[str] = set()
|
learned: set[str] = set()
|
||||||
try:
|
try:
|
||||||
|
|
@ -317,6 +403,11 @@ def main() -> int:
|
||||||
sys.stderr.write(f"autolearn: {_n_ad} ad-candidate hosts promoted\n")
|
sys.stderr.write(f"autolearn: {_n_ad} ad-candidate hosts promoted\n")
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
sys.stderr.write(f"autolearn: ad feed error: {e}\n")
|
sys.stderr.write(f"autolearn: ad feed error: {e}\n")
|
||||||
|
try:
|
||||||
|
_n_social = _social_feed()
|
||||||
|
sys.stderr.write(f"autolearn: {_n_social} cross-site cookie trackers promoted\n")
|
||||||
|
except Exception as e:
|
||||||
|
sys.stderr.write(f"autolearn: social feed error: {e}\n")
|
||||||
sys.stderr.write(
|
sys.stderr.write(
|
||||||
f"autolearn: {len(out)} hosts learned ({ti} threat-intel + "
|
f"autolearn: {len(out)} hosts learned ({ti} threat-intel + "
|
||||||
f"{len(out) - ti} classified cross-site) @ {int(time.time())}"
|
f"{len(out) - ti} classified cross-site) @ {int(time.time())}"
|
||||||
|
|
|
||||||
|
|
@ -57,10 +57,14 @@ router = APIRouter(tags=["toolbox"])
|
||||||
@router.get("/__toolbox/loader.js")
|
@router.get("/__toolbox/loader.js")
|
||||||
async def toolbox_loader_js() -> Response:
|
async def toolbox_loader_js() -> Response:
|
||||||
"""Static cosmetic loader (applies the banner client-side from the bundle)."""
|
"""Static cosmetic loader (applies the banner client-side from the bundle)."""
|
||||||
|
# no-store: the loader is the banner entry point and evolves (SPA re-assert,
|
||||||
|
# CSP proof, …). A long cache (was max-age=3600) pins stale loaders in clients
|
||||||
|
# for up to an hour — so loader changes never reach already-visited sites. It's
|
||||||
|
# 4 KB; serve it fresh every load so updates propagate immediately.
|
||||||
return Response(
|
return Response(
|
||||||
content=bundlemod.LOADER_JS,
|
content=bundlemod.LOADER_JS,
|
||||||
media_type="application/javascript",
|
media_type="application/javascript",
|
||||||
headers={"Cache-Control": "public, max-age=3600"},
|
headers={"Cache-Control": "no-store, no-cache, must-revalidate, max-age=0"},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -74,6 +78,31 @@ async def toolbox_bundle(mh: str = Query(default=""), wg: int = Query(default=0)
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@router.get("/__toolbox/inline")
|
||||||
|
async def toolbox_inline(
|
||||||
|
mh: str = Query(default=""),
|
||||||
|
wg: int = Query(default=0),
|
||||||
|
csp: int = Query(default=0),
|
||||||
|
) -> Response:
|
||||||
|
"""#662 — COMPLETE self-contained inline banner script BODY.
|
||||||
|
|
||||||
|
Sites with a SERVICE WORKER (leparisien, cnn…) intercept every same-origin
|
||||||
|
request, so the legacy ``<script src="/__toolbox/loader.js">`` + its
|
||||||
|
``fetch("/__toolbox/bundle")`` are hijacked by the SW (404 / app-shell)
|
||||||
|
before reaching our MITM engine → no banner. The Go engine fetches THIS
|
||||||
|
body server-side at inject time and bakes it into a self-contained
|
||||||
|
``<script>…</script>`` — no same-origin fetch for the SW to touch.
|
||||||
|
|
||||||
|
``mh`` / ``wg`` / ``csp`` come from the query params (baked as JS literals,
|
||||||
|
not data-attrs / currentScript); the bundle is ``get_bundle(mh, wg)`` baked
|
||||||
|
as a JSON literal (not fetched). no-store like the loader (it evolves)."""
|
||||||
|
return Response(
|
||||||
|
content=bundlemod.inline_script(mh, bool(wg), bool(csp)),
|
||||||
|
media_type="application/javascript",
|
||||||
|
headers={"Cache-Control": "no-store, no-cache, must-revalidate, max-age=0"},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
# #662 — ad-block metrics ingest from the Go MITM engine (sbxmitm). The #662
|
# #662 — ad-block metrics ingest from the Go MITM engine (sbxmitm). The #662
|
||||||
# cutover moved the BLOCK decision (204 on ad/tracker hosts) into the Go engine
|
# cutover moved the BLOCK decision (204 on ad/tracker hosts) into the Go engine
|
||||||
# but left the METRICS unported, so the #ads dashboard froze. The engine now
|
# but left the METRICS unported, so the #ads dashboard froze. The engine now
|
||||||
|
|
@ -109,12 +138,20 @@ async def toolbox_ad_event(request: Request) -> Response:
|
||||||
return Response(status_code=204)
|
return Response(status_code=204)
|
||||||
blocks = body.get("blocks") or []
|
blocks = body.get("blocks") or []
|
||||||
clients = body.get("clients") or []
|
clients = body.get("clients") or []
|
||||||
|
# #662 — the Go engine now also feeds the AUTO-LEARN loop: 3rd-party
|
||||||
|
# ad-path requests it saw on the allow/mitm path (ad_ghost's _AD_PATH
|
||||||
|
# heuristic), recorded as candidates here for secubox-toolbox-autolearn
|
||||||
|
# to promote into learned-trackers.txt at AD_MIN_SITES distinct sites.
|
||||||
|
candidates = body.get("candidates") or []
|
||||||
if not isinstance(blocks, list):
|
if not isinstance(blocks, list):
|
||||||
blocks = []
|
blocks = []
|
||||||
if not isinstance(clients, list):
|
if not isinstance(clients, list):
|
||||||
clients = []
|
clients = []
|
||||||
|
if not isinstance(candidates, list):
|
||||||
|
candidates = []
|
||||||
blocks = blocks[:_AD_EVENT_ROW_CAP]
|
blocks = blocks[:_AD_EVENT_ROW_CAP]
|
||||||
clients = clients[:_AD_EVENT_ROW_CAP]
|
clients = clients[:_AD_EVENT_ROW_CAP]
|
||||||
|
candidates = candidates[:_AD_EVENT_ROW_CAP]
|
||||||
|
|
||||||
block_rows = [
|
block_rows = [
|
||||||
(b["ad_host"], b.get("site", ""), "block", int(b.get("hits", 0)), int(b.get("bytes", 0)))
|
(b["ad_host"], b.get("site", ""), "block", int(b.get("hits", 0)), int(b.get("bytes", 0)))
|
||||||
|
|
@ -126,14 +163,102 @@ async def toolbox_ad_event(request: Request) -> Response:
|
||||||
for c in clients
|
for c in clients
|
||||||
if isinstance(c, dict) and c.get("mac_hash") and c.get("ad_host")
|
if isinstance(c, dict) and c.get("mac_hash") and c.get("ad_host")
|
||||||
]
|
]
|
||||||
|
cand_rows = [
|
||||||
|
(c["host"], c.get("site", ""), int(c.get("hits", 0)))
|
||||||
|
for c in candidates
|
||||||
|
if isinstance(c, dict) and c.get("host")
|
||||||
|
]
|
||||||
if block_rows:
|
if block_rows:
|
||||||
store.record_ad_blocks(block_rows)
|
store.record_ad_blocks(block_rows)
|
||||||
if client_rows:
|
if client_rows:
|
||||||
store.record_ad_client_blocks(client_rows)
|
store.record_ad_client_blocks(client_rows)
|
||||||
|
if cand_rows:
|
||||||
|
store.record_ad_candidates(cand_rows)
|
||||||
except Exception as e: # never raise into the engine's fire-and-forget POST
|
except Exception as e: # never raise into the engine's fire-and-forget POST
|
||||||
log.debug("ad-event ingest failed: %s", e)
|
log.debug("ad-event ingest failed: %s", e)
|
||||||
return Response(status_code=204)
|
return Response(status_code=204)
|
||||||
|
|
||||||
|
|
||||||
|
# #662 — cross-site cookie-tracker edge ingest from the Go MITM engine (sbxmitm).
|
||||||
|
# The #662 Phase-7 cutover decommissioned the in-process Python social_graph addon
|
||||||
|
# that fed social.record_edge(), so the kbin /social graph (social_edges →
|
||||||
|
# social_nodes/social_links) froze. The engine now computes the SAME 3rd-party
|
||||||
|
# cookie-tracker edges (FAITHFUL port of social_graph.py: deny-list, eTLD+1
|
||||||
|
# 3rd-party check, cookie_id_hash, CMP consent_state) and POSTs a batch here. We
|
||||||
|
# call social.record_edge() per row, which writes raw social_edges; the existing
|
||||||
|
# app.py social_fold_loop folds them into nodes/links.
|
||||||
|
#
|
||||||
|
# Raw cookie VALUES never reach this endpoint — only the truncated cookie_id_hash
|
||||||
|
# (privacy/CSPN; this is exactly why the original ran in-process).
|
||||||
|
#
|
||||||
|
# UNAUTHENTICATED, same trust note as /__toolbox/ad-event: the engine reaches the
|
||||||
|
# portal only over the R3 nft perimeter (loopback / WG ingress).
|
||||||
|
_SOCIAL_EVENT_ROW_CAP = 5000 # bound the edge list so a misbehaving engine can't flood us
|
||||||
|
_SOCIAL_FOLD_DEBOUNCE = 60 # seconds: floor between in-handler safety folds
|
||||||
|
_social_last_fold = 0.0 # module-level throttle timestamp
|
||||||
|
|
||||||
|
|
||||||
|
@router.post("/__toolbox/social-event")
|
||||||
|
async def toolbox_social_event(request: Request) -> Response:
|
||||||
|
"""Ingest a batch of cross-site tracker edges from the Go engine. Best-effort:
|
||||||
|
never 500s the engine (it is fire-and-forget) — always returns 204. See the
|
||||||
|
trust note above for why this is unauthenticated."""
|
||||||
|
global _social_last_fold
|
||||||
|
try:
|
||||||
|
# Body-size guard BEFORE parsing (mirrors /__toolbox/ad-event): the legit
|
||||||
|
# payload (≤5000 edges) is well under 2 MB; reject larger outright so a
|
||||||
|
# misbehaving/compromised WG peer can't pressure portal memory.
|
||||||
|
try:
|
||||||
|
clen = int(request.headers.get("content-length") or 0)
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
clen = 0
|
||||||
|
if clen > 2 * 1024 * 1024:
|
||||||
|
return Response(status_code=204)
|
||||||
|
body = await request.json()
|
||||||
|
if not isinstance(body, dict):
|
||||||
|
return Response(status_code=204)
|
||||||
|
edges = body.get("edges") or []
|
||||||
|
if not isinstance(edges, list):
|
||||||
|
edges = []
|
||||||
|
edges = edges[:_SOCIAL_EVENT_ROW_CAP]
|
||||||
|
|
||||||
|
from . import social as _social
|
||||||
|
|
||||||
|
recorded = 0
|
||||||
|
for e in edges:
|
||||||
|
if not isinstance(e, dict):
|
||||||