mirror of
https://github.com/CyberMind-FR/secubox-deb.git
synced 2026-06-29 19:43:10 +00:00
Compare commits
23 Commits
ded89934d0
...
4c6777dc68
| Author | SHA1 | Date | |
|---|---|---|---|
| 4c6777dc68 | |||
|
|
1a315317e7 | ||
|
|
04598482fb | ||
| be0497e6de | |||
| 7db7a73d65 | |||
| 3ade5619d0 | |||
| a48f43607b | |||
|
|
27ba48c1a1 | ||
| c04a9d0c1c | |||
| 3009ef93d9 | |||
|
|
78ad554ece | ||
| 895356dc00 | |||
| 4063ae1a95 | |||
|
|
77da033371 | ||
| 3850da5479 | |||
| 040e460876 | |||
| 55f9e4c803 | |||
| 257fc95182 | |||
|
|
591106ec65 | ||
|
|
15a668829b | ||
| 73b8ad36b1 | |||
| d0db3e87fd | |||
| 05c659b4ca |
|
|
@ -3,6 +3,27 @@
|
|||
|
||||
---
|
||||
|
||||
## 2026-06-19 — kbin milestone: ToolBoX 2.7.0 (middle release) + Tor chapter staged (#683)
|
||||
|
||||
- **End-of-session checkpoint** — docs + positioning + version, no runtime behaviour change.
|
||||
- **`secubox-toolbox` 2.6.59 → 2.7.0** (middle release) — caps the 2.6.x line
|
||||
(ad-intelligence / Anti-Track v2 / anti-bot uTLS #662) and opens the **kbin** chapter:
|
||||
kbin (`kbin.gk2.secubox.in`, the public ToolBoX portal) framed as the *first tool of the
|
||||
CyberMind Swiss-army cyber kit* — transparent performance, full-encrypted MITM inspection,
|
||||
ad poison/smog injection, adware-ban transparency banner, safe browsing.
|
||||
- **Docs** — new wiki use-case `docs/wiki/Kbin-Toolbox.md`, `docs/FAQ-KBIN-TOR.md`,
|
||||
README positioning blurb.
|
||||
- **Plan #683 (issue + spec)** — kbin **Tor endpoint**: a quick-switch re-routing consenting
|
||||
client surfing through Tor (outbound egress, pseudo-network) so the kbin exit is anonymized.
|
||||
Spec `docs/superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md`. Invariants:
|
||||
inspection preserved (Tor after the forging core), fail-closed, opt-in/default-OFF, no DNS
|
||||
leak, CSPN audit-logged. Opposite direction of `secubox-exposure` (inbound hidden services);
|
||||
reuses its Tor control. Depends on the #662 Go core for the preferred SOCKS5-dialer transport.
|
||||
- **Caveat recorded** — Tor mode must force `tls_splice` (#649) OFF per-client or asset flows
|
||||
leak the real IP.
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-19 — #662 anti-bot: Chrome TLS fingerprint (uTLS) — defeat DataDome without splice (PR #674)
|
||||
|
||||
- lemonde.fr (DataDome) blocked R3 navigation at the 2nd level: the engine re-origined
|
||||
|
|
|
|||
|
|
@ -1,10 +1,26 @@
|
|||
# TODO — SecuBox-DEB Backlog
|
||||
*Mis à jour : 2026-06-13*
|
||||
*Mis à jour : 2026-06-19*
|
||||
|
||||
---
|
||||
|
||||
## 🔥 P0 — Immediate (in flight)
|
||||
|
||||
### kbin Tor endpoint — anonymized quick-switch surfing (#683)
|
||||
|
||||
> Capstone du couteau suisse cyber : l'anonymat de la sortie. Spec :
|
||||
> `docs/superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md`.
|
||||
> Invariants : inspection préservée, fail-closed, opt-in (défaut OFF), no DNS leak, CSPN audit.
|
||||
|
||||
- [ ] **Transport** — Option A dialer SOCKS5 upstream (cœur Go #662, *préféré*) vs
|
||||
Option B nft mark → Tor TransPort (fallback pré-#662).
|
||||
- [ ] **Profil Tor egress** — réutiliser `secubox-exposure` (bootstrap/NEWNYM), egress-only.
|
||||
- [ ] **API toolbox** — `POST /admin/tor/{on,off}` (WG-hash scoped) + `GET /tor/state` +
|
||||
`POST /tor/newnym` + état SQLite per-client (TTL 24h).
|
||||
- [ ] **UI kbin** — toggle 🧅 + badge état + flag pays de sortie + bouton « nouvelle identité ».
|
||||
- [ ] **Leak-guard nft** + DNS-over-Tor (test exit IP + resolver ≠ Unbound).
|
||||
- [ ] **`tls_splice` OFF en mode Tor** (#649) — sinon les flux asset fuient l'IP réelle.
|
||||
- [ ] **CSPN** — audit-log chaque bascule ; soak DARK (flag présent, UI cachée) avant flip.
|
||||
|
||||
### ToolBox clients (`clients/`)
|
||||
|
||||
- [x] **#531 Android scaffold + CI** — Gradle/Compose one-tap onboarding,
|
||||
|
|
|
|||
|
|
@ -1,5 +1,35 @@
|
|||
# WIP — Work In Progress
|
||||
*Mis à jour : 2026-06-18*
|
||||
*Mis à jour : 2026-06-19*
|
||||
|
||||
---
|
||||
|
||||
## 🔄 2026-06-19 : kbin milestone — ToolBoX 2.7.0 + chapitre Tor (plan)
|
||||
|
||||
Checkpoint de fin de session. Pas de changement de comportement runtime — docs +
|
||||
positionnement + version + plan de la lame suivante.
|
||||
|
||||
- ✅ **ToolBoX 2.7.0** (middle release) — clôt la ligne 2.6.x (ad-intelligence /
|
||||
Anti-Track v2 / anti-bot uTLS #662), ouvre le chapitre kbin « premier outil du
|
||||
couteau suisse cyber ». kbin = perf transparente + full encrypted + poison/smog +
|
||||
bandeau anti-adware + safe browsing.
|
||||
- ✅ **Docs kbin** — wiki [`Kbin-Toolbox.md`](../docs/wiki/Kbin-Toolbox.md),
|
||||
[`FAQ-KBIN-TOR.md`](../docs/FAQ-KBIN-TOR.md), blurb README.
|
||||
- ✅ **Plan #683** — spec
|
||||
[`2026-06-19-kbin-tor-anonymized-surfing-design.md`](../docs/superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md) :
|
||||
endpoint Tor quick-switch (egress sortant, fail-closed, opt-in, no DNS leak,
|
||||
inspection préservée). Dépend du cœur Go #662.
|
||||
|
||||
### ⬜ Next Up — chapitre Tor (#683)
|
||||
|
||||
- **Décider le transport** : Option A (dialer SOCKS5 upstream via le cœur Go #662,
|
||||
*préféré*) vs Option B (nft mark → Tor TransPort, fallback pré-#662).
|
||||
- **Profil Tor egress** dans `secubox-exposure` (ou unit `tor-egress` dédié) —
|
||||
egress-only, pas de relay/hidden-service dans ce profil.
|
||||
- **API toolbox** : `POST /admin/tor/{on,off}` (par client, WG-hash), `GET /tor/state`,
|
||||
`POST /tor/newnym` + état SQLite + bandeau 🧅 UI.
|
||||
- **Leak-guard nft** + DNS-over-Tor (test : exit IP + resolver ≠ Unbound local).
|
||||
- **Caveat** : en mode Tor, forcer `tls_splice` OFF pour ce client (sinon les flux
|
||||
asset fuient l'IP réelle). Soak DARK (flag présent, UI cachée) avant flip.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
24
README.md
24
README.md
|
|
@ -57,6 +57,30 @@
|
|||
|
||||
---
|
||||
|
||||
## 🗡️ kbin — le premier outil du couteau suisse cyber
|
||||
|
||||
**kbin** (`kbin.gk2.secubox.in`) est le portail public de la **ToolBoX** SecuBox — la
|
||||
*cabine numérique* et **première lame du couteau suisse cyber modulaire** de
|
||||
[cybermind.fr](https://cybermind.fr). On s'y branche, on surfe normalement, et la lame
|
||||
inspecte et protège le trafic de façon transparente :
|
||||
|
||||
| 🗡️ | Lame |
|
||||
|----|------|
|
||||
| ⚡ | **Performance transparente** — on ne déchiffre que ce qu'on modifie (SNI-splice sélectif) |
|
||||
| 🔒 | **Full encrypted** — inspection MITM complète, forge de cert par hôte, fingerprint Chrome uTLS |
|
||||
| ☠️ | **Injection de poison & smog** — le trafic ad-tech ressort empoisonné, pas seulement bloqué |
|
||||
| 🚫 | **Bandeau anti-adware** — transparence injectée, immune au CSP, SPA-aware |
|
||||
| 🛡️ | **Safe browsing** — Vortex DNS + blacklist nft + détection anti-bot |
|
||||
|
||||
> **Prochaine lame — 🧅 mode Tor quick-switch ([#683](https://github.com/CyberMind-FR/secubox-deb/issues/683)).**
|
||||
> Un tap → le surf ressort par le réseau Tor (egress sortant, pseudo-network) : l'inspection
|
||||
> reste intacte, seule l'**IP de sortie** devient anonyme. Fail-closed, opt-in, sans fuite DNS.
|
||||
|
||||
- Use-case : [docs/wiki/Kbin-Toolbox.md](docs/wiki/Kbin-Toolbox.md)
|
||||
- FAQ : [docs/FAQ-KBIN-TOR.md](docs/FAQ-KBIN-TOR.md)
|
||||
|
||||
---
|
||||
|
||||
## License — CyberMind Source-Disclosed (CMSD-1.0)
|
||||
|
||||
> **Source disclosed, rights reserved.**
|
||||
|
|
|
|||
93
docs/FAQ-KBIN-TOR.md
Normal file
93
docs/FAQ-KBIN-TOR.md
Normal file
|
|
@ -0,0 +1,93 @@
|
|||
# FAQ — kbin & le mode Tor anonymisé
|
||||
|
||||
> kbin (`kbin.gk2.secubox.in`) = le portail public de la **ToolBoX** SecuBox, premier
|
||||
> outil du couteau suisse cyber CyberMind. Cette FAQ couvre le surf protégé et le futur
|
||||
> **mode Tor quick-switch** ([#683](https://github.com/CyberMind-FR/secubox-deb/issues/683)).
|
||||
|
||||
---
|
||||
|
||||
### Qu'est-ce que kbin exactement ?
|
||||
|
||||
Le portail public de `secubox-toolbox`. On rejoint l'AP libre de la cabine, on consent,
|
||||
et tout le trafic traverse le pipeline de forge MITM SecuBox : inspection chiffrée,
|
||||
nettoyage pub/tracker, bandeau de transparence, safe browsing. Voir
|
||||
[Kbin-Toolbox](wiki/Kbin-Toolbox.md).
|
||||
|
||||
### kbin voit-il tout mon trafic ? C'est pas dangereux ?
|
||||
|
||||
C'est **consenti et éphémère**. La MAC est hashée avec un sel rotatif 24 h, aucune valeur
|
||||
de cookie brute n'est persistée, aucun mapping session ↔ identité réelle ne survit au TTL.
|
||||
Trois niveaux d'opt-in : R0 (bypass complet), R1 (analyse passive, recommandé), R2/R3
|
||||
(TLS-break + bandeau). Sans consentement, **pas** de déchiffrement.
|
||||
|
||||
### « Performance transparente », ça veut dire quoi ?
|
||||
|
||||
On ne déchiffre que ce qu'on modifie. Les flux pur-asset (vidéo, images CDN) sont
|
||||
*splicés* dès le ClientHello TLS (`tls_splice`, #649) — les workers ne forgent/déchiffrent
|
||||
pas ce qui n'a aucune valeur L7. Débit ligne, latence quasi nulle.
|
||||
|
||||
### C'est quoi « l'injection de poison et de smog » ?
|
||||
|
||||
Le trafic ad-tech et tracker n'est pas seulement bloqué : il est **empoisonné**. Anti-Track
|
||||
v2 (#633) renvoie des pseudo-réponses, neutralise les scripts CDN préchargés, et au niveau
|
||||
réseau fait de l'IP-drop + DNS-refuse. Le profil publicitaire ressort pollué, pas vide —
|
||||
indistinguable d'un vrai blocage côté tracker.
|
||||
|
||||
### Le bandeau anti-adware, il bloque quoi ?
|
||||
|
||||
Une bannière de transparence injectée dans la page : nombre de trackers vus/bloqués,
|
||||
acteurs reconnus cross-site. Elle est immune au CSP et SPA-aware (#636/#639, webext #655).
|
||||
C'est l'affichage ; le blocage réel vient des blocklists Vortex DNS + blacklist nft.
|
||||
|
||||
---
|
||||
|
||||
## Mode Tor (plan #683)
|
||||
|
||||
### Le mode Tor, ça fait quoi ?
|
||||
|
||||
Un interrupteur 🧅 sur kbin : un tap → ton surf ressort **par le réseau Tor** au lieu du
|
||||
WAN de la box. IP de sortie anonyme, identité réseau masquée — du « pseudo-network
|
||||
surfing ».
|
||||
|
||||
### Est-ce que kbin arrête de m'inspecter/protéger en mode Tor ?
|
||||
|
||||
Non. Tor se place **après** le cœur de forge MITM, sur le transport upstream (dialer
|
||||
SOCKS5). Tu gardes le poison/smog, le bandeau et le safe browsing ; **seules l'IP de sortie
|
||||
et l'identité réseau changent**.
|
||||
|
||||
### Et si Tor tombe, ça repasse en clair ?
|
||||
|
||||
**Jamais.** Le design est **fail-closed** : si Tor n'est pas disponible, le trafic est
|
||||
coupé, pas renvoyé en clearnet. L'anonymat est un invariant, pas un best-effort.
|
||||
|
||||
### Y a-t-il des fuites DNS ?
|
||||
|
||||
Non. Quand le mode Tor est actif, la résolution passe **par Tor**, pas par l'Unbound local.
|
||||
|
||||
### C'est la même chose que `secubox-exposure` ?
|
||||
|
||||
Non, direction opposée. `secubox-exposure` publie des **services cachés** Tor (entrant —
|
||||
exposer un service interne). kbin Tor endpoint fait sortir ton **surf** par Tor (sortant).
|
||||
Le contrôle Tor (bootstrap, NEWNYM/nouvelle identité) est réutilisé entre les deux.
|
||||
|
||||
### Comment je change d'IP de sortie ?
|
||||
|
||||
Bouton « nouvelle identité » (NEWNYM) → nouveau circuit Tor → nouvelle IP de sortie, à la
|
||||
volée, sans reconnecter.
|
||||
|
||||
### C'est activé par défaut ?
|
||||
|
||||
Non. **Opt-in par client** (scopé WG-hash), **défaut OFF**, respecte ton niveau de
|
||||
consentement R. Chaque bascule on/off est journalisée (audit-log CSPN immuable).
|
||||
|
||||
---
|
||||
|
||||
## Voir aussi
|
||||
|
||||
- [Kbin-Toolbox](wiki/Kbin-Toolbox.md) — la page use-case complète
|
||||
- [Spec mode Tor](superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md)
|
||||
- [Anti-Track](wiki/Anti-Track.md) — bloque/empoisonne/anonymise (couche DNS/IP)
|
||||
|
||||
---
|
||||
|
||||
*CyberMind — Gérald Kerma · LicenseRef-CMSD-1.0*
|
||||
|
|
@ -0,0 +1,99 @@
|
|||
# Design — kbin Tor endpoint: quick-switch anonymized web surfing
|
||||
|
||||
*Spec · 2026-06-19 · issue [#683](https://github.com/CyberMind-FR/secubox-deb/issues/683) · status: PLAN (no code yet)*
|
||||
|
||||
## Problem
|
||||
|
||||
kbin (the public ToolBoX portal, first tool of the Swiss-army cyber kit) already gives
|
||||
transparent perf + full-MITM inspection + ad poison/smog + adware-ban banner + safe
|
||||
browsing. The **egress is still clearnet**: a kbin session exits via the board WAN with the
|
||||
real IP. The capstone is **anonymity of the exit** — a quick-switch that re-routes a
|
||||
consenting client's surfing through **Tor** (outbound), turning kbin into a pseudo-network
|
||||
surfing booth.
|
||||
|
||||
This is the **opposite direction** of `secubox-exposure` (which publishes inbound Tor
|
||||
hidden services). We reuse its Tor control plumbing (bootstrap, NEWNYM) but for egress.
|
||||
|
||||
## Invariants (non-negotiable)
|
||||
|
||||
1. **Inspection preserved** — Tor sits *after* the MITM forging core, on the upstream
|
||||
transport (SOCKS5 dialer). Poison/smog + banner + safe-browsing stay; only the **exit
|
||||
IP + network identity** change.
|
||||
2. **Fail-closed** — if Tor is down/not bootstrapped, traffic is dropped, never falls back
|
||||
to clearnet. Anonymity is an invariant, not best-effort.
|
||||
3. **No DNS leak** — when Tor mode is on, resolution goes through Tor, not local Unbound.
|
||||
4. **Opt-in, default OFF** — per-client (WG-hash scoped), honors the existing R consent
|
||||
level. No silent global toggle.
|
||||
5. **CSPN** — every Tor on/off decision written to the immutable audit-log; no plaintext
|
||||
exit; TLS 1.3 floor unchanged.
|
||||
|
||||
## Two transport options (decide first)
|
||||
|
||||
| Option | Mechanism | Pros | Cons |
|
||||
|--------|-----------|------|------|
|
||||
| **A — SOCKS5 upstream dialer** (preferred) | The Go forging core (#662) dials upstream via Tor's SOCKS5 (`127.0.0.1:9050`) when the client is Tor-flagged. | Clean integration with #662; per-flow choice; cert verify + uTLS preserved; DNS-over-Tor native (SOCKS5 remote resolve). | Requires the Go core to land first (#662 dependency). |
|
||||
| **B — nft mark → Tor TransPort** | Per-client nft mark routes 80/443 to Tor `TransPort`/`DNSPort`; transparent at L3. | Engine-agnostic; works without #662. | Bypasses the forging core unless chained carefully → risk of losing inspection (violates invariant 1). |
|
||||
|
||||
**Recommendation:** Option A, gated on #662 Go core. Option B only as a pre-#662 fallback,
|
||||
and only if the mark routes *through* the MITM TPROXY first, then Tor.
|
||||
|
||||
## Components
|
||||
|
||||
- **Tor daemon** — `tor.service`, SOCKS5 `9050` + control port (cookie auth). Reuse
|
||||
`secubox-exposure` bootstrap; ensure egress-only config (no relay, no hidden service in
|
||||
this profile).
|
||||
- **toolbox API** — `POST /admin/tor/{on,off}` (per-client, kbin-gated for bulk),
|
||||
`GET /tor/state` (bootstrapped? exit country? client flag?), `POST /tor/newnym`.
|
||||
- **Go forging core (#662)** — upstream dialer switch: Tor-flagged client → SOCKS5 dialer
|
||||
(remote DNS) instead of direct. uTLS Chrome FP + manual cert verify unchanged.
|
||||
- **State store** — per-client `tor_enabled` (WG-hash scoped, TTL-bound) in the toolbox
|
||||
SQLite (`clients` table extension or a small `tor_flags` table).
|
||||
- **nft leak-guard** — when a client is Tor-flagged, a guard rule ensures no 80/443 path
|
||||
reaches the WAN except via the Tor dialer (defense-in-depth for invariant 2/3).
|
||||
- **kbin UI** — 🧅 toggle + state badge (bootstrapping / on / exit-country flag) + "new
|
||||
identity" button; respects R-level (greyed if R0).
|
||||
|
||||
## UX
|
||||
|
||||
```
|
||||
[kbin page] ── tap 🧅 ──▶ POST /admin/tor/on (this client)
|
||||
▼
|
||||
Tor bootstrapped? ──no──▶ "Tor démarre…" (spinner, fail-closed until ready)
|
||||
│yes
|
||||
▼
|
||||
flag client tor_enabled (WG-hash, TTL 24h) + audit-log
|
||||
▼
|
||||
forging core dials upstream via SOCKS5 → exit IP changes
|
||||
▼
|
||||
badge: 🧅 ON · 🌍 <exit-country flag> [Nouvelle identité]
|
||||
```
|
||||
|
||||
## Open questions (resolve next session)
|
||||
|
||||
- Per-flow vs per-session Tor? (start per-session/per-client; per-flow later)
|
||||
- Exit-country selection (`ExitNodes {cc}`) exposed to user, or auto?
|
||||
- Latency expectation messaging — Tor is slower; the perf banner must set expectations.
|
||||
- Interaction with `tls_splice` (#649): splice = direct fast-path; in Tor mode, splice
|
||||
must be disabled or also routed through Tor (else asset flows leak the real IP).
|
||||
**Likely: Tor mode forces splice OFF for that client.**
|
||||
- Interaction with Anti-Track v2 IP-drop/DNS-refuse: ordering vs Tor resolution.
|
||||
|
||||
## Dependencies & sequencing
|
||||
|
||||
1. **#662 Go core** lands the upstream dialer abstraction → enables Option A.
|
||||
2. Tor egress profile in `secubox-exposure` (or a dedicated `tor-egress` unit).
|
||||
3. toolbox API + state + UI.
|
||||
4. nft leak-guard + DNS-over-Tor verification (leak test: compare exit IP + DNS resolver).
|
||||
5. CSPN audit-log wiring + soak DARK (flag exists, UI hidden) → flip.
|
||||
|
||||
## Test plan (sketch)
|
||||
|
||||
- Leak test: with Tor mode on, `check.torproject.org` confirms Tor; DNS resolver is not the
|
||||
local Unbound; real WAN IP never observed upstream.
|
||||
- Fail-closed test: stop `tor.service` mid-session → traffic drops, no clearnet egress.
|
||||
- Inspection test: ad-block + banner + poison still fire while on Tor.
|
||||
- NEWNYM test: exit IP changes after "new identity".
|
||||
|
||||
---
|
||||
|
||||
*CyberMind — Gérald Kerma · LicenseRef-CMSD-1.0*
|
||||
94
docs/wiki/Kbin-Toolbox.md
Normal file
94
docs/wiki/Kbin-Toolbox.md
Normal file
|
|
@ -0,0 +1,94 @@
|
|||
# kbin — ToolBoX, le premier outil du couteau suisse cyber
|
||||
|
||||
**CyberMind · Gondwana · Notre-Dame-du-Cruet · Savoie** | [Home](Home) | [Anti-Track](Anti-Track) | [Modules](Modules)
|
||||
|
||||
> **kbin** (`kbin.gk2.secubox.in`) est le portail public de la **ToolBoX** SecuBox —
|
||||
> la *cabine téléphonique numérique*. C'est le **premier outil du couteau suisse cyber
|
||||
> modulaire** de [cybermind.fr](https://cybermind.fr) : on s'y connecte, on surfe, et la
|
||||
> lame inspecte, nettoie et protège le trafic de façon transparente.
|
||||
|
||||
---
|
||||
|
||||
## Le concept en une phrase
|
||||
|
||||
> **Branche-toi, navigue normalement — kbin rend ta session rapide, chiffrée, sans pub
|
||||
> et bientôt anonyme.**
|
||||
|
||||
kbin est la face publique du module [`secubox-toolbox`](../../packages/secubox-toolbox/).
|
||||
Le client rejoint l'AP libre, consent (R1 passif / R2 TLS-break), et tout son trafic
|
||||
traverse le pipeline de forge MITM SecuBox — sans configuration, sans app obligatoire.
|
||||
|
||||
---
|
||||
|
||||
## Les 5 lames déjà affûtées
|
||||
|
||||
| 🗡️ Lame | Ce qu'elle fait | Implémentation |
|
||||
|---------|-----------------|----------------|
|
||||
| **⚡ Performance transparente** | Débit ligne, latence quasi nulle ; on ne déchiffre que ce qu'on modifie (SNI-splice sélectif des flux pur-asset). | `tls_splice` addon (#649), workers R3 |
|
||||
| **🔒 Full encrypted** | Inspection MITM complète sur HTTPS sortant : forge de cert par hôte, chaîne de certs vérifiée, fingerprint Chrome (uTLS) côté upstream. | Go forging core (#662), uTLS HelloChrome |
|
||||
| **☠️ Injection de poison & smog** | Le trafic ad-tech / tracker entre dans la chambre d'inspection et ressort empoisonné/embrumé : pseudo-réponses, scripts neutralisés, IP-drop + DNS-refuse. | Anti-Track v2 (#633), `privacy_guard`, ad-ghoster |
|
||||
| **🚫 Bandeau anti-adware** | Bannière de transparence injectée dans la page : « tu as été pisté / X trackers bloqués », immune au CSP, SPA-aware. | banner saga (#636/#639), webext (#655) |
|
||||
| **🛡️ Safe browsing** | Blocklists Vortex DNS, blacklist nft (CrowdSec + threat-intel), détection anti-bot/challenge passive. | Phase 13 enforcement plane, Vortex Unbound |
|
||||
|
||||
---
|
||||
|
||||
## La lame suivante : 🧅 Tor quick-switch (plan #683)
|
||||
|
||||
C'est la **pointe manquante** : l'anonymat de la sortie.
|
||||
|
||||
Aujourd'hui kbin voit, nettoie et protège — mais le trafic ressort par le WAN de la box,
|
||||
avec l'IP réelle. Le **endpoint Tor** ajoute un interrupteur :
|
||||
|
||||
> **Un tap sur kbin → 🧅 « Mode Tor »** → le surf du client ressort **par le réseau Tor**
|
||||
> au lieu du WAN. Pseudo-réseau, IP de sortie anonyme, identité réseau masquée.
|
||||
|
||||
Invariants de conception (voir
|
||||
[spec](../superpowers/specs/2026-06-19-kbin-tor-anonymized-surfing-design.md)) :
|
||||
|
||||
- **L'inspection reste intacte** — Tor se place *après* le cœur de forge MITM, sur le
|
||||
transport upstream (dialer SOCKS5). On garde poison/smog + bandeau + safe browsing ;
|
||||
seules **l'IP de sortie et l'identité réseau** changent.
|
||||
- **Opt-in par client** (scopé WG-hash), **défaut OFF**, respecte le niveau de consentement R.
|
||||
- **Fail-closed** — si Tor tombe, **pas** de repli clearnet (l'anonymat est un invariant,
|
||||
pas un best-effort).
|
||||
- **Pas de fuite DNS** — résolution via Tor quand le mode est actif, pas via l'Unbound local.
|
||||
- **CSPN** — chaque bascule Tor on/off est journalisée (audit-log immuable) ; aucune sortie
|
||||
en clair.
|
||||
|
||||
### Cas d'usage
|
||||
|
||||
1. **Cabine VILLAGE3B** — un visiteur veut consulter un site sensible (santé, juridique,
|
||||
presse) depuis la borne publique sans laisser l'IP de la box. Tap 🧅 → surf anonyme.
|
||||
2. **Pseudo-network surfing** — naviguer comme depuis un autre pays / une autre identité
|
||||
réseau, le temps d'une session éphémère 24h.
|
||||
3. **Renouvellement de circuit** — bouton « nouvelle identité » (NEWNYM) pour changer
|
||||
d'IP de sortie à la volée.
|
||||
|
||||
> Direction **opposée** à `secubox-exposure` : celui-ci publie des *services cachés* Tor
|
||||
> (entrant) ; kbin Tor endpoint fait sortir le surf client *par* Tor (sortant).
|
||||
|
||||
---
|
||||
|
||||
## Où ça vit
|
||||
|
||||
| Élément | Emplacement |
|
||||
|---------|-------------|
|
||||
| Portail public | `kbin.gk2.secubox.in` → HAProxy → `toolbox_landing` → `10.99.0.1:8088` |
|
||||
| Tableau opérateur | `admin.gk2.secubox.in/toolbox/` |
|
||||
| Vue carto perso | `kbin.gk2.secubox.in/social/me` |
|
||||
| Module | [`packages/secubox-toolbox/`](../../packages/secubox-toolbox/) |
|
||||
| Canal Tor (réutilisé) | [`packages/secubox-exposure/`](../../packages/secubox-exposure/) |
|
||||
|
||||
---
|
||||
|
||||
## Voir aussi
|
||||
|
||||
- [Anti-Track](Anti-Track) — moteur bloque/empoisonne/anonymise (couche DNS/IP)
|
||||
- [FAQ kbin & Tor](../FAQ-KBIN-TOR.md)
|
||||
- Punk Exposure Engine — canal Tor, doctrine dans `CLAUDE.md`
|
||||
- Epic [#662](https://github.com/CyberMind-FR/secubox-deb/issues/662) — migration cœur MITM (Go)
|
||||
- Plan [#683](https://github.com/CyberMind-FR/secubox-deb/issues/683) — kbin Tor endpoint
|
||||
|
||||
---
|
||||
|
||||
*CyberMind — Gérald Kerma · LicenseRef-CMSD-1.0*
|
||||
147
packages/secubox-toolbox-ng/cmd/sbxmitm/adcand_test.go
Normal file
147
packages/secubox-toolbox-ng/cmd/sbxmitm/adcand_test.go
Normal file
|
|
@ -0,0 +1,147 @@
|
|||
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||
//
|
||||
// SecuBox-Deb :: toolbox-ng :: ad-candidate learning-feed tests (#662)
|
||||
//
|
||||
// The Go cutover blocked from STATIC lists but never emitted LEARNING
|
||||
// candidates, so a brand-new adware (acotedemoi.com) was never observed → never
|
||||
// promoted → slipped through forever. These tests prove the engine now ports
|
||||
// ad_ghost's _AD_PATH heuristic and records a candidate (host,site) for every
|
||||
// 3rd-party ad-path request on the allow/mitm path — the feed autolearn promotes.
|
||||
package main
|
||||
|
||||
import (
|
||||
"path/filepath"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func TestAdPathRegex(t *testing.T) {
|
||||
hit := []string{
|
||||
"/ad/1.gif", "/ads/x", "/adserver/req", "/pagead/conversion",
|
||||
"/gampad/ads", "/doubleclick/x", "/beacon", "/pixel.gif",
|
||||
"/collect", "/track", "/tracking/p", "/telemetry/v2", "/metric",
|
||||
"/PAGEAD/Upper", // case-insensitive
|
||||
}
|
||||
for _, p := range hit {
|
||||
if !adPathRE.MatchString(p) {
|
||||
t.Errorf("adPathRE should MATCH %q", p)
|
||||
}
|
||||
}
|
||||
miss := []string{"/", "/index.html", "/api/users", "/static/app.js", "/cart", "/headline"}
|
||||
for _, p := range miss {
|
||||
if adPathRE.MatchString(p) {
|
||||
t.Errorf("adPathRE should NOT match %q", p)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// newAdCandTestPolicy builds a Policy with doubleclick.net allowlisted (so the
|
||||
// allowlist-skip branch is exercised) and nothing learned.
|
||||
func newAdCandTestPolicy(t *testing.T) *Policy {
|
||||
t.Helper()
|
||||
pol, err := LoadPolicy(PolicyOpts{
|
||||
AllowPath: writeTemp(t, "doubleclick.net\n"),
|
||||
LearnedPath: writeTemp(t, ""),
|
||||
SpliceSeedPath: writeTemp(t, ""),
|
||||
SpliceLearnPath: writeTemp(t, ""),
|
||||
PureTrackersPath: writeTemp(t, ""),
|
||||
SelfDomains: []string{"secubox.in"},
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("LoadPolicy: %v", err)
|
||||
}
|
||||
return pol
|
||||
}
|
||||
|
||||
func TestMaybeRecordAdCandidate(t *testing.T) {
|
||||
pol := newAdCandTestPolicy(t)
|
||||
|
||||
cases := []struct {
|
||||
name string
|
||||
host string // request host
|
||||
site string // referer site (registrable)
|
||||
path string
|
||||
want bool // candidate recorded?
|
||||
wantHK string
|
||||
}{
|
||||
{"3rd-party ad-path → candidate", "metrics.acotedemoi.com", "lemonde.fr", "/collect", true, "metrics.acotedemoi.com"},
|
||||
{"3rd-party ad-path /pagead", "ads.foo.io", "news.example", "/pagead/x", true, "ads.foo.io"},
|
||||
{"1st-party (same registrable) → no candidate", "static.lemonde.fr", "lemonde.fr", "/ads/x", false, ""},
|
||||
{"3rd-party non-ad-path → no candidate", "cdn.acotedemoi.com", "lemonde.fr", "/app.js", false, ""},
|
||||
{"no site (no Referer) → no candidate", "metrics.acotedemoi.com", "", "/collect", false, ""},
|
||||
{"allowlisted host → no candidate", "ads.doubleclick.net", "lemonde.fr", "/pagead/x", false, ""},
|
||||
}
|
||||
|
||||
for _, tc := range cases {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
cand := newAdCandidates()
|
||||
px := &Proxy{pol: pol, cand: cand, analysisRelay: true}
|
||||
px.maybeRecordAdCandidate(tc.host, tc.site, tc.path)
|
||||
snap := cand.snapshot()
|
||||
if tc.want {
|
||||
if len(snap) != 1 {
|
||||
t.Fatalf("want 1 candidate, got %d (%+v)", len(snap), snap)
|
||||
}
|
||||
if snap[0].Host != tc.wantHK {
|
||||
t.Fatalf("candidate host = %q, want %q", snap[0].Host, tc.wantHK)
|
||||
}
|
||||
if snap[0].Site != tc.site {
|
||||
t.Fatalf("candidate site = %q, want %q", snap[0].Site, tc.site)
|
||||
}
|
||||
if snap[0].Hits != 1 {
|
||||
t.Fatalf("candidate hits = %d, want 1", snap[0].Hits)
|
||||
}
|
||||
} else if len(snap) != 0 {
|
||||
t.Fatalf("want 0 candidates, got %d (%+v)", len(snap), snap)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// TestAdCandidateGatedByRelay proves the feed is gated behind the analysis/ad
|
||||
// relay flag: with the gate off, nothing is recorded even on a textbook hit.
|
||||
func TestAdCandidateGatedByRelay(t *testing.T) {
|
||||
pol := newAdCandTestPolicy(t)
|
||||
cand := newAdCandidates()
|
||||
px := &Proxy{pol: pol, cand: cand, analysisRelay: false}
|
||||
px.maybeRecordAdCandidate("metrics.acotedemoi.com", "lemonde.fr", "/collect")
|
||||
if n := len(cand.snapshot()); n != 0 {
|
||||
t.Fatalf("relay off: want 0 candidates, got %d", n)
|
||||
}
|
||||
}
|
||||
|
||||
// TestAdCandidateHitsAccumulate proves repeated (host,site) hits coalesce.
|
||||
func TestAdCandidateHitsAccumulate(t *testing.T) {
|
||||
cand := newAdCandidates()
|
||||
for i := 0; i < 5; i++ {
|
||||
cand.record("x.tracker.io", "site.example")
|
||||
}
|
||||
snap := cand.snapshot()
|
||||
if len(snap) != 1 || snap[0].Hits != 5 {
|
||||
t.Fatalf("want 1 row hits=5, got %+v", snap)
|
||||
}
|
||||
// snapshot clears.
|
||||
if n := len(cand.snapshot()); n != 0 {
|
||||
t.Fatalf("snapshot should clear: got %d", n)
|
||||
}
|
||||
}
|
||||
|
||||
// TestAdCandidatePayloadShape proves the candidates list serialises into the
|
||||
// extended ad-event payload (host/site/hits keys).
|
||||
func TestAdCandidatePayloadShape(t *testing.T) {
|
||||
cand := newAdCandidates()
|
||||
cand.record("x.tracker.io", "site.example")
|
||||
rows := cand.snapshot()
|
||||
p := adEventPayload{Candidates: rows}
|
||||
if p.empty() {
|
||||
t.Fatal("payload with candidates must not be empty()")
|
||||
}
|
||||
}
|
||||
|
||||
// writeTemp writes content to a fresh temp file and returns its path.
|
||||
func writeTemp(t *testing.T, content string) string {
|
||||
t.Helper()
|
||||
f := filepath.Join(t.TempDir(), "list.txt")
|
||||
writeFile(t, f, content)
|
||||
return f
|
||||
}
|
||||
|
|
@ -26,10 +26,74 @@ import (
|
|||
"log"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"regexp"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
// ── ad-candidate learning feed (#662 auto-learn loop) ─────────────────────────
|
||||
//
|
||||
// The STATIC block list never grows on its own; ad_ghost fed autolearn by
|
||||
// capturing CANDIDATES — 3rd-party requests whose PATH smells like an ad/track
|
||||
// endpoint — into ad_candidates, which secubox-toolbox-autolearn later promotes
|
||||
// into learned-trackers.txt at AD_MIN_SITES distinct sites. The Go cutover
|
||||
// dropped this feed, so new adwares (acotedemoi.com) were never observed. This
|
||||
// restores it in the engine: the allow/mitm hot path records (host,site) when
|
||||
// the request is 3rd-party AND adPathRE matches, buffered + flushed with the
|
||||
// existing ad-event machinery.
|
||||
|
||||
// adPathRE ports ad_ghost._AD_PATH (RE2-safe, case-insensitive). Matches a path
|
||||
// that looks like an ad/track endpoint. Learning only — never a block decision.
|
||||
//
|
||||
// Python: re.compile(r"/ads?/|/adserver|/pagead|/gampad|/doubleclick|/beacon|"
|
||||
// r"/pixel|/collect|/track(ing)?|/telemetry|/metric", re.I)
|
||||
var adPathRE = regexp.MustCompile(`(?i)/ads?/|/adserver|/pagead|/gampad|/doubleclick|/beacon|/pixel|/collect|/track(ing)?|/telemetry|/metric`)
|
||||
|
||||
// adCandMapCap bounds the candidate buffer (mirrors ad_ghost's `len(_cand) <
|
||||
// 20000` guard): NEW keys past the cap are dropped until the next flush clears
|
||||
// it, so a dead portal can never grow memory unbounded.
|
||||
const adCandMapCap = 20000
|
||||
|
||||
// adCandidates is the lock-guarded (host,site)→hits candidate aggregator,
|
||||
// drained by the ad-stats flusher into the ad-event payload's "candidates" list.
|
||||
type adCandidates struct {
|
||||
mu sync.Mutex
|
||||
hit map[adKey]int64
|
||||
}
|
||||
|
||||
func newAdCandidates() *adCandidates { return &adCandidates{hit: map[adKey]int64{}} }
|
||||
|
||||
// record tallies one ad-candidate (host,site). O(1); the cap drops only NEW keys
|
||||
// (existing keys keep accumulating). Empty host is ignored.
|
||||
func (a *adCandidates) record(host, site string) {
|
||||
if host == "" {
|
||||
return
|
||||
}
|
||||
a.mu.Lock()
|
||||
defer a.mu.Unlock()
|
||||
k := adKey{adHost: host, site: site}
|
||||
if _, ok := a.hit[k]; ok {
|
||||
a.hit[k]++
|
||||
} else if len(a.hit) < adCandMapCap {
|
||||
a.hit[k] = 1
|
||||
}
|
||||
}
|
||||
|
||||
// snapshot atomically reads-and-clears the buffer, returning the candidate rows.
|
||||
func (a *adCandidates) snapshot() []adCandidateRow {
|
||||
a.mu.Lock()
|
||||
defer a.mu.Unlock()
|
||||
if len(a.hit) == 0 {
|
||||
return nil
|
||||
}
|
||||
rows := make([]adCandidateRow, 0, len(a.hit))
|
||||
for k, n := range a.hit {
|
||||
rows = append(rows, adCandidateRow{Host: k.adHost, Site: k.site, Hits: n})
|
||||
}
|
||||
a.hit = map[adKey]int64{}
|
||||
return rows
|
||||
}
|
||||
|
||||
// refererSite ports the ad_ghost _site_of logic: parse the Referer header as a
|
||||
// URL, take its hostname, and return registrable(hostname). Empty Referer or a
|
||||
// parse failure → "" (the page that issued the blocked request is unknown).
|
||||
|
|
@ -133,9 +197,19 @@ type adClientRow struct {
|
|||
Bytes int64 `json:"bytes"`
|
||||
}
|
||||
|
||||
// adCandidateRow is one learning candidate (host seen issuing ad-path requests
|
||||
// from a 1st-party site). Mirrors the portal /__toolbox/ad-event "candidates"
|
||||
// contract → store.record_ad_candidates([(host, site, hits), ...]).
|
||||
type adCandidateRow struct {
|
||||
Host string `json:"host"`
|
||||
Site string `json:"site"`
|
||||
Hits int64 `json:"hits"`
|
||||
}
|
||||
|
||||
type adEventPayload struct {
|
||||
Blocks []adBlockRow `json:"blocks"`
|
||||
Clients []adClientRow `json:"clients"`
|
||||
Blocks []adBlockRow `json:"blocks"`
|
||||
Clients []adClientRow `json:"clients"`
|
||||
Candidates []adCandidateRow `json:"candidates,omitempty"`
|
||||
}
|
||||
|
||||
// snapshot atomically reads-and-clears both maps, returning the accumulated rows.
|
||||
|
|
@ -159,7 +233,9 @@ func (a *adStats) snapshot() adEventPayload {
|
|||
}
|
||||
|
||||
// empty reports whether a payload carries no rows (nothing to POST).
|
||||
func (p adEventPayload) empty() bool { return len(p.Blocks) == 0 && len(p.Clients) == 0 }
|
||||
func (p adEventPayload) empty() bool {
|
||||
return len(p.Blocks) == 0 && len(p.Clients) == 0 && len(p.Candidates) == 0
|
||||
}
|
||||
|
||||
// adEventClient is a short-timeout fire-and-forget client for the ad-event POST.
|
||||
// Sibling of portalClient (banner.go): the portal is a fixed loopback base, so
|
||||
|
|
@ -175,8 +251,15 @@ var adEventClient = &http.Client{
|
|||
// non-2xx) is swallowed with at most a debug log — the metrics are stats, not
|
||||
// security, and the engine must never block on the portal. Exposed (returns the
|
||||
// flushed payload) so the test can assert the snapshot/clear + payload shape.
|
||||
func (a *adStats) flushOnce(portal string) adEventPayload {
|
||||
//
|
||||
// cand may be nil (the CONNECT PoC / tests with no learning feed); when set its
|
||||
// candidate rows are drained into the SAME payload so the learning feed rides
|
||||
// the existing ad-event channel (one POST per 10s, not two).
|
||||
func (a *adStats) flushOnce(portal string, cand *adCandidates) adEventPayload {
|
||||
p := a.snapshot()
|
||||
if cand != nil {
|
||||
p.Candidates = cand.snapshot()
|
||||
}
|
||||
if p.empty() {
|
||||
return p
|
||||
}
|
||||
|
|
@ -198,10 +281,10 @@ func (a *adStats) flushOnce(portal string) adEventPayload {
|
|||
// runAdStatsFlusher is the background flusher goroutine: every adFlushInterval it
|
||||
// drains the aggregator to the portal. Start it once from main() (like the
|
||||
// engine's other startup goroutines). It runs forever (the process lifetime).
|
||||
func (a *adStats) runAdStatsFlusher(portal string) {
|
||||
func (a *adStats) runAdStatsFlusher(portal string, cand *adCandidates) {
|
||||
t := time.NewTicker(adFlushInterval)
|
||||
defer t.Stop()
|
||||
for range t.C {
|
||||
a.flushOnce(portal)
|
||||
a.flushOnce(portal, cand)
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -41,7 +41,7 @@ func TestRecordAdBlockEmptyHostIgnored(t *testing.T) {
|
|||
|
||||
func TestRecordAdBlockPerClientOnlyWhenMacSet(t *testing.T) {
|
||||
a := newAdStats()
|
||||
a.recordAdBlock("ads.example.com", "site", "") // no mac → no client row
|
||||
a.recordAdBlock("ads.example.com", "site", "") // no mac → no client row
|
||||
a.recordAdBlock("ads.example.com", "site", "mac1") // mac → client row
|
||||
a.recordAdBlock("ads.example.com", "site", "mac1")
|
||||
|
||||
|
|
@ -111,7 +111,7 @@ func TestFlushOncePayloadShapeMatchesContract(t *testing.T) {
|
|||
}))
|
||||
defer srv.Close()
|
||||
|
||||
a.flushOnce(srv.URL)
|
||||
a.flushOnce(srv.URL, nil)
|
||||
|
||||
if ct != "application/json" {
|
||||
t.Fatalf("Content-Type = %q, want application/json", ct)
|
||||
|
|
@ -145,7 +145,7 @@ func TestFlushOnceEmptySkipsPost(t *testing.T) {
|
|||
w.WriteHeader(http.StatusNoContent)
|
||||
}))
|
||||
defer srv.Close()
|
||||
a.flushOnce(srv.URL)
|
||||
a.flushOnce(srv.URL, nil)
|
||||
if posted {
|
||||
t.Fatalf("flushOnce on empty aggregator must not POST")
|
||||
}
|
||||
|
|
@ -156,7 +156,7 @@ func TestFlushOnceSwallowsPortalError(t *testing.T) {
|
|||
a.recordAdBlock("ads.example.com", "site", "")
|
||||
// Unreachable portal → must not panic, must still clear the maps (snapshot
|
||||
// happens before the POST).
|
||||
a.flushOnce("http://127.0.0.1:1")
|
||||
a.flushOnce("http://127.0.0.1:1", nil)
|
||||
if len(a.blocks) != 0 {
|
||||
t.Fatalf("flushOnce must clear maps even on POST failure")
|
||||
}
|
||||
|
|
|
|||
|
|
@ -24,6 +24,7 @@ import (
|
|||
"io"
|
||||
"log"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
|
@ -111,6 +112,94 @@ func injectLoader(body []byte, clientHash string, wg, cspBypassed bool) []byte {
|
|||
return body
|
||||
}
|
||||
|
||||
// ── inline banner (#662, supersedes injectLoader in the live path) ──────────
|
||||
//
|
||||
// Sites with a SERVICE WORKER (leparisien, cnn…) intercept EVERY same-origin
|
||||
// request, so the legacy <script src="/__toolbox/loader.js"> tag and the
|
||||
// fetch("/__toolbox/bundle") it makes are hijacked by the page's SW (404 /
|
||||
// app-shell) BEFORE they reach this engine → the banner never appears. The fix
|
||||
// is to INLINE the whole banner: the engine fetches the COMPLETE script body
|
||||
// from the portal server-side (once per injected HTML response) and bakes it
|
||||
// into a self-contained <script>…</script> with mh/wg/csp + the bundle as JS
|
||||
// literals — so there is NOTHING same-origin for the SW to hijack.
|
||||
//
|
||||
// injectLoader + the /__toolbox/loader.js short-circuit are KEPT (not removed)
|
||||
// for compatibility, but the live inject path now uses the inline banner.
|
||||
|
||||
// fetchInlineBanner fetches the COMPLETE inline banner script BODY from the
|
||||
// portal's /__toolbox/inline endpoint (which bakes mh/wg/csp + the bundle as JS
|
||||
// literals). Returns (body, true) on a 2xx; FAIL-OPEN (returns "", false) on any
|
||||
// error — portal down, timeout, non-2xx, read failure — so the caller simply
|
||||
// skips the inject and serves the page intact (no banner, like today's fail-open
|
||||
// when the portal asset 204s). It NEVER breaks a navigation over a banner.
|
||||
//
|
||||
// wg → "1" else "0"; cspBypassed → csp=1 (the 🔓 proof) else 0; clientHash is
|
||||
// ascii-sanitised exactly like the data-mh attribute was.
|
||||
func fetchInlineBanner(portal, clientHash string, wg, cspBypassed bool) (string, bool) {
|
||||
wgVal := "0"
|
||||
if wg {
|
||||
wgVal = "1"
|
||||
}
|
||||
cspVal := "0"
|
||||
if cspBypassed {
|
||||
cspVal = "1"
|
||||
}
|
||||
q := url.Values{}
|
||||
q.Set("mh", asciiOnly(clientHash))
|
||||
q.Set("wg", wgVal)
|
||||
q.Set("csp", cspVal)
|
||||
target := strings.TrimRight(portal, "/") + "/__toolbox/inline?" + q.Encode()
|
||||
resp, err := portalClient.Get(target)
|
||||
if err != nil {
|
||||
log.Printf("inline banner fetch failed for %s: %v", target, err)
|
||||
return "", false
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
||||
log.Printf("inline banner fetch non-2xx (%d) for %s", resp.StatusCode, target)
|
||||
return "", false
|
||||
}
|
||||
body, rerr := io.ReadAll(io.LimitReader(resp.Body, 8<<20))
|
||||
if rerr != nil {
|
||||
log.Printf("inline banner read failed for %s: %v", target, rerr)
|
||||
return "", false
|
||||
}
|
||||
return string(body), true
|
||||
}
|
||||
|
||||
// injectInlineBanner inserts a SELF-CONTAINED <script>scriptBody</script> into an
|
||||
// HTML body once. It is idempotent via the SAME bannerGuard marker injectLoader
|
||||
// uses (so a body already carrying either form is never double-injected), and it
|
||||
// uses the SAME placement injectLoader did:
|
||||
// - guard idempotency: body already contains bannerGuard → unchanged.
|
||||
// - after the first (case-insensitive) "<head"'s closing '>'.
|
||||
// - else right BEFORE the first "<body".
|
||||
// - else return the body unchanged (no inject).
|
||||
//
|
||||
// scriptBody is the COMPLETE inline IIFE from fetchInlineBanner (NOT a src tag);
|
||||
// an empty scriptBody is a no-op (returns the body unchanged) so a failed/skipped
|
||||
// fetch is handled gracefully by the caller passing "".
|
||||
func injectInlineBanner(body []byte, scriptBody string) []byte {
|
||||
if scriptBody == "" {
|
||||
return body
|
||||
}
|
||||
if bytes.Contains(body, []byte(bannerGuard)) {
|
||||
return body
|
||||
}
|
||||
script := []byte("<!-- " + bannerGuard + " --><script>" + scriptBody + "</script>")
|
||||
low := bytes.ToLower(body)
|
||||
|
||||
if h := bytes.Index(low, []byte("<head")); h >= 0 {
|
||||
if j := bytes.IndexByte(body[h:], '>'); j >= 0 {
|
||||
return spliceAt(body, script, h+j+1)
|
||||
}
|
||||
}
|
||||
if b := bytes.Index(low, []byte("<body")); b >= 0 {
|
||||
return spliceAt(body, script, b)
|
||||
}
|
||||
return body
|
||||
}
|
||||
|
||||
// ── /__toolbox/* reverse-proxy to the portal ─────────────────────────────────
|
||||
|
||||
// isToolboxAssetPath reports whether a request path is one of the banner assets
|
||||
|
|
|
|||
|
|
@ -10,10 +10,19 @@
|
|||
package main
|
||||
|
||||
import (
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// inlineTestScript is a stand-in for the COMPLETE inline banner body that
|
||||
// fetchInlineBanner pulls from the portal. The Go engine treats it as an opaque
|
||||
// string (the JS literal-baking is the portal's job, covered by the Python
|
||||
// tests); these tests only assert placement / idempotency / fail-open. Shared
|
||||
// across banner_test, gzip_test, compress_test, cosmetic_test.
|
||||
const inlineTestScript = `(function(){window.__SBX_LOADER__=1;})();`
|
||||
|
||||
func TestInjectLoaderGuardIdempotent(t *testing.T) {
|
||||
// Body already carrying the guard → returned byte-for-byte unchanged.
|
||||
body := []byte("<html><head><!-- " + bannerGuard + " --><script></script></head><body>hi</body></html>")
|
||||
|
|
@ -130,3 +139,141 @@ func TestPortalTargetURL(t *testing.T) {
|
|||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── #662 inline banner (SW-immune; supersedes injectLoader in the live path) ──
|
||||
|
||||
func TestInjectInlineBannerEmptyScriptNoop(t *testing.T) {
|
||||
// scriptBody == "" (fetch failed/skipped) → no inject, body unchanged.
|
||||
body := []byte(`<html><head></head><body>hi</body></html>`)
|
||||
out := injectInlineBanner(body, "")
|
||||
if string(out) != string(body) {
|
||||
t.Fatalf("empty scriptBody must be a no-op.\n got: %s", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestInjectInlineBannerGuardIdempotent(t *testing.T) {
|
||||
// Body already carrying the guard → returned byte-for-byte unchanged.
|
||||
body := []byte("<html><head><!-- " + bannerGuard + " --><script></script></head><body>hi</body></html>")
|
||||
out := injectInlineBanner(body, inlineTestScript)
|
||||
if string(out) != string(body) {
|
||||
t.Fatalf("guarded body must be unchanged.\n got: %s", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestInjectInlineBannerHeadInsertion(t *testing.T) {
|
||||
body := []byte(`<html><head lang="en"><title>x</title></head><body>hi</body></html>`)
|
||||
out := string(injectInlineBanner(body, inlineTestScript))
|
||||
headOpen := `<head lang="en">`
|
||||
idx := strings.Index(out, headOpen)
|
||||
if idx < 0 {
|
||||
t.Fatalf("head open lost: %s", out)
|
||||
}
|
||||
after := out[idx+len(headOpen):]
|
||||
// An INLINE <script> (not <script src), carrying the body verbatim, right
|
||||
// after the <head>'s '>'.
|
||||
wantTag := `<!-- ` + bannerGuard + ` --><script>` + inlineTestScript + `</script>`
|
||||
if !strings.HasPrefix(after, wantTag) {
|
||||
t.Fatalf("inline tag not inserted right after <head>'s '>'.\n got: %s", after)
|
||||
}
|
||||
if strings.Contains(out, "<script src=") {
|
||||
t.Fatalf("inline banner must NOT be a <script src> tag: %s", out)
|
||||
}
|
||||
if !strings.Contains(out, wantTag+`<title>x</title>`) {
|
||||
t.Fatalf("original head content displaced: %s", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestInjectInlineBannerBodyFallback(t *testing.T) {
|
||||
body := []byte(`<html><body class="x">hi</body></html>`)
|
||||
out := string(injectInlineBanner(body, inlineTestScript))
|
||||
wantTag := `<!-- ` + bannerGuard + ` --><script>` + inlineTestScript + `</script>`
|
||||
if !strings.Contains(out, wantTag+`<body class="x">`) {
|
||||
t.Fatalf("inline tag not inserted right before <body>.\n got: %s", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestInjectInlineBannerNeitherHeadNorBody(t *testing.T) {
|
||||
body := []byte(`<p>just a fragment</p>`)
|
||||
out := injectInlineBanner(body, inlineTestScript)
|
||||
if string(out) != string(body) {
|
||||
t.Fatalf("no head/body → must be unchanged.\n got: %s", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestInjectInlineBannerCaseInsensitiveHead(t *testing.T) {
|
||||
body := []byte(`<HTML><HEAD></HEAD><BODY>hi</BODY></HTML>`)
|
||||
out := string(injectInlineBanner(body, inlineTestScript))
|
||||
if !strings.Contains(out, `<HEAD><!-- `+bannerGuard) {
|
||||
t.Fatalf("case-insensitive <HEAD> match failed: %s", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFetchInlineBannerOK(t *testing.T) {
|
||||
// Portal returns a body + 200 → fetchInlineBanner returns (body, true) and
|
||||
// echoes mh/wg/csp into the query.
|
||||
var gotQuery string
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
gotQuery = r.URL.RawQuery
|
||||
w.Header().Set("Content-Type", "application/javascript")
|
||||
_, _ = w.Write([]byte(inlineTestScript))
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
body, ok := fetchInlineBanner(srv.URL, "deadbeef", true, true)
|
||||
if !ok {
|
||||
t.Fatal("fetchInlineBanner must report ok=true on a 200")
|
||||
}
|
||||
if body != inlineTestScript {
|
||||
t.Fatalf("fetchInlineBanner body mismatch: %q", body)
|
||||
}
|
||||
for _, want := range []string{"mh=deadbeef", "wg=1", "csp=1"} {
|
||||
if !strings.Contains(gotQuery, want) {
|
||||
t.Fatalf("query %q missing %q", gotQuery, want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestFetchInlineBannerWGCSPZero(t *testing.T) {
|
||||
var gotQuery string
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
gotQuery = r.URL.RawQuery
|
||||
_, _ = w.Write([]byte(inlineTestScript))
|
||||
}))
|
||||
defer srv.Close()
|
||||
if _, ok := fetchInlineBanner(srv.URL, "x", false, false); !ok {
|
||||
t.Fatal("ok=true expected")
|
||||
}
|
||||
for _, want := range []string{"wg=0", "csp=0"} {
|
||||
if !strings.Contains(gotQuery, want) {
|
||||
t.Fatalf("query %q missing %q", gotQuery, want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestFetchInlineBannerFailOpenDeadPortal(t *testing.T) {
|
||||
// A dead portal (closed listener) → fail-open: ("", false) → caller skips the
|
||||
// inject and serves the page intact. No panic, no error surfaced.
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(http.ResponseWriter, *http.Request) {}))
|
||||
url := srv.URL
|
||||
srv.Close() // close BEFORE the fetch → dial error
|
||||
|
||||
body, ok := fetchInlineBanner(url, "x", false, false)
|
||||
if ok {
|
||||
t.Fatal("dead portal must fail open (ok=false)")
|
||||
}
|
||||
if body != "" {
|
||||
t.Fatalf("fail-open body must be empty, got %q", body)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFetchInlineBannerNon2xxFailOpen(t *testing.T) {
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusInternalServerError)
|
||||
_, _ = w.Write([]byte("boom"))
|
||||
}))
|
||||
defer srv.Close()
|
||||
body, ok := fetchInlineBanner(srv.URL, "x", false, false)
|
||||
if ok || body != "" {
|
||||
t.Fatalf("non-2xx must fail open: ok=%v body=%q", ok, body)
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -90,7 +90,7 @@ func TestInjectIntoBodyBrotli(t *testing.T) {
|
|||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
out, ok := injectIntoBody(enc, "br", "abc123", true, false)
|
||||
out, ok := injectIntoBody(enc, "br", inlineTestScript, true)
|
||||
if !ok {
|
||||
t.Fatal("br inject must report ok=true")
|
||||
}
|
||||
|
|
@ -113,7 +113,7 @@ func TestInjectIntoBodyZstd(t *testing.T) {
|
|||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
out, ok := injectIntoBody(enc, "zstd", "abc123", true, false)
|
||||
out, ok := injectIntoBody(enc, "zstd", inlineTestScript, true)
|
||||
if !ok {
|
||||
t.Fatal("zstd inject must report ok=true")
|
||||
}
|
||||
|
|
@ -132,7 +132,7 @@ func TestInjectIntoBodyZstd(t *testing.T) {
|
|||
|
||||
func TestInjectIntoBodyBrotliCaseInsensitive(t *testing.T) {
|
||||
enc, _ := brotliBytes([]byte(`<head></head>`))
|
||||
out, ok := injectIntoBody(enc, "BR", "z", false, false)
|
||||
out, ok := injectIntoBody(enc, "BR", inlineTestScript, false)
|
||||
if !ok {
|
||||
t.Fatal("Content-Encoding BR (upper) must be recognised → ok=true")
|
||||
}
|
||||
|
|
@ -147,7 +147,7 @@ func TestInjectIntoBodyBrotliCaseInsensitive(t *testing.T) {
|
|||
|
||||
func TestInjectIntoBodyBrotliFailOpen(t *testing.T) {
|
||||
bad := []byte("not brotli at all <head></head>")
|
||||
out, ok := injectIntoBody(bad, "br", "x", false, false)
|
||||
out, ok := injectIntoBody(bad, "br", inlineTestScript, false)
|
||||
if ok {
|
||||
t.Fatal("corrupt br body must fail open (ok=false)")
|
||||
}
|
||||
|
|
@ -158,7 +158,7 @@ func TestInjectIntoBodyBrotliFailOpen(t *testing.T) {
|
|||
|
||||
func TestInjectIntoBodyZstdFailOpen(t *testing.T) {
|
||||
bad := []byte("not zstd at all <head></head>")
|
||||
out, ok := injectIntoBody(bad, "zstd", "x", false, false)
|
||||
out, ok := injectIntoBody(bad, "zstd", inlineTestScript, false)
|
||||
if ok {
|
||||
t.Fatal("corrupt zstd body must fail open (ok=false)")
|
||||
}
|
||||
|
|
@ -177,7 +177,7 @@ func TestBrotliZstdBombGuard(t *testing.T) {
|
|||
t.Fatal("unbrotliBytes must reject output exceeding gunzipCap")
|
||||
}
|
||||
// fail-open through the inject path.
|
||||
if out, ok := injectIntoBody(brBomb, "br", "x", false, false); ok || !bytes.Equal(out, brBomb) {
|
||||
if out, ok := injectIntoBody(brBomb, "br", inlineTestScript, false); ok || !bytes.Equal(out, brBomb) {
|
||||
t.Fatal("over-cap br body must fail open with original bytes")
|
||||
}
|
||||
|
||||
|
|
@ -188,7 +188,7 @@ func TestBrotliZstdBombGuard(t *testing.T) {
|
|||
if _, err := unzstdBytes(zsBomb); err == nil {
|
||||
t.Fatal("unzstdBytes must reject output exceeding gunzipCap")
|
||||
}
|
||||
if out, ok := injectIntoBody(zsBomb, "zstd", "x", false, false); ok || !bytes.Equal(out, zsBomb) {
|
||||
if out, ok := injectIntoBody(zsBomb, "zstd", inlineTestScript, false); ok || !bytes.Equal(out, zsBomb) {
|
||||
t.Fatal("over-cap zstd body must fail open with original bytes")
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -132,27 +132,32 @@ func TestInjectCosmeticCaseInsensitive(t *testing.T) {
|
|||
}
|
||||
}
|
||||
|
||||
func TestInjectLoaderAndCosmeticCompose(t *testing.T) {
|
||||
func TestInjectInlineBannerAndCosmeticCompose(t *testing.T) {
|
||||
// Both markers must be present after composing the two injects (wg client).
|
||||
// #662 — the banner is now the INLINE script (not a <script src> tag).
|
||||
body := []byte(`<html><head></head><body>hi</body></html>`)
|
||||
out := string(injectHTML(body, "deadbeef", true, false))
|
||||
out := string(injectHTML(body, inlineTestScript, true))
|
||||
if !strings.Contains(out, bannerGuard) {
|
||||
t.Fatalf("loader marker missing after compose: %s", out)
|
||||
t.Fatalf("banner marker missing after compose: %s", out)
|
||||
}
|
||||
if !strings.Contains(out, cosmeticGuard) {
|
||||
t.Fatalf("cosmetic marker missing after compose: %s", out)
|
||||
}
|
||||
if !strings.Contains(out, `data-mh="deadbeef"`) {
|
||||
t.Fatalf("loader data-mh missing after compose: %s", out)
|
||||
// The inline banner is an inline <script> carrying the baked body, NOT a src.
|
||||
if !strings.Contains(out, "<script>"+inlineTestScript+"</script>") {
|
||||
t.Fatalf("inline banner body missing after compose: %s", out)
|
||||
}
|
||||
if strings.Contains(out, "<script src=") {
|
||||
t.Fatalf("inline path must NOT emit a <script src> tag: %s", out)
|
||||
}
|
||||
}
|
||||
|
||||
func TestInjectHTMLNonWGSkipsCosmetic(t *testing.T) {
|
||||
// Non-WG (non-R3) clients get the loader but NOT the cosmetic style.
|
||||
// Non-WG (non-R3) clients get the banner but NOT the cosmetic style.
|
||||
body := []byte(`<html><head></head><body>hi</body></html>`)
|
||||
out := string(injectHTML(body, "x", false, false))
|
||||
out := string(injectHTML(body, inlineTestScript, false))
|
||||
if !strings.Contains(out, bannerGuard) {
|
||||
t.Fatalf("loader marker missing for non-wg: %s", out)
|
||||
t.Fatalf("banner marker missing for non-wg: %s", out)
|
||||
}
|
||||
if strings.Contains(out, cosmeticGuard) {
|
||||
t.Fatalf("cosmetic style must NOT be injected for non-wg client: %s", out)
|
||||
|
|
@ -163,7 +168,7 @@ func TestInjectIntoBodyGzipCarriesCosmetic(t *testing.T) {
|
|||
// The gzip decompress→inject→recompress path must carry BOTH injects for wg.
|
||||
body := []byte(`<html><head></head><body>hi</body></html>`)
|
||||
gz := gzipBytes(body)
|
||||
out, ok := injectIntoBody(gz, "gzip", "mh1", true, false)
|
||||
out, ok := injectIntoBody(gz, "gzip", inlineTestScript, true)
|
||||
if !ok {
|
||||
t.Fatalf("injectIntoBody(gzip) returned ok=false")
|
||||
}
|
||||
|
|
@ -174,4 +179,8 @@ func TestInjectIntoBodyGzipCarriesCosmetic(t *testing.T) {
|
|||
if !strings.Contains(string(plain), bannerGuard) || !strings.Contains(string(plain), cosmeticGuard) {
|
||||
t.Fatalf("gzip path lost a marker: %s", plain)
|
||||
}
|
||||
// The inline banner script body survives the gzip round-trip.
|
||||
if !strings.Contains(string(plain), "<script>"+inlineTestScript+"</script>") {
|
||||
t.Fatalf("inline banner body lost on gzip path: %s", plain)
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -146,31 +146,39 @@ func zstdBytes(in []byte) ([]byte, error) {
|
|||
}
|
||||
|
||||
// injectHTML applies BOTH HTML transforms in one pass over the DECOMPRESSED
|
||||
// body: the transparency-banner loader (always) AND, for R3 (wg) clients, the
|
||||
// ad/popup-hiding cosmetic <style> (#662 — the cutover left this unported). Both
|
||||
// are idempotent (own guard markers) and order-independent; running them in the
|
||||
// same decompressed step means the cosmetic style benefits from the gzip
|
||||
// handling exactly like the loader. The cosmetic style is gated to wg because it
|
||||
// is an R3-tunnel opt-in behaviour (mirrors the Python addon's _is_r3plus gate).
|
||||
func injectHTML(plain []byte, clientHash string, wg, cspBypassed bool) []byte {
|
||||
out := injectLoader(plain, clientHash, wg, cspBypassed)
|
||||
// body: the transparency-banner (always, via the INLINE script) AND, for R3 (wg)
|
||||
// clients, the ad/popup-hiding cosmetic <style> (#662 — the cutover left this
|
||||
// unported). Both are idempotent (own guard markers) and order-independent;
|
||||
// running them in the same decompressed step means the cosmetic style benefits
|
||||
// from the gzip handling exactly like the banner. The cosmetic style is gated to
|
||||
// wg because it is an R3-tunnel opt-in behaviour (mirrors the Python addon's
|
||||
// _is_r3plus gate).
|
||||
//
|
||||
// #662 — scriptBody is the COMPLETE inline banner IIFE pre-fetched server-side
|
||||
// from the portal (fetchInlineBanner). We INLINE it (injectInlineBanner) instead
|
||||
// of a <script src="/__toolbox/loader.js"> tag so a site's SERVICE WORKER has no
|
||||
// same-origin request to hijack. An empty scriptBody (fetch failed/skipped) makes
|
||||
// the banner inject a no-op — fail-open, page intact. The cosmetic <style> is
|
||||
// already inline and SW-immune, so it is UNCHANGED.
|
||||
func injectHTML(plain []byte, scriptBody string, wg bool) []byte {
|
||||
out := injectInlineBanner(plain, scriptBody)
|
||||
if wg {
|
||||
out = injectCosmetic(out)
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// injectIntoBody runs the HTML injection (loader + R3 cosmetic style) over a
|
||||
// (possibly gzip-compressed) HTML body, returning the new body bytes to serve
|
||||
// and whether the body was rewritten. cspBypassed (#662) is threaded into the
|
||||
// loader tag as data-csp="1" when a real CSP was relaxed on this page.
|
||||
// injectIntoBody runs the HTML injection (inline banner + R3 cosmetic style) over
|
||||
// a (possibly compressed) HTML body, returning the new body bytes to serve and
|
||||
// whether the body was rewritten. scriptBody (#662) is the COMPLETE inline banner
|
||||
// IIFE pre-fetched from the portal; "" → the banner inject is skipped (fail-open).
|
||||
//
|
||||
// - encoding == "" (identity): injectHTML runs directly on body; the result
|
||||
// is returned (ok=true). The caller MUST update Content-Length to len(out).
|
||||
// - encoding ∈ {gzip, br, zstd} (case-insensitive): the body is decoded,
|
||||
// injected, then RE-ENCODED in the SAME codec so the client transfer stays
|
||||
// compressed (the tunnel is perf-sensitive) and Content-Encoding is
|
||||
// UNCHANGED. The caller sets Content-Length to len(out). BOTH the loader and
|
||||
// UNCHANGED. The caller sets Content-Length to len(out). BOTH the banner and
|
||||
// the cosmetic style are injected on the decompressed body, so the cosmetic
|
||||
// CSS lands on compressed pages too (the common case).
|
||||
// - any other encoding (deflate, multi-value, …): pass through untouched,
|
||||
|
|
@ -181,23 +189,23 @@ func injectHTML(plain []byte, clientHash string, wg, cspBypassed bool) []byte {
|
|||
// never broken or corrupted.
|
||||
//
|
||||
// The 32MiB decompression-bomb cap (gunzipCap) is enforced uniformly across
|
||||
// gzip/br/zstd. idempotency / placement live inside injectLoader/injectCosmetic.
|
||||
func injectIntoBody(body []byte, encoding, clientHash string, wg, cspBypassed bool) (out []byte, ok bool) {
|
||||
// gzip/br/zstd. idempotency / placement live inside injectInlineBanner/injectCosmetic.
|
||||
func injectIntoBody(body []byte, encoding, scriptBody string, wg bool) (out []byte, ok bool) {
|
||||
switch strings.ToLower(strings.TrimSpace(encoding)) {
|
||||
case "":
|
||||
return injectHTML(body, clientHash, wg, cspBypassed), true
|
||||
return injectHTML(body, scriptBody, wg), true
|
||||
case "gzip":
|
||||
plain, err := gunzipBytes(body)
|
||||
if err != nil {
|
||||
return body, false // fail open: serve the original compressed bytes
|
||||
}
|
||||
return gzipBytes(injectHTML(plain, clientHash, wg, cspBypassed)), true
|
||||
return gzipBytes(injectHTML(plain, scriptBody, wg)), true
|
||||
case "br":
|
||||
plain, err := unbrotliBytes(body)
|
||||
if err != nil {
|
||||
return body, false // fail open
|
||||
}
|
||||
reenc, err := brotliBytes(injectHTML(plain, clientHash, wg, cspBypassed))
|
||||
reenc, err := brotliBytes(injectHTML(plain, scriptBody, wg))
|
||||
if err != nil {
|
||||
return body, false // fail open: never serve a truncated br frame
|
||||
}
|
||||
|
|
@ -207,7 +215,7 @@ func injectIntoBody(body []byte, encoding, clientHash string, wg, cspBypassed bo
|
|||
if err != nil {
|
||||
return body, false // fail open
|
||||
}
|
||||
reenc, err := zstdBytes(injectHTML(plain, clientHash, wg, cspBypassed))
|
||||
reenc, err := zstdBytes(injectHTML(plain, scriptBody, wg))
|
||||
if err != nil {
|
||||
return body, false // fail open: never serve a truncated zstd frame
|
||||
}
|
||||
|
|
|
|||
|
|
@ -44,7 +44,7 @@ func TestInjectIntoBodyGzip(t *testing.T) {
|
|||
// End-to-end-ish: HTML with <head>, gzipped, run through the exact transform
|
||||
// the inject path uses. Result must gunzip back to an injected, intact doc.
|
||||
html := `<html><head><title>page</title></head><body>content</body></html>`
|
||||
out, ok := injectIntoBody(gzipBytes([]byte(html)), "gzip", "abc123", true, false)
|
||||
out, ok := injectIntoBody(gzipBytes([]byte(html)), "gzip", inlineTestScript, true)
|
||||
if !ok {
|
||||
t.Fatal("gzip inject must report ok=true")
|
||||
}
|
||||
|
|
@ -68,7 +68,7 @@ func TestInjectIntoBodyGzip(t *testing.T) {
|
|||
|
||||
func TestInjectIntoBodyGzipCaseInsensitiveEncoding(t *testing.T) {
|
||||
html := `<head></head>`
|
||||
out, ok := injectIntoBody(gzipBytes([]byte(html)), "GZIP", "z", false, false)
|
||||
out, ok := injectIntoBody(gzipBytes([]byte(html)), "GZIP", inlineTestScript, false)
|
||||
if !ok {
|
||||
t.Fatal("Content-Encoding GZIP (upper) must be recognised → ok=true")
|
||||
}
|
||||
|
|
@ -85,7 +85,7 @@ func TestInjectIntoBodyGzipFailOpen(t *testing.T) {
|
|||
// Bytes labelled gzip but NOT gzip → fail open: original bytes, ok=false,
|
||||
// no panic.
|
||||
bad := []byte("not gzip at all <head></head>")
|
||||
out, ok := injectIntoBody(bad, "gzip", "x", false, false)
|
||||
out, ok := injectIntoBody(bad, "gzip", inlineTestScript, false)
|
||||
if ok {
|
||||
t.Fatal("corrupt gzip body must fail open (ok=false)")
|
||||
}
|
||||
|
|
@ -97,7 +97,7 @@ func TestInjectIntoBodyGzipFailOpen(t *testing.T) {
|
|||
func TestInjectIntoBodyIdentity(t *testing.T) {
|
||||
// Identity (empty Content-Encoding): inject directly, grown body returned.
|
||||
html := []byte(`<html><head></head><body>hi</body></html>`)
|
||||
out, ok := injectIntoBody(html, "", "deadbeef", false, false)
|
||||
out, ok := injectIntoBody(html, "", inlineTestScript, false)
|
||||
if !ok {
|
||||
t.Fatal("identity inject must report ok=true")
|
||||
}
|
||||
|
|
@ -113,7 +113,7 @@ func TestInjectIntoBodyUnknownEncodingPassthrough(t *testing.T) {
|
|||
// #662 — gzip/br/zstd are now ALL decoded+re-encoded; deflate (and any other
|
||||
// codec / multi-value AE) remains an unknown encoding we pass through.
|
||||
body := []byte("\x78\x9c some deflate-ish bytes")
|
||||
out, ok := injectIntoBody(body, "deflate", "x", false, false)
|
||||
out, ok := injectIntoBody(body, "deflate", inlineTestScript, false)
|
||||
if ok {
|
||||
t.Fatal("unknown encoding must pass through (ok=false)")
|
||||
}
|
||||
|
|
@ -131,7 +131,7 @@ func TestGunzipBombGuard(t *testing.T) {
|
|||
t.Fatal("gunzipBytes must reject output exceeding gunzipCap")
|
||||
}
|
||||
// And via the inject path: fail open, original bytes preserved.
|
||||
out, ok := injectIntoBody(big, "gzip", "x", false, false)
|
||||
out, ok := injectIntoBody(big, "gzip", inlineTestScript, false)
|
||||
if ok {
|
||||
t.Fatal("over-cap gzip body must fail open through injectIntoBody")
|
||||
}
|
||||
|
|
|
|||
|
|
@ -199,12 +199,26 @@ func ja4ish(h *tls.ClientHelloInfo) string {
|
|||
type Proxy struct {
|
||||
ca *CA
|
||||
pol *Policy
|
||||
jaSink func(string) // JA4 observations (logged; a sidecar in prod)
|
||||
jarKey []byte // anti-track HMAC fake-identity seed (nil → poison off)
|
||||
poison bool // master gate: poison tracker Set-Cookies (default on when jarKey present)
|
||||
portal string // portal base URL for /__toolbox/* reverse-proxy (banner assets)
|
||||
ads *adStats // #662 — ad-block metrics aggregator (flushed to the portal)
|
||||
cspDemo bool // #662 CONSENTED-DEMONSTRATION: relax a page's CSP so the injected loader runs, and flag the bypass (data-csp=1 → 🔓). Default on.
|
||||
jaSink func(string) // JA4 observations (logged; a sidecar in prod)
|
||||
jarKey []byte // anti-track HMAC fake-identity seed (nil → poison off)
|
||||
poison bool // master gate: poison tracker Set-Cookies (default on when jarKey present)
|
||||
portal string // portal base URL for /__toolbox/* reverse-proxy (banner assets)
|
||||
ads *adStats // #662 — ad-block metrics aggregator (flushed to the portal)
|
||||
cand *adCandidates // #662 — ad-candidate learning feed (flushed with ads to the portal)
|
||||
cspDemo bool // #662 CONSENTED-DEMONSTRATION: relax a page's CSP so the injected loader runs, and flag the bypass (data-csp=1 → 🔓). Default on.
|
||||
|
||||
// analysisRelay gates the per-flow telemetry relay to the dpi/cookies/ja4
|
||||
// analysis sidecar sockets (#662 — restoring the "Qui te piste?" events the
|
||||
// decommissioned Python addons fed). Default on; relay.go is the transport.
|
||||
analysisRelay bool
|
||||
|
||||
// socialRelay gates the cross-site cookie-tracker correlation (#662 — restoring
|
||||
// the kbin /social graph the decommissioned Python social_graph addon fed).
|
||||
// Default on. social.go is the engine; edges are batched + POSTed to the
|
||||
// portal's /__toolbox/social-event ingest. nil → off (CONNECT PoC / tests).
|
||||
socialRelayOn bool
|
||||
social *socialRelay
|
||||
consent *consentLog
|
||||
}
|
||||
|
||||
// recordAdBlock forwards a 204'd ad/tracker block to the engine's metrics
|
||||
|
|
@ -216,12 +230,52 @@ func (px *Proxy) recordAdBlock(adHost, site, macHash string) {
|
|||
}
|
||||
}
|
||||
|
||||
// maybeRecordAdCandidate feeds the auto-learn loop (#662): on the allow/mitm
|
||||
// path (NOT block — already caught; NOT allowlisted/own-infra), it records an
|
||||
// ad-candidate (host, site) when the request is 3rd-party
|
||||
// (registrable(host) != registrable(site)) AND the path smells like an ad/track
|
||||
// endpoint (adPathRE). It is the engine port of ad_ghost's candidate capture —
|
||||
// the feed secubox-toolbox-autolearn promotes into learned-trackers.txt at
|
||||
// AD_MIN_SITES distinct sites. Gated behind the analysis/ad relay flag, O(1) hot
|
||||
// path, fire-and-forget, nil-safe (CONNECT PoC / tests with no feed).
|
||||
func (px *Proxy) maybeRecordAdCandidate(host, site, path string) {
|
||||
if px == nil || px.cand == nil || !px.relayEnabled() || px.pol == nil {
|
||||
return
|
||||
}
|
||||
if site == "" || host == "" {
|
||||
return // no 1st-party context (no Referer) → nothing to attribute.
|
||||
}
|
||||
if px.pol.allowedSafe(host) {
|
||||
return // own-infra / allowlist: never learn our own / trusted hosts.
|
||||
}
|
||||
if registrable(host) == registrable(site) {
|
||||
return // 1st-party request: not a cross-site ad/track signal.
|
||||
}
|
||||
if !adPathRE.MatchString(path) {
|
||||
return // path doesn't look like an ad/track endpoint.
|
||||
}
|
||||
px.cand.record(host, site)
|
||||
}
|
||||
|
||||
func (px *Proxy) serverTLSConfig() *tls.Config {
|
||||
return px.serverTLSConfigCapture(nil)
|
||||
}
|
||||
|
||||
// serverTLSConfigCapture is serverTLSConfig with an extra per-handshake hook:
|
||||
// capture, if non-nil, is invoked inside GetCertificate with the live
|
||||
// *tls.ClientHelloInfo (SNI, SupportedProtos, CipherSuites). The accept-path
|
||||
// handlers use it to relay the ja4 ClientHello payload (relay.go) WITH the
|
||||
// client conn's peer IP — which is known at the handler, not inside the TLS
|
||||
// config. Passing nil yields the plain forging config (CONNECT PoC, tests).
|
||||
func (px *Proxy) serverTLSConfigCapture(capture func(*tls.ClientHelloInfo)) *tls.Config {
|
||||
return &tls.Config{
|
||||
GetCertificate: func(h *tls.ClientHelloInfo) (*tls.Certificate, error) {
|
||||
if px.jaSink != nil {
|
||||
px.jaSink(ja4ish(h)) // capture handshake fingerprint
|
||||
}
|
||||
if capture != nil {
|
||||
capture(h) // ja4 relay material (peer IP threaded in by the handler)
|
||||
}
|
||||
name := h.ServerName
|
||||
if name == "" {
|
||||
name = "unknown.local"
|
||||
|
|
@ -231,6 +285,38 @@ func (px *Proxy) serverTLSConfig() *tls.Config {
|
|||
}
|
||||
}
|
||||
|
||||
// peerIP returns the remote IP (no port) of a client conn, the same basis as
|
||||
// clientHashFromConn. Used as the client_ip field of every relay payload.
|
||||
func peerIP(conn net.Conn) string {
|
||||
if conn == nil {
|
||||
return ""
|
||||
}
|
||||
host, _, err := net.SplitHostPort(conn.RemoteAddr().String())
|
||||
if err != nil {
|
||||
return conn.RemoteAddr().String()
|
||||
}
|
||||
return host
|
||||
}
|
||||
|
||||
// captureAndEmitJA4 returns a GetCertificate capture hook that relays the ja4
|
||||
// ClientHello payload for THIS handshake (once), tagged with the given client
|
||||
// conn's peer IP + mac-hash-aware clientHash. Gated by analysisRelay (emitJA4
|
||||
// checks). The hook copies the ClientHelloInfo fields it needs immediately
|
||||
// (the struct is only valid during the callback). Returns nil when the relay is
|
||||
// off so the plain config is used (no per-handshake allocation).
|
||||
func (px *Proxy) captureAndEmitJA4(rawClient net.Conn) func(*tls.ClientHelloInfo) {
|
||||
if !px.relayEnabled() {
|
||||
return nil
|
||||
}
|
||||
ip := peerIP(rawClient)
|
||||
hash := clientHashFromConn(rawClient)
|
||||
return func(h *tls.ClientHelloInfo) {
|
||||
alpn := append([]string(nil), h.SupportedProtos...)
|
||||
ciphers := append([]uint16(nil), h.CipherSuites...)
|
||||
px.emitJA4(ip, hash, h.ServerName, alpn, ciphers)
|
||||
}
|
||||
}
|
||||
|
||||
func (px *Proxy) handleConnect(w http.ResponseWriter, r *http.Request) {
|
||||
host := r.URL.Hostname()
|
||||
hj, ok := w.(http.Hijacker)
|
||||
|
|
@ -262,7 +348,9 @@ func (px *Proxy) handleConnect(w http.ResponseWriter, r *http.Request) {
|
|||
}
|
||||
|
||||
// MITM: TLS-terminate the client with a forged cert (+ ClientHello capture).
|
||||
tconn := tls.Server(client, px.serverTLSConfig())
|
||||
// The capture hook relays the ja4 ClientHello payload for this handshake,
|
||||
// tagged with the client's peer IP (#662). nil when the relay gate is off.
|
||||
tconn := tls.Server(client, px.serverTLSConfigCapture(px.captureAndEmitJA4(client)))
|
||||
if err := tconn.Handshake(); err != nil {
|
||||
return
|
||||
}
|
||||
|
|
@ -326,6 +414,12 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
|
|||
// per-client breakdown keys on the WG persona hash. recordAdBlock is
|
||||
// O(1) and never blocks the block path.
|
||||
px.recordAdBlock(host, refererSite(req.Header.Get("Referer")), clientHashFromConn(rawClient))
|
||||
// #662 — the cross-site tracking evidence lives PRECISELY on the blocked
|
||||
// trackers: the browser still SENT its 3rd-party Cookie to doubleclick/
|
||||
// adnxs/… before we 204 it. Correlate that request-Cookie here (resp=nil,
|
||||
// request-only) or the /social graph misses the very trackers it exists to
|
||||
// expose. Hash-only, WG-peer only, fire-and-forget — same as the allow path.
|
||||
px.emitSocial(peerIP(rawClient), host, req, nil)
|
||||
writeRaw(tconn, 204, "No Content", map[string]string{"X-SecuBox-Ng": "blocked"}, nil)
|
||||
return
|
||||
}
|
||||
|
|
@ -339,6 +433,24 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
|
|||
// allow — stripping operator headers + asserting opt-out is universally
|
||||
// safe and never touches own-infra correctness).
|
||||
clientHash := clientHashFromConn(rawClient) // mac_hash-aware (WG persona)
|
||||
|
||||
// #662 — relay the DPI classification hint for this MITM'd request (allow|mitm
|
||||
// only; never the block 204 / splice paths). Fire-and-forget BEFORE anonymize
|
||||
// mutates headers, so we relay the client's original User-Agent (the Python
|
||||
// DPIRelay ran on the unmodified request). Gated by --analysis-relay; a
|
||||
// dead/slow dpi.sock can never block or delay the proxy flow.
|
||||
relayIP := peerIP(rawClient)
|
||||
px.emitDPI(relayIP, clientHash, host, req)
|
||||
|
||||
// #662 — feed the auto-learn loop: on this allow/mitm flow, record an
|
||||
// ad-candidate when the request is 3rd-party AND its path smells like an
|
||||
// ad/track endpoint (ad_ghost's _AD_PATH heuristic). site = registrable of
|
||||
// the Referer (the ad_ghost _site_of flavour). Done BEFORE anonymize mutates
|
||||
// headers (so the Referer is the client's original). O(1), gated,
|
||||
// fire-and-forget — a new adware host gets observed here, promoted by
|
||||
// autolearn, then blocked+smogged after the policy live-reloads it.
|
||||
px.maybeRecordAdCandidate(host, refererSite(req.Header.Get("Referer")), req.URL.Path)
|
||||
|
||||
anonymizeRequest(req.Header)
|
||||
|
||||
// #662 — do NOT touch Accept-Encoding. We FORWARD the client's original
|
||||
|
|
@ -379,6 +491,24 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
|
|||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
// #662 — relay the cookie metadata for this MITM'd response (allow|mitm only).
|
||||
// NAMES ONLY (never values — privacy/CSPN); no-op unless ≥1 Set-Cookie OR ≥1
|
||||
// request Cookie is present. Emitted before poison rewrites Set-Cookie VALUES,
|
||||
// which is irrelevant here (names are unchanged by poison) but keeps the
|
||||
// relayed names byte-for-byte the origin's. Fire-and-forget, gated.
|
||||
px.emitCookies(relayIP, clientHash, req, resp)
|
||||
|
||||
// #662 — cross-site cookie-tracker correlation (restores the kbin /social
|
||||
// graph). FAITHFUL to the decommissioned Python social_graph addon: extract
|
||||
// 3rd-party cookie edges (Set-Cookie + request Cookie), hash the identifier
|
||||
// (cookieIDHash — NEVER the raw value), classify consent_state, and buffer
|
||||
// them for the batched POST to the portal /__toolbox/social-event ingest.
|
||||
// Like the addon, this ONLY fires for known R3 WG peers (macHashOf, not the
|
||||
// raw-IP fallback): non-WG flows yield no edges. allow|mitm only (the block
|
||||
// 204 / splice paths return before here). Gated by --social-relay; pure +
|
||||
// non-blocking (the flush is a background goroutine).
|
||||
px.emitSocial(relayIP, host, req, resp)
|
||||
|
||||
// Poison: only on MITM'd tracker flows (never on allow/own-infra), and only
|
||||
// when the jar key is loaded. Replaces tracking-id Set-Cookie values with a
|
||||
// stable fabricated persona; benign cookies pass through untouched.
|
||||
|
|
@ -409,16 +539,26 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
|
|||
strings.Contains(resp.Header.Get("Content-Type"), "text/html") {
|
||||
// #662 CONSENTED-DEMONSTRATION — ONLY here, on the responses we actually
|
||||
// inject into (2xx text/html, R3/wg gate), and ONLY when the operator
|
||||
// left the demo on, do we relax the page's CSP so the same-origin
|
||||
// /__toolbox/loader.js can execute even on strict-CSP sites. cspBypassed
|
||||
// is true iff there was a real CSP to bypass — it becomes data-csp="1" on
|
||||
// the loader tag and the portal banner renders a 🔓 as the visible proof.
|
||||
// We never strip CSP on non-injected responses.
|
||||
// left the demo on, do we relax the page's CSP so the inline banner can
|
||||
// run even on strict-CSP sites. cspBypassed is true iff there was a real
|
||||
// CSP to bypass — it becomes csp=1 on the inline script and the banner
|
||||
// renders a 🔓 as the visible proof. We never strip CSP on non-injected
|
||||
// responses.
|
||||
cspBypassed := false
|
||||
if px.cspDemo {
|
||||
cspBypassed = relaxCSPForLoader(resp.Header)
|
||||
}
|
||||
if out, ok := injectIntoBody(body, resp.Header.Get("Content-Encoding"), clientHash, wg, cspBypassed); ok {
|
||||
// #662 — INLINE the banner (supersedes the <script src="/__toolbox/
|
||||
// loader.js"> tag): sites with a SERVICE WORKER (leparisien, cnn…) hijack
|
||||
// the same-origin src + its fetch("/__toolbox/bundle") before they reach
|
||||
// this engine, so the banner never appeared. We fetch the COMPLETE script
|
||||
// body from the portal server-side (mh/wg/csp + bundle baked as JS
|
||||
// literals — no same-origin request for the SW to touch) and bake it into
|
||||
// a self-contained <script>…</script>. Fail-open: a dead/slow portal →
|
||||
// scriptBody=="" → the banner inject is skipped and the page is served
|
||||
// intact (the cosmetic <style>, already inline, is unaffected).
|
||||
scriptBody, _ := fetchInlineBanner(px.portal, clientHash, wg, cspBypassed)
|
||||
if out, ok := injectIntoBody(body, resp.Header.Get("Content-Encoding"), scriptBody, wg); ok {
|
||||
body = out
|
||||
// Keep the response framing consistent with the served bytes. The
|
||||
// encoding is unchanged (gzip stays gzip, identity stays identity);
|
||||
|
|
@ -428,6 +568,11 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
|
|||
resp.ContentLength = int64(len(body))
|
||||
}
|
||||
}
|
||||
// #662 — strip Alt-Svc so the browser is never told this origin offers HTTP/3
|
||||
// (h3). With h3 unadvertised it keeps using HTTP/2 over TCP, which we MITM;
|
||||
// otherwise it caches "h3 available" and keeps trying QUIC (UDP 443) — which
|
||||
// bypasses this TCP proxy and is only best-effort blocked by the nft reject.
|
||||
resp.Header.Del("Alt-Svc")
|
||||
writeResponse(tconn, resp, body)
|
||||
}
|
||||
|
||||
|
|
@ -445,6 +590,10 @@ func main() {
|
|||
"portal base URL; /__toolbox/loader.js + /__toolbox/bundle are reverse-proxied here (banner assets, served for any MITM'd origin)")
|
||||
cspDemo := flag.Bool("csp-bypass-demo", true,
|
||||
"CONSENTED DEMONSTRATION: relax a page's CSP so the injected transparency-banner loader runs even on strict-CSP sites, and flag the bypass (banner shows 🔓). Only on injected 2xx text/html R3 responses; never on non-injected responses. Set false to never touch CSP.")
|
||||
analysisRelay := flag.Bool("analysis-relay", true,
|
||||
"relay per-flow telemetry (dpi/cookies/ja4) to the analysis sidecar sockets so the kbin \"Qui te piste?\" events refill (#662; replaces the decommissioned Python relay addons). Fire-and-forget; a dead/slow sidecar never affects the proxy. Set false to emit nothing.")
|
||||
socialRelay := flag.Bool("social-relay", true,
|
||||
"compute cross-site cookie-tracker edges and POST them to the portal /__toolbox/social-event ingest so the kbin /social graph refills (#662; replaces the decommissioned Python social_graph addon). Hash-only (never raw cookie values); WG-peer flows only; batched + fire-and-forget — a dead/slow portal never affects the proxy. Set false to emit nothing.")
|
||||
flag.Parse()
|
||||
ca, err := loadCA(*caCert, *caKey)
|
||||
if err != nil {
|
||||
|
|
@ -472,12 +621,26 @@ func main() {
|
|||
poison: *poison,
|
||||
portal: *portal,
|
||||
ads: newAdStats(),
|
||||
cand: newAdCandidates(),
|
||||
cspDemo: *cspDemo,
|
||||
|
||||
analysisRelay: *analysisRelay,
|
||||
|
||||
socialRelayOn: *socialRelay,
|
||||
social: newSocialRelay(),
|
||||
consent: newConsentLog(),
|
||||
}
|
||||
// #662 — start the social-edge flusher: the MITM path buffers cross-site
|
||||
// tracker edges into px.social, drained every 10s to the portal's
|
||||
// /__toolbox/social-event (best-effort, fire-and-forget) so the kbin /social
|
||||
// graph (frozen since the cutover) refills.
|
||||
go px.social.runFlusher(*portal)
|
||||
// #662 — start the ad-block metrics flusher: the block path tallies every
|
||||
// 204 into px.ads, drained every 10s to the portal's /__toolbox/ad-event
|
||||
// (best-effort, fire-and-forget) so the #ads dashboard sees blocks again.
|
||||
go px.ads.runAdStatsFlusher(*portal)
|
||||
// #662 — the candidate feed (px.cand) is drained in the SAME flush so the
|
||||
// learning candidates ride the existing ad-event channel (one POST / 10s).
|
||||
go px.ads.runAdStatsFlusher(*portal, px.cand)
|
||||
if *transparent {
|
||||
// Transparent R3 mode: raw accept loop, each conn carries its pre-DNAT
|
||||
// destination via SO_ORIGINAL_DST (recovered in handleTransparent). The
|
||||
|
|
|
|||
|
|
@ -17,6 +17,8 @@ import (
|
|||
"os"
|
||||
"regexp"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
// ── ad_ghost: static ad/tracker host pattern (port of _AD_HOST) ──────────────
|
||||
|
|
@ -95,19 +97,55 @@ func envOr(key, def string) string {
|
|||
// Policy carries the loaded sets/regex and decides per-host actions. It also
|
||||
// keeps the legacy PoC fields (Inject) so the existing wiring/tests still work.
|
||||
type Policy struct {
|
||||
adHost *regexp.Regexp
|
||||
learned map[string]bool // learned-trackers (host or registrable, lowercased)
|
||||
allow map[string]bool // ad-allowlist (host or registrable, lowercased)
|
||||
spliceSeed map[string]bool // splice seed patterns
|
||||
spliceLearn map[string]bool // splice learned patterns
|
||||
never map[string]bool // pure-trackers ∪ fortknox (splice never-set)
|
||||
selfRegs map[string]bool // own-infra registrable domains
|
||||
selfDomains []string // own-infra (for the host==d || host endswith .d guard)
|
||||
// mu guards the live-reloadable map fields below. Decide/allowed/blockedByAd/
|
||||
// shouldSplice take RLock; maybeReload takes Lock only when a backing file
|
||||
// actually changed (the throttle + stat happen under a separate lighter lock).
|
||||
mu sync.RWMutex
|
||||
|
||||
adHost *regexp.Regexp
|
||||
learned map[string]bool // learned-trackers (host or registrable, lowercased)
|
||||
allow map[string]bool // ad-allowlist (host or registrable, lowercased)
|
||||
spliceSeed map[string]bool // splice seed patterns
|
||||
spliceLearn map[string]bool // splice learned patterns
|
||||
never map[string]bool // pure-trackers ∪ fortknox (splice never-set)
|
||||
selfRegs map[string]bool // own-infra registrable domains
|
||||
selfDomains []string // own-infra (for the host==d || host endswith .d guard)
|
||||
|
||||
// ── live-reload state (#662 auto-learn loop) ─────────────────────────────
|
||||
//
|
||||
// The lists are loaded once at startup, then re-read on-disk when their
|
||||
// mtime changes so autolearn promotions / manual edits take effect WITHOUT a
|
||||
// worker restart (mirrors ad_ghost._maybe_reload). The hot path (Decide)
|
||||
// calls maybeReload(): a throttle check, then — at most every reloadThrottle —
|
||||
// a cheap stat() of each backing file. Only a changed file is re-read and its
|
||||
// map atomically swapped under mu.
|
||||
reloadFiles []reloadTarget // backing files + their swap target
|
||||
fortknoxSites []string // kept for rebuilding the never-set on pure-trackers reload
|
||||
reloadMu sync.Mutex // guards lastReloadCheck + the per-file mtimes
|
||||
lastReloadID int64 // unix-nano of the last throttle pass (0 = never)
|
||||
reloadThrottle time.Duration // min interval between stat passes (0 in tests = eager)
|
||||
|
||||
// Legacy PoC fields kept so non-policy behaviour is unchanged.
|
||||
Inject []byte // banner / ad-CSS marker injected before </head> or </body>
|
||||
}
|
||||
|
||||
// reloadTarget describes one backing file the engine live-reloads: its path, the
|
||||
// last mtime we read, whether comment-stripping applies (loadLines vs
|
||||
// loadLinesRaw), and an applier that swaps the freshly-read set into the right
|
||||
// Policy field (under p.mu, held by the caller). pure-trackers re-derives the
|
||||
// never-set (∪ fortknox) so it stays consistent.
|
||||
type reloadTarget struct {
|
||||
path string
|
||||
stripComm bool
|
||||
lastMtime int64
|
||||
apply func(p *Policy, set map[string]bool)
|
||||
}
|
||||
|
||||
// defaultReloadThrottle is the production stat cadence: a backing-file change
|
||||
// (autolearn runs hourly; a promotion is rare) is observed within ~15s, and the
|
||||
// hot path stats at most ~4×/minute regardless of request rate.
|
||||
const defaultReloadThrottle = 15 * time.Second
|
||||
|
||||
// loadLines mirrors the comment-stripping Python loaders (splice._load_lines,
|
||||
// ad_ghost._allowed's allowlist read): split on first '#', trim, lowercase,
|
||||
// skip blanks. Missing/unreadable file → empty set (best-effort).
|
||||
|
|
@ -196,16 +234,107 @@ func LoadPolicy(opts PolicyOpts) (*Policy, error) {
|
|||
selfDomains = append(selfDomains, d)
|
||||
}
|
||||
|
||||
return &Policy{
|
||||
adHost: re,
|
||||
learned: loadLinesRaw(opts.LearnedPath), // mirrors _learned_set (no comment-strip)
|
||||
allow: loadLines(opts.AllowPath),
|
||||
spliceSeed: loadLines(opts.SpliceSeedPath),
|
||||
spliceLearn: loadLines(opts.SpliceLearnPath),
|
||||
never: never,
|
||||
selfRegs: selfRegs,
|
||||
selfDomains: selfDomains,
|
||||
}, nil
|
||||
p := &Policy{
|
||||
adHost: re,
|
||||
learned: loadLinesRaw(opts.LearnedPath), // mirrors _learned_set (no comment-strip)
|
||||
allow: loadLines(opts.AllowPath),
|
||||
spliceSeed: loadLines(opts.SpliceSeedPath),
|
||||
spliceLearn: loadLines(opts.SpliceLearnPath),
|
||||
never: never,
|
||||
selfRegs: selfRegs,
|
||||
selfDomains: selfDomains,
|
||||
fortknoxSites: append([]string(nil), opts.FortknoxSites...),
|
||||
reloadThrottle: defaultReloadThrottle,
|
||||
}
|
||||
|
||||
// ── register the live-reloadable backing files (#662 auto-learn loop) ─────
|
||||
//
|
||||
// Each entry re-reads its file when its mtime changes and atomically swaps
|
||||
// the map under p.mu (held by maybeReload). learned-trackers + ad-allowlist
|
||||
// are the load-bearing pair (autolearn promotes into learned; the operator
|
||||
// edits the allowlist); the splice seed/learned + pure-trackers files are
|
||||
// reloaded too for consistency (pure-trackers re-derives the never-set).
|
||||
p.reloadFiles = []reloadTarget{
|
||||
{path: opts.LearnedPath, stripComm: false, lastMtime: statMtime(opts.LearnedPath),
|
||||
apply: func(p *Policy, s map[string]bool) { p.learned = s }},
|
||||
{path: opts.AllowPath, stripComm: true, lastMtime: statMtime(opts.AllowPath),
|
||||
apply: func(p *Policy, s map[string]bool) { p.allow = s }},
|
||||
{path: opts.SpliceSeedPath, stripComm: true, lastMtime: statMtime(opts.SpliceSeedPath),
|
||||
apply: func(p *Policy, s map[string]bool) { p.spliceSeed = s }},
|
||||
{path: opts.SpliceLearnPath, stripComm: true, lastMtime: statMtime(opts.SpliceLearnPath),
|
||||
apply: func(p *Policy, s map[string]bool) { p.spliceLearn = s }},
|
||||
{path: opts.PureTrackersPath, stripComm: true, lastMtime: statMtime(opts.PureTrackersPath),
|
||||
apply: func(p *Policy, s map[string]bool) {
|
||||
// pure-trackers ∪ fortknox → never-set (mirrors LoadPolicy above).
|
||||
for _, fk := range p.fortknoxSites {
|
||||
if fk = strings.Trim(strings.ToLower(strings.TrimSpace(fk)), "."); fk != "" {
|
||||
s[fk] = true
|
||||
}
|
||||
}
|
||||
p.never = s
|
||||
}},
|
||||
}
|
||||
return p, nil
|
||||
}
|
||||
|
||||
// statMtime returns the file's mtime in unix-nano, or 0 when the file is missing
|
||||
// or unreadable (best-effort, like the Python loaders: a missing file → empty
|
||||
// set, mtime 0). A file appearing/disappearing therefore registers as a change.
|
||||
func statMtime(path string) int64 {
|
||||
if path == "" {
|
||||
return 0
|
||||
}
|
||||
fi, err := os.Stat(path)
|
||||
if err != nil {
|
||||
return 0
|
||||
}
|
||||
return fi.ModTime().UnixNano()
|
||||
}
|
||||
|
||||
// maybeReload re-reads any backing list whose on-disk mtime changed since the
|
||||
// last pass, swapping the affected map(s) under p.mu. Throttled to at most one
|
||||
// stat pass per p.reloadThrottle (cheap: a time compare + a few stats), so the
|
||||
// Decide hot path pays almost nothing. Concurrency-safe: the throttle/mtime
|
||||
// bookkeeping is under reloadMu and the map swap under mu — Decide's readers
|
||||
// hold mu.RLock, so a swap is atomic w.r.t. any in-flight decision.
|
||||
func (p *Policy) maybeReload() {
|
||||
now := time.Now()
|
||||
p.reloadMu.Lock()
|
||||
if p.reloadThrottle > 0 && p.lastReloadID != 0 &&
|
||||
now.Sub(time.Unix(0, p.lastReloadID)) < p.reloadThrottle {
|
||||
p.reloadMu.Unlock()
|
||||
return
|
||||
}
|
||||
p.lastReloadID = now.UnixNano()
|
||||
|
||||
// Collect the files that changed (stat under reloadMu; re-read outside mu).
|
||||
type pending struct {
|
||||
idx int
|
||||
set map[string]bool
|
||||
}
|
||||
var changed []pending
|
||||
for i := range p.reloadFiles {
|
||||
rt := &p.reloadFiles[i]
|
||||
if rt.path == "" {
|
||||
continue
|
||||
}
|
||||
m := statMtime(rt.path)
|
||||
if m != rt.lastMtime {
|
||||
rt.lastMtime = m
|
||||
changed = append(changed, pending{idx: i, set: scanLines(rt.path, rt.stripComm)})
|
||||
}
|
||||
}
|
||||
p.reloadMu.Unlock()
|
||||
|
||||
if len(changed) == 0 {
|
||||
return
|
||||
}
|
||||
// Swap the affected maps atomically under the write lock.
|
||||
p.mu.Lock()
|
||||
for _, c := range changed {
|
||||
p.reloadFiles[c.idx].apply(p, c.set)
|
||||
}
|
||||
p.mu.Unlock()
|
||||
}
|
||||
|
||||
// ── registrable: port of ad_ghost._registrable ───────────────────────────────
|
||||
|
|
@ -279,6 +408,11 @@ func hostMatches(host string, patterns map[string]bool) bool {
|
|||
|
||||
// allowed: port of ad_ghost._allowed. Own-infra ALWAYS wins (reflash-safe),
|
||||
// then the operator allowlist (host or registrable).
|
||||
//
|
||||
// LOCK CONTRACT: reads the reloadable allow map — the caller MUST hold at least
|
||||
// p.mu.RLock (Decide / shouldPoison do). Lock-free internally so Decide can call
|
||||
// it alongside shouldSplice/blockedByAd under a single RLock (sync.RWMutex is
|
||||
// not reentrant).
|
||||
func (p *Policy) allowed(host string) bool {
|
||||
h := strings.ToLower(host)
|
||||
reg := registrable(h)
|
||||
|
|
@ -297,7 +431,19 @@ func (p *Policy) allowed(host string) bool {
|
|||
return p.allow[h] || p.allow[reg]
|
||||
}
|
||||
|
||||
// allowedSafe is the lock-taking entry point to allowed() for callers OUTSIDE a
|
||||
// Decide RLock (e.g. the ad-candidate feed). It also picks up a live-reloaded
|
||||
// allowlist via maybeReload, so a freshly-allowlisted host stops being learned.
|
||||
func (p *Policy) allowedSafe(host string) bool {
|
||||
p.maybeReload()
|
||||
p.mu.RLock()
|
||||
defer p.mu.RUnlock()
|
||||
return p.allowed(host)
|
||||
}
|
||||
|
||||
// shouldSplice: port of splice.should_splice (never wins; then seed ∪ learned).
|
||||
// LOCK CONTRACT: reads the reloadable never/spliceSeed/spliceLearn maps — the
|
||||
// caller MUST hold at least p.mu.RLock (Decide does).
|
||||
func (p *Policy) shouldSplice(sni string) bool {
|
||||
s := strings.Trim(strings.ToLower(sni), ".")
|
||||
if s == "" {
|
||||
|
|
@ -312,6 +458,10 @@ func (p *Policy) shouldSplice(sni string) bool {
|
|||
// blockedByAd: port of the ad_ghost requestheaders block decision (sans the
|
||||
// allowlist guard, which Decide applies first): _AD_HOST match OR
|
||||
// registrable/host in learned-trackers.
|
||||
//
|
||||
// LOCK CONTRACT: reads the reloadable learned map — the caller MUST hold at
|
||||
// least p.mu.RLock. Decide and shouldPoison (via isTracker) do; the candidate-
|
||||
// emit path calls it only through those.
|
||||
func (p *Policy) blockedByAd(host string) bool {
|
||||
if p.adHost.MatchString(host) {
|
||||
return true
|
||||
|
|
@ -339,9 +489,16 @@ func (p *Policy) blockedByAd(host string) bool {
|
|||
// sni defaults to host when empty (the live engine splices on SNI == the TLS
|
||||
// host; for the parity harness host and sni are the same value).
|
||||
func (p *Policy) Decide(host, sni string) string {
|
||||
// #662 — pick up autolearn promotions / manual edits without a worker
|
||||
// restart. Throttled to ~every reloadThrottle and best-effort, so the hot
|
||||
// path normally pays only a time compare. Done BEFORE taking the read lock
|
||||
// (maybeReload may take the write lock to swap a changed map).
|
||||
p.maybeReload()
|
||||
if sni == "" {
|
||||
sni = host
|
||||
}
|
||||
p.mu.RLock()
|
||||
defer p.mu.RUnlock()
|
||||
if p.allowed(host) {
|
||||
return "allow"
|
||||
}
|
||||
|
|
|
|||
|
|
@ -148,6 +148,12 @@ func (p *Policy) isTracker(host string) bool {
|
|||
// allowlisted — own-infra flows are left clean (same dark safety as the block
|
||||
// path). The caller additionally requires a loaded jar key.
|
||||
func (p *Policy) shouldPoison(host string) bool {
|
||||
// #662 — consult the same live-reloaded learned set Decide uses, so a host
|
||||
// promoted into learned-trackers (by autolearn) is poisoned (smogged), not
|
||||
// only 204'd, without a worker restart. RLock-guard the reloadable maps
|
||||
// (allowed + isTracker→blockedByAd read them); maybeReload may swap them.
|
||||
p.mu.RLock()
|
||||
defer p.mu.RUnlock()
|
||||
if p.allowed(host) {
|
||||
return false // own-infra / allowlist → never poison
|
||||
}
|
||||
|
|
|
|||
291
packages/secubox-toolbox-ng/cmd/sbxmitm/relay.go
Normal file
291
packages/secubox-toolbox-ng/cmd/sbxmitm/relay.go
Normal file
|
|
@ -0,0 +1,291 @@
|
|||
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||
//
|
||||
// SecuBox-Deb :: toolbox-ng :: per-flow analysis relay (#662)
|
||||
//
|
||||
// Restores the dpi / cookies / ja4 EVENTS that feed the kbin "Qui te piste?"
|
||||
// cumulative-stats page, frozen since the #662 Phase-7 cutover decommissioned
|
||||
// the Python mitmproxy relay addons (packages/secubox-toolbox/mitmproxy_addons/
|
||||
// {dpi,cookies,ja4}.py). The Go engine is now the live R3 MITM core; this file
|
||||
// re-implements EXACTLY what those addons did — extract privacy-safe flow
|
||||
// metadata and fire-and-forget it to the analysis sidecar sockets, which
|
||||
// enrich + write toolbox.db.events keyed by client_mac_hash.
|
||||
//
|
||||
// Transport is the existing emit() helper (sidecar.go): a detached goroutine
|
||||
// with its own 2s timeout — a dead/slow analysis socket can NEVER block, delay,
|
||||
// or break a client flow. The payload builders here are pure (no I/O), O(1)-ish
|
||||
// per flow, and emit NAMES ONLY for cookies (never values — privacy / CSPN).
|
||||
//
|
||||
// Pure standard library — no external modules.
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Stable socket paths — verbatim from the Python addons' TARGET constants
|
||||
// (the http+unix:///run/secubox/<x>.sock/<route> URLs), split into path+route.
|
||||
const (
|
||||
dpiSocket = "/run/secubox/dpi.sock"
|
||||
cookiesSocket = "/run/secubox/cookies.sock"
|
||||
ja4Socket = "/run/secubox/threat-analyst.sock"
|
||||
|
||||
dpiRoute = "/classify"
|
||||
cookiesRoute = "/inject"
|
||||
ja4Route = "/ja4"
|
||||
)
|
||||
|
||||
// Caps + truncation limits, matching the Python addons exactly.
|
||||
const (
|
||||
maxSetCookieNames = 30 // cookies.py _names_only(set_cookies, cap=30)
|
||||
maxCookieNames = 50 // cookies.py sent_names[:50]
|
||||
maxCookieNameLen = 32 // cookies.py name[:32]
|
||||
maxCookieURL = 300 // cookies.py pretty_url[:300]
|
||||
)
|
||||
|
||||
// nowMS returns the current time as unix milliseconds (ts_ms in every payload).
|
||||
func nowMS() int64 { return time.Now().UnixMilli() }
|
||||
|
||||
// ── gate ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
// relayEnabled reports whether per-flow analysis relaying is on (the
|
||||
// --analysis-relay flag → Proxy.analysisRelay). When false, nothing is emitted.
|
||||
// Nil-safe so tests / the CONNECT PoC that build a bare Proxy can call it.
|
||||
func (px *Proxy) relayEnabled() bool {
|
||||
return px != nil && px.analysisRelay
|
||||
}
|
||||
|
||||
// relayEmit is the gated, fire-and-forget emit used by every relay call site.
|
||||
// It NEVER blocks (delegates to emit() which detaches a goroutine with its own
|
||||
// timeout) and emits nothing when the relay gate is off.
|
||||
func (px *Proxy) relayEmit(socketPath, route string, payload []byte) {
|
||||
if !px.relayEnabled() || len(payload) == 0 {
|
||||
return
|
||||
}
|
||||
emit(socketPath, route, payload)
|
||||
}
|
||||
|
||||
// ── dpi payload ──────────────────────────────────────────────────────────────
|
||||
|
||||
// dpiEvent mirrors the JSON the Python DPIRelay.request() emitted. user_agent is
|
||||
// a *string so an absent UA serialises to JSON null (not ""), matching
|
||||
// headers.get("user-agent") → None. scheme + sni are constant "https" / host on
|
||||
// the MITM'd path (we only relay terminated TLS flows).
|
||||
type dpiEvent struct {
|
||||
TSMs int64 `json:"ts_ms"`
|
||||
ClientIP string `json:"client_ip"`
|
||||
MacHash string `json:"client_mac_hash"`
|
||||
Host string `json:"host"`
|
||||
Scheme string `json:"scheme"`
|
||||
Method string `json:"method"`
|
||||
UserAgent *string `json:"user_agent"`
|
||||
SNI string `json:"sni"`
|
||||
}
|
||||
|
||||
// buildDPIPayload builds the /classify payload for one MITM'd request.
|
||||
func buildDPIPayload(clientIP, macHash, host string, req *http.Request) []byte {
|
||||
var ua *string
|
||||
if v := req.Header.Get("User-Agent"); v != "" {
|
||||
ua = &v
|
||||
}
|
||||
ev := dpiEvent{
|
||||
TSMs: nowMS(),
|
||||
ClientIP: clientIP,
|
||||
MacHash: macHash,
|
||||
Host: host,
|
||||
Scheme: "https",
|
||||
Method: req.Method,
|
||||
UserAgent: ua,
|
||||
SNI: host,
|
||||
}
|
||||
b, _ := json.Marshal(ev)
|
||||
return b
|
||||
}
|
||||
|
||||
// emitDPI relays the DPI classification hint for a MITM'd request (gated).
|
||||
func (px *Proxy) emitDPI(clientIP, macHash, host string, req *http.Request) {
|
||||
if !px.relayEnabled() {
|
||||
return
|
||||
}
|
||||
px.relayEmit(dpiSocket, dpiRoute, buildDPIPayload(clientIP, macHash, host, req))
|
||||
}
|
||||
|
||||
// ── cookies payload ──────────────────────────────────────────────────────────
|
||||
|
||||
// cookiesEvent mirrors the JSON the Python CookiesRelay.response() emitted.
|
||||
// NAMES ONLY — never cookie values (privacy / CSPN).
|
||||
type cookiesEvent struct {
|
||||
TSMs int64 `json:"ts_ms"`
|
||||
ClientIP string `json:"client_ip"`
|
||||
MacHash string `json:"client_mac_hash"`
|
||||
URL string `json:"url"`
|
||||
Method string `json:"method"`
|
||||
SetCookieNames []string `json:"set_cookie_names"`
|
||||
CookieNames []string `json:"cookie_names"`
|
||||
SetCookieCount int `json:"set_cookie_count"`
|
||||
CookieCount int `json:"cookie_count"`
|
||||
Status int `json:"status"`
|
||||
}
|
||||
|
||||
// cookiesRelevant reports whether a flow carries any cookie signal worth
|
||||
// relaying: ≥1 Set-Cookie in the response OR ≥1 Cookie in the request. Mirrors
|
||||
// the Python `if not (set_cookies or req_cookies): return`.
|
||||
func cookiesRelevant(req *http.Request, resp *http.Response) bool {
|
||||
if resp != nil && len(resp.Header.Values("Set-Cookie")) > 0 {
|
||||
return true
|
||||
}
|
||||
return req != nil && len(req.Header.Values("Cookie")) > 0
|
||||
}
|
||||
|
||||
// setCookieName extracts the cookie NAME from a Set-Cookie header line: the text
|
||||
// before the first '=' of the first ';'-delimited field, trimmed and capped.
|
||||
// Returns "" for attribute-only / malformed / empty-name lines (skipped).
|
||||
func setCookieName(sc string) string {
|
||||
head := sc
|
||||
if i := strings.IndexByte(sc, ';'); i >= 0 {
|
||||
head = sc[:i]
|
||||
}
|
||||
eq := strings.IndexByte(head, '=')
|
||||
if eq < 0 {
|
||||
return ""
|
||||
}
|
||||
n := strings.TrimSpace(head[:eq])
|
||||
if len(n) > maxCookieNameLen {
|
||||
n = n[:maxCookieNameLen]
|
||||
}
|
||||
return n
|
||||
}
|
||||
|
||||
// parseCookieHeaderNames splits a single "Cookie:" header value into its
|
||||
// individual cookie NAMES (text before each '=' across ';'-separated pairs),
|
||||
// trimmed + capped. Mirrors cookies.py _parse_cookie_header.
|
||||
func parseCookieHeaderNames(value string) []string {
|
||||
var names []string
|
||||
for _, part := range strings.Split(value, ";") {
|
||||
eq := strings.IndexByte(part, '=')
|
||||
if eq < 0 {
|
||||
continue
|
||||
}
|
||||
n := strings.TrimSpace(part[:eq])
|
||||
if len(n) > maxCookieNameLen {
|
||||
n = n[:maxCookieNameLen]
|
||||
}
|
||||
if n != "" {
|
||||
names = append(names, n)
|
||||
}
|
||||
}
|
||||
return names
|
||||
}
|
||||
|
||||
// setCookieNames returns the NAMES of the response Set-Cookie lines, scanning at
|
||||
// most the first `cap` header lines (Python _names_only(headers[:cap])).
|
||||
func setCookieNames(setCookies []string, cap int) []string {
|
||||
out := make([]string, 0, len(setCookies))
|
||||
for i, sc := range setCookies {
|
||||
if i >= cap {
|
||||
break
|
||||
}
|
||||
if n := setCookieName(sc); n != "" {
|
||||
out = append(out, n)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// buildCookiesPayload builds the /inject payload for one MITM'd response that
|
||||
// carries a cookie signal. The caller is expected to have checked
|
||||
// cookiesRelevant; building on an empty flow yields empty name lists.
|
||||
func buildCookiesPayload(clientIP, macHash string, req *http.Request, resp *http.Response) []byte {
|
||||
setCookies := resp.Header.Values("Set-Cookie")
|
||||
reqCookies := req.Header.Values("Cookie")
|
||||
|
||||
// Sent cookie names: flatten every Cookie header line, then cap to 50 total.
|
||||
var sent []string
|
||||
for _, ch := range reqCookies {
|
||||
sent = append(sent, parseCookieHeaderNames(ch)...)
|
||||
}
|
||||
if len(sent) > maxCookieNames {
|
||||
sent = sent[:maxCookieNames]
|
||||
}
|
||||
|
||||
u := req.URL.String()
|
||||
if len(u) > maxCookieURL {
|
||||
u = u[:maxCookieURL]
|
||||
}
|
||||
|
||||
ev := cookiesEvent{
|
||||
TSMs: nowMS(),
|
||||
ClientIP: clientIP,
|
||||
MacHash: macHash,
|
||||
URL: u,
|
||||
Method: req.Method,
|
||||
SetCookieNames: setCookieNames(setCookies, maxSetCookieNames),
|
||||
CookieNames: sent,
|
||||
SetCookieCount: len(setCookies),
|
||||
CookieCount: len(reqCookies),
|
||||
Status: resp.StatusCode,
|
||||
}
|
||||
b, _ := json.Marshal(ev)
|
||||
return b
|
||||
}
|
||||
|
||||
// emitCookies relays the cookie metadata for a MITM'd response (gated). No-op
|
||||
// when neither a Set-Cookie nor a request Cookie is present.
|
||||
func (px *Proxy) emitCookies(clientIP, macHash string, req *http.Request, resp *http.Response) {
|
||||
if !px.relayEnabled() || !cookiesRelevant(req, resp) {
|
||||
return
|
||||
}
|
||||
px.relayEmit(cookiesSocket, cookiesRoute, buildCookiesPayload(clientIP, macHash, req, resp))
|
||||
}
|
||||
|
||||
// ── ja4 payload ──────────────────────────────────────────────────────────────
|
||||
|
||||
// ja4Event mirrors the JSON the Python JA4Relay.tls_clienthello() emitted.
|
||||
// alpn_protocols / cipher_suites are always JSON arrays (never null) — matching
|
||||
// list(ch.alpn_protocols or []). extensions is always null: crypto/tls'
|
||||
// ClientHelloInfo does not expose the raw extension list, exactly the Python
|
||||
// `if hasattr(ch, "extensions") else None` fallback (the service tolerates it).
|
||||
type ja4Event struct {
|
||||
TSMs int64 `json:"ts_ms"`
|
||||
ClientIP string `json:"client_ip"`
|
||||
MacHash string `json:"client_mac_hash"`
|
||||
SNI string `json:"sni"`
|
||||
ALPN []string `json:"alpn_protocols"`
|
||||
Ciphers []uint16 `json:"cipher_suites"`
|
||||
Extensions *[]int `json:"extensions"` // always nil → JSON null
|
||||
}
|
||||
|
||||
// buildJA4Payload builds the /ja4 payload for one MITM'd TLS ClientHello.
|
||||
func buildJA4Payload(clientIP, macHash, sni string, alpn []string, ciphers []uint16) []byte {
|
||||
if alpn == nil {
|
||||
alpn = []string{}
|
||||
}
|
||||
if ciphers == nil {
|
||||
ciphers = []uint16{}
|
||||
}
|
||||
ev := ja4Event{
|
||||
TSMs: nowMS(),
|
||||
ClientIP: clientIP,
|
||||
MacHash: macHash,
|
||||
SNI: sni,
|
||||
ALPN: alpn,
|
||||
Ciphers: ciphers,
|
||||
Extensions: nil,
|
||||
}
|
||||
b, _ := json.Marshal(ev)
|
||||
return b
|
||||
}
|
||||
|
||||
// emitJA4 relays the captured ClientHello fingerprint material for a MITM'd
|
||||
// handshake (gated). Called once per handshake, before Decide — so blocked and
|
||||
// allowed flows alike are relayed, matching the Python addon which ran on every
|
||||
// tls_clienthello.
|
||||
func (px *Proxy) emitJA4(clientIP, macHash, sni string, alpn []string, ciphers []uint16) {
|
||||
if !px.relayEnabled() {
|
||||
return
|
||||
}
|
||||
px.relayEmit(ja4Socket, ja4Route, buildJA4Payload(clientIP, macHash, sni, alpn, ciphers))
|
||||
}
|
||||
355
packages/secubox-toolbox-ng/cmd/sbxmitm/relay_test.go
Normal file
355
packages/secubox-toolbox-ng/cmd/sbxmitm/relay_test.go
Normal file
|
|
@ -0,0 +1,355 @@
|
|||
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||
//
|
||||
// Unit tests for the per-flow analysis relay payload builders + emit wiring
|
||||
// (#662 — restoring the dpi/cookies/ja4 events that feed "Qui te piste?").
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"net"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// ── dpi payload ──────────────────────────────────────────────────────────────
|
||||
|
||||
func TestBuildDPIPayload(t *testing.T) {
|
||||
req, _ := http.NewRequest("GET", "https://tracker.example.com/pixel?x=1", nil)
|
||||
req.Header.Set("User-Agent", "Mozilla/5.0 (X11)")
|
||||
p := buildDPIPayload("203.0.113.7", "abcd1234", "tracker.example.com", req)
|
||||
|
||||
var m map[string]any
|
||||
if err := json.Unmarshal(p, &m); err != nil {
|
||||
t.Fatalf("unmarshal: %v\n%s", err, p)
|
||||
}
|
||||
if m["client_ip"] != "203.0.113.7" {
|
||||
t.Errorf("client_ip = %v", m["client_ip"])
|
||||
}
|
||||
if m["client_mac_hash"] != "abcd1234" {
|
||||
t.Errorf("client_mac_hash = %v", m["client_mac_hash"])
|
||||
}
|
||||
if m["host"] != "tracker.example.com" {
|
||||
t.Errorf("host = %v", m["host"])
|
||||
}
|
||||
if m["scheme"] != "https" {
|
||||
t.Errorf("scheme = %v", m["scheme"])
|
||||
}
|
||||
if m["method"] != "GET" {
|
||||
t.Errorf("method = %v", m["method"])
|
||||
}
|
||||
if m["user_agent"] != "Mozilla/5.0 (X11)" {
|
||||
t.Errorf("user_agent = %v", m["user_agent"])
|
||||
}
|
||||
if m["sni"] != "tracker.example.com" {
|
||||
t.Errorf("sni = %v", m["sni"])
|
||||
}
|
||||
// ts_ms present and plausible (a recent unix-millis value).
|
||||
ts, ok := m["ts_ms"].(float64)
|
||||
if !ok || ts < 1_600_000_000_000 {
|
||||
t.Errorf("ts_ms = %v (want recent unix millis)", m["ts_ms"])
|
||||
}
|
||||
}
|
||||
|
||||
// Absent User-Agent → JSON null (not "" and not omitted), mirroring the Python
|
||||
// addon's headers.get("user-agent") → None.
|
||||
func TestBuildDPIPayloadNullUserAgent(t *testing.T) {
|
||||
req, _ := http.NewRequest("GET", "https://h.example/", nil)
|
||||
p := buildDPIPayload("1.2.3.4", "h", "h.example", req)
|
||||
if !strings.Contains(string(p), `"user_agent":null`) {
|
||||
t.Errorf("expected user_agent null, got: %s", p)
|
||||
}
|
||||
}
|
||||
|
||||
// ── cookies payload ──────────────────────────────────────────────────────────
|
||||
|
||||
func TestBuildCookiesPayloadNamesOnly(t *testing.T) {
|
||||
req, _ := http.NewRequest("POST", "https://shop.example.com/cart", nil)
|
||||
req.Header.Add("Cookie", "sessionid=SECRET_VALUE; csrftoken=ANOTHER_SECRET")
|
||||
req.Header.Add("Cookie", "_ga=GA1.2.deadbeef")
|
||||
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
|
||||
resp.Header.Add("Set-Cookie", "_fbp=fb.1.SECRET; Path=/; HttpOnly; SameSite=Lax")
|
||||
resp.Header.Add("Set-Cookie", "uid=PRIVATE; Domain=.example.com")
|
||||
|
||||
p := buildCookiesPayload("10.99.1.5", "wgpersona", req, resp)
|
||||
var m map[string]any
|
||||
if err := json.Unmarshal(p, &m); err != nil {
|
||||
t.Fatalf("unmarshal: %v\n%s", err, p)
|
||||
}
|
||||
if m["url"] != "https://shop.example.com/cart" {
|
||||
t.Errorf("url = %v", m["url"])
|
||||
}
|
||||
if m["method"] != "POST" {
|
||||
t.Errorf("method = %v", m["method"])
|
||||
}
|
||||
if int(m["status"].(float64)) != 200 {
|
||||
t.Errorf("status = %v", m["status"])
|
||||
}
|
||||
if int(m["set_cookie_count"].(float64)) != 2 {
|
||||
t.Errorf("set_cookie_count = %v", m["set_cookie_count"])
|
||||
}
|
||||
if int(m["cookie_count"].(float64)) != 2 {
|
||||
t.Errorf("cookie_count (header lines) = %v", m["cookie_count"])
|
||||
}
|
||||
setNames := toStrings(m["set_cookie_names"])
|
||||
if !equalStrSet(setNames, []string{"_fbp", "uid"}) {
|
||||
t.Errorf("set_cookie_names = %v", setNames)
|
||||
}
|
||||
cookieNames := toStrings(m["cookie_names"])
|
||||
if !equalStrSet(cookieNames, []string{"sessionid", "csrftoken", "_ga"}) {
|
||||
t.Errorf("cookie_names = %v", cookieNames)
|
||||
}
|
||||
// Hard privacy guarantee: NO value leaked anywhere in the payload.
|
||||
raw := string(p)
|
||||
for _, secret := range []string{"SECRET_VALUE", "ANOTHER_SECRET", "deadbeef", "fb.1.SECRET", "PRIVATE", "GA1.2"} {
|
||||
if strings.Contains(raw, secret) {
|
||||
t.Errorf("payload leaked cookie value %q: %s", secret, raw)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Set-Cookie name parse: text before the first '='. Cookie header split on ';'.
|
||||
func TestCookieNameParsing(t *testing.T) {
|
||||
if got := setCookieName("name=val; Path=/; Secure"); got != "name" {
|
||||
t.Errorf("setCookieName = %q", got)
|
||||
}
|
||||
if got := setCookieName(" spaced = v"); got != "spaced" {
|
||||
t.Errorf("setCookieName trim = %q", got)
|
||||
}
|
||||
if got := setCookieName("=novalue"); got != "" {
|
||||
t.Errorf("setCookieName empty name = %q", got)
|
||||
}
|
||||
if got := setCookieName("attributeonly"); got != "" {
|
||||
t.Errorf("setCookieName no eq = %q", got)
|
||||
}
|
||||
|
||||
names := parseCookieHeaderNames("a=1; b=2;c=3")
|
||||
if !equalStrSet(names, []string{"a", "b", "c"}) {
|
||||
t.Errorf("parseCookieHeaderNames = %v", names)
|
||||
}
|
||||
}
|
||||
|
||||
// Caps: ≤30 Set-Cookie names, ≤50 sent cookie names.
|
||||
func TestCookiesPayloadCaps(t *testing.T) {
|
||||
req, _ := http.NewRequest("GET", "https://e.example/", nil)
|
||||
var bigCookie strings.Builder
|
||||
for i := 0; i < 80; i++ {
|
||||
if i > 0 {
|
||||
bigCookie.WriteString("; ")
|
||||
}
|
||||
bigCookie.WriteString("c")
|
||||
bigCookie.WriteByte(byte('0' + i%10))
|
||||
bigCookie.WriteString("_")
|
||||
bigCookie.WriteByte(byte('a' + i%26))
|
||||
bigCookie.WriteString("=v")
|
||||
}
|
||||
req.Header.Add("Cookie", bigCookie.String())
|
||||
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
|
||||
for i := 0; i < 45; i++ {
|
||||
resp.Header.Add("Set-Cookie", "sc"+string(rune('A'+i%26))+string(rune('0'+i%10))+"=v")
|
||||
}
|
||||
p := buildCookiesPayload("1.1.1.1", "h", req, resp)
|
||||
var m map[string]any
|
||||
json.Unmarshal(p, &m)
|
||||
if n := len(toStrings(m["set_cookie_names"])); n > 30 {
|
||||
t.Errorf("set_cookie_names not capped at 30: %d", n)
|
||||
}
|
||||
if n := len(toStrings(m["cookie_names"])); n > 50 {
|
||||
t.Errorf("cookie_names not capped at 50: %d", n)
|
||||
}
|
||||
// raw counts still reflect the real totals.
|
||||
if int(m["set_cookie_count"].(float64)) != 45 {
|
||||
t.Errorf("set_cookie_count = %v", m["set_cookie_count"])
|
||||
}
|
||||
}
|
||||
|
||||
// URL truncated to ≤300 chars.
|
||||
func TestCookiesPayloadURLTruncation(t *testing.T) {
|
||||
long := "https://e.example/" + strings.Repeat("a", 500)
|
||||
u, _ := url.Parse(long)
|
||||
req := &http.Request{Method: "GET", URL: u, Header: http.Header{}}
|
||||
req.Header.Add("Cookie", "x=1")
|
||||
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
|
||||
p := buildCookiesPayload("1.1.1.1", "h", req, resp)
|
||||
var m map[string]any
|
||||
json.Unmarshal(p, &m)
|
||||
if len(m["url"].(string)) > 300 {
|
||||
t.Errorf("url not truncated: %d chars", len(m["url"].(string)))
|
||||
}
|
||||
}
|
||||
|
||||
// cookiesRelevant gates emission: only when ≥1 Set-Cookie OR ≥1 Cookie.
|
||||
func TestCookiesRelevant(t *testing.T) {
|
||||
mk := func(setC, reqC bool) (*http.Request, *http.Response) {
|
||||
req, _ := http.NewRequest("GET", "https://e/", nil)
|
||||
if reqC {
|
||||
req.Header.Add("Cookie", "a=1")
|
||||
}
|
||||
resp := &http.Response{StatusCode: 200, Header: http.Header{}}
|
||||
if setC {
|
||||
resp.Header.Add("Set-Cookie", "x=1")
|
||||
}
|
||||
return req, resp
|
||||
}
|
||||
if r, p := mk(false, false); cookiesRelevant(r, p) {
|
||||
t.Error("no cookies → should not be relevant")
|
||||
}
|
||||
if r, p := mk(true, false); !cookiesRelevant(r, p) {
|
||||
t.Error("set-cookie present → relevant")
|
||||
}
|
||||
if r, p := mk(false, true); !cookiesRelevant(r, p) {
|
||||
t.Error("request cookie present → relevant")
|
||||
}
|
||||
}
|
||||
|
||||
// ── ja4 payload ──────────────────────────────────────────────────────────────
|
||||
|
||||
func TestBuildJA4Payload(t *testing.T) {
|
||||
p := buildJA4Payload("198.51.100.9", "tlspersona", "secure.example.com",
|
||||
[]string{"h2", "http/1.1"}, []uint16{4865, 4866, 49195})
|
||||
var m map[string]any
|
||||
if err := json.Unmarshal(p, &m); err != nil {
|
||||
t.Fatalf("unmarshal: %v\n%s", err, p)
|
||||
}
|
||||
if m["sni"] != "secure.example.com" {
|
||||
t.Errorf("sni = %v", m["sni"])
|
||||
}
|
||||
if m["client_ip"] != "198.51.100.9" {
|
||||
t.Errorf("client_ip = %v", m["client_ip"])
|
||||
}
|
||||
if m["client_mac_hash"] != "tlspersona" {
|
||||
t.Errorf("client_mac_hash = %v", m["client_mac_hash"])
|
||||
}
|
||||
alpn := toStrings(m["alpn_protocols"])
|
||||
if !equalStrSet(alpn, []string{"h2", "http/1.1"}) {
|
||||
t.Errorf("alpn = %v", alpn)
|
||||
}
|
||||
cs := m["cipher_suites"].([]any)
|
||||
if len(cs) != 3 || int(cs[0].(float64)) != 4865 {
|
||||
t.Errorf("cipher_suites = %v", cs)
|
||||
}
|
||||
// extensions: always null (stdlib doesn't expose them).
|
||||
if !strings.Contains(string(p), `"extensions":null`) {
|
||||
t.Errorf("expected extensions null, got: %s", p)
|
||||
}
|
||||
}
|
||||
|
||||
// Empty ALPN / ciphers → JSON empty arrays (mirrors list(... or [])), not null.
|
||||
func TestBuildJA4PayloadEmptySlices(t *testing.T) {
|
||||
p := buildJA4Payload("1.1.1.1", "h", "", nil, nil)
|
||||
raw := string(p)
|
||||
if !strings.Contains(raw, `"alpn_protocols":[]`) {
|
||||
t.Errorf("alpn should be [] not null: %s", raw)
|
||||
}
|
||||
if !strings.Contains(raw, `"cipher_suites":[]`) {
|
||||
t.Errorf("cipher_suites should be [] not null: %s", raw)
|
||||
}
|
||||
}
|
||||
|
||||
// ── gate wiring ──────────────────────────────────────────────────────────────
|
||||
|
||||
// The flag wires into Proxy.analysisRelay and gates emission.
|
||||
func TestAnalysisRelayGate(t *testing.T) {
|
||||
on := &Proxy{analysisRelay: true}
|
||||
off := &Proxy{analysisRelay: false}
|
||||
if !on.relayEnabled() {
|
||||
t.Error("analysisRelay=true → relayEnabled() should be true")
|
||||
}
|
||||
if off.relayEnabled() {
|
||||
t.Error("analysisRelay=false → relayEnabled() should be false")
|
||||
}
|
||||
}
|
||||
|
||||
// emitDPI/emitCookies/emitJA4 respect the gate: with analysisRelay=false they
|
||||
// deliver nothing to a live socket; with it true they deliver.
|
||||
func TestEmitGateRespected(t *testing.T) {
|
||||
sock := filepath.Join(t.TempDir(), "dpi.sock")
|
||||
ln, err := net.Listen("unix", sock)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
defer ln.Close()
|
||||
hits := make(chan struct{}, 4)
|
||||
go func() {
|
||||
for {
|
||||
c, err := ln.Accept()
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
buf := make([]byte, 1024)
|
||||
c.Read(buf)
|
||||
c.Write([]byte("HTTP/1.1 204 No Content\r\nContent-Length: 0\r\nConnection: close\r\n\r\n"))
|
||||
c.Close()
|
||||
hits <- struct{}{}
|
||||
}
|
||||
}()
|
||||
|
||||
// Gate off → nothing delivered.
|
||||
off := &Proxy{analysisRelay: false}
|
||||
off.relayEmit(sock, "/classify", []byte(`{"k":"v"}`))
|
||||
select {
|
||||
case <-hits:
|
||||
t.Fatal("gate off but a payload was delivered")
|
||||
case <-time.After(300 * time.Millisecond):
|
||||
}
|
||||
|
||||
// Gate on → delivered.
|
||||
on := &Proxy{analysisRelay: true}
|
||||
on.relayEmit(sock, "/classify", []byte(`{"k":"v"}`))
|
||||
select {
|
||||
case <-hits:
|
||||
case <-time.After(2 * time.Second):
|
||||
t.Fatal("gate on but nothing delivered")
|
||||
}
|
||||
}
|
||||
|
||||
// ── socket-path consts ─────────────────────────────────────────────────────
|
||||
|
||||
func TestRelaySocketPaths(t *testing.T) {
|
||||
if dpiSocket != "/run/secubox/dpi.sock" {
|
||||
t.Errorf("dpiSocket = %q", dpiSocket)
|
||||
}
|
||||
if cookiesSocket != "/run/secubox/cookies.sock" {
|
||||
t.Errorf("cookiesSocket = %q", cookiesSocket)
|
||||
}
|
||||
if ja4Socket != "/run/secubox/threat-analyst.sock" {
|
||||
t.Errorf("ja4Socket = %q", ja4Socket)
|
||||
}
|
||||
}
|
||||
|
||||
// ── test helpers ───────────────────────────────────────────────────────────
|
||||
|
||||
func toStrings(v any) []string {
|
||||
arr, ok := v.([]any)
|
||||
if !ok {
|
||||
return nil
|
||||
}
|
||||
out := make([]string, 0, len(arr))
|
||||
for _, e := range arr {
|
||||
out = append(out, e.(string))
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func equalStrSet(got, want []string) bool {
|
||||
if len(got) != len(want) {
|
||||
return false
|
||||
}
|
||||
seen := map[string]int{}
|
||||
for _, g := range got {
|
||||
seen[g]++
|
||||
}
|
||||
for _, w := range want {
|
||||
seen[w]--
|
||||
}
|
||||
for _, n := range seen {
|
||||
if n != 0 {
|
||||
return false
|
||||
}
|
||||
}
|
||||
return true
|
||||
}
|
||||
189
packages/secubox-toolbox-ng/cmd/sbxmitm/reload_test.go
Normal file
189
packages/secubox-toolbox-ng/cmd/sbxmitm/reload_test.go
Normal file
|
|
@ -0,0 +1,189 @@
|
|||
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||
//
|
||||
// SecuBox-Deb :: toolbox-ng :: policy live-reload tests (#662 auto-learn loop)
|
||||
//
|
||||
// The #662 Go cutover loaded the BLOCK/SPLICE lists ONCE at startup, so an
|
||||
// autolearn promotion (or a manual edit) of learned-trackers.txt never took
|
||||
// effect until a worker restart — the very thing that made new adwares slip
|
||||
// through forever. These tests prove the mtime-based live-reload: after the
|
||||
// throttle window, a host appended to learned-trackers.txt flips Decide from
|
||||
// "mitm" to "block" with NO restart. Concurrency is exercised under -race.
|
||||
package main
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"sync"
|
||||
"sync/atomic"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// writeFile is a tiny helper that (re)writes a backing list file with content.
|
||||
func writeFile(t *testing.T, path, content string) {
|
||||
t.Helper()
|
||||
if err := os.WriteFile(path, []byte(content), 0o644); err != nil {
|
||||
t.Fatalf("write %s: %v", path, err)
|
||||
}
|
||||
}
|
||||
|
||||
// bumpMtime forces the file's mtime forward so the reload's stat sees a change
|
||||
// even on coarse-granularity filesystems or sub-second test runs.
|
||||
func bumpMtime(t *testing.T, path string, d time.Duration) {
|
||||
t.Helper()
|
||||
ft := time.Now().Add(d)
|
||||
if err := os.Chtimes(path, ft, ft); err != nil {
|
||||
t.Fatalf("chtimes %s: %v", path, err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMaybeReloadPicksUpAppendedLearnedTracker is the linchpin test: a host that
|
||||
// initially Decides "mitm" must flip to "block" once it is appended to
|
||||
// learned-trackers.txt and the throttle window elapses — without reloading the
|
||||
// Policy from scratch.
|
||||
func TestMaybeReloadPicksUpAppendedLearnedTracker(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
learned := filepath.Join(dir, "learned-trackers.txt")
|
||||
allow := filepath.Join(dir, "ad-allowlist.txt")
|
||||
writeFile(t, learned, "")
|
||||
writeFile(t, allow, "")
|
||||
|
||||
pol, err := LoadPolicy(PolicyOpts{
|
||||
LearnedPath: learned,
|
||||
AllowPath: allow,
|
||||
// keep the splice/never paths in the temp dir so missing-file behaviour
|
||||
// (empty set) is deterministic.
|
||||
SpliceSeedPath: filepath.Join(dir, "seed"),
|
||||
SpliceLearnPath: filepath.Join(dir, "slearn"),
|
||||
PureTrackersPath: filepath.Join(dir, "pure"),
|
||||
SelfDomains: []string{"secubox.in"},
|
||||
})
|
||||
if err != nil {
|
||||
t.Fatalf("LoadPolicy: %v", err)
|
||||
}
|
||||
// Make the reload eager for the test (no 15s wait): zero throttle.
|
||||
pol.reloadThrottle = 0
|
||||
|
||||
const host = "acotedemoi.com"
|
||||
if got := pol.Decide(host, host); got != "mitm" {
|
||||
t.Fatalf("before promotion: Decide(%q) = %q, want mitm", host, got)
|
||||
}
|
||||
|
||||
// Promote: append the host and bump mtime forward.
|
||||
writeFile(t, learned, host+"\n")
|
||||
bumpMtime(t, learned, 2*time.Second)
|
||||
|
||||
if got := pol.Decide(host, host); got != "block" {
|
||||
t.Fatalf("after promotion: Decide(%q) = %q, want block", host, got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMaybeReloadThrottled proves the throttle: with a non-zero throttle window,
|
||||
// a change made just after a reload is NOT observed until the window elapses,
|
||||
// keeping the hot path cheap (one stat per ~window, not per request).
|
||||
func TestMaybeReloadThrottled(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
learned := filepath.Join(dir, "learned-trackers.txt")
|
||||
writeFile(t, learned, "")
|
||||
|
||||
pol, err := LoadPolicy(PolicyOpts{LearnedPath: learned, AllowPath: filepath.Join(dir, "allow")})
|
||||
if err != nil {
|
||||
t.Fatalf("LoadPolicy: %v", err)
|
||||
}
|
||||
pol.reloadThrottle = time.Hour // effectively "never re-stat during the test"
|
||||
|
||||
// Prime the throttle clock with one Decide (does the initial stat).
|
||||
_ = pol.Decide("x.example", "x.example")
|
||||
|
||||
const host = "tracker.example"
|
||||
writeFile(t, learned, host+"\n")
|
||||
bumpMtime(t, learned, 2*time.Second)
|
||||
|
||||
if got := pol.Decide(host, host); got != "mitm" {
|
||||
t.Fatalf("throttled: Decide(%q) = %q, want mitm (change not yet observed)", host, got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMaybeReloadAllowlist proves the allowlist file is live-reloaded too: a
|
||||
// host the ad-host regex would block ("doubleclick.net") flips block→allow once
|
||||
// appended to the allowlist and the window elapses.
|
||||
func TestMaybeReloadAllowlist(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
learned := filepath.Join(dir, "learned-trackers.txt")
|
||||
allow := filepath.Join(dir, "ad-allowlist.txt")
|
||||
writeFile(t, learned, "")
|
||||
writeFile(t, allow, "")
|
||||
|
||||
pol, err := LoadPolicy(PolicyOpts{LearnedPath: learned, AllowPath: allow})
|
||||
if err != nil {
|
||||
t.Fatalf("LoadPolicy: %v", err)
|
||||
}
|
||||
pol.reloadThrottle = 0
|
||||
|
||||
const host = "doubleclick.net"
|
||||
if got := pol.Decide(host, host); got != "block" {
|
||||
t.Fatalf("before allow: Decide(%q) = %q, want block", host, got)
|
||||
}
|
||||
writeFile(t, allow, host+"\n")
|
||||
bumpMtime(t, allow, 2*time.Second)
|
||||
if got := pol.Decide(host, host); got != "allow" {
|
||||
t.Fatalf("after allow: Decide(%q) = %q, want allow", host, got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMaybeReloadConcurrent runs Decide from many goroutines while the backing
|
||||
// learned file is rewritten concurrently. Under `go test -race` this proves the
|
||||
// RWMutex-guarded swap is data-race-free.
|
||||
func TestMaybeReloadConcurrent(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
learned := filepath.Join(dir, "learned-trackers.txt")
|
||||
writeFile(t, learned, "seed.example\n")
|
||||
|
||||
pol, err := LoadPolicy(PolicyOpts{LearnedPath: learned, AllowPath: filepath.Join(dir, "allow")})
|
||||
if err != nil {
|
||||
t.Fatalf("LoadPolicy: %v", err)
|
||||
}
|
||||
pol.reloadThrottle = 0 // force a stat on every Decide → maximal contention
|
||||
|
||||
var wg sync.WaitGroup
|
||||
var blocks int64
|
||||
stop := make(chan struct{})
|
||||
|
||||
// Writer: keep appending hosts + bumping mtime.
|
||||
wg.Add(1)
|
||||
go func() {
|
||||
defer wg.Done()
|
||||
i := 0
|
||||
for {
|
||||
select {
|
||||
case <-stop:
|
||||
return
|
||||
default:
|
||||
}
|
||||
writeFile(t, learned, "seed.example\nh"+itoa(i)+".example\n")
|
||||
bumpMtime(t, learned, time.Duration(i+1)*time.Second)
|
||||
i++
|
||||
}
|
||||
}()
|
||||
|
||||
// Readers: hammer Decide on the seed (stable → always block) + a live host.
|
||||
for r := 0; r < 8; r++ {
|
||||
wg.Add(1)
|
||||
go func() {
|
||||
defer wg.Done()
|
||||
for j := 0; j < 2000; j++ {
|
||||
if pol.Decide("seed.example", "seed.example") == "block" {
|
||||
atomic.AddInt64(&blocks, 1)
|
||||
}
|
||||
pol.Decide("h0.example", "h0.example")
|
||||
}
|
||||
}()
|
||||
}
|
||||
time.Sleep(50 * time.Millisecond)
|
||||
close(stop)
|
||||
wg.Wait()
|
||||
if blocks == 0 {
|
||||
t.Fatal("expected the stable seed host to block at least once")
|
||||
}
|
||||
}
|
||||
|
|
@ -18,7 +18,9 @@
|
|||
// avatar → /run/secubox/avatar.sock POST /fingerprint
|
||||
// ja4 → /run/secubox/threat-analyst.sock POST /ja4
|
||||
// soc_relay → /run/secubox/soc.sock POST /event
|
||||
// social_graph: in-process (no socket) — correlated inside the engine, not emitted.
|
||||
// social_graph: correlated in-process (social.go) — edges (hash-only, never raw
|
||||
// cookie values) are NOT emitted to a module socket but POSTed to the portal
|
||||
// /__toolbox/social-event ingest (the social store lives in the toolbox/portal).
|
||||
//
|
||||
// emit takes the full socket PATH (not an http+unix:// URL) plus the route in
|
||||
// the payload's destination; callers build the path from the table above.
|
||||
|
|
|
|||
605
packages/secubox-toolbox-ng/cmd/sbxmitm/social.go
Normal file
605
packages/secubox-toolbox-ng/cmd/sbxmitm/social.go
Normal file
|
|
@ -0,0 +1,605 @@
|
|||
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||
//
|
||||
// SecuBox-Deb :: toolbox-ng :: cross-site cookie-tracker correlation (#662)
|
||||
//
|
||||
// Restores the kbin "/social" cross-site tracker graph, frozen since the #662
|
||||
// Phase-7 cutover decommissioned the in-process Python `social_graph` addon
|
||||
// (packages/secubox-toolbox/mitmproxy_addons/social_graph.py). The graph reads
|
||||
// social_nodes/social_links in toolbox.db, folded from raw social_edges — and
|
||||
// the edges stopped flowing when the Python addon was retired.
|
||||
//
|
||||
// This is a FAITHFUL Go port of the addon's correlation logic:
|
||||
// - cookieIDHash : byte-exact port of social.cookie_id_hash (Python = source
|
||||
// of truth, proven by social_test.go ↔ tests/test_social_parity.py over a
|
||||
// shared fixture — the same anti-rig discipline as jar.go).
|
||||
// - isDenyListed + the _DEFAULT_DENY_COOKIES set (social.py).
|
||||
// - registrableSocial : the addon's _registrable_domain eTLD+1 helper
|
||||
// (DIFFERENT from policy.go's registrable() — IP literals pass through,
|
||||
// no port strip, a larger multi-label-TLD table; the graph correctness
|
||||
// depends on this exact flavour, so it is replicated verbatim and NOT
|
||||
// consolidated with policy.registrable).
|
||||
// - the 3rd-party decision (tracker_domain != src_site on eTLD+1) on BOTH the
|
||||
// response Set-Cookie path and the request Cookie path, mirroring the
|
||||
// addon's response()+request hooks.
|
||||
// - the CMP consent-platform detection → consent_state ∈ {none_seen,
|
||||
// pre_consent, post_consent} via a per-(peer,site) in-memory log.
|
||||
//
|
||||
// Privacy/CSPN invariant (the reason the original ran in-process): raw cookie
|
||||
// VALUES NEVER leave the engine — only the truncated SHA-256 cookieIDHash is
|
||||
// emitted. The edges are POSTed fire-and-forget to the portal's
|
||||
// /__toolbox/social-event ingest (sibling of /__toolbox/ad-event), which calls
|
||||
// social.record_edge(). Best-effort throughout; a dead/slow portal can never
|
||||
// block or delay a client flow.
|
||||
//
|
||||
// Pure standard library — no external modules, no go.sum.
|
||||
package main
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"crypto/sha256"
|
||||
"encoding/hex"
|
||||
"encoding/json"
|
||||
"log"
|
||||
"net/http"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
// ── registrableSocial: port of social_graph._registrable_domain ─────────────
|
||||
//
|
||||
// Python (mitmproxy_addons/social_graph.py):
|
||||
//
|
||||
// h = (host or "").lower().strip(".")
|
||||
// if not h or h.replace(".", "").isdigit(): return h # raw IP → as-is
|
||||
// parts = h.split(".")
|
||||
// if len(parts) < 2: return h
|
||||
// last_two = ".".join(parts[-2:])
|
||||
// if last_two in _MULTI_LABEL_TLDS and len(parts) >= 3: return ".".join(parts[-3:])
|
||||
// return last_two
|
||||
//
|
||||
// This DIFFERS from policy.registrable (ad_ghost flavour): no port strip, IP
|
||||
// literals pass through unchanged (the store later drops IP trackers via
|
||||
// _is_ip), and the multi-label-TLD table below is the addon's larger set. The
|
||||
// graph's 3rd-party comparison is done with THIS function, so it must match the
|
||||
// addon exactly.
|
||||
var socialMultiLabelTLDs = map[string]bool{
|
||||
"co.uk": true, "ac.uk": true, "gov.uk": true, "org.uk": true, "net.uk": true,
|
||||
"co.jp": true, "ne.jp": true, "ac.jp": true,
|
||||
"com.au": true, "net.au": true, "org.au": true,
|
||||
"com.br": true, "com.cn": true, "com.hk": true, "com.tw": true, "com.mx": true,
|
||||
}
|
||||
|
||||
func registrableSocial(host string) string {
|
||||
h := strings.Trim(strings.ToLower(host), ".")
|
||||
if h == "" {
|
||||
return h
|
||||
}
|
||||
// h.replace(".","").isdigit() → all-digit (IPv4-ish) → return as-is.
|
||||
if isAllDigits(strings.ReplaceAll(h, ".", "")) {
|
||||
return h
|
||||
}
|
||||
parts := strings.Split(h, ".")
|
||||
if len(parts) < 2 {
|
||||
return h
|
||||
}
|
||||
last2 := strings.Join(parts[len(parts)-2:], ".")
|
||||
if socialMultiLabelTLDs[last2] && len(parts) >= 3 {
|
||||
return strings.Join(parts[len(parts)-3:], ".")
|
||||
}
|
||||
return last2
|
||||
}
|
||||
|
||||
// ── cookieIDHash: BYTE-EXACT port of social.cookie_id_hash ───────────────────
|
||||
//
|
||||
// Python (secubox_toolbox/social.py):
|
||||
//
|
||||
// h = sha256()
|
||||
// h.update(tracker_domain.lower().encode("utf-8","replace")); h.update(b"\x00")
|
||||
// h.update(cookie_name.lower().encode("utf-8","replace")); h.update(b"\x00")
|
||||
// h.update(cookie_value.encode("utf-8","replace"))
|
||||
// return h.hexdigest()[:16]
|
||||
//
|
||||
// CRITICAL: tracker_domain + cookie_name are LOWER-cased; the cookie_value is
|
||||
// NOT. NUL (0x00) separators between the three fields. Go strings are already
|
||||
// UTF-8, and strings.ToLower is byte-identical to Python str.lower for the
|
||||
// ASCII + Latin domain/name inputs the fixtures exercise (incl. the Ünîcödé
|
||||
// case, verified at parity). hex of the first 8 digest bytes == hexdigest()[:16].
|
||||
func cookieIDHash(trackerDomain, cookieName, cookieValue string) string {
|
||||
h := sha256.New()
|
||||
h.Write([]byte(strings.ToLower(trackerDomain)))
|
||||
h.Write([]byte{0x00})
|
||||
h.Write([]byte(strings.ToLower(cookieName)))
|
||||
h.Write([]byte{0x00})
|
||||
h.Write([]byte(cookieValue)) // value NOT lower-cased
|
||||
sum := h.Sum(nil)
|
||||
return hex.EncodeToString(sum)[:16]
|
||||
}
|
||||
|
||||
// ── deny-list: port of social._DEFAULT_DENY_COOKIES + is_deny_listed ─────────
|
||||
//
|
||||
// Names whose presence on a flow is NEVER recorded as a tracker identifier
|
||||
// (session / csrf / auth / cloudflare / consent / locale). Replicated verbatim
|
||||
// from social.py; matched case-insensitively after trimming.
|
||||
var socialDenyCookies = map[string]bool{
|
||||
// session
|
||||
"phpsessid": true, "jsessionid": true, "asp.net_sessionid": true, "ci_session": true,
|
||||
"express.sid": true, "connect.sid": true, "sails.sid": true, "django_session": true,
|
||||
"laravel_session": true, "flask_session": true, "session": true, "sessionid": true,
|
||||
// csrf
|
||||
"_csrf": true, "_csrf_token": true, "xsrf-token": true, "csrftoken": true, "csrf": true,
|
||||
"x-csrf-token": true, "anti-csrf-token": true,
|
||||
// auth (1st-party)
|
||||
"auth": true, "auth_token": true, "access_token": true, "refresh_token": true, "bearer": true,
|
||||
"remember_token": true, "remember_me": true, "_oauth2_proxy": true,
|
||||
// cloudflare / consent / locale (low signal)
|
||||
"__cf_bm": true, "cf_clearance": true, "consent": true, "cookieconsent_status": true,
|
||||
"locale": true, "lang": true, "language": true, "_locale": true,
|
||||
}
|
||||
|
||||
// isDenyListed mirrors social.is_deny_listed (default-deny set only; the engine
|
||||
// does not load the TOML extra_deny override). An empty name is deny-listed
|
||||
// (Python returns True for a blank name).
|
||||
func isDenyListed(cookieName string) bool {
|
||||
name := strings.ToLower(strings.TrimSpace(cookieName))
|
||||
if name == "" {
|
||||
return true
|
||||
}
|
||||
return socialDenyCookies[name]
|
||||
}
|
||||
|
||||
// ── cookie parsers: port of _parse_set_cookie / _parse_cookie_header ─────────
|
||||
|
||||
// parseSetCookieNameValue mirrors social_graph._parse_set_cookie: name=value is
|
||||
// the text up to the first ';'; the name is everything before the first '=',
|
||||
// trimmed; the value is the rest of that first field, trimmed. Returns ok=false
|
||||
// for an attribute-only / nameless / empty line.
|
||||
func parseSetCookieNameValue(header string) (name, value string, ok bool) {
|
||||
field := header
|
||||
if i := strings.IndexByte(field, ';'); i >= 0 {
|
||||
field = field[:i]
|
||||
}
|
||||
eq := strings.IndexByte(field, '=')
|
||||
if eq < 0 {
|
||||
return "", "", false
|
||||
}
|
||||
name = strings.TrimSpace(field[:eq])
|
||||
value = strings.TrimSpace(field[eq+1:])
|
||||
if name == "" {
|
||||
return "", "", false
|
||||
}
|
||||
return name, value, true
|
||||
}
|
||||
|
||||
// cookiePair is one (name,value) parsed from a request Cookie header.
|
||||
type cookiePair struct{ name, value string }
|
||||
|
||||
// parseCookieHeader mirrors social_graph._parse_cookie_header: split on ';',
|
||||
// each "name=value" yields a trimmed (name,value); nameless pairs are dropped.
|
||||
func parseCookieHeader(header string) []cookiePair {
|
||||
var out []cookiePair
|
||||
for _, part := range strings.Split(header, ";") {
|
||||
eq := strings.IndexByte(part, '=')
|
||||
if eq < 0 {
|
||||
continue
|
||||
}
|
||||
name := strings.TrimSpace(part[:eq])
|
||||
value := strings.TrimSpace(part[eq+1:])
|
||||
if name != "" {
|
||||
out = append(out, cookiePair{name: name, value: value})
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// extractSetCookieDomainAttr mirrors social_graph._extract_domain_attr: pull the
|
||||
// "; Domain=…" attribute from a Set-Cookie line, trimmed, leading dot stripped,
|
||||
// lower-cased. Returns "" when absent.
|
||||
func extractSetCookieDomainAttr(setCookie string) string {
|
||||
low := strings.ToLower(setCookie)
|
||||
idx := strings.Index(low, "domain")
|
||||
for idx >= 0 {
|
||||
// require it to be an attribute (preceded by ';' after optional spaces),
|
||||
// mirroring the Python regex `;\s*domain\s*=`.
|
||||
j := idx + len("domain")
|
||||
// skip spaces, then '='
|
||||
k := j
|
||||
for k < len(setCookie) && (setCookie[k] == ' ' || setCookie[k] == '\t') {
|
||||
k++
|
||||
}
|
||||
if k < len(setCookie) && setCookie[k] == '=' {
|
||||
// confirm a ';' (or start) precedes `domain` (after spaces).
|
||||
p := idx - 1
|
||||
for p >= 0 && (setCookie[p] == ' ' || setCookie[p] == '\t') {
|
||||
p--
|
||||
}
|
||||
if p < 0 || setCookie[p] == ';' {
|
||||
rest := setCookie[k+1:]
|
||||
if e := strings.IndexByte(rest, ';'); e >= 0 {
|
||||
rest = rest[:e]
|
||||
}
|
||||
val := strings.ToLower(strings.TrimLeft(strings.TrimSpace(rest), "."))
|
||||
if val == "" {
|
||||
return ""
|
||||
}
|
||||
return val
|
||||
}
|
||||
}
|
||||
next := strings.Index(low[idx+1:], "domain")
|
||||
if next < 0 {
|
||||
return ""
|
||||
}
|
||||
idx = idx + 1 + next
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
// srcSiteFromReferer mirrors social_graph._src_site_from_referer: take Referer
|
||||
// (else Origin), strip scheme/path/query, return registrableSocial of the host.
|
||||
func srcSiteFromReferer(req *http.Request) string {
|
||||
ref := req.Header.Get("Referer")
|
||||
if ref == "" {
|
||||
ref = req.Header.Get("Origin")
|
||||
}
|
||||
if ref == "" {
|
||||
return ""
|
||||
}
|
||||
s := ref
|
||||
if i := strings.Index(s, "://"); i >= 0 {
|
||||
s = s[i+3:]
|
||||
}
|
||||
if i := strings.IndexByte(s, '/'); i >= 0 {
|
||||
s = s[:i]
|
||||
}
|
||||
if i := strings.IndexByte(s, '?'); i >= 0 {
|
||||
s = s[:i]
|
||||
}
|
||||
return registrableSocial(s)
|
||||
}
|
||||
|
||||
// ── consent-state detection: port of the _consent_log machinery ──────────────
|
||||
//
|
||||
// CMP (Consent Management Platform) cookie name prefixes + loader URL fragments,
|
||||
// verbatim from social_graph._CMP_COOKIE_PREFIXES / _CMP_LOADER_FRAGMENTS. Seen
|
||||
// on a flow → the site runs a CMP (has_cmp) and, for a cookie, consent recorded
|
||||
// (consented). consent_state classifies a tracker edge as pre/post/none-consent.
|
||||
var cmpCookiePrefixes = []string{
|
||||
"optanonconsent", "onetrustconsent", "optanonalertboxclosed", // OneTrust
|
||||
"didomi_token", "euconsent-v2", // Didomi / IAB TCF
|
||||
"__qca", "quantcast", // Quantcast
|
||||
"sp_choice", "consentuid", "_sp_", // Sourcepoint
|
||||
}
|
||||
|
||||
var cmpLoaderFragments = []string{
|
||||
"cdn.cookielaw.org", "onetrust.com", // OneTrust
|
||||
"sdk.privacy-center.org", "didomi.io", // Didomi
|
||||
"quantcast.mgr.consensu.org", "quantcast.com/choice", // Quantcast
|
||||
"sourcepoint.mgr.consensu.org", "sp-prod.net", // Sourcepoint
|
||||
}
|
||||
|
||||
// consentObservation is the per-(peer,site) state, mirroring the Python dict
|
||||
// {"has_cmp": bool, "consented": bool}.
|
||||
type consentObservation struct {
|
||||
hasCMP bool
|
||||
consented bool
|
||||
}
|
||||
|
||||
// consentKey mirrors social_graph._consent_key = (mac_hash, site).
|
||||
type consentKey struct{ macHash, site string }
|
||||
|
||||
// consentLog is the bounded in-memory per-(peer,site) observation log, mirroring
|
||||
// the module-level _consent_log + its 20k soft-cap wholesale clear. The Go proxy
|
||||
// is genuinely concurrent (Python relied on the GIL), so all access is
|
||||
// mutex-guarded.
|
||||
type consentLog struct {
|
||||
mu sync.Mutex
|
||||
log map[consentKey]consentObservation
|
||||
}
|
||||
|
||||
const consentLogCap = 20000 // mirrors social_graph._consent_log soft cap
|
||||
|
||||
func newConsentLog() *consentLog {
|
||||
return &consentLog{log: map[consentKey]consentObservation{}}
|
||||
}
|
||||
|
||||
// update mirrors social_graph._update_consent_log: observe whether this flow
|
||||
// reveals a CMP loader (URL fragment, both request and response side) or a CMP
|
||||
// cookie (either direction) for the (peer,site) pair, and fold it into the log.
|
||||
// - url is flow.request.pretty_url (lower-cased here).
|
||||
// - cookieBlobs are the raw request Cookie + response Set-Cookie header lines.
|
||||
func (cl *consentLog) update(macHash, site, url string, cookieBlobs []string) {
|
||||
cl.mu.Lock()
|
||||
defer cl.mu.Unlock()
|
||||
if len(cl.log) > consentLogCap {
|
||||
cl.log = map[consentKey]consentObservation{}
|
||||
}
|
||||
key := consentKey{macHash: macHash, site: site}
|
||||
st := cl.log[key]
|
||||
|
||||
lurl := strings.ToLower(url)
|
||||
for _, frag := range cmpLoaderFragments {
|
||||
if strings.Contains(lurl, frag) {
|
||||
st.hasCMP = true
|
||||
break
|
||||
}
|
||||
}
|
||||
for _, blob := range cookieBlobs {
|
||||
low := strings.ToLower(blob)
|
||||
for _, pref := range cmpCookiePrefixes {
|
||||
if strings.Contains(low, pref) {
|
||||
st.hasCMP = true
|
||||
st.consented = true
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
cl.log[key] = st
|
||||
}
|
||||
|
||||
// stateFor mirrors social_graph._consent_state_for: post_consent if a consent
|
||||
// cookie was seen here, pre_consent if a CMP is present but no consent cookie
|
||||
// yet, none_seen otherwise.
|
||||
func (cl *consentLog) stateFor(macHash, site string) string {
|
||||
cl.mu.Lock()
|
||||
defer cl.mu.Unlock()
|
||||
st, ok := cl.log[consentKey{macHash: macHash, site: site}]
|
||||
if !ok {
|
||||
return "none_seen"
|
||||
}
|
||||
if st.consented {
|
||||
return "post_consent"
|
||||
}
|
||||
if st.hasCMP {
|
||||
return "pre_consent"
|
||||
}
|
||||
return "none_seen"
|
||||
}
|
||||
|
||||
// ── edge extraction: port of SocialGraph.response()+request() hook logic ──────
|
||||
|
||||
// socialEdge is one cross-site tracker edge, mirroring the kwargs the Python
|
||||
// social.record_edge accepts; serialised straight into the ingest batch.
|
||||
type socialEdge struct {
|
||||
ClientMacHash string `json:"client_mac_hash"`
|
||||
SrcSite string `json:"src_site"`
|
||||
TrackerDomain string `json:"tracker_domain"`
|
||||
CookieIDHashVal string `json:"cookie_id_hash_val"`
|
||||
JA4Hash string `json:"ja4_hash,omitempty"`
|
||||
ConsentState string `json:"consent_state"`
|
||||
}
|
||||
|
||||
// socialEdgesFor extracts the cross-site tracker edges for ONE MITM'd flow,
|
||||
// mirroring SocialGraph.response() + the request-Cookie tail. Pure (no I/O): the
|
||||
// caller emits the returned edges. macHash MUST be the WG persona hash (the
|
||||
// addon only fires for known R3 peers — empty macHash yields no edges). reqHost
|
||||
// is flow.request.host; reqURL is flow.request.pretty_url (for CMP loader
|
||||
// detection); ja4 is the captured fingerprint (may be "").
|
||||
//
|
||||
// Decision logic, faithful to the addon:
|
||||
// - src_site = registrableSocial(reqHost); skip if empty.
|
||||
// - update the consent log for (macHash, src_site), derive consent_state.
|
||||
// - Set-Cookie path (first 50): for each non-deny-listed cookie, tracker_domain
|
||||
// = registrableSocial(Domain= attr OR reqHost); emit IFF tracker_domain != ""
|
||||
// and != src_site (3rd-party).
|
||||
// - Cookie path: only when a Referer/Origin context site exists and differs
|
||||
// from the tracker (= registrableSocial(reqHost)); cap 5 Cookie headers ×
|
||||
// 50 pairs; emit per non-deny-listed cookie with the context site's
|
||||
// consent_state.
|
||||
func socialEdgesFor(macHash string, req *http.Request, resp *http.Response, reqHost, reqURL, ja4 string, cl *consentLog) []socialEdge {
|
||||
if macHash == "" || cl == nil {
|
||||
return nil
|
||||
}
|
||||
srcSite := registrableSocial(reqHost)
|
||||
if srcSite == "" {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Gather the cookie blobs (both directions) for the CMP cookie check, then
|
||||
// fold the consent observation BEFORE deriving consent_state (matches the
|
||||
// addon's ordering: _update_consent_log then _consent_state_for).
|
||||
var setCookies []string
|
||||
if resp != nil {
|
||||
setCookies = resp.Header.Values("Set-Cookie")
|
||||
}
|
||||
var reqCookies []string
|
||||
if req != nil {
|
||||
reqCookies = req.Header.Values("Cookie")
|
||||
}
|
||||
blobs := make([]string, 0, len(reqCookies)+len(setCookies))
|
||||
blobs = append(blobs, reqCookies...)
|
||||
blobs = append(blobs, setCookies...)
|
||||
cl.update(macHash, srcSite, reqURL, blobs)
|
||||
consentState := cl.stateFor(macHash, srcSite)
|
||||
|
||||
var edges []socialEdge
|
||||
|
||||
// Set-Cookie path — first 50 lines (matches the addon's [:50]).
|
||||
for i, sc := range setCookies {
|
||||
if i >= 50 {
|
||||
break
|
||||
}
|
||||
name, value, ok := parseSetCookieNameValue(sc)
|
||||
if !ok || isDenyListed(name) {
|
||||
continue
|
||||
}
|
||||
domainAttr := extractSetCookieDomainAttr(sc)
|
||||
issuer := domainAttr
|
||||
if issuer == "" {
|
||||
issuer = reqHost
|
||||
}
|
||||
trackerDomain := registrableSocial(issuer)
|
||||
if trackerDomain == "" || trackerDomain == srcSite {
|
||||
continue // 1st-party Set-Cookie: not a cross-site tracker signal.
|
||||
}
|
||||
edges = append(edges, socialEdge{
|
||||
ClientMacHash: macHash,
|
||||
SrcSite: srcSite,
|
||||
TrackerDomain: trackerDomain,
|
||||
CookieIDHashVal: cookieIDHash(trackerDomain, name, value),
|
||||
JA4Hash: ja4,
|
||||
ConsentState: consentState,
|
||||
})
|
||||
}
|
||||
|
||||
// Request-Cookie path — only when this request is itself for a 3rd-party
|
||||
// tracker and we have a differing 1st-party context from the Referer/Origin.
|
||||
if len(reqCookies) == 0 {
|
||||
return edges
|
||||
}
|
||||
trackerDomain := registrableSocial(reqHost)
|
||||
if trackerDomain == "" {
|
||||
return edges
|
||||
}
|
||||
ctxSite := srcSiteFromReferer(req)
|
||||
if ctxSite == "" || ctxSite == trackerDomain {
|
||||
return edges
|
||||
}
|
||||
ctxConsent := cl.stateFor(macHash, ctxSite)
|
||||
for i, hdr := range reqCookies {
|
||||
if i >= 5 { // addon caps Cookie headers at [:5]
|
||||
break
|
||||
}
|
||||
pairs := parseCookieHeader(hdr)
|
||||
for j, p := range pairs {
|
||||
if j >= 50 { // and pairs at [:50]
|
||||
break
|
||||
}
|
||||
if isDenyListed(p.name) {
|
||||
continue
|
||||
}
|
||||
edges = append(edges, socialEdge{
|
||||
ClientMacHash: macHash,
|
||||
SrcSite: ctxSite,
|
||||
TrackerDomain: trackerDomain,
|
||||
CookieIDHashVal: cookieIDHash(trackerDomain, p.name, p.value),
|
||||
JA4Hash: ja4,
|
||||
ConsentState: ctxConsent,
|
||||
})
|
||||
}
|
||||
}
|
||||
return edges
|
||||
}
|
||||
|
||||
// ── relay: batch + POST to the portal /__toolbox/social-event ingest ─────────
|
||||
|
||||
const (
|
||||
socialFlushInterval = 10 * time.Second // drain cadence (sibling of adFlushInterval)
|
||||
socialBatchCap = 5000 // max edges held between flushes (drop excess)
|
||||
)
|
||||
|
||||
// socialEventPayload mirrors the portal /__toolbox/social-event JSON contract.
|
||||
type socialEventPayload struct {
|
||||
Edges []socialEdge `json:"edges"`
|
||||
}
|
||||
|
||||
func (p socialEventPayload) empty() bool { return len(p.Edges) == 0 }
|
||||
|
||||
// socialRelay buffers extracted edges and flushes them to the portal. Bounded:
|
||||
// once the buffer holds socialBatchCap edges, NEW edges are dropped until the
|
||||
// next flush clears it (a dead portal can never grow memory unbounded). Edges
|
||||
// carry ONLY the cookieIDHash — never raw values (privacy/CSPN).
|
||||
type socialRelay struct {
|
||||
mu sync.Mutex
|
||||
buf []socialEdge
|
||||
}
|
||||
|
||||
func newSocialRelay() *socialRelay { return &socialRelay{} }
|
||||
|
||||
// add appends edges to the buffer under the cap. Never blocks the flow.
|
||||
func (s *socialRelay) add(edges ...socialEdge) {
|
||||
if len(edges) == 0 {
|
||||
return
|
||||
}
|
||||
s.mu.Lock()
|
||||
defer s.mu.Unlock()
|
||||
for _, e := range edges {
|
||||
if len(s.buf) >= socialBatchCap {
|
||||
return
|
||||
}
|
||||
s.buf = append(s.buf, e)
|
||||
}
|
||||
}
|
||||
|
||||
// snapshot atomically reads-and-clears the buffer.
|
||||
func (s *socialRelay) snapshot() socialEventPayload {
|
||||
s.mu.Lock()
|
||||
defer s.mu.Unlock()
|
||||
if len(s.buf) == 0 {
|
||||
return socialEventPayload{}
|
||||
}
|
||||
p := socialEventPayload{Edges: s.buf}
|
||||
s.buf = nil
|
||||
return p
|
||||
}
|
||||
|
||||
// socialEventClient is the short-timeout fire-and-forget client for the
|
||||
// social-event POST (sibling of adEventClient). Never follows redirects (SSRF
|
||||
// hygiene); tight timeout so a slow portal can't stall the flusher.
|
||||
var socialEventClient = &http.Client{
|
||||
Timeout: 5 * time.Second,
|
||||
CheckRedirect: func(*http.Request, []*http.Request) error { return http.ErrUseLastResponse },
|
||||
}
|
||||
|
||||
// flushOnce snapshots the buffer and, if non-empty, POSTs it to the portal's
|
||||
// /__toolbox/social-event ingest. Best-effort: any error is swallowed with at
|
||||
// most a log line — the engine must never block on the portal. Returns the
|
||||
// flushed payload so the test can assert the snapshot/clear + shape.
|
||||
func (s *socialRelay) flushOnce(portal string) socialEventPayload {
|
||||
p := s.snapshot()
|
||||
if p.empty() {
|
||||
return p
|
||||
}
|
||||
buf, err := json.Marshal(p)
|
||||
if err != nil {
|
||||
log.Printf("social-event marshal failed: %v", err)
|
||||
return p
|
||||
}
|
||||
url := portalTargetURL(portal, "/__toolbox/social-event")
|
||||
resp, err := socialEventClient.Post(url, "application/json", bytes.NewReader(buf))
|
||||
if err != nil {
|
||||
log.Printf("social-event post failed for %s: %v", url, err)
|
||||
return p
|
||||
}
|
||||
resp.Body.Close()
|
||||
return p
|
||||
}
|
||||
|
||||
// ── proxy wiring ──────────────────────────────────────────────────────────
|
||||
|
||||
// socialEnabled reports whether cross-site correlation is on (--social-relay →
|
||||
// Proxy.socialRelayOn, with the buffer + consent log allocated). Nil-safe so the
|
||||
// CONNECT PoC / tests that build a bare Proxy can call it.
|
||||
func (px *Proxy) socialEnabled() bool {
|
||||
return px != nil && px.socialRelayOn && px.social != nil && px.consent != nil
|
||||
}
|
||||
|
||||
// emitSocial extracts the cross-site tracker edges for a MITM'd flow and buffers
|
||||
// them for the batched portal POST. clientIP is the client's peer IP; the per-
|
||||
// client identity is the WG persona hash (macHashOf) — NOT the raw-IP fallback,
|
||||
// so non-WG flows produce no edges, exactly like the Python addon's
|
||||
// _client_mac_hash gate. Gated, pure (the buffer.add is O(1) under a short
|
||||
// mutex), never blocks the flow. reqURL feeds the CMP loader-fragment check.
|
||||
func (px *Proxy) emitSocial(clientIP, host string, req *http.Request, resp *http.Response) {
|
||||
if !px.socialEnabled() || req == nil {
|
||||
return
|
||||
}
|
||||
macHash := macHashOf(clientIP)
|
||||
if macHash == "" {
|
||||
return // known R3 WG peers only (addon: `if not mac_hash: return`)
|
||||
}
|
||||
reqURL := req.URL.String()
|
||||
edges := socialEdgesFor(macHash, req, resp, host, reqURL, "", px.consent)
|
||||
px.social.add(edges...)
|
||||
}
|
||||
|
||||
// runFlusher is the background flusher goroutine: every socialFlushInterval it
|
||||
// drains the buffer to the portal. Start once from main(); runs for the process
|
||||
// lifetime.
|
||||
func (s *socialRelay) runFlusher(portal string) {
|
||||
t := time.NewTicker(socialFlushInterval)
|
||||
defer t.Stop()
|
||||
for range t.C {
|
||||
s.flushOnce(portal)
|
||||
}
|
||||
}
|
||||
297
packages/secubox-toolbox-ng/cmd/sbxmitm/social_test.go
Normal file
297
packages/secubox-toolbox-ng/cmd/sbxmitm/social_test.go
Normal file
|
|
@ -0,0 +1,297 @@
|
|||
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||
//
|
||||
// Cross-engine SOCIAL parity + decision harness — Go side (#662).
|
||||
//
|
||||
// Anti-rig: loads testdata/social-cookie-id-fixtures.json (GENERATED by the real
|
||||
// secubox_toolbox.social.cookie_id_hash) and asserts cookieIDHash reproduces
|
||||
// every `expect` byte-for-byte — Python is the source of truth, exactly like the
|
||||
// jar parity harness. The Python side is tests/test_social_parity.py.
|
||||
//
|
||||
// The rest exercises the ported decision surface: deny-list, registrableSocial
|
||||
// (the addon flavour, NOT policy.registrable), the 3rd-party Set-Cookie + Cookie
|
||||
// edge extraction, consent_state classification, and the relay buffer/flush.
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
)
|
||||
|
||||
type socialCookieFixture struct {
|
||||
TrackerDomain string `json:"tracker_domain"`
|
||||
CookieName string `json:"cookie_name"`
|
||||
CookieValue string `json:"cookie_value"`
|
||||
Expect string `json:"expect"`
|
||||
Why string `json:"why"`
|
||||
}
|
||||
|
||||
type socialCookieFile struct {
|
||||
Fixtures []socialCookieFixture `json:"fixtures"`
|
||||
}
|
||||
|
||||
// TestCookieIDHashParity: cookieIDHash == the Python-generated expect for every
|
||||
// fixture. This is the anti-rig that proves the Go hash is byte-identical to
|
||||
// social.cookie_id_hash (lower-case domain+name, raw value, NUL separators).
|
||||
func TestCookieIDHashParity(t *testing.T) {
|
||||
dir := testdataDir(t)
|
||||
raw, err := os.ReadFile(filepath.Join(dir, "social-cookie-id-fixtures.json"))
|
||||
if err != nil {
|
||||
t.Fatalf("read social fixtures: %v", err)
|
||||
}
|
||||
var f socialCookieFile
|
||||
if err := json.Unmarshal(raw, &f); err != nil {
|
||||
t.Fatalf("parse social fixtures: %v", err)
|
||||
}
|
||||
if len(f.Fixtures) == 0 {
|
||||
t.Fatal("no social cookie-id fixtures")
|
||||
}
|
||||
for _, fx := range f.Fixtures {
|
||||
got := cookieIDHash(fx.TrackerDomain, fx.CookieName, fx.CookieValue)
|
||||
if got != fx.Expect {
|
||||
t.Errorf("cookieIDHash(%q,%q,%q)=%q want %q (%s)",
|
||||
fx.TrackerDomain, fx.CookieName, fx.CookieValue, got, fx.Expect, fx.Why)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestCookieIDHashFolding: domain+name are lower-cased but the value is NOT —
|
||||
// the explicit invariant the store contract pins.
|
||||
func TestCookieIDHashFolding(t *testing.T) {
|
||||
if cookieIDHash("DoubleClick.NET", "IDE", "AbC") != cookieIDHash("doubleclick.net", "ide", "AbC") {
|
||||
t.Error("domain+name must be case-folded")
|
||||
}
|
||||
if cookieIDHash("d.net", "n", "AbC") == cookieIDHash("d.net", "n", "abc") {
|
||||
t.Error("value must NOT be case-folded")
|
||||
}
|
||||
}
|
||||
|
||||
func TestIsDenyListed(t *testing.T) {
|
||||
deny := []string{"PHPSESSID", "session", " csrftoken ", "__cf_bm", "consent", "locale", "", " "}
|
||||
for _, n := range deny {
|
||||
if !isDenyListed(n) {
|
||||
t.Errorf("isDenyListed(%q) = false, want true", n)
|
||||
}
|
||||
}
|
||||
allow := []string{"IDE", "_ga", "_fbp", "uid", "datr"}
|
||||
for _, n := range allow {
|
||||
if isDenyListed(n) {
|
||||
t.Errorf("isDenyListed(%q) = true, want false", n)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestRegistrableSocial: the addon flavour — IP literals pass through (NOT ""),
|
||||
// no port strip semantics needed, the larger multi-label table.
|
||||
func TestRegistrableSocial(t *testing.T) {
|
||||
cases := map[string]string{
|
||||
"www.lemonde.fr": "lemonde.fr",
|
||||
"cdn.api.example.co.uk": "example.co.uk",
|
||||
"tracker.com": "tracker.com",
|
||||
"a.b.c.doubleclick.net": "doubleclick.net",
|
||||
"WWW.Example.COM": "example.com",
|
||||
"sub.example.com.au": "example.com.au",
|
||||
"192.168.1.1": "192.168.1.1", // IP literal as-is (addon), store drops later
|
||||
".trailing.dot.net.": "dot.net",
|
||||
"single": "single",
|
||||
"": "",
|
||||
}
|
||||
for in, want := range cases {
|
||||
if got := registrableSocial(in); got != want {
|
||||
t.Errorf("registrableSocial(%q)=%q want %q", in, got, want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseSetCookieNameValue(t *testing.T) {
|
||||
cases := []struct {
|
||||
in string
|
||||
name, value string
|
||||
ok bool
|
||||
}{
|
||||
{"IDE=AHWqTUm; Domain=.doubleclick.net; Path=/", "IDE", "AHWqTUm", true},
|
||||
{" _ga = GA1.2.3 ; Max-Age=63", "_ga", "GA1.2.3", true},
|
||||
{"Secure; HttpOnly", "", "", false},
|
||||
{"=novalue", "", "", false},
|
||||
{"empty=", "empty", "", true},
|
||||
}
|
||||
for _, c := range cases {
|
||||
n, v, ok := parseSetCookieNameValue(c.in)
|
||||
if n != c.name || v != c.value || ok != c.ok {
|
||||
t.Errorf("parseSetCookieNameValue(%q)=(%q,%q,%v) want (%q,%q,%v)", c.in, n, v, ok, c.name, c.value, c.ok)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestExtractSetCookieDomainAttr(t *testing.T) {
|
||||
cases := map[string]string{
|
||||
"IDE=x; Domain=.doubleclick.net; Path=/": "doubleclick.net",
|
||||
"a=b; domain=Example.COM": "example.com",
|
||||
"a=b; Path=/": "",
|
||||
"a=b": "",
|
||||
"a=domainlike=1; Path=/": "", // value containing "domain" is not the attr
|
||||
}
|
||||
for in, want := range cases {
|
||||
if got := extractSetCookieDomainAttr(in); got != want {
|
||||
t.Errorf("extractSetCookieDomainAttr(%q)=%q want %q", in, got, want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestSrcSiteFromReferer(t *testing.T) {
|
||||
req := httptest.NewRequest("GET", "https://tracker.io/p.gif", nil)
|
||||
if got := srcSiteFromReferer(req); got != "" {
|
||||
t.Errorf("no referer → %q want \"\"", got)
|
||||
}
|
||||
req.Header.Set("Referer", "https://www.lemonde.fr/article?x=1")
|
||||
if got := srcSiteFromReferer(req); got != "lemonde.fr" {
|
||||
t.Errorf("referer → %q want lemonde.fr", got)
|
||||
}
|
||||
req.Header.Del("Referer")
|
||||
req.Header.Set("Origin", "https://news.example.co.uk")
|
||||
if got := srcSiteFromReferer(req); got != "example.co.uk" {
|
||||
t.Errorf("origin fallback → %q want example.co.uk", got)
|
||||
}
|
||||
}
|
||||
|
||||
// helper: build a response with the given Set-Cookie lines.
|
||||
func respWithSetCookies(lines ...string) *http.Response {
|
||||
h := http.Header{}
|
||||
for _, l := range lines {
|
||||
h.Add("Set-Cookie", l)
|
||||
}
|
||||
return &http.Response{Header: h}
|
||||
}
|
||||
|
||||
// TestSocialEdgesThirdParty: a 3rd-party Set-Cookie (Domain= a different eTLD+1)
|
||||
// on a 1st-party page yields one edge with the right src_site/tracker_domain.
|
||||
func TestSocialEdgesThirdParty(t *testing.T) {
|
||||
req := httptest.NewRequest("GET", "https://ads.doubleclick.net/pixel", nil)
|
||||
resp := respWithSetCookies("IDE=AHWqTUm; Domain=.doubleclick.net; Path=/")
|
||||
// reqHost is the responding host (doubleclick) — but src_site is also derived
|
||||
// from it; so to model a TRUE 3rd-party we use the Domain attr differing from
|
||||
// the request host's registrable. Here both are doubleclick.net → 1st-party,
|
||||
// expect NO edge.
|
||||
edges := socialEdgesFor("machash1", req, resp, "ads.doubleclick.net", "https://ads.doubleclick.net/pixel", "", newConsentLog())
|
||||
if len(edges) != 0 {
|
||||
t.Fatalf("1st-party Set-Cookie should yield 0 edges, got %d", len(edges))
|
||||
}
|
||||
|
||||
// Now a genuine 3rd-party: the page host is lemonde.fr, a Set-Cookie with
|
||||
// Domain=.doubleclick.net (the embedded tracker setting on its own domain via
|
||||
// the request being to doubleclick but src derived from referer is the
|
||||
// request-cookie path; the Set-Cookie path uses reqHost as src). Model the
|
||||
// addon's Set-Cookie path: reqHost=lemonde.fr, Domain attr=doubleclick.net.
|
||||
resp2 := respWithSetCookies("IDE=AHWqTUm; Domain=.doubleclick.net; Path=/")
|
||||
edges = socialEdgesFor("machash1", req, resp2, "www.lemonde.fr", "https://www.lemonde.fr/", "", newConsentLog())
|
||||
if len(edges) != 1 {
|
||||
t.Fatalf("3rd-party Set-Cookie should yield 1 edge, got %d", len(edges))
|
||||
}
|
||||
e := edges[0]
|
||||
if e.SrcSite != "lemonde.fr" || e.TrackerDomain != "doubleclick.net" {
|
||||
t.Errorf("edge src/tracker = %q/%q want lemonde.fr/doubleclick.net", e.SrcSite, e.TrackerDomain)
|
||||
}
|
||||
if e.CookieIDHashVal != cookieIDHash("doubleclick.net", "IDE", "AHWqTUm") {
|
||||
t.Errorf("edge cookie id hash mismatch: %q", e.CookieIDHashVal)
|
||||
}
|
||||
if e.ConsentState != "none_seen" {
|
||||
t.Errorf("consent_state = %q want none_seen", e.ConsentState)
|
||||
}
|
||||
}
|
||||
|
||||
// TestSocialEdgesDenyAndIP: deny-listed names produce no edge; IP-literal hosts
|
||||
// produce no edge (registrableSocial returns the IP, store drops it — but src
|
||||
// derivation: an IP src_site == IP tracker → not 3rd party anyway).
|
||||
func TestSocialEdgesDenyAndIP(t *testing.T) {
|
||||
req := httptest.NewRequest("GET", "https://x/", nil)
|
||||
resp := respWithSetCookies("PHPSESSID=abc; Domain=.doubleclick.net")
|
||||
edges := socialEdgesFor("m", req, resp, "www.lemonde.fr", "https://www.lemonde.fr/", "", newConsentLog())
|
||||
if len(edges) != 0 {
|
||||
t.Fatalf("deny-listed cookie should yield 0 edges, got %d", len(edges))
|
||||
}
|
||||
// empty mac hash → no edges (R3-only gate)
|
||||
if e := socialEdgesFor("", req, respWithSetCookies("IDE=x; Domain=.doubleclick.net"), "www.lemonde.fr", "u", "", newConsentLog()); len(e) != 0 {
|
||||
t.Fatalf("empty macHash should yield 0 edges, got %d", len(e))
|
||||
}
|
||||
}
|
||||
|
||||
// TestSocialEdgesRequestCookiePath: a request TO a tracker carrying a Cookie,
|
||||
// with a Referer to a different 1st-party, yields an edge attributed to the
|
||||
// referer's site.
|
||||
func TestSocialEdgesRequestCookiePath(t *testing.T) {
|
||||
req := httptest.NewRequest("GET", "https://ads.doubleclick.net/px", nil)
|
||||
req.Header.Set("Cookie", "IDE=AHWqTUm; session=secret")
|
||||
req.Header.Set("Referer", "https://www.lemonde.fr/article")
|
||||
// No Set-Cookie in the response; src_site = registrableSocial(reqHost) =
|
||||
// doubleclick.net; the Set-Cookie loop emits nothing; the request-Cookie tail
|
||||
// uses ctxSite=lemonde.fr (referer) != tracker doubleclick.net → edge. The
|
||||
// deny-listed `session` cookie is skipped, so exactly 1 edge (IDE).
|
||||
edges := socialEdgesFor("m", req, &http.Response{Header: http.Header{}}, "ads.doubleclick.net", "https://ads.doubleclick.net/px", "", newConsentLog())
|
||||
if len(edges) != 1 {
|
||||
t.Fatalf("request-cookie path should yield 1 edge, got %d", len(edges))
|
||||
}
|
||||
if edges[0].SrcSite != "lemonde.fr" || edges[0].TrackerDomain != "doubleclick.net" {
|
||||
t.Errorf("edge = %q/%q want lemonde.fr/doubleclick.net", edges[0].SrcSite, edges[0].TrackerDomain)
|
||||
}
|
||||
}
|
||||
|
||||
// TestConsentLog: loader fragment → pre_consent; CMP cookie → post_consent.
|
||||
func TestConsentLog(t *testing.T) {
|
||||
cl := newConsentLog()
|
||||
if got := cl.stateFor("m", "lemonde.fr"); got != "none_seen" {
|
||||
t.Errorf("fresh → %q want none_seen", got)
|
||||
}
|
||||
// CMP loader request observed (no consent cookie yet) → pre_consent.
|
||||
cl.update("m", "lemonde.fr", "https://cdn.cookielaw.org/consent/scripttemplates/otSDKStub.js", nil)
|
||||
if got := cl.stateFor("m", "lemonde.fr"); got != "pre_consent" {
|
||||
t.Errorf("after CMP loader → %q want pre_consent", got)
|
||||
}
|
||||
// CMP consent cookie observed → post_consent.
|
||||
cl.update("m", "lemonde.fr", "https://www.lemonde.fr/", []string{"OptanonConsent=isGpcEnabled=0; Path=/"})
|
||||
if got := cl.stateFor("m", "lemonde.fr"); got != "post_consent" {
|
||||
t.Errorf("after CMP cookie → %q want post_consent", got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestSocialRelayFlush: the buffer batches edges and flushOnce POSTs them to the
|
||||
// portal /__toolbox/social-event, then clears.
|
||||
func TestSocialRelayFlush(t *testing.T) {
|
||||
var got socialEventPayload
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
if r.URL.Path != "/__toolbox/social-event" {
|
||||
t.Errorf("unexpected path %q", r.URL.Path)
|
||||
}
|
||||
_ = json.NewDecoder(r.Body).Decode(&got)
|
||||
w.WriteHeader(204)
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
s := newSocialRelay()
|
||||
s.add(socialEdge{ClientMacHash: "m", SrcSite: "a.fr", TrackerDomain: "t.com", CookieIDHashVal: "deadbeef", ConsentState: "none_seen"})
|
||||
p := s.flushOnce(srv.URL)
|
||||
if len(p.Edges) != 1 || len(got.Edges) != 1 {
|
||||
t.Fatalf("flush sent %d / server got %d, want 1/1", len(p.Edges), len(got.Edges))
|
||||
}
|
||||
if got.Edges[0].TrackerDomain != "t.com" {
|
||||
t.Errorf("server edge tracker = %q want t.com", got.Edges[0].TrackerDomain)
|
||||
}
|
||||
// Buffer cleared: a second flush sends nothing.
|
||||
if p2 := s.flushOnce(srv.URL); !p2.empty() {
|
||||
t.Errorf("second flush should be empty, got %d edges", len(p2.Edges))
|
||||
}
|
||||
}
|
||||
|
||||
// TestSocialRelayCap: the buffer never exceeds socialBatchCap.
|
||||
func TestSocialRelayCap(t *testing.T) {
|
||||
s := newSocialRelay()
|
||||
for i := 0; i < socialBatchCap+100; i++ {
|
||||
s.add(socialEdge{ClientMacHash: "m", SrcSite: "a", TrackerDomain: "t", CookieIDHashVal: "h", ConsentState: "none_seen"})
|
||||
}
|
||||
if got := s.snapshot(); len(got.Edges) != socialBatchCap {
|
||||
t.Errorf("buffer held %d edges, want cap %d", len(got.Edges), socialBatchCap)
|
||||
}
|
||||
}
|
||||
|
|
@ -389,7 +389,11 @@ func (px *Proxy) handleTransparent(client net.Conn) {
|
|||
// over a replayable conn, then run the shared pipeline dialling the captured
|
||||
// original-dst (NOT the SNI).
|
||||
replay := &prefixConn{prefix: hello, Conn: client}
|
||||
tconn := tls.Server(replay, px.serverTLSConfig())
|
||||
// The capture hook relays the ja4 ClientHello payload for this handshake,
|
||||
// tagged with the REAL transparent peer IP from the raw client conn (#662).
|
||||
// nil when the relay gate is off. Emitted around Decide → blocked/allowed
|
||||
// alike, matching the Python addon's per-tls_clienthello behaviour.
|
||||
tconn := tls.Server(replay, px.serverTLSConfigCapture(px.captureAndEmitJA4(client)))
|
||||
if err := tconn.Handshake(); err != nil {
|
||||
return
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,3 +1,58 @@
|
|||
secubox-toolbox-ng (0.1.14-1~bookworm1) bookworm; urgency=medium
|
||||
|
||||
* quic/banner: strip Alt-Svc response header so browsers stop learning/preferring
|
||||
HTTP/3 (h3) and stay on HTTP/2-over-TCP (MITM-able). Complements the nft
|
||||
udp443 reject; addresses sites where browsers ignore the reject and keep
|
||||
retrying QUIC, bypassing inject/adblock/metrics. (ref #662)
|
||||
|
||||
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 14:30:00 +0000
|
||||
|
||||
secubox-toolbox-ng (0.1.13-1~bookworm1) bookworm; urgency=medium
|
||||
|
||||
* banner: INLINE the banner (server-side bundle fetch, baked literals) instead
|
||||
of <script src>/fetch — defeats site service workers that intercept the
|
||||
same-origin /__toolbox/* requests (leparisien, cnn). Fail-open. (ref #662)
|
||||
|
||||
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 13:15:00 +0000
|
||||
|
||||
secubox-toolbox-ng (0.1.12-1~bookworm1) bookworm; urgency=medium
|
||||
|
||||
* adlearn: live-reload the blocklist (mtime) so promotions/edits block without
|
||||
a worker restart; emit ad-candidates (3rd-party ad-path) to the portal;
|
||||
autolearn also promotes cross-site trackers from social_edges. Learned
|
||||
trackers are auto-204 + poison-smogged. (ref #662)
|
||||
|
||||
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 12:30:00 +0000
|
||||
|
||||
secubox-toolbox-ng (0.1.11-1~bookworm1) bookworm; urgency=medium
|
||||
|
||||
* social: ALSO correlate on the block path — blocked 3rd-party trackers still
|
||||
carry the browser's request Cookie (the cross-site evidence); without this
|
||||
the /social graph misses the very trackers it exists to expose (they're 204'd
|
||||
before the allow/mitm correlation). resp=nil request-only, hash-only. (ref #662)
|
||||
|
||||
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 11:55:00 +0000
|
||||
|
||||
secubox-toolbox-ng (0.1.10-1~bookworm1) bookworm; urgency=medium
|
||||
|
||||
* social: faithfully port the in-process social_graph correlation — the engine
|
||||
computes cross-site tracker edges (byte-exact cookie_id_hash, deny-list,
|
||||
eTLD+1 3rd-party check, CMP consent_state) and relays HASH-ONLY edges
|
||||
(never raw values, WG-only) to the new portal /__toolbox/social-event →
|
||||
social.record_edge → /social graph un-frozen. --social-relay (default on). (ref #662)
|
||||
|
||||
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 11:30:00 +0000
|
||||
|
||||
secubox-toolbox-ng (0.1.9-1~bookworm1) bookworm; urgency=medium
|
||||
|
||||
* telemetry: relay per-flow metadata to the analysis sidecars (dpi /classify,
|
||||
cookies /inject, threat-analyst /ja4) — restoring the kbin "Qui te piste?"
|
||||
events frozen since the Phase-7 cutover. Fire-and-forget, names-only cookies,
|
||||
gated --analysis-relay (default on). The sidecars enrich + write toolbox
|
||||
events → cumulative-stats live again with real host classification. (ref #662)
|
||||
|
||||
-- Gerald KERMA <devel@cybermind.fr> Thu, 19 Jun 2026 10:40:00 +0000
|
||||
|
||||
secubox-toolbox-ng (0.1.8-1~bookworm1) bookworm; urgency=medium
|
||||
|
||||
* demo/csp: only relax + flag 🔓 when the page's effective script directive
|
||||
|
|
|
|||
61
packages/secubox-toolbox-ng/testdata/social-cookie-id-fixtures.json
vendored
Normal file
61
packages/secubox-toolbox-ng/testdata/social-cookie-id-fixtures.json
vendored
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
{
|
||||
"_comment": "Cross-engine parity fixtures for social.cookie_id_hash (#662). GENERATED by the real secubox_toolbox.social.cookie_id_hash (Python = source of truth); the Go cookieIDHash MUST reproduce every `expect` byte-for-byte. Note: tracker_domain + cookie_name are LOWER-cased before hashing, the cookie_value is NOT; NUL (0x00) separators; UTF-8 with 'replace' errors. See tests/test_social_parity.py (Python) ↔ social_test.go (Go).",
|
||||
"fixtures": [
|
||||
{
|
||||
"tracker_domain": "doubleclick.net",
|
||||
"cookie_name": "IDE",
|
||||
"cookie_value": "AHWqTUm123",
|
||||
"expect": "8e7fadaeb2584768",
|
||||
"why": "plain ascii"
|
||||
},
|
||||
{
|
||||
"tracker_domain": "DoubleClick.NET",
|
||||
"cookie_name": "ide",
|
||||
"cookie_value": "AHWqTUm123",
|
||||
"expect": "8e7fadaeb2584768",
|
||||
"why": "domain+name UPPER folded, value verbatim -> identical hash to #1 (proves domain+name are lower-cased)"
|
||||
},
|
||||
{
|
||||
"tracker_domain": "doubleclick.net",
|
||||
"cookie_name": "IDE",
|
||||
"cookie_value": "ahwqtum123",
|
||||
"expect": "550317c9729652c2",
|
||||
"why": "value lower-cased DIFFERS from #1 (proves the VALUE is NOT folded)"
|
||||
},
|
||||
{
|
||||
"tracker_domain": "ads.example.com",
|
||||
"cookie_name": "_ga",
|
||||
"cookie_value": "GA1.2.999.111",
|
||||
"expect": "89a398ebd72ee863",
|
||||
"why": "GA cookie"
|
||||
},
|
||||
{
|
||||
"tracker_domain": "tracker.io",
|
||||
"cookie_name": "uid",
|
||||
"cookie_value": "Ünîcødé✓",
|
||||
"expect": "3b4923e9d9bb77a2",
|
||||
"why": "unicode value (utf-8 encoded)"
|
||||
},
|
||||
{
|
||||
"tracker_domain": "tracker.io",
|
||||
"cookie_name": "Ünîcödé",
|
||||
"cookie_value": "val",
|
||||
"expect": "d4db5a0d71216313",
|
||||
"why": "unicode cookie NAME (lower-cased + utf-8)"
|
||||
},
|
||||
{
|
||||
"tracker_domain": "",
|
||||
"cookie_name": "x",
|
||||
"cookie_value": "y",
|
||||
"expect": "2081f4f26135019e",
|
||||
"why": "empty domain still hashes (NUL separators)"
|
||||
},
|
||||
{
|
||||
"tracker_domain": "d.net",
|
||||
"cookie_name": "n",
|
||||
"cookie_value": "",
|
||||
"expect": "b0da6b889cb198a1",
|
||||
"why": "empty value"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -1,3 +1,19 @@
|
|||
secubox-toolbox (2.7.0-1~bookworm1) bookworm; urgency=medium
|
||||
|
||||
* MIDDLE RELEASE — caps the 2.6.x line (ad-intelligence / Anti-Track v2 /
|
||||
anti-bot uTLS) and opens the kbin "first tool of the Swiss-army cyber kit"
|
||||
chapter. kbin now delivers: transparent performance, full-MITM encrypted
|
||||
inspection, ad poison/smog injection, the adware-ban transparency banner,
|
||||
and safe browsing.
|
||||
* docs: kbin use-case consolidated — wiki `Kbin-Toolbox.md`, `FAQ-KBIN-TOR.md`,
|
||||
README positioning blurb.
|
||||
* plan(#683): next chapter staged — kbin Tor endpoint, a quick-switch that
|
||||
re-routes consenting client surfing through Tor (outbound egress, pseudo-
|
||||
network) so the kbin exit is anonymized. Design spec landed; no behaviour
|
||||
change yet (default OFF, fail-closed by design).
|
||||
|
||||
-- Gerald KERMA <devel@cybermind.fr> Fri, 19 Jun 2026 11:00:00 +0200
|
||||
|
||||
secubox-toolbox (2.6.59-1~bookworm1) bookworm; urgency=medium
|
||||
|
||||
* ui: cap all admin dashboard lists to top-5 shown — #filtres bypass hosts,
|
||||
|
|
|
|||
|
|
@ -51,13 +51,19 @@ table inet wg-toolbox {
|
|||
|
||||
chain forward {
|
||||
type filter hook forward priority filter; policy accept;
|
||||
# Phase 6.K / #662 — drop UDP 443 (QUIC/HTTP3) FIRST, before the blanket
|
||||
# outbound accept below. If it sits AFTER the accept it is never reached
|
||||
# (the accept terminates evaluation) → QUIC slips through and the whole
|
||||
# MITM is bypassed (no inject, no ad-block, no metrics, no social). The
|
||||
# REJECT (not drop) forces Chrome/Firefox to fall back to HTTP/2 over TCP
|
||||
# IMMEDIATELY: a silent drop just makes the browser RETRY QUIC for tens of
|
||||
# seconds (observed 199 retry packets, never falling back) — an ICMP
|
||||
# port-unreachable tells it "no QUIC here" at once. First in the chain so
|
||||
# it also breaks existing QUIC sessions (outbound). ORDER IS LOAD-BEARING.
|
||||
iif "wg-toolbox" udp dport 443 counter reject
|
||||
# Outbound from tunnel → internet
|
||||
iif "wg-toolbox" oif "lan0" accept
|
||||
# Return traffic
|
||||
iif "lan0" oif "wg-toolbox" ct state established,related accept
|
||||
# Phase 6.K — drop UDP 443 (QUIC/HTTP3) so browsers fall back to
|
||||
# HTTP/2 over TCP, which our DNAT can intercept. Without this,
|
||||
# Chrome/Firefox prefer QUIC and bypass mitm entirely.
|
||||
iif "wg-toolbox" udp dport 443 counter drop
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -221,6 +221,92 @@ def _ad_feed() -> int:
|
|||
return len(promoted)
|
||||
|
||||
|
||||
# #662 — cross-site-reuse promotion. A tracker_domain seen issuing cookies on
|
||||
# >= SOCIAL_MIN_SITES DISTINCT src_site (across peers, recent window) is a
|
||||
# BEHAVIOURALLY-confirmed cross-site tracker (the social graph), independent of
|
||||
# the ad-path heuristic. Promote it into learned-trackers.txt so the engine
|
||||
# blocks (204) + smogs it. Conservative + reuses the SAME allowlist/self guard as
|
||||
# _ad_feed (NEVER promote allowlisted or self domains). De-dups against OUT.
|
||||
SOCIAL_MIN_SITES = int(os.environ.get("SECUBOX_SOCIAL_MIN_SITES", "3"))
|
||||
SOCIAL_WINDOW_HOURS = int(os.environ.get("SECUBOX_SOCIAL_WINDOW_HOURS", "168"))
|
||||
|
||||
|
||||
def _social_feed() -> int:
|
||||
"""Promote cross-site cookie-reuse trackers (social_edges) into the learned
|
||||
blocklist. A tracker_domain linking >= SOCIAL_MIN_SITES distinct src_site in
|
||||
the last SOCIAL_WINDOW_HOURS is promoted. Allowlist + self domains excluded
|
||||
(reused guard). MERGES into OUT (never overwrites). Returns count promoted, or
|
||||
-1 if unavailable (e.g. no social_edges table). Best-effort: never raises."""
|
||||
cutoff = int(time.time()) - SOCIAL_WINDOW_HOURS * 3600
|
||||
try:
|
||||
con = sqlite3.connect(DB, timeout=5)
|
||||
rows = con.execute(
|
||||
"SELECT tracker_domain, COUNT(DISTINCT src_site) AS sites "
|
||||
"FROM social_edges WHERE ts >= ? "
|
||||
"GROUP BY tracker_domain", (cutoff,)).fetchall()
|
||||
con.close()
|
||||
except Exception as e:
|
||||
sys.stderr.write(f"autolearn: social query failed: {e}\n")
|
||||
return -1
|
||||
# Fold to registrable and aggregate the distinct-site count per eTLD+1 (two
|
||||
# tracker subdomains of the same registrable jointly meet the threshold).
|
||||
by_reg: dict[str, set] = {}
|
||||
try:
|
||||
scon = sqlite3.connect(DB, timeout=5)
|
||||
for td, _sites in rows:
|
||||
reg = registrable(td)
|
||||
if not reg:
|
||||
continue
|
||||
ss = by_reg.setdefault(reg, set())
|
||||
for (s,) in scon.execute(
|
||||
"SELECT DISTINCT src_site FROM social_edges "
|
||||
"WHERE ts >= ? AND tracker_domain = ?", (cutoff, td)):
|
||||
if s:
|
||||
ss.add(s)
|
||||
scon.close()
|
||||
except Exception as e:
|
||||
sys.stderr.write(f"autolearn: social fold failed: {e}\n")
|
||||
return -1
|
||||
|
||||
allow = _load_ad_allowlist()
|
||||
self_doms = {d.strip().lower() for d in
|
||||
os.environ.get("SECUBOX_SELF_DOMAINS", "secubox.in").split(",")
|
||||
if d.strip()}
|
||||
promoted: set = set()
|
||||
for reg, sites in by_reg.items():
|
||||
if len(sites) < SOCIAL_MIN_SITES:
|
||||
continue
|
||||
if reg in allow:
|
||||
continue
|
||||
if reg in self_doms or any(reg == d or reg.endswith("." + d) for d in self_doms):
|
||||
continue
|
||||
promoted.add(reg)
|
||||
if not promoted:
|
||||
return 0
|
||||
existing: set = set()
|
||||
try:
|
||||
if os.path.exists(OUT):
|
||||
with open(OUT, encoding="utf-8") as fh:
|
||||
for ln in fh:
|
||||
ln = ln.strip()
|
||||
if ln:
|
||||
existing.add(ln)
|
||||
except Exception as e:
|
||||
sys.stderr.write(f"autolearn: social merge read failed: {e}\n")
|
||||
new = promoted - existing
|
||||
merged = sorted(existing | promoted)[:MAX_ENTRIES]
|
||||
try:
|
||||
os.makedirs(os.path.dirname(OUT), exist_ok=True)
|
||||
tmp = OUT + ".tmp"
|
||||
with open(tmp, "w", encoding="utf-8") as fh:
|
||||
fh.write("\n".join(merged) + ("\n" if merged else ""))
|
||||
os.replace(tmp, OUT)
|
||||
except Exception as e:
|
||||
sys.stderr.write(f"autolearn: social write failed: {e}\n")
|
||||
return -1
|
||||
return len(new)
|
||||
|
||||
|
||||
def main() -> int:
|
||||
learned: set[str] = set()
|
||||
try:
|
||||
|
|
@ -317,6 +403,11 @@ def main() -> int:
|
|||
sys.stderr.write(f"autolearn: {_n_ad} ad-candidate hosts promoted\n")
|
||||
except Exception as e:
|
||||
sys.stderr.write(f"autolearn: ad feed error: {e}\n")
|
||||
try:
|
||||
_n_social = _social_feed()
|
||||
sys.stderr.write(f"autolearn: {_n_social} cross-site cookie trackers promoted\n")
|
||||
except Exception as e:
|
||||
sys.stderr.write(f"autolearn: social feed error: {e}\n")
|
||||
sys.stderr.write(
|
||||
f"autolearn: {len(out)} hosts learned ({ti} threat-intel + "
|
||||
f"{len(out) - ti} classified cross-site) @ {int(time.time())}"
|
||||
|
|
|
|||
|
|
@ -57,10 +57,14 @@ router = APIRouter(tags=["toolbox"])
|
|||
@router.get("/__toolbox/loader.js")
|
||||
async def toolbox_loader_js() -> Response:
|
||||
"""Static cosmetic loader (applies the banner client-side from the bundle)."""
|
||||
# no-store: the loader is the banner entry point and evolves (SPA re-assert,
|
||||
# CSP proof, …). A long cache (was max-age=3600) pins stale loaders in clients
|
||||
# for up to an hour — so loader changes never reach already-visited sites. It's
|
||||
# 4 KB; serve it fresh every load so updates propagate immediately.
|
||||
return Response(
|
||||
content=bundlemod.LOADER_JS,
|
||||
media_type="application/javascript",
|
||||
headers={"Cache-Control": "public, max-age=3600"},
|
||||
headers={"Cache-Control": "no-store, no-cache, must-revalidate, max-age=0"},
|
||||
)
|
||||
|
||||
|
||||
|
|
@ -74,6 +78,31 @@ async def toolbox_bundle(mh: str = Query(default=""), wg: int = Query(default=0)
|
|||
)
|
||||
|
||||
|
||||
@router.get("/__toolbox/inline")
|
||||
async def toolbox_inline(
|
||||
mh: str = Query(default=""),
|
||||
wg: int = Query(default=0),
|
||||
csp: int = Query(default=0),
|
||||
) -> Response:
|
||||
"""#662 — COMPLETE self-contained inline banner script BODY.
|
||||
|
||||
Sites with a SERVICE WORKER (leparisien, cnn…) intercept every same-origin
|
||||
request, so the legacy ``<script src="/__toolbox/loader.js">`` + its
|
||||
``fetch("/__toolbox/bundle")`` are hijacked by the SW (404 / app-shell)
|
||||
before reaching our MITM engine → no banner. The Go engine fetches THIS
|
||||
body server-side at inject time and bakes it into a self-contained
|
||||
``<script>…</script>`` — no same-origin fetch for the SW to touch.
|
||||
|
||||
``mh`` / ``wg`` / ``csp`` come from the query params (baked as JS literals,
|
||||
not data-attrs / currentScript); the bundle is ``get_bundle(mh, wg)`` baked
|
||||
as a JSON literal (not fetched). no-store like the loader (it evolves)."""
|
||||
return Response(
|
||||
content=bundlemod.inline_script(mh, bool(wg), bool(csp)),
|
||||
media_type="application/javascript",
|
||||
headers={"Cache-Control": "no-store, no-cache, must-revalidate, max-age=0"},
|
||||
)
|
||||
|
||||
|
||||
# #662 — ad-block metrics ingest from the Go MITM engine (sbxmitm). The #662
|
||||
# cutover moved the BLOCK decision (204 on ad/tracker hosts) into the Go engine
|
||||
# but left the METRICS unported, so the #ads dashboard froze. The engine now
|
||||
|
|
@ -109,12 +138,20 @@ async def toolbox_ad_event(request: Request) -> Response:
|
|||
return Response(status_code=204)
|
||||
blocks = body.get("blocks") or []
|
||||
clients = body.get("clients") or []
|
||||
# #662 — the Go engine now also feeds the AUTO-LEARN loop: 3rd-party
|
||||
# ad-path requests it saw on the allow/mitm path (ad_ghost's _AD_PATH
|
||||
# heuristic), recorded as candidates here for secubox-toolbox-autolearn
|
||||
# to promote into learned-trackers.txt at AD_MIN_SITES distinct sites.
|
||||
candidates = body.get("candidates") or []
|
||||
if not isinstance(blocks, list):
|
||||
blocks = []
|
||||
if not isinstance(clients, list):
|
||||
clients = []
|
||||
if not isinstance(candidates, list):
|
||||
candidates = []
|
||||
blocks = blocks[:_AD_EVENT_ROW_CAP]
|
||||
clients = clients[:_AD_EVENT_ROW_CAP]
|
||||
candidates = candidates[:_AD_EVENT_ROW_CAP]
|
||||
|
||||
block_rows = [
|
||||
(b["ad_host"], b.get("site", ""), "block", int(b.get("hits", 0)), int(b.get("bytes", 0)))
|
||||
|
|
@ -126,14 +163,102 @@ async def toolbox_ad_event(request: Request) -> Response:
|
|||
for c in clients
|
||||
if isinstance(c, dict) and c.get("mac_hash") and c.get("ad_host")
|
||||
]
|
||||
cand_rows = [
|
||||
(c["host"], c.get("site", ""), int(c.get("hits", 0)))
|
||||
for c in candidates
|
||||
if isinstance(c, dict) and c.get("host")
|
||||
]
|
||||
if block_rows:
|
||||
store.record_ad_blocks(block_rows)
|
||||
if client_rows:
|
||||
store.record_ad_client_blocks(client_rows)
|
||||
if cand_rows:
|
||||
store.record_ad_candidates(cand_rows)
|
||||
except Exception as e: # never raise into the engine's fire-and-forget POST
|
||||
log.debug("ad-event ingest failed: %s", e)
|
||||
return Response(status_code=204)
|
||||
|
||||
|
||||
# #662 — cross-site cookie-tracker edge ingest from the Go MITM engine (sbxmitm).
|
||||
# The #662 Phase-7 cutover decommissioned the in-process Python social_graph addon
|
||||
# that fed social.record_edge(), so the kbin /social graph (social_edges →
|
||||
# social_nodes/social_links) froze. The engine now computes the SAME 3rd-party
|
||||
# cookie-tracker edges (FAITHFUL port of social_graph.py: deny-list, eTLD+1
|
||||
# 3rd-party check, cookie_id_hash, CMP consent_state) and POSTs a batch here. We
|
||||
# call social.record_edge() per row, which writes raw social_edges; the existing
|
||||
# app.py social_fold_loop folds them into nodes/links.
|
||||
#
|
||||
# Raw cookie VALUES never reach this endpoint — only the truncated cookie_id_hash
|
||||
# (privacy/CSPN; this is exactly why the original ran in-process).
|
||||
#
|
||||
# UNAUTHENTICATED, same trust note as /__toolbox/ad-event: the engine reaches the
|
||||
# portal only over the R3 nft perimeter (loopback / WG ingress).
|
||||
_SOCIAL_EVENT_ROW_CAP = 5000 # bound the edge list so a misbehaving engine can't flood us
|
||||
_SOCIAL_FOLD_DEBOUNCE = 60 # seconds: floor between in-handler safety folds
|
||||
_social_last_fold = 0.0 # module-level throttle timestamp
|
||||
|
||||
|
||||
@router.post("/__toolbox/social-event")
|
||||
async def toolbox_social_event(request: Request) -> Response:
|
||||
"""Ingest a batch of cross-site tracker edges from the Go engine. Best-effort:
|
||||
never 500s the engine (it is fire-and-forget) — always returns 204. See the
|
||||
trust note above for why this is unauthenticated."""
|
||||
global _social_last_fold
|
||||
try:
|
||||
# Body-size guard BEFORE parsing (mirrors /__toolbox/ad-event): the legit
|
||||
# payload (≤5000 edges) is well under 2 MB; reject larger outright so a
|
||||
# misbehaving/compromised WG peer can't pressure portal memory.
|
||||
try:
|
||||
clen = int(request.headers.get("content-length") or 0)
|
||||
except (TypeError, ValueError):
|
||||
clen = 0
|
||||
if clen > 2 * 1024 * 1024:
|
||||
return Response(status_code=204)
|
||||
body = await request.json()
|
||||
if not isinstance(body, dict):
|
||||
return Response(status_code=204)
|
||||
edges = body.get("edges") or []
|
||||
if not isinstance(edges, list):
|
||||
edges = []
|
||||
edges = edges[:_SOCIAL_EVENT_ROW_CAP]
|
||||
|
||||
from . import social as _social
|
||||
|
||||
recorded = 0
|
||||
for e in edges:
|
||||
if not isinstance(e, dict):
|
||||
continue
|
||||
try:
|
||||
_social.record_edge(
|
||||
client_mac_hash=e.get("client_mac_hash") or "",
|
||||
src_site=e.get("src_site") or "",
|
||||
tracker_domain=e.get("tracker_domain") or "",
|
||||
cookie_id_hash_val=e.get("cookie_id_hash_val") or "",
|
||||
ja4_hash=e.get("ja4_hash") or None,
|
||||
consent_state=e.get("consent_state") or "none_seen",
|
||||
)
|
||||
recorded += 1
|
||||
except Exception as row_err: # one bad row never fails the batch
|
||||
log.debug("social-event row failed: %s", row_err)
|
||||
|
||||
# Safety fold: the app.py social_fold_loop already folds every 5 min, but
|
||||
# fold here too (debounced to ≤ once / 60 s via a module-level timestamp)
|
||||
# so a freshly-ingested edge surfaces in the d3 graph promptly even between
|
||||
# loop ticks. Cheap (indexed window scan) and self-throttling; a fold
|
||||
# failure is swallowed (the loop will catch up).
|
||||
if recorded:
|
||||
now = time.time()
|
||||
if now - _social_last_fold >= _SOCIAL_FOLD_DEBOUNCE:
|
||||
_social_last_fold = now
|
||||
try:
|
||||
_social.fold_recent(window_seconds=600)
|
||||
except Exception as fold_err:
|
||||
log.debug("social-event fold failed: %s", fold_err)
|
||||
except Exception as e: # never raise into the engine's fire-and-forget POST
|
||||
log.debug("social-event ingest failed: %s", e)
|
||||
return Response(status_code=204)
|
||||
|
||||
|
||||
# Cap geo/UA enrichment on /admin/clients/rich to the rows the UI actually shows
|
||||
# (top-5 + headroom). Beyond this, clients get bare fields — avoids ~51 cached
|
||||
# geo lookups per poll (ref #644).
|
||||
|
|
@ -2994,6 +3119,14 @@ async def admin_clients_rich() -> dict:
|
|||
# Use module-level imports so monkeypatching in tests works correctly.
|
||||
_av = avatar_analysis
|
||||
_geo = geo
|
||||
# Phase 6 (#662) : map each WG client to its REAL external (pre-tunnel)
|
||||
# endpoint IP so the flag reflects the client's true origin country, not
|
||||
# the internal 10.99.1.x (which GeoIPs to nothing). Best-effort, cached.
|
||||
try:
|
||||
from . import wg as _wg
|
||||
_wg_eps = _wg.wg_endpoints()
|
||||
except Exception:
|
||||
_wg_eps = {}
|
||||
rows = store.list_clients()
|
||||
rows = sorted(rows, key=lambda r: (r.get("last_seen") or 0), reverse=True)
|
||||
now = _t.time()
|
||||
|
|
@ -3030,7 +3163,13 @@ async def admin_clients_rich() -> dict:
|
|||
except Exception:
|
||||
pass
|
||||
try:
|
||||
gi = _geo.lookup(r.get("ip") or "")
|
||||
# PRIVACY : the external endpoint IP is used transiently for the
|
||||
# GeoIP lookup ONLY — it is NEVER stored or returned in the API
|
||||
# response. The appliance is privacy-focused: country-granularity
|
||||
# only (flag / ISO), never the raw client origin IP. Fall back to
|
||||
# the stored (internal) IP for non-WG / captive clients.
|
||||
geo_key = _wg_eps.get(r.get("mac_hash") or "") or (r.get("ip") or "")
|
||||
gi = _geo.lookup(geo_key)
|
||||
flag = gi.get("flag", "") or ""
|
||||
country_iso = gi.get("country_iso", "") or ""
|
||||
asn_org = gi.get("asn_org", "") or ""
|
||||
|
|
|
|||
|
|
@ -103,26 +103,31 @@ def get_bundle(client_id: str, is_wg: bool = False) -> dict:
|
|||
"tracker_patterns": TRACKER_PATTERNS, "ts": int(time.time())}
|
||||
|
||||
|
||||
# Cosmetic client-side loader. Served static + cached; applies the transparency
|
||||
# banner from the bundle off the page's critical render path. Per-page stats
|
||||
# (trackers, cookies) are derived in-browser (Resource Timing / document.cookie),
|
||||
# so the proxy never scans the body. Self-guarded, dismissible, fail-silent.
|
||||
LOADER_JS = r"""(function(){
|
||||
"use strict";
|
||||
if (window.__SBX_LOADER__) return; window.__SBX_LOADER__ = 1;
|
||||
var s = document.currentScript || {};
|
||||
var ds = s.dataset || {};
|
||||
var mh = ds.mh || "", wg = ds.wg || "0";
|
||||
// #662 CONSENTED-DEMONSTRATION: the engine relaxed this page's CSP so this
|
||||
// loader could run even under a strict policy, and stamped data-csp="1" on our
|
||||
// <script>. When set, the banner shows a 🔓 as VISIBLE proof the page's CSP was
|
||||
// bypassed to inject. Absent → no proof emoji (page had no CSP to bypass).
|
||||
var csp = ds.csp || "";
|
||||
// SPA support (#662): cache the bundle + remember an explicit dismiss, so the
|
||||
// banner can be re-asserted after client-side navigation / DOM re-renders
|
||||
// (cnn, youtube… swap content without reloading → the one-shot loader would
|
||||
// otherwise vanish). Re-assert never fights a user who clicked ✕.
|
||||
var bundle = null, dismissed = false;
|
||||
# ── shared banner JS body (#662) ─────────────────────────────────────────────
|
||||
#
|
||||
# The render + SPA-re-assert + dismiss + countTrackers + 🔓 cspProof logic is
|
||||
# IDENTICAL between the legacy src-loader (LOADER_JS, fetched as
|
||||
# /__toolbox/loader.js → fetch()es /__toolbox/bundle) and the new INLINE banner
|
||||
# (inline_script(), baked into the page by the Go engine at inject time). To
|
||||
# avoid drift, that logic lives ONCE in _BANNER_CORE; each caller differs only in
|
||||
# its PRELUDE — how `bundle`, `mh`, `wg`, `csp`, `dismissed` are obtained:
|
||||
#
|
||||
# * LOADER_JS → reads data-mh/data-wg/data-csp off document.currentScript and
|
||||
# fetch()es the bundle (legacy; kept working for the
|
||||
# /__toolbox/loader.js route).
|
||||
# * inline → mh/wg/csp/bundle are baked as JS LITERALS (no currentScript,
|
||||
# no fetch) so a site's SERVICE WORKER has nothing same-origin to
|
||||
# hijack (leparisien, cnn… run a SW that 404s our assets).
|
||||
#
|
||||
# _BANNER_CORE assumes `mh`, `wg`, `csp`, `bundle`, `dismissed` are already
|
||||
# declared by the prelude and runs render/SPA off them.
|
||||
|
||||
# render + SPA-re-assert + dismiss + countTrackers + 🔓 cspProof. Shared verbatim
|
||||
# by both preludes. References `mh`, `wg`, `csp`, `bundle`, `dismissed` from the
|
||||
# enclosing prelude scope. Defines ensure() + installs the history/popstate hooks
|
||||
# + 2s poll; the prelude calls ensure() (inline) or sets `bundle` then ensure()s
|
||||
# (src-loader).
|
||||
_BANNER_CORE = r"""
|
||||
function ready(fn){ if (document.body) { fn(); } else { setTimeout(function(){ready(fn);}, 30); } }
|
||||
function esc(t){ return String(t).replace(/[&<>"]/g, function(c){
|
||||
return {"&":"&","<":"<",">":">","\"":"""}[c]; }); }
|
||||
|
|
@ -168,10 +173,6 @@ LOADER_JS = r"""(function(){
|
|||
// ensure(): (re)render the banner if it's absent and the bundle is loaded and
|
||||
// the user hasn't dismissed it. Cheap (a getElementById guard inside render).
|
||||
function ensure(){ if (bundle && !dismissed) ready(function(){ render(bundle); }); }
|
||||
fetch("/__toolbox/bundle?mh=" + encodeURIComponent(mh) + "&wg=" + encodeURIComponent(wg), {credentials:"omit"})
|
||||
.then(function(r){ return r.json(); })
|
||||
.then(function(b){ bundle = b; ensure(); })
|
||||
.catch(function(){});
|
||||
// SPA re-assert: wrap history nav + popstate (defer so the framework settles),
|
||||
// plus a light 2s poll as a catch-all for DOM re-renders that drop the banner.
|
||||
["pushState","replaceState"].forEach(function(m){
|
||||
|
|
@ -182,5 +183,93 @@ LOADER_JS = r"""(function(){
|
|||
});
|
||||
window.addEventListener("popstate", function(){ setTimeout(ensure, 150); });
|
||||
setInterval(ensure, 2000);
|
||||
"""
|
||||
|
||||
|
||||
def _js_str(value: str) -> str:
|
||||
"""JS string LITERAL for an arbitrary string. json.dumps yields a valid JS
|
||||
string; we additionally escape ``</`` → ``<\\/`` so a value can never close
|
||||
the surrounding inline <script> (e.g. a value of "</script>")."""
|
||||
return json.dumps(value).replace("</", "<\\/")
|
||||
|
||||
|
||||
def _js_json(obj) -> str:
|
||||
"""JS object LITERAL for a JSON-serialisable object, hardened against a
|
||||
``</script>`` breakout: json.dumps is valid JS, and escaping ``</`` → ``<\\/``
|
||||
means no nested string (pin, report_url…) can terminate the inline script."""
|
||||
return json.dumps(obj, ensure_ascii=False).replace("</", "<\\/")
|
||||
|
||||
|
||||
def inline_script(mh: str, wg: bool, csp: bool) -> str:
|
||||
"""Build the COMPLETE self-contained inline banner script BODY (#662).
|
||||
|
||||
Service-worker survival: sites like leparisien / cnn register a SW that
|
||||
intercepts every same-origin request — so the legacy
|
||||
``<script src="/__toolbox/loader.js">`` + its ``fetch("/__toolbox/bundle")``
|
||||
are hijacked by the SW (404 / app-shell) before reaching our MITM engine, and
|
||||
the banner never appears. The fix is to bake EVERYTHING as JS literals so the
|
||||
inline script makes NO same-origin request the SW can touch:
|
||||
|
||||
* ``bundle`` is ``get_bundle(mh, wg)`` baked as a JSON literal (not fetched),
|
||||
* ``mh`` / ``wg`` / ``csp`` are baked as string literals (NOT data-attrs /
|
||||
currentScript — the null-currentScript-in-async bug killed #653),
|
||||
* NO ``document.currentScript``, NO ``fetch()``.
|
||||
|
||||
Returns an IIFE string suitable for ``<script>…</script>``. The single-run
|
||||
guard (``window.__SBX_LOADER__``), the ``#sbx-banner`` element-id guard, the
|
||||
dismissed flag, the history pushState/replaceState/popstate hooks + 2s poll,
|
||||
and the 🔓 proof when ``csp`` is set are all preserved (from _BANNER_CORE).
|
||||
"""
|
||||
bundle_obj = get_bundle(mh, bool(wg))
|
||||
prelude = (
|
||||
"(function(){\n"
|
||||
' "use strict";\n'
|
||||
" if (window.__SBX_LOADER__) return; window.__SBX_LOADER__ = 1;\n"
|
||||
# Baked literals — no currentScript / dataset, no fetch (SW-immune).
|
||||
" var mh = " + _js_str(mh or "") + ";\n"
|
||||
" var wg = " + _js_str("1" if wg else "0") + ";\n"
|
||||
# csp=="1" → the engine relaxed a real CSP to inject; render the 🔓 proof.
|
||||
" var csp = " + _js_str("1" if csp else "0") + ";\n"
|
||||
" var bundle = " + _js_json(bundle_obj) + ";\n"
|
||||
" var dismissed = false;\n"
|
||||
)
|
||||
# Inline path renders on the first tick — the bundle is already present (no
|
||||
# async fetch to wait on), so ensure() can run immediately.
|
||||
return prelude + _BANNER_CORE + " ensure();\n})();"
|
||||
|
||||
|
||||
# Cosmetic client-side loader. Served static + cached; applies the transparency
|
||||
# banner from the bundle off the page's critical render path. Per-page stats
|
||||
# (trackers, cookies) are derived in-browser (Resource Timing / document.cookie),
|
||||
# so the proxy never scans the body. Self-guarded, dismissible, fail-silent.
|
||||
#
|
||||
# Legacy src-loader (#620): kept working for the /__toolbox/loader.js route. The
|
||||
# INLINE path (inline_script) supersedes it in the live engine inject path because
|
||||
# a site service-worker hijacks the same-origin src + fetch (#662).
|
||||
_LOADER_PRELUDE = r"""(function(){
|
||||
"use strict";
|
||||
if (window.__SBX_LOADER__) return; window.__SBX_LOADER__ = 1;
|
||||
var s = document.currentScript || {};
|
||||
var ds = s.dataset || {};
|
||||
var mh = ds.mh || "", wg = ds.wg || "0";
|
||||
// #662 CONSENTED-DEMONSTRATION: the engine relaxed this page's CSP so this
|
||||
// loader could run even under a strict policy, and stamped data-csp="1" on our
|
||||
// <script>. When set, the banner shows a 🔓 as VISIBLE proof the page's CSP was
|
||||
// bypassed to inject. Absent → no proof emoji (page had no CSP to bypass).
|
||||
var csp = ds.csp || "";
|
||||
// SPA support (#662): cache the bundle + remember an explicit dismiss, so the
|
||||
// banner can be re-asserted after client-side navigation / DOM re-renders
|
||||
// (cnn, youtube… swap content without reloading → the one-shot loader would
|
||||
// otherwise vanish). Re-assert never fights a user who clicked ✕.
|
||||
var bundle = null, dismissed = false;
|
||||
"""
|
||||
|
||||
# The legacy src-loader fetches the bundle (same-origin), then ensure()s. The
|
||||
# render + SPA logic is the SAME _BANNER_CORE the inline path uses (no drift).
|
||||
LOADER_JS = _LOADER_PRELUDE + _BANNER_CORE + r"""
|
||||
fetch("/__toolbox/bundle?mh=" + encodeURIComponent(mh) + "&wg=" + encodeURIComponent(wg), {credentials:"omit"})
|
||||
.then(function(r){ return r.json(); })
|
||||
.then(function(b){ bundle = b; ensure(); })
|
||||
.catch(function(){});
|
||||
})();
|
||||
"""
|
||||
|
|
|
|||
|
|
@ -32,11 +32,95 @@ DB = Path("/var/lib/secubox/toolbox/toolbox.db")
|
|||
CACHE_FILE = Path("/var/lib/secubox/toolbox/cumulative-cache.json")
|
||||
CACHE_TTL_SECONDS = 60 # refresh every minute
|
||||
|
||||
# Live analysis-module event stores (post-#662 Phase-7 cutover). The legacy
|
||||
# toolbox.db `events` table froze at the cutover; the live counts + hosts now
|
||||
# live in each analysis module, exposed over its own unix socket.
|
||||
# GET /mitm-events/stats?since_seconds=N -> {"kind":..,"count":n,...}
|
||||
# GET /mitm-events?limit=N -> {"events":[{...payload...}],"count":n}
|
||||
_MITM_MODULES = [
|
||||
("dpi", "/run/secubox/dpi.sock"),
|
||||
("cookies", "/run/secubox/cookies.sock"),
|
||||
("ja4", "/run/secubox/threat-analyst.sock"),
|
||||
]
|
||||
# dpi socket is the one carrying host/sni payloads for top-hosts aggregation.
|
||||
_DPI_SOCK = "/run/secubox/dpi.sock"
|
||||
|
||||
|
||||
def _now() -> int:
|
||||
return int(time.time())
|
||||
|
||||
|
||||
def _uds_get_json(sock_path: str, path: str, timeout: int = 2) -> dict | None:
|
||||
"""GET a JSON document over a unix socket. Returns the parsed dict, or
|
||||
None on any error (never raises). Mirrors api._pull_mitm_module_events's
|
||||
UDSConnection pattern."""
|
||||
import socket as _sock
|
||||
import http.client as _hc
|
||||
|
||||
try:
|
||||
class UDSConnection(_hc.HTTPConnection):
|
||||
def connect(self):
|
||||
self.sock = _sock.socket(_sock.AF_UNIX, _sock.SOCK_STREAM)
|
||||
self.sock.settimeout(self.timeout)
|
||||
self.sock.connect(sock_path)
|
||||
|
||||
conn = UDSConnection("localhost", timeout=timeout)
|
||||
try:
|
||||
conn.request("GET", path)
|
||||
resp = conn.getresponse()
|
||||
if resp.status != 200:
|
||||
return None
|
||||
raw = resp.read().decode("utf-8", errors="ignore")[:1000000]
|
||||
return json.loads(raw)
|
||||
finally:
|
||||
conn.close()
|
||||
except Exception as e:
|
||||
log.debug("uds get %s%s failed: %s", sock_path, path, e)
|
||||
return None
|
||||
|
||||
|
||||
def _live_event_counts(window_seconds: int) -> dict | None:
|
||||
"""Query each analysis module's GET /mitm-events/stats for its event count
|
||||
in the window. Returns {"dpi":n,"cookies":n,"ja4":n} (missing/error module
|
||||
omitted). Returns None only if EVERY module call failed (caller falls back
|
||||
to the legacy toolbox.db query)."""
|
||||
counts: dict[str, int] = {}
|
||||
any_ok = False
|
||||
for kind, sock_path in _MITM_MODULES:
|
||||
data = _uds_get_json(
|
||||
sock_path, f"/mitm-events/stats?since_seconds={int(window_seconds)}"
|
||||
)
|
||||
if data is None:
|
||||
continue
|
||||
any_ok = True
|
||||
# Prefer the module's self-reported kind; fall back to our tag.
|
||||
k = data.get("kind") or kind
|
||||
try:
|
||||
counts[k] = int(data.get("count", 0))
|
||||
except (TypeError, ValueError):
|
||||
counts[k] = 0
|
||||
return counts if any_ok else None
|
||||
|
||||
|
||||
def _live_top_hosts(limit: int = 5000, top: int = 25) -> list | None:
|
||||
"""Aggregate top hosts from the dpi module's recent events. Returns a list
|
||||
of {"host":..,"count":..} (same shape as the legacy top_hosts_7d), or None
|
||||
if the dpi module call failed."""
|
||||
data = _uds_get_json(_DPI_SOCK, f"/mitm-events?limit={int(limit)}")
|
||||
if data is None:
|
||||
return None
|
||||
host_counter: Counter = Counter()
|
||||
for ev in data.get("events", []) or []:
|
||||
try:
|
||||
p = ev.get("payload") or {}
|
||||
h = p.get("host") or p.get("sni")
|
||||
if h:
|
||||
host_counter[h] += 1
|
||||
except Exception:
|
||||
pass
|
||||
return [{"host": h, "count": n} for h, n in host_counter.most_common(top)]
|
||||
|
||||
|
||||
def _safe_query(db, sql: str, params: tuple = ()) -> list:
|
||||
try:
|
||||
cur = db.execute(sql, params)
|
||||
|
|
@ -74,30 +158,48 @@ def compute() -> dict:
|
|||
out["sessions"]["all_time"] = (_safe_query(c,
|
||||
"SELECT COUNT(DISTINCT mac_hash) FROM clients") or [(0,)])[0][0]
|
||||
|
||||
# Event counts by source (last 7 days for relevance)
|
||||
for row in _safe_query(c,
|
||||
"SELECT source, COUNT(*) as n FROM events WHERE ts > ? GROUP BY source",
|
||||
(d7d,)):
|
||||
out["events"][row["source"]] = row["n"]
|
||||
out["events"]["total_7d"] = sum(out["events"].values())
|
||||
# Event counts by source (last 7 days for relevance).
|
||||
# Post-#662 Phase-7: the live counts live in the analysis modules'
|
||||
# own stores (queried over unix sockets). The legacy toolbox.db
|
||||
# `events` table froze at the cutover, so prefer the live path and
|
||||
# only fall back to the frozen table if EVERY module call fails.
|
||||
live_counts = _live_event_counts(86400 * 7)
|
||||
if live_counts is not None:
|
||||
out["events"].update(live_counts)
|
||||
else:
|
||||
for row in _safe_query(c,
|
||||
"SELECT source, COUNT(*) as n FROM events WHERE ts > ? GROUP BY source",
|
||||
(d7d,)):
|
||||
out["events"][row["source"]] = row["n"]
|
||||
out["events"]["total_7d"] = sum(
|
||||
v for v in out["events"].values() if isinstance(v, int)
|
||||
)
|
||||
|
||||
# Top hosts (anonymized — just hostnames, no mac_hash)
|
||||
host_counter = Counter()
|
||||
for row in _safe_query(c,
|
||||
"SELECT payload FROM events WHERE source='dpi' AND ts > ? LIMIT 5000",
|
||||
(d7d,)):
|
||||
try:
|
||||
p = json.loads(row["payload"])
|
||||
h = p.get("host") or p.get("sni")
|
||||
if h:
|
||||
host_counter[h] += 1
|
||||
except Exception:
|
||||
pass
|
||||
out["top_hosts_7d"] = [
|
||||
{"host": h, "count": n}
|
||||
for h, n in host_counter.most_common(15)
|
||||
]
|
||||
# Top hosts (anonymized — just hostnames, no mac_hash).
|
||||
# Live path: aggregate the dpi module's recent events; fall back to
|
||||
# the frozen toolbox.db `events` table only if the dpi call fails.
|
||||
live_hosts = _live_top_hosts()
|
||||
if live_hosts is not None:
|
||||
out["top_hosts_7d"] = live_hosts
|
||||
else:
|
||||
host_counter = Counter()
|
||||
for row in _safe_query(c,
|
||||
"SELECT payload FROM events WHERE source='dpi' AND ts > ? LIMIT 5000",
|
||||
(d7d,)):
|
||||
try:
|
||||
p = json.loads(row["payload"])
|
||||
h = p.get("host") or p.get("sni")
|
||||
if h:
|
||||
host_counter[h] += 1
|
||||
except Exception:
|
||||
pass
|
||||
out["top_hosts_7d"] = [
|
||||
{"host": h, "count": n}
|
||||
for h, n in host_counter.most_common(15)
|
||||
]
|
||||
|
||||
# Risk score / level distributions read the `clients` table (not
|
||||
# the frozen `events` table), so they stay on toolbox.db for now.
|
||||
# Risk score distribution (last 7d)
|
||||
score_buckets = {"low": 0, "medium": 0, "high": 0}
|
||||
for row in _safe_query(c,
|
||||
|
|
|
|||
|
|
@ -233,3 +233,84 @@ def revoke_client(client_pubkey: str) -> bool:
|
|||
def _now_ts() -> float:
|
||||
import time
|
||||
return time.time()
|
||||
|
||||
|
||||
# Phase 6 (#662) : map each WG peer to its REAL external (pre-tunnel) endpoint IP
|
||||
# so the admin client table can show the client's true origin country flag —
|
||||
# the stored client IP is the internal 10.99.1.x which GeoIPs to nothing.
|
||||
|
||||
import hashlib as _hashlib
|
||||
import ipaddress as _ipaddress
|
||||
|
||||
_ENDPOINTS_CACHE: dict[str, str] = {}
|
||||
_ENDPOINTS_TS: float = 0.0
|
||||
_ENDPOINTS_TTL = 30.0 # endpoints change rarely; don't shell out per request/row
|
||||
|
||||
|
||||
def _is_private_or_loopback(ip: str) -> bool:
|
||||
"""True for RFC1918 / loopback / link-local / ULA — non-routable, no
|
||||
meaningful country (a client on the local LAN has no public geo)."""
|
||||
try:
|
||||
a = _ipaddress.ip_address(ip)
|
||||
except ValueError:
|
||||
return True
|
||||
return (
|
||||
a.is_private # 10/8, 172.16/12, 192.168/16, fc00::/7
|
||||
or a.is_loopback # 127/8, ::1
|
||||
or a.is_link_local # 169.254/16, fe80::/10
|
||||
or a.is_unspecified
|
||||
)
|
||||
|
||||
|
||||
def _strip_endpoint_port(endpoint: str) -> str | None:
|
||||
"""`IP:port` or `[IPv6]:port` → bare IP. None for `(none)` / malformed."""
|
||||
ep = (endpoint or "").strip()
|
||||
if not ep or ep == "(none)":
|
||||
return None
|
||||
if ep.startswith("["): # IPv6 literal: [2001:db8::1]:51820
|
||||
host = ep[1:].split("]", 1)[0]
|
||||
return host or None
|
||||
# IPv4 (or bare host): split off the last :port
|
||||
return ep.rsplit(":", 1)[0] or None
|
||||
|
||||
|
||||
def wg_endpoints() -> dict[str, str]:
|
||||
"""Return {mac_hash: external_ip} for every WG peer with a real, routable
|
||||
endpoint, derived from `wg show wg-toolbox dump`.
|
||||
|
||||
mac_hash = sha256(pubkey)[:16] — the SAME derivation used when the peer is
|
||||
registered (api.wg_profile_new). The external IP is the peer's pre-tunnel
|
||||
endpoint, i.e. its true public origin. RFC1918 / loopback / link-local
|
||||
endpoints and `(none)` are skipped (no meaningful country).
|
||||
|
||||
Best-effort : empty dict on any error or if `wg` is missing. Cached ~30s.
|
||||
"""
|
||||
global _ENDPOINTS_CACHE, _ENDPOINTS_TS
|
||||
now = _now_ts()
|
||||
if _ENDPOINTS_CACHE and (now - _ENDPOINTS_TS) < _ENDPOINTS_TTL:
|
||||
return _ENDPOINTS_CACHE
|
||||
out: dict[str, str] = {}
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
["wg", "show", WG_INTERFACE, "dump"],
|
||||
capture_output=True, text=True, timeout=2, check=False,
|
||||
)
|
||||
lines = proc.stdout.splitlines()
|
||||
# First line is the interface (privkey, pubkey, port, fwmark) — skip it.
|
||||
# Peer lines: pubkey presharedkey endpoint allowed-ips ...
|
||||
for line in lines[1:]:
|
||||
fields = line.split("\t")
|
||||
if len(fields) < 3:
|
||||
continue
|
||||
pubkey = fields[0].strip()
|
||||
ip = _strip_endpoint_port(fields[2])
|
||||
if not pubkey or not ip or _is_private_or_loopback(ip):
|
||||
continue
|
||||
mac_hash = _hashlib.sha256(pubkey.encode()).hexdigest()[:16]
|
||||
out[mac_hash] = ip
|
||||
except Exception as e: # missing wg, timeout, permission, parse error
|
||||
log.debug("wg_endpoints unavailable: %s", e)
|
||||
return _ENDPOINTS_CACHE or {}
|
||||
_ENDPOINTS_CACHE = out
|
||||
_ENDPOINTS_TS = now
|
||||
return out
|
||||
|
|
|
|||
68
packages/secubox-toolbox/tests/test_ad_event_candidates.py
Normal file
68
packages/secubox-toolbox/tests/test_ad_event_candidates.py
Normal file
|
|
@ -0,0 +1,68 @@
|
|||
# tests/test_ad_event_candidates.py
|
||||
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||
"""#662 — /__toolbox/ad-event accepts a "candidates" list (the Go engine's
|
||||
auto-learn feed) → store.record_ad_candidates(). Never 500s the engine."""
|
||||
import asyncio
|
||||
import json
|
||||
|
||||
from secubox_toolbox import api, store
|
||||
|
||||
|
||||
class _FakeRequest:
|
||||
"""Minimal Request stand-in: headers + an async json() body."""
|
||||
|
||||
def __init__(self, body: dict, content_length=None):
|
||||
self._body = body
|
||||
cl = content_length
|
||||
if cl is None:
|
||||
cl = len(json.dumps(body).encode())
|
||||
self.headers = {"content-length": str(cl)}
|
||||
|
||||
async def json(self):
|
||||
return self._body
|
||||
|
||||
|
||||
def test_candidates_ingested(monkeypatch):
|
||||
captured = {}
|
||||
monkeypatch.setattr(store, "record_ad_candidates", lambda rows: captured.setdefault("cand", list(rows)))
|
||||
monkeypatch.setattr(store, "record_ad_blocks", lambda rows: None)
|
||||
monkeypatch.setattr(store, "record_ad_client_blocks", lambda rows: None)
|
||||
|
||||
body = {
|
||||
"blocks": [],
|
||||
"clients": [],
|
||||
"candidates": [
|
||||
{"host": "metrics.acotedemoi.com", "site": "lemonde.fr", "hits": 3},
|
||||
{"host": "ads.foo.io", "site": "news.example", "hits": 1},
|
||||
{"site": "no-host.example", "hits": 9}, # missing host → skipped
|
||||
{"host": "", "site": "x", "hits": 2}, # empty host → skipped
|
||||
],
|
||||
}
|
||||
resp = asyncio.run(api.toolbox_ad_event(_FakeRequest(body)))
|
||||
assert resp.status_code == 204
|
||||
rows = captured.get("cand")
|
||||
assert rows == [
|
||||
("metrics.acotedemoi.com", "lemonde.fr", 3),
|
||||
("ads.foo.io", "news.example", 1),
|
||||
]
|
||||
|
||||
|
||||
def test_candidates_absent_is_noop(monkeypatch):
|
||||
called = {"cand": False}
|
||||
monkeypatch.setattr(store, "record_ad_candidates", lambda rows: called.__setitem__("cand", True))
|
||||
monkeypatch.setattr(store, "record_ad_blocks", lambda rows: None)
|
||||
monkeypatch.setattr(store, "record_ad_client_blocks", lambda rows: None)
|
||||
|
||||
resp = asyncio.run(api.toolbox_ad_event(_FakeRequest({"blocks": [], "clients": []})))
|
||||
assert resp.status_code == 204
|
||||
assert called["cand"] is False # no candidates key → record_ad_candidates not called
|
||||
|
||||
|
||||
def test_candidates_bad_payload_never_500s(monkeypatch):
|
||||
monkeypatch.setattr(store, "record_ad_candidates", lambda rows: (_ for _ in ()).throw(RuntimeError("boom")))
|
||||
monkeypatch.setattr(store, "record_ad_blocks", lambda rows: None)
|
||||
monkeypatch.setattr(store, "record_ad_client_blocks", lambda rows: None)
|
||||
|
||||
body = {"candidates": [{"host": "x.io", "site": "s", "hits": 1}]}
|
||||
resp = asyncio.run(api.toolbox_ad_event(_FakeRequest(body)))
|
||||
assert resp.status_code == 204 # store raised, but the endpoint swallows it
|
||||
98
packages/secubox-toolbox/tests/test_autolearn_socialfeed.py
Normal file
98
packages/secubox-toolbox/tests/test_autolearn_socialfeed.py
Normal file
|
|
@ -0,0 +1,98 @@
|
|||
# tests/test_autolearn_socialfeed.py
|
||||
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||
"""#662 — cross-site-reuse promotion: a tracker_domain seen on >= N distinct
|
||||
src_site across recent social_edges is a behaviourally-confirmed cross-site
|
||||
tracker and gets promoted into learned-trackers.txt. Allowlist + self guard
|
||||
reused from _ad_feed; merges (never overwrites)."""
|
||||
import sqlite3
|
||||
import importlib.util
|
||||
import pathlib
|
||||
import time
|
||||
|
||||
|
||||
def _load_autolearn():
|
||||
p = pathlib.Path(__file__).resolve().parents[1] / "sbin" / "secubox-toolbox-autolearn"
|
||||
spec = importlib.util.spec_from_loader("autolearn", loader=None)
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
exec(compile(p.read_text(), str(p), "exec"), mod.__dict__)
|
||||
return mod
|
||||
|
||||
|
||||
def _mk_db(db):
|
||||
con = sqlite3.connect(db)
|
||||
con.executescript(
|
||||
"CREATE TABLE social_edges("
|
||||
" id INTEGER PRIMARY KEY AUTOINCREMENT, ts INTEGER NOT NULL,"
|
||||
" client_mac_hash TEXT, src_site TEXT NOT NULL,"
|
||||
" tracker_domain TEXT NOT NULL, cookie_id_hash TEXT,"
|
||||
" ja4_hash TEXT, consent_state TEXT DEFAULT 'none_seen');")
|
||||
return con
|
||||
|
||||
|
||||
def test_social_feed_promotes_cross_site_tracker(tmp_path, monkeypatch):
|
||||
db = tmp_path / "t.db"
|
||||
con = _mk_db(db)
|
||||
now = int(time.time())
|
||||
rows = [
|
||||
# tracker.io: 3 distinct src_sites (>= SOCIAL_MIN_SITES=3) → promote
|
||||
(now, "m1", "cnn.com", "tracker.io"),
|
||||
(now, "m1", "bbc.com", "tracker.io"),
|
||||
(now, "m2", "lemonde.fr", "tracker.io"),
|
||||
# twosite.net: only 2 distinct sites → NOT promoted
|
||||
(now, "m1", "cnn.com", "twosite.net"),
|
||||
(now, "m1", "bbc.com", "twosite.net"),
|
||||
# safe.cdn.net: 3 sites but ALLOWLISTED → excluded
|
||||
(now, "m1", "a.com", "safe.cdn.net"),
|
||||
(now, "m1", "b.com", "safe.cdn.net"),
|
||||
(now, "m1", "c.com", "safe.cdn.net"),
|
||||
# secubox.in: 3 sites but SELF domain → excluded
|
||||
(now, "m1", "a.com", "secubox.in"),
|
||||
(now, "m1", "b.com", "secubox.in"),
|
||||
(now, "m1", "c.com", "secubox.in"),
|
||||
# stale.io: 3 sites but OUTSIDE the recent window → excluded
|
||||
(now - 999999, "m1", "a.com", "stale.io"),
|
||||
(now - 999999, "m1", "b.com", "stale.io"),
|
||||
(now - 999999, "m1", "c.com", "stale.io"),
|
||||
]
|
||||
con.executemany(
|
||||
"INSERT INTO social_edges(ts,client_mac_hash,src_site,tracker_domain) "
|
||||
"VALUES(?,?,?,?)", rows)
|
||||
con.commit()
|
||||
con.close()
|
||||
|
||||
allow = tmp_path / "ad-allowlist.txt"
|
||||
allow.write_text("safe.cdn.net\n")
|
||||
out = tmp_path / "learned-trackers.txt"
|
||||
out.write_text("preexisting.tracker.com\n")
|
||||
|
||||
monkeypatch.setenv("SECUBOX_AUTOLEARN_DB", str(db))
|
||||
monkeypatch.setenv("SECUBOX_AUTOLEARN_OUT", str(out))
|
||||
monkeypatch.setenv("SECUBOX_AD_ALLOWLIST", str(allow))
|
||||
monkeypatch.setenv("SECUBOX_SOCIAL_MIN_SITES", "3")
|
||||
monkeypatch.setenv("SECUBOX_SOCIAL_WINDOW_HOURS", "168")
|
||||
|
||||
al = _load_autolearn()
|
||||
n = al._social_feed()
|
||||
|
||||
lines = out.read_text().split()
|
||||
assert "tracker.io" in lines # 3 distinct sites, recent → promoted
|
||||
assert "twosite.net" not in lines # below threshold
|
||||
assert "safe.cdn.net" not in lines # allowlisted
|
||||
assert "secubox.in" not in lines # self domain
|
||||
assert "stale.io" not in lines # outside window
|
||||
assert "preexisting.tracker.com" in lines # merge, not overwrite
|
||||
assert len(lines) == len(set(lines)) # no dups
|
||||
assert n == 1
|
||||
|
||||
|
||||
def test_social_feed_no_table_is_safe(tmp_path, monkeypatch):
|
||||
db = tmp_path / "empty.db"
|
||||
sqlite3.connect(db).close() # no social_edges table
|
||||
out = tmp_path / "learned-trackers.txt"
|
||||
out.write_text("x.tracker.com\n")
|
||||
monkeypatch.setenv("SECUBOX_AUTOLEARN_DB", str(db))
|
||||
monkeypatch.setenv("SECUBOX_AUTOLEARN_OUT", str(out))
|
||||
al = _load_autolearn()
|
||||
n = al._social_feed()
|
||||
assert n == -1 # gated/unavailable, not a crash
|
||||
assert "x.tracker.com" in out.read_text() # file untouched
|
||||
|
|
@ -48,6 +48,9 @@ def test_get_bundle_caches(monkeypatch):
|
|||
|
||||
|
||||
def test_loader_js_is_served_string():
|
||||
assert "addEventListener" not in bundle.LOADER_JS # uses currentScript pattern
|
||||
# The legacy src-loader uses the currentScript pattern and fetch()es the
|
||||
# bundle same-origin (the inline path #662 supersedes it in the live engine
|
||||
# but /__toolbox/loader.js still serves this).
|
||||
assert "currentScript" in bundle.LOADER_JS
|
||||
assert "__toolbox/bundle" in bundle.LOADER_JS
|
||||
assert bundle.LOADER_JS.strip().startswith("(function()")
|
||||
|
|
|
|||
138
packages/secubox-toolbox/tests/test_inline_banner.py
Normal file
138
packages/secubox-toolbox/tests/test_inline_banner.py
Normal file
|
|
@ -0,0 +1,138 @@
|
|||
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||
# Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||
# Source-Disclosed License — All rights reserved except as expressly granted.
|
||||
# See LICENCE-CMSD-1.0.md for terms.
|
||||
|
||||
"""SecuBox-Deb :: toolbox :: inline (SW-immune) banner script tests (#662).
|
||||
|
||||
The inline banner survives sites with a SERVICE WORKER (leparisien, cnn…): the
|
||||
engine bakes the bundle + mh/wg/csp as JS literals so there is NO same-origin
|
||||
fetch the SW can hijack. These tests pin that contract:
|
||||
* a valid baked `var bundle = {...}` (JSON), mh/wg/csp literals,
|
||||
* the 🔓 proof gated by csp,
|
||||
* NO currentScript (the #653 null-in-async bug) and NO fetch(,
|
||||
* `</script>` is escaped (no inline-script breakout),
|
||||
* get_bundle is called with (mh, bool(wg)).
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
||||
|
||||
from secubox_toolbox import api, bundle # noqa: E402
|
||||
|
||||
|
||||
def _baked_bundle(script: str) -> dict:
|
||||
"""Extract + parse the baked `var bundle = {...};` JSON from an inline script.
|
||||
Undoes the `</` → `<\\/` breakout escaping before parsing as JSON."""
|
||||
m = re.search(r"var bundle = (\{.*?\});\n", script, re.S)
|
||||
assert m, "no baked `var bundle = {...};` in inline script"
|
||||
return json.loads(m.group(1).replace("<\\/", "</"))
|
||||
|
||||
|
||||
def test_inline_bakes_valid_bundle_json():
|
||||
s = bundle.inline_script("x", wg=True, csp=True)
|
||||
b = _baked_bundle(s)
|
||||
assert b["v"] == 1
|
||||
assert b["client_id"] == "x"
|
||||
# wg=True → public report URL (proves get_bundle was called with wg=True)
|
||||
assert b["report_url"] == bundle.REPORT_URL_PUBLIC + "?mh=x"
|
||||
assert isinstance(b["tracker_patterns"], list) and b["tracker_patterns"]
|
||||
|
||||
|
||||
def test_inline_bakes_mh_wg_csp_literals():
|
||||
s = bundle.inline_script("deadbeef", wg=True, csp=True)
|
||||
assert 'var mh = "deadbeef";' in s
|
||||
assert 'var wg = "1";' in s
|
||||
assert 'var csp = "1";' in s
|
||||
s0 = bundle.inline_script("deadbeef", wg=False, csp=False)
|
||||
assert 'var wg = "0";' in s0
|
||||
assert 'var csp = "0";' in s0
|
||||
|
||||
|
||||
def test_inline_csp_literal_and_proof_logic():
|
||||
# The 🔓 literal lives in the shared render core, gated at runtime by
|
||||
# csp === "1". csp=1 → var csp = "1" so render shows the proof.
|
||||
s1 = bundle.inline_script("x", wg=False, csp=True)
|
||||
assert "\U0001f513" in s1 # 🔓 present in the render logic
|
||||
assert 'var csp = "1";' in s1 # runtime gate ON
|
||||
# csp=0 → gate OFF (no proof rendered), even though the literal is in core.
|
||||
s0 = bundle.inline_script("x", wg=False, csp=False)
|
||||
assert 'var csp = "0";' in s0
|
||||
|
||||
|
||||
def test_inline_has_no_currentscript_no_fetch():
|
||||
# #653 root cause: document.currentScript is null in an async context. The
|
||||
# inline script MUST NOT read it, and MUST NOT fetch() (SW would hijack it).
|
||||
s = bundle.inline_script("x", wg=True, csp=True)
|
||||
assert "currentScript" not in s
|
||||
assert "fetch(" not in s
|
||||
|
||||
|
||||
def test_inline_keeps_guards_and_spa_hooks():
|
||||
s = bundle.inline_script("x", wg=True, csp=True)
|
||||
assert "window.__SBX_LOADER__" in s # single-run guard
|
||||
assert 'getElementById("sbx-banner")' in s # element-id guard
|
||||
assert "dismissed" in s
|
||||
assert "pushState" in s and "replaceState" in s and "popstate" in s
|
||||
assert "setInterval(ensure, 2000)" in s
|
||||
assert "countTrackers" in s
|
||||
|
||||
|
||||
def test_inline_escapes_script_breakout():
|
||||
# A bundle value that literally contains </script> must NOT close the inline
|
||||
# <script> — it must be escaped to <\/script>.
|
||||
orig = bundle._read_pin
|
||||
bundle._read_pin = lambda: "</script><img src=x onerror=alert(1)>"
|
||||
bundle._cache.clear()
|
||||
try:
|
||||
s = bundle.inline_script("z", wg=False, csp=False)
|
||||
finally:
|
||||
bundle._read_pin = orig
|
||||
bundle._cache.clear()
|
||||
# The IIFE close is the only legitimate "})();"; nothing before the final
|
||||
# close should contain a raw "</script>".
|
||||
head = s[: s.rfind("})();")]
|
||||
assert "</script>" not in head
|
||||
assert "<\\/script>" in head # escaped form present
|
||||
|
||||
|
||||
def test_inline_get_bundle_called_with_bool_wg(monkeypatch):
|
||||
seen = {}
|
||||
|
||||
def fake_get_bundle(mh, is_wg=False):
|
||||
seen["args"] = (mh, is_wg)
|
||||
return {"v": 1, "client_id": mh, "level": "r1", "pin": "",
|
||||
"report_url": "http://x", "tracker_patterns": ["doubleclick"],
|
||||
"ts": 0}
|
||||
|
||||
monkeypatch.setattr(bundle, "get_bundle", fake_get_bundle)
|
||||
bundle.inline_script("abc", wg=1, csp=0) # wg passed as truthy int
|
||||
assert seen["args"] == ("abc", True) # coerced to bool
|
||||
|
||||
|
||||
def test_legacy_loader_still_intact():
|
||||
# The src-loader must keep working: it reads currentScript + data-attrs and
|
||||
# fetch()es the bundle (the inline path supersedes it in the live engine, but
|
||||
# the /__toolbox/loader.js route still serves it).
|
||||
assert "currentScript" in bundle.LOADER_JS
|
||||
assert "fetch(" in bundle.LOADER_JS
|
||||
assert "function render" in bundle.LOADER_JS
|
||||
assert "window.__SBX_LOADER__" in bundle.LOADER_JS
|
||||
|
||||
|
||||
def test_inline_route_returns_javascript_body():
|
||||
import asyncio
|
||||
resp = asyncio.run(api.toolbox_inline(mh="abc", wg=1, csp=1))
|
||||
assert resp.status_code == 200
|
||||
assert "javascript" in resp.media_type
|
||||
assert "no-store" in resp.headers.get("Cache-Control", "")
|
||||
body = resp.body.decode("utf-8")
|
||||
assert "window.__SBX_LOADER__" in body
|
||||
assert "currentScript" not in body
|
||||
assert "fetch(" not in body
|
||||
assert 'var mh = "abc";' in body
|
||||
assert 'var csp = "1";' in body
|
||||
48
packages/secubox-toolbox/tests/test_social_parity.py
Normal file
48
packages/secubox-toolbox/tests/test_social_parity.py
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||
# Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||
"""Cross-engine SOCIAL parity harness — Python side (#662).
|
||||
|
||||
Loads the SAME ``social-cookie-id-fixtures.json`` the Go core uses
|
||||
(``../secubox-toolbox-ng/testdata``) and asserts ``social.cookie_id_hash``
|
||||
reproduces each fixture's ``expect``.
|
||||
|
||||
Python is the source of truth: the ``expect`` values were GENERATED by this very
|
||||
``social.cookie_id_hash``. The Go side (cmd/sbxmitm/social_test.go) must
|
||||
reproduce them byte-for-byte. Both files reading identical inputs is what makes
|
||||
the parity meaningful — the same anti-rig discipline as the jar parity harness.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
|
||||
from secubox_toolbox import social
|
||||
|
||||
_HERE = os.path.dirname(os.path.abspath(__file__))
|
||||
# tests/ → packages/secubox-toolbox → packages → packages/secubox-toolbox-ng
|
||||
_NG_TESTDATA = os.path.normpath(
|
||||
os.path.join(_HERE, "..", "..", "secubox-toolbox-ng", "testdata"))
|
||||
_FIXTURES = os.path.join(_NG_TESTDATA, "social-cookie-id-fixtures.json")
|
||||
|
||||
|
||||
def _load():
|
||||
with open(_FIXTURES, encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
def test_cookie_id_hash_parity():
|
||||
data = _load()
|
||||
assert data["fixtures"], "no fixtures"
|
||||
failures = []
|
||||
for fx in data["fixtures"]:
|
||||
got = social.cookie_id_hash(
|
||||
fx["tracker_domain"], fx["cookie_name"], fx["cookie_value"])
|
||||
if got != fx["expect"]:
|
||||
failures.append((fx, got))
|
||||
assert not failures, f"cookie_id_hash drift: {failures}"
|
||||
|
||||
|
||||
def test_cookie_id_hash_invariants():
|
||||
# domain + name are lower-cased; the value is NOT.
|
||||
assert social.cookie_id_hash("A.NET", "N", "v") == social.cookie_id_hash("a.net", "n", "v")
|
||||
assert social.cookie_id_hash("a.net", "n", "V") != social.cookie_id_hash("a.net", "n", "v")
|
||||
118
packages/secubox-toolbox/tests/test_wg_endpoints_geoflag.py
Normal file
118
packages/secubox-toolbox/tests/test_wg_endpoints_geoflag.py
Normal file
|
|
@ -0,0 +1,118 @@
|
|||
# SPDX-License-Identifier: LicenseRef-CMSD-1.0
|
||||
# Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
|
||||
|
||||
"""Phase 6 (#662) — per-client country flag from the REAL external WG endpoint IP."""
|
||||
import asyncio
|
||||
import hashlib
|
||||
from types import SimpleNamespace
|
||||
|
||||
from secubox_toolbox import api
|
||||
from secubox_toolbox import wg
|
||||
|
||||
|
||||
# A `wg show wg-toolbox dump` blob. First line = interface (skipped).
|
||||
# Peer fields are TAB-separated: pubkey psk endpoint allowed-ips ...
|
||||
_PUB_PUBLIC = "cVZ7s8d2pubkeyAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" # real public endpoint
|
||||
_PUB_NONE = "noneZZZpubkeyBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=" # endpoint (none)
|
||||
_PUB_LAN = "lanZZZZZpubkeyCCCCCCCCCCCCCCCCCCCCCCCCCCCC=" # RFC1918 endpoint
|
||||
_PUB_V6 = "v6ZZZZZZpubkeyDDDDDDDDDDDDDDDDDDDDDDDDDDDDD=" # IPv6 endpoint
|
||||
|
||||
_DUMP = "\t".join([
|
||||
"srvPrivKey", "srvPubKey", "51820", "off",
|
||||
]) + "\n" + "\n".join([
|
||||
"\t".join([_PUB_PUBLIC, "(none)", "88.163.66.208:51820", "10.99.1.2/32", "0", "0", "0"]),
|
||||
"\t".join([_PUB_NONE, "(none)", "(none)", "10.99.1.3/32", "0", "0", "0"]),
|
||||
"\t".join([_PUB_LAN, "(none)", "192.168.1.50:41234", "10.99.1.4/32", "0", "0", "0"]),
|
||||
"\t".join([_PUB_V6, "(none)", "[2606:4700:4700::1111]:51820", "10.99.1.5/32", "0", "0", "0"]),
|
||||
])
|
||||
|
||||
|
||||
def _hash(pub: str) -> str:
|
||||
return hashlib.sha256(pub.encode()).hexdigest()[:16]
|
||||
|
||||
|
||||
def _fake_run(blob):
|
||||
def _run(cmd, **kw):
|
||||
return SimpleNamespace(stdout=blob, stderr="", returncode=0)
|
||||
return _run
|
||||
|
||||
|
||||
def test_wg_endpoints_parsing(monkeypatch):
|
||||
# Bust the 30s cache between tests.
|
||||
wg._ENDPOINTS_CACHE, wg._ENDPOINTS_TS = {}, 0.0
|
||||
monkeypatch.setattr(wg.subprocess, "run", _fake_run(_DUMP))
|
||||
|
||||
eps = wg.wg_endpoints()
|
||||
|
||||
# Public IPv4 endpoint → mapped, port stripped.
|
||||
assert eps[_hash(_PUB_PUBLIC)] == "88.163.66.208"
|
||||
# mac_hash derivation matches the known pubkey→hash.
|
||||
assert _hash(_PUB_PUBLIC) == "ad32e736309b1348"
|
||||
# IPv6 endpoint → bracket + port stripped (global IPv6 kept).
|
||||
assert eps[_hash(_PUB_V6)] == "2606:4700:4700::1111"
|
||||
# `(none)` endpoint skipped.
|
||||
assert _hash(_PUB_NONE) not in eps
|
||||
# RFC1918 LAN endpoint skipped (no meaningful country).
|
||||
assert _hash(_PUB_LAN) not in eps
|
||||
|
||||
|
||||
def test_wg_endpoints_besteffort_empty(monkeypatch):
|
||||
wg._ENDPOINTS_CACHE, wg._ENDPOINTS_TS = {}, 0.0
|
||||
|
||||
def _boom(*a, **k):
|
||||
raise FileNotFoundError("wg not installed")
|
||||
|
||||
monkeypatch.setattr(wg.subprocess, "run", _boom)
|
||||
assert wg.wg_endpoints() == {}
|
||||
|
||||
|
||||
def test_clients_rich_uses_external_endpoint_flag(monkeypatch):
|
||||
wg_pub = "clientPubKeyEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE="
|
||||
wg_mac = _hash(wg_pub)
|
||||
rows = [
|
||||
# WG client: stored ip is internal 10.99.1.x, has a public endpoint.
|
||||
{"mac_hash": wg_mac, "ip": "10.99.1.7", "state": "active",
|
||||
"level": "r3", "score": 0, "last_seen": 100.0, "first_seen": 0.0},
|
||||
# Non-WG / captive client: no endpoint → falls back to stored ip.
|
||||
{"mac_hash": "captive01", "ip": "203.0.113.9", "state": "active",
|
||||
"level": "r1", "score": 0, "last_seen": 50.0, "first_seen": 0.0},
|
||||
]
|
||||
monkeypatch.setattr(api.store, "list_clients", lambda: rows)
|
||||
monkeypatch.setattr(api.store, "latest_user_agent", lambda mh: "")
|
||||
|
||||
# External endpoint for the WG client only. admin_clients_rich does a lazy
|
||||
# `from . import wg`, so patching the wg module attribute is what takes effect.
|
||||
import secubox_toolbox.wg as _wgmod
|
||||
monkeypatch.setattr(_wgmod, "wg_endpoints", lambda: {wg_mac: "88.163.66.208"})
|
||||
|
||||
seen_keys = []
|
||||
|
||||
def fake_lookup(key):
|
||||
seen_keys.append(key)
|
||||
if key == "88.163.66.208":
|
||||
return {"flag": "🇫🇷", "country_iso": "FR", "asn_org": "Orange"}
|
||||
if key == "203.0.113.9":
|
||||
return {"flag": "🇺🇸", "country_iso": "US", "asn_org": "Example"}
|
||||
return {"flag": "", "country_iso": "", "asn_org": ""}
|
||||
|
||||
monkeypatch.setattr(api.geo, "lookup", fake_lookup)
|
||||
|
||||
out = asyncio.run(api.admin_clients_rich())
|
||||
clients = {c["mac_hash"]: c for c in out["clients"]}
|
||||
|
||||
# WG client: flag derived from the EXTERNAL IP, not the internal 10.99.1.7.
|
||||
assert clients[wg_mac]["flag"] == "🇫🇷"
|
||||
assert clients[wg_mac]["country_iso"] == "FR"
|
||||
assert "88.163.66.208" in seen_keys
|
||||
assert "10.99.1.7" not in seen_keys # internal IP never geo-looked-up
|
||||
|
||||
# Non-WG client: falls back to the stored ip.
|
||||
assert clients["captive01"]["flag"] == "🇺🇸"
|
||||
assert "203.0.113.9" in seen_keys
|
||||
|
||||
# PRIVACY: the raw external IP must NOT appear anywhere in the response.
|
||||
import json
|
||||
dumped = json.dumps(out, default=str)
|
||||
assert "88.163.66.208" not in dumped
|
||||
# The stored (internal) ip is still the only ip field exposed.
|
||||
assert clients[wg_mac]["ip"] == "10.99.1.7"
|
||||
Loading…
Reference in New Issue
Block a user