Compare commits

...

3 Commits

Author SHA1 Message Date
CyberMind
e0cd433485
Merge pull request #671 from CyberMind-FR/fix/662-gzip-banner
Some checks are pending
License Headers / check (push) Waiting to run
fix(#662): inject banner into compressed HTML (gzip decode/re-encode)
2026-06-18 19:38:41 +02:00
8ffe54ee0d chore: changelog 0.1.3 — gzip banner inject (ref #662) 2026-06-18 19:37:43 +02:00
449b28f8a1 fix(toolbox-ng): inject banner into gzip HTML, not just identity (ref #662)
The Go MITM engine's transparency banner only appeared on UNCOMPRESSED
HTML. Browsers send `Accept-Encoding: gzip, br`, so most pages came back
gzip/brotli-compressed; the engine passed the compressed body straight
through and injectLoader (which scans for <head>/<body>) silently no-oped
on the binary blob. Proven on-board: identity HTML → banner present;
gzip HTML → banner absent.

Two-part fix, stdlib-only (compress/gzip; brotli/zstd are not in the
stdlib, which is why we constrain the wire to gzip):

1. mitmPipeline now pins the upstream request to `Accept-Encoding: gzip`
   (Set, not Del — Del would make Go's Transport auto-decompress and lose
   wire compression to the client for ALL resources). This guarantees
   every response is gzip or identity. Applies to both CONNECT and
   transparent paths (shared pipeline).

2. New gzip.go inject helper: in the existing 2xx + text/html gate,
   injectIntoBody gunzips → injectLoader → re-gzips when Content-Encoding
   is gzip (keeping the client transfer compressed), injects directly on
   identity, and fails open (original bytes untouched) on corrupt/unknown
   encoding or a decompression bomb (32MiB inflate cap). Content-Length /
   resp.ContentLength are updated to match the served bytes so the grown
   body is not truncated.

Non-HTML / non-2xx responses still pass through byte-for-byte (possibly
still gzip). Poison Set-Cookie + anonymize unchanged. Idempotency guard
stays inside injectLoader.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 19:37:03 +02:00
4 changed files with 298 additions and 1 deletions

View File

@ -0,0 +1,109 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: gzip-aware banner injection (#662)
//
// The transparency-banner inject (injectLoader) scans the HTML body for
// <head>/<body>. Browsers send `Accept-Encoding: gzip, br`, so most upstream
// responses come back COMPRESSED — and a compressed body has no plaintext
// <head>/<body> for injectLoader to find, so it silently no-ops (the banner
// vanished on every gzip page). mitmPipeline now pins the upstream request to
// `Accept-Encoding: gzip` (dropping br/zstd/deflate we cannot decode with the
// stdlib), so every response is either gzip or identity.
//
// This file holds the gzip helpers + the single inject-path transform that
// decompresses (if gzip) → injectLoader → recompresses, fail-open on any error
// so a banner asset never breaks the page.
//
// Pure standard library — compress/gzip only; no external modules (brotli/zstd
// are NOT in the stdlib, which is exactly why we constrain the wire to gzip).
package main
import (
"bytes"
"compress/gzip"
"io"
"strings"
)
// gunzipCap bounds the decompressed output so a maliciously-crafted gzip body
// (a "decompression bomb") cannot blow the worker's memory. The upstream body
// itself is already read under an 8MiB LimitReader; 32MiB of inflated HTML is a
// generous ceiling for a single page. Exceeding it → treated as an error
// (caller fails open and serves the original compressed bytes).
const gunzipCap = 32 << 20
// gunzipBytes inflates a gzip-compressed body. It is defensive on two axes:
// - a malformed/non-gzip input returns an error (caller fails open),
// - the decompressed output is capped at gunzipCap; if the stream would
// exceed it, that is reported as an error too (decompression-bomb guard).
func gunzipBytes(in []byte) ([]byte, error) {
zr, err := gzip.NewReader(bytes.NewReader(in))
if err != nil {
return nil, err
}
defer zr.Close()
// Read up to gunzipCap+1 so we can tell "exactly at the cap" (fine) from
// "the stream is bigger than the cap" (bomb → error).
out, err := io.ReadAll(io.LimitReader(zr, gunzipCap+1))
if err != nil {
return nil, err
}
if len(out) > gunzipCap {
return nil, errGunzipTooLarge
}
return out, nil
}
// errGunzipTooLarge is returned by gunzipBytes when the decompressed stream
// exceeds gunzipCap (decompression-bomb guard).
var errGunzipTooLarge = errString("gunzip output exceeds cap")
// errString is a tiny stdlib-only error type (avoids importing errors/fmt for
// one sentinel).
type errString string
func (e errString) Error() string { return string(e) }
// gzipBytes compresses in with the default gzip level. It never errors: the
// gzip.Writer only writes into an in-memory bytes.Buffer, which cannot fail.
func gzipBytes(in []byte) []byte {
var buf bytes.Buffer
zw := gzip.NewWriter(&buf)
_, _ = zw.Write(in)
_ = zw.Close()
return buf.Bytes()
}
// injectIntoBody runs the transparency-banner injection over a (possibly
// gzip-compressed) HTML body, returning the new body bytes to serve and whether
// the body was rewritten.
//
// - encoding == "" (identity): injectLoader runs directly on body; the result
// is returned (ok=true). The caller MUST update Content-Length to len(out).
// - encoding == "gzip" (case-insensitive): the body is gunzipped, injected,
// then RE-gzipped so the client transfer stays compressed (the tunnel is
// perf-sensitive). The caller keeps Content-Encoding: gzip and sets
// Content-Length to len(out).
// - any other encoding (br/zstd/deflate — should not occur after the upstream
// Accept-Encoding pin, but be safe): pass through untouched, ok=false.
//
// Fail-open: if gunzip fails (corrupt / not-actually-gzip / bomb), the ORIGINAL
// bytes are returned with ok=false so the page is never broken.
//
// idempotency / placement live entirely inside injectLoader (unchanged).
func injectIntoBody(body []byte, encoding, clientHash string, wg bool) (out []byte, ok bool) {
switch strings.ToLower(strings.TrimSpace(encoding)) {
case "":
return injectLoader(body, clientHash, wg), true
case "gzip":
plain, err := gunzipBytes(body)
if err != nil {
return body, false // fail open: serve the original compressed bytes
}
injected := injectLoader(plain, clientHash, wg)
return gzipBytes(injected), true
default:
return body, false // unknown encoding we cannot decode → pass through
}
}

View File

@ -0,0 +1,152 @@
// SPDX-License-Identifier: LicenseRef-CMSD-1.0
// Copyright (c) 2026 CyberMind — Gérald Kerma <devel@cybermind.fr>
//
// SecuBox-Deb :: toolbox-ng :: gzip-aware banner injection tests (#662)
//
// Covers the LIVE bug: the banner only injected into UNCOMPRESSED HTML, so
// gzip pages (the common case — browsers send Accept-Encoding: gzip,br) lost
// the banner. These tests pin the decompress→inject→recompress transform and
// its fail-open behaviour.
package main
import (
"bytes"
"strings"
"testing"
)
func TestGzipRoundTrip(t *testing.T) {
cases := [][]byte{
[]byte(""),
[]byte("hello world"),
[]byte(`<html><head><title>x</title></head><body>hi</body></html>`),
bytes.Repeat([]byte("AB"), 100000), // larger, compressible payload
}
for _, x := range cases {
got, err := gunzipBytes(gzipBytes(x))
if err != nil {
t.Fatalf("gunzipBytes(gzipBytes(%d bytes)) errored: %v", len(x), err)
}
if !bytes.Equal(got, x) {
t.Fatalf("round-trip mismatch: got %d bytes, want %d bytes", len(got), len(x))
}
}
}
func TestGunzipNonGzipFails(t *testing.T) {
// Plain bytes that are not a gzip stream → error, no panic.
if _, err := gunzipBytes([]byte("this is definitely not gzip")); err == nil {
t.Fatal("gunzipBytes on non-gzip input must error")
}
}
func TestInjectIntoBodyGzip(t *testing.T) {
// End-to-end-ish: HTML with <head>, gzipped, run through the exact transform
// the inject path uses. Result must gunzip back to an injected, intact doc.
html := `<html><head><title>page</title></head><body>content</body></html>`
out, ok := injectIntoBody(gzipBytes([]byte(html)), "gzip", "abc123", true)
if !ok {
t.Fatal("gzip inject must report ok=true")
}
plain, err := gunzipBytes(out)
if err != nil {
t.Fatalf("re-gzipped output must gunzip cleanly: %v", err)
}
s := string(plain)
if !strings.Contains(s, bannerGuard) {
t.Fatalf("banner guard %q absent after gzip inject:\n%s", bannerGuard, s)
}
// Document otherwise intact: original head/body content preserved.
if !strings.Contains(s, "<title>page</title>") || !strings.Contains(s, "<body>content</body>") {
t.Fatalf("original document content displaced:\n%s", s)
}
// The loader tag landed inside <head>.
if !strings.Contains(s, `<head><!-- `+bannerGuard) {
t.Fatalf("loader tag not inserted right after <head>:\n%s", s)
}
}
func TestInjectIntoBodyGzipCaseInsensitiveEncoding(t *testing.T) {
html := `<head></head>`
out, ok := injectIntoBody(gzipBytes([]byte(html)), "GZIP", "z", false)
if !ok {
t.Fatal("Content-Encoding GZIP (upper) must be recognised → ok=true")
}
plain, err := gunzipBytes(out)
if err != nil {
t.Fatalf("gunzip failed: %v", err)
}
if !strings.Contains(string(plain), bannerGuard) {
t.Fatalf("banner absent for upper-case GZIP encoding: %s", plain)
}
}
func TestInjectIntoBodyGzipFailOpen(t *testing.T) {
// Bytes labelled gzip but NOT gzip → fail open: original bytes, ok=false,
// no panic.
bad := []byte("not gzip at all <head></head>")
out, ok := injectIntoBody(bad, "gzip", "x", false)
if ok {
t.Fatal("corrupt gzip body must fail open (ok=false)")
}
if !bytes.Equal(out, bad) {
t.Fatalf("fail-open must return the ORIGINAL bytes untouched")
}
}
func TestInjectIntoBodyIdentity(t *testing.T) {
// Identity (empty Content-Encoding): inject directly, grown body returned.
html := []byte(`<html><head></head><body>hi</body></html>`)
out, ok := injectIntoBody(html, "", "deadbeef", false)
if !ok {
t.Fatal("identity inject must report ok=true")
}
if !bytes.Contains(out, []byte(bannerGuard)) {
t.Fatalf("banner absent on identity inject: %s", out)
}
if len(out) <= len(html) {
t.Fatalf("identity inject must GROW the body: got %d, was %d", len(out), len(html))
}
}
func TestInjectIntoBodyUnknownEncodingPassthrough(t *testing.T) {
// br/zstd/deflate (shouldn't occur after the Accept-Encoding pin) → untouched.
body := []byte("\x1f\x8b some br-ish bytes")
out, ok := injectIntoBody(body, "br", "x", false)
if ok {
t.Fatal("unknown encoding must pass through (ok=false)")
}
if !bytes.Equal(out, body) {
t.Fatalf("unknown-encoding passthrough must be byte-for-byte")
}
}
func TestGunzipBombGuard(t *testing.T) {
// A body that inflates beyond gunzipCap must be rejected (not OOM the worker).
// gzip of >32MiB of zeros compresses to a small blob but inflates past the
// cap → gunzipBytes returns an error → inject path fails open.
big := gzipBytes(make([]byte, gunzipCap+1024))
if _, err := gunzipBytes(big); err == nil {
t.Fatal("gunzipBytes must reject output exceeding gunzipCap")
}
// And via the inject path: fail open, original bytes preserved.
out, ok := injectIntoBody(big, "gzip", "x", false)
if ok {
t.Fatal("over-cap gzip body must fail open through injectIntoBody")
}
if !bytes.Equal(out, big) {
t.Fatal("over-cap fail-open must return the original compressed bytes")
}
}
func TestGunzipExactlyAtCap(t *testing.T) {
// A body that inflates to EXACTLY gunzipCap is allowed (boundary).
payload := make([]byte, gunzipCap)
got, err := gunzipBytes(gzipBytes(payload))
if err != nil {
t.Fatalf("exactly-at-cap payload must be allowed: %v", err)
}
if len(got) != gunzipCap {
t.Fatalf("at-cap length mismatch: got %d, want %d", len(got), gunzipCap)
}
}

View File

@ -37,6 +37,7 @@ import (
"net" "net"
"net/http" "net/http"
"os" "os"
"strconv"
"strings" "strings"
"sync" "sync"
"time" "time"
@ -324,6 +325,16 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
clientHash := clientHashFromConn(rawClient) // mac_hash-aware (WG persona) clientHash := clientHashFromConn(rawClient) // mac_hash-aware (WG persona)
anonymizeRequest(req.Header) anonymizeRequest(req.Header)
// #662 — pin the upstream Accept-Encoding to gzip (overwrite, dropping
// br/zstd/deflate we cannot decode with the stdlib). This guarantees every
// response is either gzip or identity, so the inject path can reliably
// gunzip→inject→re-gzip the HTML. We Set (not Del): Del would make Go's
// Transport auto-decompress and re-serve identity, losing wire compression
// to the client for ALL resources (incl. non-injected ones). Set keeps the
// Transport in pass-through mode so non-HTML bodies stay compressed
// end-to-end. Browsers always accept gzip, so relaying gzip back is safe.
req.Header.Set("Accept-Encoding", "gzip")
// proxy upstream, inject into HTML bodies. // proxy upstream, inject into HTML bodies.
up := &http.Client{Timeout: 30 * time.Second} up := &http.Client{Timeout: 30 * time.Second}
if dialHost != "" { if dialHost != "" {
@ -356,9 +367,25 @@ func (px *Proxy) mitmPipeline(tconn *tls.Conn, rawClient net.Conn, host, verdict
// Inject the transparency-banner loader only on 2xx text/html responses // Inject the transparency-banner loader only on 2xx text/html responses
// (mirrors the Python addon, which skips non-200). The loader's same-origin // (mirrors the Python addon, which skips non-200). The loader's same-origin
// <script src="/__toolbox/loader.js"> is served by the short-circuit above. // <script src="/__toolbox/loader.js"> is served by the short-circuit above.
//
// #662 — the body may be gzip-compressed (we pinned Accept-Encoding: gzip
// upstream). injectIntoBody gunzips→injects→re-gzips when Content-Encoding
// is gzip, injects directly when identity, and fails open (untouched) on a
// corrupt/unknown encoding. Only on a successful rewrite do we update the
// framing: writeResponse emits Content-Length from len(body), but a stale
// resp.ContentLength / Content-Encoding could mislead downstream — so we
// keep them consistent with the bytes we actually serve.
if resp.StatusCode >= 200 && resp.StatusCode < 300 && if resp.StatusCode >= 200 && resp.StatusCode < 300 &&
strings.Contains(resp.Header.Get("Content-Type"), "text/html") { strings.Contains(resp.Header.Get("Content-Type"), "text/html") {
body = injectLoader(body, clientHash, wg) if out, ok := injectIntoBody(body, resp.Header.Get("Content-Encoding"), clientHash, wg); ok {
body = out
// Keep the response framing consistent with the served bytes. The
// encoding is unchanged (gzip stays gzip, identity stays identity);
// only the length changed because injection grew the body. A stale
// Content-Length would truncate/corrupt the response.
resp.Header.Set("Content-Length", strconv.Itoa(len(body)))
resp.ContentLength = int64(len(body))
}
} }
writeResponse(tconn, resp, body) writeResponse(tconn, resp, body)
} }

View File

@ -1,3 +1,12 @@
secubox-toolbox-ng (0.1.3-1~bookworm1) bookworm; urgency=medium
* banner: inject into COMPRESSED HTML too. Pin upstream Accept-Encoding to gzip
(stdlib can't brotli), and in the inject path gunzip → injectLoader → re-gzip
(32MiB inflate cap, fail-open on corrupt). Fixes missing banner on the common
gzip/br case; non-HTML passes through untouched. (ref #662)
-- Gerald KERMA <devel@cybermind.fr> Wed, 18 Jun 2026 19:45:00 +0000
secubox-toolbox-ng (0.1.2-1~bookworm1) bookworm; urgency=medium secubox-toolbox-ng (0.1.2-1~bookworm1) bookworm; urgency=medium
* banner: port the real transparency-banner inject — inject the loader * banner: port the real transparency-banner inject — inject the loader