More hardening and migration from Drone to Woodpecker

This commit is contained in:
2026-02-01 19:11:32 -03:00
parent a0f148c3ef
commit 5035ed118d
12 changed files with 558 additions and 112 deletions

236
docs/SECURITY-AUDIT.md Normal file
View File

@@ -0,0 +1,236 @@
# Security Audit Report — Mail Autoconfig
**Audit date:** 2025-02-01
**Scope:** Full application and Docker deployment hardening; prevention of host/container compromise.
**Re-audit (Linode / past compromise):** 2025-02-01 — Additional hardening applied after reported server takeovers; dependency CVEs and bind-address issues addressed.
---
## Executive Summary
The application already has solid security measures (host validation, domain sanitization, rate limiting, non-root container, read-only filesystem). This audit identified **one critical deployment bug**, several **medium-priority** hardening gaps, and **low-priority** improvements. Addressing the critical and medium items will significantly reduce risk of abuse and container/host compromise.
---
## Critical
### C1. Docker Compose — Wrong Backend Port (Traefik)
**Location:** `docker-compose.yml`
**Issue:** Several Traefik service labels use `loadbalancer.server.port=808080` instead of `8080`. The app listens on `127.0.0.1:8080` (Dockerfile). With port `808080`, Traefik cannot reach the app; those hosts will get connection errors.
**Affected labels (port 808080):**
- mailconfig-mifi-holdings
- mailconfig-mifi-com-br
- mailconfig-mifi-dev
- mailconfig-mifi-ventures
- mailconfig-mifi-vix-br
- mailconfig-mifi-me
**Impact:** Service broken for those domains; no direct security exploit, but must be fixed for correct and consistent deployment.
**Remediation:** Change all `server.port=808080` to `server.port=8080` in `docker-compose.yml`.
---
## High
### H1. X-Forwarded-For Spoofing / Rate Limit Bypass
**Location:** `app.py``rate_limit()`
**Issue:** Client IP is taken from `X-Forwarded-For` (first value). If Traefik does not overwrite this header with the real client IP, any client can send arbitrary `X-Forwarded-For` values. That allows:
- Bypassing rate limits by rotating spoofed IPs.
- Concentrating rate-limit “cost” on a victim IP (abuse/DoS).
**Remediation:**
1. Prefer **one** trusted proxy header (e.g. `X-Real-IP` or the rightmost/last `X-Forwarded-For` after your proxy) and configure Traefik to set it from the real client.
2. If you must use `X-Forwarded-For`, ensure Traefik overwrites it (or appends the real client) and that the app only trusts one proxy hop (e.g. via Werkzeug `ProxyFix` with `x_for=1`), then use `request.remote_addr` for rate limiting so the app uses the proxys notion of client IP, not raw headers.
3. Document the chosen header and proxy configuration so future changes dont re-open spoofing.
### H2. Request Body Size When Content-Length Is Missing
**Location:** `app.py``validate_request_size()`
**Issue:** Only `request.content_length` is checked. If the client omits `Content-Length` (or uses chunked encoding), the check is skipped. A malicious client could send a very large body and increase memory/CPU usage (DoS).
**Remediation:**
- For POST (e.g. Autodiscover): require `Content-Length` and reject (e.g. 411 or 400) when missing, or enforce a hard body read limit in WSGI/Gunicorn.
- Optionally use a small `max_content_length` on the Flask app and/or Gunicorn body size limits so oversized bodies are rejected even when `Content-Length` is missing or wrong.
### H3. Jinja2 Autoescape May Not Apply to `.j2` Templates
**Location:** `app.py``jinja2.Environment(autoescape=jinja2.select_autoescape(['xml']))`
**Issue:** `select_autoescape(['xml'])` typically matches template **filename extension**. Templates are named `config-v1.1.xml.j2` and `Autodiscover.xml.j2`; the extension used by Jinja2 may be `.j2`, not `.xml`, so XML autoescape might not be enabled. Output is then safe only because `sanitize_domain()` restricts to `[a-zA-Z0-9.-]`.
**Remediation:** Rely on explicit escaping for defense in depth: use `autoescape=True` for this environment, or a custom predicate that returns `True` for these template names (e.g. names containing `xml` or ending in `.xml.j2`). Ensures XML-sensitive characters are escaped if sanitization is ever relaxed or bypassed.
---
## Medium
### M1. /ping Endpoint Does Not Validate Host
**Location:** `app.py``ping()`
**Issue:** `/ping` has no `@validate_host`. Any `Host` can be used. Impact is low (only reveals that the service is up), but it weakens defense in depth and allows probing behind the proxy.
**Remediation:** Add `@validate_host` to `/ping`, or restrict `/ping` to internal use (e.g. only when request comes from loopback or from Traefik health checks) and document that it must not be exposed to the internet.
### M2. No Explicit Capability Dropping in Container
**Location:** `docker-compose.yml`
**Issue:** `security_opt: no-new-privileges:true` is set, but capabilities are not explicitly dropped. Default Docker capabilities (e.g. NET_RAW, SETPCAP) are still available; dropping unneeded ones reduces attack surface.
**Remediation:** Add:
```yaml
cap_drop:
- ALL
```
If the process or healthcheck needs specific capabilities (unlikely for this app), add them back with `cap_add` only as needed.
### M3. Dependency Versions Not Pinned
**Location:** `Dockerfile``pip install Flask Jinja2 gunicorn`
**Issue:** No `requirements.txt` with versions. Builds can pull different versions over time, leading to supply-chain and compatibility risk and making security patches harder to track.
**Remediation:** Add `requirements.txt` with pinned versions (e.g. `Flask==3.x.x`, `Jinja2==3.x.x`, `gunicorn==21.x.x`), use `pip install -r requirements.txt` in the Dockerfile, and review/update versions in a controlled way.
### M4. Rate Limit Storage Unbounded Growth
**Location:** `app.py``defaultdict(deque)`
**Issue:** Old entries are removed per-IP when that IP makes a new request, but IPs that stop requesting are never cleaned. Under heavy scanning (many distinct IPs), memory can grow without bound.
**Remediation:** Use a TTL-based structure (e.g. Redis with expiry), or a background task / periodic cleanup that drops keys older than `RATE_LIMIT_WINDOW`. For in-memory, a single global cleanup on each request (e.g. prune keys whose last request is older than the window) limits growth.
---
## Low
### L1. Unused Import
**Location:** `app.py``import html`
**Issue:** `html` is never used. Minor hygiene and static-analysis noise.
**Remediation:** Remove `import html`.
### L2. Host Header in HTML Links
**Location:** `app.py``index()`
**Status:** `host` is taken from the request but only after `@validate_host`, so it is one of `ALLOWED_HOSTS`. Using it in links is safe. No change required; noted for completeness.
### L3. Flask Secret Key
**Location:** `app.py`
**Status:** No sessions or signed cookies are used. No `SECRET_KEY` is required for current functionality. If you add session or cookie-based features later, set a strong `app.secret_key` (from env, not hardcoded).
---
## Remediation Plan
| Priority | ID | Action | Status |
|----------|------|--------|--------|
| Critical | C1 | Fix `docker-compose.yml`: change all `server.port=808080` to `8080`. | **Done** |
| High | H1 | Use trusted proxy (ProxyFix) and rate-limit by `request.remote_addr`; document proxy config. | **Done** |
| High | H2 | Enforce body size: require Content-Length for POST; set `MAX_CONTENT_LENGTH`. | **Done** |
| High | H3 | Set Jinja2 `autoescape=True` so XML templates are always escaped. | **Done** |
| Medium | M1 | Add `@validate_host` to `/ping`; healthcheck must send allowed Host. | **Done** |
| Medium | M2 | Add `cap_drop: [ALL]` in `docker-compose.yml`. | **Done** |
| Medium | M3 | Add `requirements.txt` with pinned versions; use it in Dockerfile. | **Done** |
| Medium | M4 | Add periodic cleanup of stale rate-limit keys when storage > 1000. | **Done** |
| Low | L1 | Remove unused `import html`. | **Done** |
| Optional | O1 | Gunicorn `--limit-request-line` / `--limit-request-fields`. | **Done** |
| Optional | O2 | Add `Permissions-Policy` header. | **Done** |
**Deployment note (H1):** Ensure Traefik overwrites `X-Forwarded-For` with the real client IP (or prepends it). With `ProxyFix(x_for=1)`, the app trusts the leftmost value as the client IP; if the proxy does not set it, clients could spoof it.
---
## Already Strong
- **Host validation:** Strict allowlist and `@validate_host` on sensitive routes.
- **Domain sanitization:** `sanitize_domain()` restricts to `[a-zA-Z0-9.-]` and length; used before any template render.
- **Request size limits:** Per-route limits (with the Content-Length gap noted above).
- **Security headers:** X-Content-Type-Options, X-Frame-Options, CSP, etc.
- **Container:** Non-root user, bind to 127.0.0.1, read-only filesystem, tmpfs for /tmp, resource limits, `no-new-privileges`.
- **No user-controlled paths or commands:** No path traversal or command injection surfaces found.
Implementing the critical and high items, then the medium items, will bring the service to a hardened state suitable for production and minimize risk to the host the Docker container runs on.
---
## Verification (Current Codebase)
A follow-up review confirms the following in the current implementation:
| ID | Finding | Verified |
|----|---------|----------|
| C1 | All Traefik `loadbalancer.server.port` values are `8080` in `docker-compose.yml`. | ✅ |
| H1 | `ProxyFix(app.wsgi_app, x_for=1, x_proto=1)` and rate limit uses `request.remote_addr`. | ✅ |
| H2 | POST without `Content-Length` returns 411; `MAX_CONTENT_LENGTH = 4096` set. | ✅ |
| H3 | Jinja2 `Environment(autoescape=True)`. | ✅ |
| M1 | `/ping` has `@validate_host`. | ✅ |
| M2 | `cap_drop: [ALL]` in `docker-compose.yml`. | ✅ |
| M3 | `requirements.txt` with pinned Flask, Jinja2, gunicorn; Dockerfile uses `pip install -r requirements.txt`. | ✅ |
| M4 | `_prune_stale_rate_limits()` when storage > 1000; called on each request. | ✅ |
| L1 | No `import html` in `app.py`. | ✅ |
No path traversal, command execution, `send_file`, or user-controlled file/path usage. Autodiscover POST body is never read or parsed, so malicious XML does not reach the app. Domain and Host are either allowlisted or sanitized before use in templates or HTML.
---
## Container Escape / Host Compromise Resistance
The deployment is structured to limit impact of a compromised app or container:
1. **Non-root**: Process runs as `appuser`; no root inside container.
2. **Capabilities**: `cap_drop: ALL`; no `CAP_SYS_ADMIN`, `CAP_NET_RAW`, etc., reducing kernel-level escape options.
3. **Privilege escalation**: `no-new-privileges:true` prevents gaining new privileges via setuid or similar.
4. **Filesystem**: `read_only: true` with `tmpfs` only for `/tmp`; no persistent writable host paths; no volume mounts from host.
5. **Network**: App binds to `127.0.0.1:8080`; only Traefik on the same Docker network can reach it; no `network_mode: host`.
6. **Resources**: Memory and CPU limits reduce resource-exhaustion and some DoS impact.
7. **No host access**: No Docker socket, no host PID/filesystem mounts, no `privileged` mode.
An attacker who gains code execution in the app can only affect the container and its tmpfs; they cannot access the host filesystem, other containers, or the host network without a separate host or orchestration vulnerability.
---
## Additional Optional Hardening
| ID | Suggestion | Priority | Status |
|----|------------|----------|--------|
| O1 | **Gunicorn request limits** | Low | **Done**`--limit-request-line 4094` and `--limit-request-fields 100` in Dockerfile CMD. |
| O2 | **Permissions-Policy header** | Low | **Done** — Added in `add_security_headers()`. |
| O3 | **Dependency updates** | Operational | Run `pip-audit` or Dependabot periodically; refresh pinned versions in `requirements.txt` after testing. |
O1 and O2 are implemented; O3 is an ongoing operational practice.
---
## Linode / Past Compromise — Additional Hardening (2025-02-01)
After reported takeovers of the Linode server, a second pass identified and fixed the following.
### Critical Fixes Applied
| ID | Issue | Fix |
|----|--------|-----|
| **L1** | **Bind 127.0.0.1 made app unreachable by Traefik** | Traefik runs in a *separate* container. Binding to `127.0.0.1:8080` inside this container means only processes *inside this container* can connect; Traefik cannot. Service was either broken or was previously bound to `0.0.0.0`. **Fix:** Bind to `0.0.0.0:8080`. Exposure is limited to the Docker network only (Traefik and any other container on that network). The host and internet cannot connect directly to this port. |
| **L2** | **Jinja2 3.1.4 had known CVEs** | CVE-2024-56201 (sandbox breakout), CVE-2024-56326, CVE-2025-27516 (|attr filter). Even though this app only passes sanitized `DOMAIN` and fixed template names, defense in depth requires a patched library. **Fix:** Upgraded to Jinja2 3.1.6 in `requirements.txt`. |
| **L3** | **No HSTS** | Browsers could be downgraded to HTTP without a strict directive. **Fix:** Added `Strict-Transport-Security: max-age=31536000; includeSubDomains; preload` in `add_security_headers()`. |
| **L4** | **Unsafe HTTP methods not rejected** | TRACE, CONNECT, etc. could be used for probing or abuse. **Fix:** `@app.before_request` rejects any method not in `GET, POST, HEAD, OPTIONS` with 405. |
| **L5** | **Default error pages could leak info** | Flasks default 404/500 might expose stack traces or paths in some configs. **Fix:** Custom `@app.errorhandler(404|500|413)` returning minimal JSON only. |
| **L6** | **Explicit DEBUG/TESTING off** | Ensures debug mode is never enabled by env or mistake. **Fix:** `app.config['DEBUG'] = False` and `app.config['TESTING'] = False`. |
| **L7** | **Gunicorn worker temp dir** | With read-only rootfs, workers need a writable temp dir. **Fix:** `--worker-tmp-dir /tmp` in Dockerfile CMD (tmpfs). |
| **L8** | **Network isolation documentation** | Untrusted containers on the same Docker network could reach this service. **Fix:** Comment in `docker-compose.yml`: only attach trusted containers to the `traefik` network. |
### If Your Linode Was Compromised — Checklist
1. **Assume full compromise:** Rotate all secrets (SSH keys, API keys, DB passwords, Traefik certs, any env vars). Revoke and reissue.
2. **Reimage the host** if you cannot guarantee persistence was removed (kernel modules, cron, authorized_keys, other backdoors).
3. **Audit every service on the host:** SSH (disable password auth, use keys only), Traefik, Docker socket access, any other containers or processes. Ensure no container runs `privileged` or mounts Docker socket unless strictly required.
4. **Firewall:** On the Linode, allow only required ports (e.g. 22, 80, 443). Block direct access to Docker daemon and to backend ports from the internet.
5. **Docker network:** Only attach trusted containers (e.g. Traefik) to the `traefik` network. Do not run untrusted or third-party containers on that network.
6. **Dependencies:** Run `pip-audit` (or equivalent) in CI and after any `requirements.txt` change. Rebuild and redeploy this image after upgrading Jinja2 and any other pinned deps.
7. **This image:** Rebuild from current Dockerfile and `requirements.txt` (Jinja2 3.1.6, bind 0.0.0.0, all hardening above). Do not reuse old images that may have had vulnerable Jinja2 or different config.
8. **Traefik:** Ensure it overwrites/sets `X-Forwarded-For` (or equivalent) with the real client IP so this apps rate limiting and logging see the correct IP.
Implementing the critical and high items, then the medium items, and the Linode-specific fixes above, brings the service to a fully hardened state and reduces risk of host takeover from this container.