Skip to content

fix: the /embed and /detect post endpoints accept fi... in server.py#4

Open
orbisai0security wants to merge 13 commits into
facex-engine:mainfrom
orbisai0security:fix-v-001-file-upload-validation
Open

fix: the /embed and /detect post endpoints accept fi... in server.py#4
orbisai0security wants to merge 13 commits into
facex-engine:mainfrom
orbisai0security:fix-v-001-file-upload-validation

Conversation

@orbisai0security
Copy link
Copy Markdown

Summary

Fix critical severity security issue in docker/server.py.

Vulnerability

Field Value
ID V-001
Severity CRITICAL
Scanner multi_agent_ai
Rule V-001
File docker/server.py:80

Description: The /embed and /detect POST endpoints accept file uploads without validating file type (via magic bytes), enforcing file size limits, or sanitizing content before passing to the underlying C/C++ image processing pipeline. An attacker can upload a crafted malicious file designed to trigger memory corruption in the native code, which has no authentication barrier.

Changes

  • docker/server.py

Verification

  • Build passes
  • Scanner re-scan confirms fix
  • LLM code review passed

Automated security fix by OrbisAI Security

noreply and others added 13 commits May 13, 2026 18:03
…spoof

Recognition (MobileFaceNet)
- 4 size variants (nano / tiny / standard / xs) trained from scratch on
  MS1M-RefineV2 with ArcFace head using the numerically-stable
  angle-addition margin (no acos), bf16 autocast.
- LFW accuracy after YuNet 5-pt alignment: 95.6 / 96.9 / 98.3 / 99.1.
- BN folded into conv weights at export time; .bin format for the nn2
  C engine and ONNX export for the browser pipeline.

Landmarks
- Own 98-point WFLW landmark ConvNet (1.15M params, MobileFaceNet-style
  backbone + Linear head).
- 478-point 3D landmark model distilled from MediaPipe FaceMesh with
  TPS-rendered supervision. Final error xy 0.54 px, z 0.51.
- WFLW->MP hybrid TPS template for pose-robust dense mesh rendering.

Face detector
- Own YuNet-style detector trained from scratch on WIDER FACE.
- ~100K params, FCOS-style anchor-free heads at strides 8/16/32,
  PReLU activations, GIoU bbox + focal cls loss + landmark L1 (when
  available). 320x320 input.
- Replaces OpenCV YuNet in the browser demo (facex_detect.onnx, 401 KB).

Anti-spoof
- Integrated MiniFASNet ensemble (V2 @ scale 2.7 + V1SE @ scale 4.0)
  from MinivisionAI Silent-Face-Anti-Spoofing (Apache 2.0).
- Correct preprocessing: non-square crop bbox*scale, raw [0,255] BGR
  (their ToTensor has .div(255) commented out — critical detail).

nn2 (Pure C YOLO inference engine, MiniFASNet port added)
- New AVX-512 ops: PReLU (per-channel + per-element 1D), global avg
  pool, channel multiply (SE excitation), Linear FC, Softmax, in-place
  add for residuals.
- Full MiniFASNet forward pass: 1x1 expand -> 3x3 DW -> 1x1 project
  with PReLU, SE-block for V1SE (GAP -> FC down -> ReLU -> FC up ->
  sigmoid -> per-channel multiply), residual add.
- PyTorch -> .bin converter with BN-fold and BN1d->Linear-fold.
- 2.03x speedup vs ONNX Runtime on the V2+V1SE ensemble (CPU AVX-512).
- Byte-identical predictions to ONNX/PyTorch on sample images.

Browser demo (wasm/demo_mesh.html)
- onnxruntime-web pipeline: detect -> 98pt landmark -> 478pt 3D mesh
  -> embedding -> anti-spoof, all in WebAssembly.
- Modes: hybrid (default), true 3D, dense TPS, scan, holo, detect-only,
  face ID, off.
- EMA smoothing on 98-pt landmarks (alpha=0.35 with 25 px snap), on
  anti-spoof P(live), and on face-ID cosine similarity for pose-tolerant
  matching.
- Face-ID uses tiny (1.8 MB, 512-dim) recognition with threshold 0.15.
- Anti-spoof uses wider crop with aspect-correct resize.

Model protection
- AES-256-GCM encryption of all ONNX weights, served as *.enc.
- WebCrypto API decrypts in browser at session start; inference stays
  100% client-side.
- 256-bit key split into two arrays, XORed at use, key buffer wiped
  after all sessions are loaded.
- Plain weights moved out of the served wasm/ directory.
- .gitignore strips *.onnx, *.bin, *.pt, *.enc, .model_key and all
  training data so nothing sensitive lands in the repo.

Training pipeline
- WIDER FACE downloader + dataset loader (RetinaFace 5-pt landmark
  format with WIDER official bbox-only fallback).
- Anti-spoof scene-composited dataset (real + print + screen attacks
  rendered as 112x112 scenes with face placed inside phone bezel /
  paper sheet, with hand silhouettes and synthetic backgrounds).
- Export scripts for every model variant; encrypt_models.py for
  AES-GCM encryption of all served weights.
Replaces the old detection-only demo with the complete browser pipeline:
face detector -> 98-point landmarks -> 478-point 3D mesh -> recognition
-> MiniFASNet anti-spoof. All models AES-256-GCM encrypted, decrypted by
WebCrypto in the browser. Inference stays 100% client-side.

- docs/demo/index.html: full demo (hybrid mesh, true-3D, dense TPS,
  scan, holo, detect-only, face-ID modes)
- docs/demo/*.enc: 7 encrypted ONNX models (~17 MB total)
- docs/demo/*.json: mesh topology + WFLW->MP hybrid template
- README rewritten around the demo and ownership of every model
- .gitignore whitelists docs/demo/*.enc so GitHub Pages can serve them
Demo is live on GitHub Pages:
  https://facex-engine.github.io/facex/demo/

Highlights now in the README:
- Every model in the demo is trained from scratch (4 recognition variants,
  98pt landmark, 478pt 3D mesh, custom face detector on WIDER FACE)
- AES-256-GCM weight encryption with WebCrypto in-browser decryption
- The whole surveillance pipeline is pure C, zero-deps, flashable to
  firmware: NexusDecode (184 KB, 46x FFmpeg), NexusEncode H.265, NXV
  surveillance video format (3x smaller than H.265), nn2 YOLO engine
  (1.5x ONNX), FaceX (148 KB native / 17 MB WASM)
- Whole stack <2 MB binary, runs 70 IP cameras on a single i5 core
README
- LFW badge 99.73% -> 99.07% (our xs, ours-trained, not EdgeFace)
- Drop the outdated 'EdgeFace-XS / ConvNeXt + XCA' architecture; document
  the real MobileFaceNet topology actually shipped
- 'Models' section listing all 4 recognition variants, our face detector,
  98-pt landmarks, 478-pt 3D mesh, and the third-party MiniFASNet
- Bench tables split: recognition + anti-spoof (with the nn2 2x speedup)
  + face detection (own, WIDER FACE)
- License: trained weights are Apache 2.0 (we own them); only MiniFASNet
  is upstream, also Apache
- FAQ: detection is now YES, encryption explained, ESP32/ARM roadmap
- Repo layout updated with training/, nn2/, docs/demo/, wasm/tools/
- Drop the bogus facex-sdk.js / 48KB-WASM browser snippet; show the
  WebCrypto -> ORT decryption path that actually runs

Demo
- FaceX logo in the header, with subtle glow
- Hybrid-mesh line width 0.04+0.11*z -> 0.18+0.42*z for stronger
  visual impact + slightly more alpha
- Logo also bundled into /docs/demo for GitHub Pages
… hook

- 'Where it runs' table: browser / x86 / Apple Silicon / ARM Linux / NXP iMX /
  ESP32-P4 / firmware, with status per target (shipping / in PR / draft)
- NexusDecode clarified as H.264 *and* H.265 (was reading like H.264 only)
- 'Why FaceID with FaceX instead of cloud APIs' table: monthly cost for
  100K MAU vs AWS Rekognition / Azure Face / Google Vision / FaceTec
- GDPR / HIPAA / 152-FZ / KZ-data-localization angle: frames never leave
  the device -> compliance by construction
- 'Where it's been deployed' note with direct contact hook
Automated security fix generated by Orbis Security AI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants