Epstein Unredactor
Technical Documentation for the Epstein PDF Analysis Tool
Redaction Detection
OpenCV-based row scanning detects precise 816x1056px redaction boundaries in scanned PDF images.
Width Matching
High-precision HarfBuzz text shaping measures candidate names against detected pixel widths.
WebGL Visualization
GPU-accelerated masks provide real-time interactive overlays for visual verification.
Epstein Unredactor Documentation
Welcome to the technical documentation for the Epstein Unredactor. This guide covers the internal logic, architecture, and deployment strategies for analyzing redacted documents.
Core Concepts
The tool operates on a "Core + Plugin" architecture:
- Core: Handles PDF parsing, image extraction, and basic redaction box detection.
- Plugins: Optional features like WebGL masking and typography tools are isolated into independent Django apps for modularity.
Navigation
- Architecture Overview: Understand the high-level system design.
- Backend Logic: Deep dive into the Python processing pipeline.
- Frontend Implementation: Explore the vanilla JS and WebGL rendering engine.
- API Reference: Detailed documentation of all JSON endpoints.
- Setup & Deployment: Instructions for local development and production.