Epstein Unredactor

Technical Documentation for the Epstein PDF Analysis Tool

Get Started API Reference

Redaction Detection

OpenCV-based row scanning detects precise 816x1056px redaction boundaries in scanned PDF images.

Width Matching

High-precision HarfBuzz text shaping measures candidate names against detected pixel widths.

WebGL Visualization

GPU-accelerated masks provide real-time interactive overlays for visual verification.

Epstein Unredactor Documentation

Welcome to the technical documentation for the Epstein Unredactor. This guide covers the internal logic, architecture, and deployment strategies for analyzing redacted documents.

Core Concepts

The tool operates on a "Core + Plugin" architecture:

Core: Handles PDF parsing, image extraction, and basic redaction box detection.
Plugins: Optional features like WebGL masking and typography tools are isolated into independent Django apps for modularity.

Architecture Overview: Understand the high-level system design.
Backend Logic: Deep dive into the Python processing pipeline.
Frontend Implementation: Explore the vanilla JS and WebGL rendering engine.
API Reference: Detailed documentation of all JSON endpoints.
Setup & Deployment: Instructions for local development and production.

Redaction Detection

Width Matching

WebGL Visualization

Epstein Unredactor Documentation

Core Concepts

Navigation