Epstein Unredactor
Home
Architecture
API
GitHub
Home
Architecture
API
GitHub
  • Getting Started

    • Setup & Deployment
    • Local Development
    • Production Deployment
    • /setup-and-deployment/TROUBLESHOOTING.html
  • Architecture

    • Epstein Unredactor — Architecture Overview
  • Redaction Processing

    • Redaction Processing — Backend Logic
    • ProcessRedactions.py
    • BoxDetector.py Documentation
    • SurroundingWordWidth.py Documentation
    • Scale & Size Detection
    • Artifact Visualizer — Documentation
    • width_calculator.py
    • extract_fonts — Font Detection Module
  • Frontend Implementation

    • Frontend — JavaScript Module Reference
    • State Management — state.js
    • PDF Viewer — pdf-viewer.js
    • API & Candidate Logic — api.js
    • UI Events — ui-events.js
    • WebGL Mask — webgl-mask.js
    • Formatting Bridge — text-tool.js
  • API Reference

    • API Reference

Redaction Processing — Backend Logic

This folder documents the core logic modules distributed across the guesser_core, webgl_mask, and text_tool Django apps.

Module Pipeline

flowchart TD
    subgraph guesser_core
        PR["ProcessRedactions"]
        BD["BoxDetector"]
        SW["SurroundingWordWidth"]
    end
    
    subgraph webgl_mask
        AV["artifact_visualizer"]
    end
    
    subgraph text_tool
        WC["width_calculator"]
        EF["extract_fonts"]
    end

    PDF["PDF Bytes"] --> PR
    PR --> BD
    PR --> SW
    
    PDF --> AV
    AV -.->|"depends on core logic"| BD
    
    EF -.- PR
    
    style PR fill:#2d333b,stroke:#81c995
    style BD fill:#2d333b,stroke:#8ab4f8
    style SW fill:#2d333b,stroke:#f28b82
    style AV fill:#2d333b,stroke:#fdd663
    style WC fill:#2d333b,stroke:#c58af9
    style EF fill:#2d333b,stroke:#c58af9

Module Reference

AppModuleDescription
guesser_coreBoxDetectorRow-scan detection of black rectangular boxes
guesser_coreSurroundingWordWidthRefine box edges using positions of nearby words
guesser_coreProcessRedactionsOrchestrator: coordinates detection + refinement
webgl_maskartifact_visualizerAsync generation of grayscale mask PNGs
text_toolwidth_calculatorHarfBuzz text shaping for width measurement
text_toolextract_fontsDominant font detection and mapping

Processing Order

  1. Receive PDF or image bytes from the Django view
  2. Extract embedded page images from PDF using PyMuPDF (extract_page_image_bytes)
  3. Detect black rectangular boxes in each image (BoxDetector)
  4. Refine box edges by measuring gaps to surrounding text words (SurroundingWordWidth)
  5. Return structured JSON with redaction coordinates, text spans, and base64 page images
  6. On demand: Generate grayscale mask PNGs for individual pages (artifact_visualizer)
  7. On demand: Measure pixel widths of candidate names using HarfBuzz (width_calculator)
Edit this page
Last Updated: 3/28/26, 2:37 PM
Contributors: JaguarM
Next
ProcessRedactions.py