Redaction Processing — Backend Logic
This folder documents the core logic modules distributed across the guesser_core, webgl_mask, and text_tool Django apps.
Module Pipeline
flowchart TD
subgraph guesser_core
PR["ProcessRedactions"]
BD["BoxDetector"]
SW["SurroundingWordWidth"]
end
subgraph webgl_mask
AV["artifact_visualizer"]
end
subgraph text_tool
WC["width_calculator"]
EF["extract_fonts"]
end
PDF["PDF Bytes"] --> PR
PR --> BD
PR --> SW
PDF --> AV
AV -.->|"depends on core logic"| BD
EF -.- PR
style PR fill:#2d333b,stroke:#81c995
style BD fill:#2d333b,stroke:#8ab4f8
style SW fill:#2d333b,stroke:#f28b82
style AV fill:#2d333b,stroke:#fdd663
style WC fill:#2d333b,stroke:#c58af9
style EF fill:#2d333b,stroke:#c58af9
Module Reference
| App | Module | Description |
|---|---|---|
| guesser_core | BoxDetector | Row-scan detection of black rectangular boxes |
| guesser_core | SurroundingWordWidth | Refine box edges using positions of nearby words |
| guesser_core | ProcessRedactions | Orchestrator: coordinates detection + refinement |
| webgl_mask | artifact_visualizer | Async generation of grayscale mask PNGs |
| text_tool | width_calculator | HarfBuzz text shaping for width measurement |
| text_tool | extract_fonts | Dominant font detection and mapping |
Processing Order
- Receive PDF or image bytes from the Django view
- Extract embedded page images from PDF using PyMuPDF (
extract_page_image_bytes) - Detect black rectangular boxes in each image (
BoxDetector) - Refine box edges by measuring gaps to surrounding text words (
SurroundingWordWidth) - Return structured JSON with redaction coordinates, text spans, and base64 page images
- On demand: Generate grayscale mask PNGs for individual pages (
artifact_visualizer) - On demand: Measure pixel widths of candidate names using HarfBuzz (
width_calculator)