Epstein Unredactor
Home
Architecture
API
GitHub
Home
Architecture
API
GitHub
  • Getting Started

    • Setup & Deployment
    • Local Development
    • Production Deployment
    • /setup-and-deployment/TROUBLESHOOTING.html
  • Architecture

    • Epstein Unredactor — Architecture Overview
  • Redaction Processing

    • Redaction Processing — Backend Logic
    • ProcessRedactions.py
    • BoxDetector.py Documentation
    • SurroundingWordWidth.py Documentation
    • Scale & Size Detection
    • Artifact Visualizer — Documentation
    • width_calculator.py
    • extract_fonts — Font Detection Module
  • Frontend Implementation

    • Frontend — JavaScript Module Reference
    • State Management — state.js
    • PDF Viewer — pdf-viewer.js
    • API & Candidate Logic — api.js
    • UI Events — ui-events.js
    • WebGL Mask — webgl-mask.js
    • Formatting Bridge — text-tool.js
  • API Reference

    • API Reference

API Reference

The Django backend exposes several HTTP endpoints organized into modular apps.

Note: All POST endpoints use @csrf_exempt — no CSRF token is required. There is no authentication.

Endpoints

guesser_core (Base Viewer)

MethodPathDescription
GET/Serves the single-page application
POST/analyze-pdfUpload a PDF or image for redaction analysis
GET/analyze-defaultProcesses the bundled default PDF

text_tool (Typography Plugin)

MethodPathDescription
POST/widthsCalculate pixel widths for candidate text strings
GET/fonts-listList available font files

webgl_mask (GPU Visualization Plugin)

MethodPathDescription
POST/webgl/masksGenerate all redaction masks for an uploaded PDF
GET/webgl/masks?default=trueGenerate all masks for the default PDF

POST /analyze-pdf

Upload a file (PDF or image) for redaction box detection and text span extraction.

Request

  • Content-Type: multipart/form-data
  • Body: Form field file containing the uploaded file

Supported formats:

  • PDF (application/pdf)
  • Images: PNG, JPEG, TIFF, BMP, WebP

Response — 200 OK

{
  "redactions": [
    {
      "page": 1,
      "x": 203.0,
      "y": 438.0,
      "width": 121.53,
      "height": 16.0,
      "area": 1944.48
    }
  ],
  "spans": [
    {
      "page": 1,
      "text": "Confidential",
      "font": {
        "size": 12.0,
        "flags": 0,
        "matched_font": "TimesNewRomanPSMT"
      }
    }
  ],
  "pdf_fonts": ["TimesNewRomanPSMT", "TimesNewRomanPS-BoldMT"],
  "suggested_scale": 133,
  "suggested_size": 12.0,
  "suggested_font": "times.ttf",
  "page_images": ["base64-encoded-PNG-string", null, "..."],
  "page_image_type": "image/png",
  "page_width": 816,
  "page_height": 1056,
  "num_pages": 3
}
FieldTypeDescription
redactionsarrayDetected redaction boxes sorted by page, then y, then x. Coordinates are in the embedded image's pixel space.
spansarrayText spans with font metadata (PDF only, always [] for images)
pdf_fontsarrayBase-font names declared in the PDF, sorted by number of pages they appear on (most common first). [] for images.
suggested_scaleintRecommended "Scale %" for the width calculator. 133 for standard 816 px / 612 pt letter pages. See Scale & Size Detection.
suggested_sizefloatDominant body-text font size in points, detected from text spans. 12.0 when unknown.
suggested_fontstr | null.ttf filename of the dominant font (e.g. "times.ttf"). null if the font could not be matched to an available file.
page_imagesarrayBase64-encoded PNG for each page (one per page, null if no embedded image found on that page)
page_image_typestringMIME type of the page images — always "image/png"
page_width / page_heightintPixel dimensions of the page images (816 × 1056 for standard PDFs; actual image dimensions for raw image uploads)
num_pagesintTotal number of pages

Errors

StatusReason
400No file uploaded or no file selected
500Processing error (detail in response body)

POST /widths

Calculate pixel widths for a list of text strings using HarfBuzz text shaping.

Request

  • Content-Type: application/json
{
  "strings": ["Jeffrey Epstein", "Ghislaine Maxwell"],
  "font": "times.ttf",
  "size": 12,
  "scale": 133,
  "kerning": true,
  "ligatures": true,
  "force_uppercase": false
}
FieldTypeDefaultDescription
stringsarray[]Text strings to measure
fontstring"times.ttf"Font filename from assets/fonts/
sizenumber12Font size in points
scalenumber135Scale percentage (divided by 100 internally to get scale_factor)
kerningbooltrueEnable OpenType kern feature
ligaturesbooltrueEnable liga/clig features
force_uppercaseboolfalseMeasure uppercase version of each string

The width formula applied by the backend is:

pixel_width = (advance / upem) × size × (scale / 100)

With scale = 133 and size set to the document's body-text size, this matches the pixel-space width of that text as it appears in the embedded page images.

Response — 200 OK

{
  "results": [
    { "text": "Jeffrey Epstein", "width": 89.472 },
    { "text": "Ghislaine Maxwell", "width": 107.136 }
  ]
}

GET /fonts-list

Returns a JSON array of available .ttf font filenames from assets/fonts/.

Response — 200 OK

["times.ttf", "arial.ttf", "courier_new.ttf", "calibri.ttf"]

POST /webgl/masks

Asynchronously generates redaction masks for an entire document. This is separated from /analyze-pdf to improve response times for the main layout.

Request

  • Content-Type: multipart/form-data
  • Body: Form field file containing the same PDF previously sent to /analyze-pdf.

Response — 200 OK

{
  "mask_images": [
    "base64-encoded-PNG-mask-string",
    null,
    "base64-encoded-PNG-mask-string"
  ]
}
FieldTypeDescription
mask_imagesarrayArray of base64-encoded grayscale PNG masks (one per page). null suggests no redactions on that page.

GET /webgl/masks?default=true

Utility endpoint to fetch masks for the bundled default demonstration PDF.

Response — 200 OK

Returns the same schema as POST /webgl/masks.

Edit this page
Last Updated: 3/28/26, 1:35 PM
Contributors: JaguarM