Epstein Unredactor
Home
Architecture
API
GitHub
Home
Architecture
API
GitHub
  • Getting Started

    • Setup & Deployment
    • Local Development
    • Production Deployment
    • /setup-and-deployment/TROUBLESHOOTING.html
  • Architecture

    • Epstein Unredactor — Architecture Overview
  • Redaction Processing

    • Redaction Processing — Backend Logic
    • ProcessRedactions.py
    • BoxDetector.py Documentation
    • SurroundingWordWidth.py Documentation
    • Scale & Size Detection
    • Artifact Visualizer — Documentation
    • width_calculator.py
    • extract_fonts — Font Detection Module
  • Frontend Implementation

    • Frontend — JavaScript Module Reference
    • State Management — state.js
    • PDF Viewer — pdf-viewer.js
    • API & Candidate Logic — api.js
    • UI Events — ui-events.js
    • WebGL Mask — webgl-mask.js
    • Formatting Bridge — text-tool.js
  • API Reference

    • API Reference

width_calculator.py

width_calculator.py provides precision text-width measurement for candidate name matching.


Functions

get_text_widths(texts, font_name, font_size, force_uppercase, scale_factor, kerning, ligatures)

Calculates pixel widths for a list of text strings.

Parameters:

ParameterTypeDefaultDescription
textslist[str]—Strings to measure
font_namestr"times.ttf"Font filename
font_sizeint/float12Font size in points
force_uppercaseboolFalseConvert text to uppercase before measuring
scale_factorfloat1.35Multiplier applied to the raw advance width
kerningboolTrueEnable OpenType kern feature
ligaturesboolTrueEnable liga and clig features

Output:

[{"text": "Jeffrey Epstein", "width": 89.472}, ...]

Font Resolution

The font is searched in this order:

  1. Direct path (font_name as-is)
  2. assets/fonts/{font_name}
  3. assets/fonts/{font_name}.ttf

System font directories are intentionally excluded to ensure consistent results across environments.


HarfBuzz Engine (Primary)

When uharfbuzz is available:

face = hb.Face(font_data)
font = hb.Font(face)
upem = face.upem   # units per em

buf = hb.Buffer()
buf.add_str(text)
buf.guess_segment_properties()

hb.shape(font, buf, features)

total_advance = sum(pos.x_advance for pos in buf.glyph_positions)
pixel_width = (total_advance / upem) * font_size * scale_factor

Features controlled:

FeatureEnabledDisabled
kernDefaultkerning=False
ligaDefaultligatures=False
cligDefaultligatures=False
dligNeverligatures=False

Pillow Fallback

If HarfBuzz fails or is not installed, falls back to ImageFont.truetype() with font.getlength(). This method does not support fine-grained kerning/ligature control.


get_available_fonts()

Scans the assets/fonts/ directory and returns a list of .ttf filenames.

Output: ["times.ttf", "arial.ttf", ...]

Used by the /fonts-list API endpoint to populate the frontend font dropdown.


Scale Factor

scale_factor is the multiplier that converts a raw typographic advance (in font points) into the image pixel width used by the redaction overlay coordinates.

Formula

pixel_width = (advance / upem) × font_size_pt × scale_factor

For the width to match a redaction box measured in the 816 × 1056 px embedded page images:

scale_factor = img_width_px / page_width_pt
             = 816 / 612
             = 4/3
             ≈ 1.3333

This is equivalent to converting from 72 dpi (PDF points) to 96 dpi (screen pixels): 96 / 72 = 4/3.

How the frontend sets scale_factor

The /analyze-pdf response includes suggested_scale (an integer percentage). views.py divides it by 100 before passing it to get_text_widths():

scale_factor = scale / 100.0   # e.g. 133 / 100 = 1.33

The auto-detected value suggested_scale = 133 corresponds to scale_factor ≈ 1.333, which correctly maps 12 pt Times New Roman to its pixel width in the embedded page images.

Note: The function signature's default scale_factor=1.35 is a legacy approximation of 4/3. In normal operation the frontend always supplies an explicit scale from the suggested_scale auto-detection, so the default is rarely used.

For a full derivation of the correct scale value and why the old formula ((median_size / 12) × (816/612)² × 100 ≈ 178) was incorrect, see Scale & Size Detection.

Edit this page
Last Updated: 4/6/26, 10:34 AM
Contributors: JaguarM
Prev
Artifact Visualizer — Documentation
Next
extract_fonts — Font Detection Module