Structural Diff API
A REST API that compares an AI-generated transcript against its annotator post-edit — detecting row-level structural changes (splits, merges, modifications, additions, deletions), with per-column diff detail, CER/WER/SegER/SER/SACR scoring, and a composite quality grade per batch.
Quick Start
No SDK needed. Send a POST request with your two arrays of transcript rows and receive a full diff in JSON. The API is in tasting phase — request an API key to get started.
1. Verify the service is live:
curl https://structural-diff-engine.onrender.com/v1/health2. Run a comparison:
curl -X POST https://structural-diff-engine.onrender.com/v1/diff \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-d '{
"original": [
{ "speaker": "Alice", "start_time": 0, "end_time": 1, "transcript": "Hello world" },
{ "speaker": "Bob", "start_time": 1, "end_time": 3, "transcript": "Good morning everyone" }
],
"reworked": [
{ "speaker": "Alice", "start_time": 0, "end_time": 1, "transcript": "Hello there" },
{ "speaker": "Bob", "start_time": 1, "end_time": 2, "transcript": "Good morning" },
{ "speaker": "Bob", "start_time": 2, "end_time": 3, "transcript": "everyone" }
]
}'Base URL
All endpoints are prefixed with /v1.
https://structural-diff-engine.onrender.comAuthentication
Include your API key in the x-api-key request header on every call to /v1/diff.
curl -H "x-api-key: YOUR_API_KEY" -H "Content-Type: application/json" \
-X POST https://structural-diff-engine.onrender.com/v1/diff -d '{...}'Rate Limits
Two independent tiers are enforced per API key, falling back to IP when no key is present. Exceeding either tier returns 429 Too Many Requests.
| Tier | Limit | Response header |
|---|---|---|
| Burst | 10 requests / minute | RateLimit-Limit |
| Window | 60 requests / 15 minutes | RateLimit-Remaining |
Endpoints
GET /v1/health
Lightweight liveness probe. No authentication required. Returns service version and uptime.
/v1/health· No auth{ "status": "ok", "version": "1.0.0", "uptime": 42, "timestamp": "..." }POST /v1/diff
Compare two arrays of transcript rows. Returns row-level results with quality scores. Max payload: 5 MB · Max rows: 30,000.
/v1/diff Auth requiredRequest Body
| Name | Type | Description |
|---|---|---|
original* | array | Row objects from the baseline / original version. |
reworked* | array | Row objects to compare against. |
config | object | Optional algorithm overrides. See . |
headers | string[] | Column names — required when using 2-D array input. |
columnMapping | object | Column index map for 2-D array input. See . |
Row object fields
All fields are optional except transcript. Unknown fields are passed through unchanged.
| Name | Type | Description |
|---|---|---|
transcript* | string | The text content of the row. |
speaker | string | Speaker name or ID. |
start_time | number|string | Segment start time in seconds. |
end_time | number|string | Segment end time in seconds. |
non_speech_events | string | Annotations such as [music], [laughter]. |
emotion | string | Emotion label. |
language | string | Language code (e.g. "en", "ar"). |
locale | string | Locale code (e.g. "en-US"). |
accent | string | Accent tag. |
file_name | string | Source file name. Pass-through only — not used by the diff algorithm. |
Response Shape
All successful responses use this envelope:
{
"status": "success",
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-04-08T21:00:00.000Z",
"data": {
"results": [
{
"status": "MODIFIED",
"originalRow": { "transcript": "Hello world", ... },
"reworkedRow": { "transcript": "Hello there", ... },
"notes": "transcript changed"
},
{
"status": "SPLIT",
"originalRow": { "transcript": "Good morning everyone", ... },
"reworkedRows": [ { "transcript": "Good morning" }, { "transcript": "everyone" } ],
"notes": "split into 2 rows"
}
],
"scores": { "overallCER": 0.12, "overallWER": 0.18, "SegER": 0.33, "transcriptCER": 0.12, "transcriptWER": 0.18, "SER": 0.05, "transcriptSER": 0.04, "SACR": null },
"composite": { "grade": 3.8, "label": "Good", "percent": "12.3" },
"meta": { "originalRows": 2, "reworkedRows": 3, "headers": [...] }
}
}Diff statuses
| Status | Meaning |
|---|---|
| UNCHANGED | Row is identical in both versions. |
| MODIFIED | Row exists in both versions but content changed. |
| ADDED | Row is only present in the reworked version. |
| DELETED | Row is only present in the original version. |
| SPLIT | One original row was divided into two or more reworked rows. |
| MERGED | Two or more original rows were combined into one reworked row. |
Scores
| Name | Type | Description |
|---|---|---|
overallCER | number | Character Error Rate across all columns (0–1, lower is better). |
overallWER | number | Word Error Rate across all columns (0–1). |
SegER | number | Segmentation Error Rate — boundary events (splits, merges, added and deleted rows) / expected segment count (0–1, lower is better). |
transcriptCER | number | CER computed on the transcript column only. |
transcriptWER | number | WER computed on the transcript column only. |
SER | number | Sentence Error Rate — MODIFIED rows / (UNCHANGED + MODIFIED). Fraction of comparable rows with any edit (0–1). |
transcriptSER | number | SER computed on sentences within the transcript column text. |
SACR | number | Speaker Attribution Change Rate — speaker-changed rows / MODIFIED rows. null when no speaker column is detected. |
Composite grade
| Name | Type | Description |
|---|---|---|
grade | number | Numeric score (1.0–5.0, higher is better) averaged across enabled metrics. |
label | string | Human-readable label — one of: "Excellent", "Good", "Acceptable", "Below Average", "Poor", "Unacceptable". |
percent | string | Average error percentage across the enabled scoring metrics. |
enabledMetrics | string[] | Array of metric names that contributed to this composite (e.g. ["CER", "Transcript CER", "WER", "Transcript WER", "SegER", "SER", "Transcript SER"]). Empty when all metrics are disabled. |
Response meta
| Name | Type | Description |
|---|---|---|
originalRows | number | Number of rows in the original array. |
reworkedRows | number | Number of rows in the reworked array. |
headers | string[] | Column header names used for this diff. |
Config Options
Pass a config object in the request body to override algorithm defaults. All fields are optional.
| Name | Type | Default | Description |
|---|---|---|---|
simpleMode | boolean | false | Disable split and merge detection. Pure row-by-row diff. |
enableSplits | boolean | true | Enable split row detection. |
enableMerges | boolean | true | Enable merge row detection. |
enableCER | boolean | true | Compute Character Error Rate. |
enableWER | boolean | true | Compute Word Error Rate. |
enableSegER | boolean | true | Compute Segmentation Error Rate (splits, merges, boundary events). |
enableSER | boolean | true | Compute Sentence Error Rate. |
stripDiacritics | boolean | true | Normalise Arabic/accented characters before comparison. |
positionalMode | boolean | false | Compare rows strictly by position, skipping alignment. |
ignoreColNames | string[] | [] | Column names excluded from MODIFIED detection. |
enableInlineDiff | boolean | true | Include transcriptDiff on MODIFIED rows. Set false to skip char-level diff and reduce response size. |
structuralTransforms | TransformRule[] | [] | Pre-comparison find/replace rules applied to both sides before similarity scoring (max 20 rules). |
enableTranscriptCER | boolean | true | Compute CER restricted to the transcript column only. Independent of enableCER. |
enableTranscriptWER | boolean | true | Compute WER restricted to the transcript column only. Independent of enableWER. |
enableTranscriptSER | boolean | true | Transcript-column sentence-level SER — counts changed sentences across MODIFIED, SPLIT, and MERGED rows. |
enableSACR | boolean | true | Compute Speaker Attribution Change Rate (speaker-changed MODIFIED rows / total MODIFIED rows). Auto-detects speaker column; returns null when none found. |
speakerColName | string | auto-detect | Override auto-detection of the speaker column. Case-insensitive match. (e.g. "spk_id"). |
enableComposite | boolean | true | Compute composite quality grade (1–5 average across enabled metrics). |
cerInComposite | boolean | true | Include overallCER in composite grade. CER is still computed and returned when false. |
werInComposite | boolean | true | Include overallWER in composite grade. WER is still computed and returned when false. |
segerInComposite | boolean | true | Include SegER in composite grade. SegER is still computed and returned when false. |
serInComposite | boolean | true | Include SER in composite grade. SER is still computed and returned when false. |
Column Mapping
When original / reworked are 2-D arrays (arrays of arrays) instead of objects, supply headers and/or columnMapping to tell the engine which index carries each field.
{
"original": [[0, 1, "Alice", "Hello world"]],
"headers": ["start_time", "end_time", "speaker", "transcript"],
"columnMapping": { "transcript": 3, "speaker": 2, "start_time": 0, "end_time": 1 }
}| Name | Type | Description |
|---|---|---|
transcript* | integer | 0-based column index of the transcript field. |
speaker | integer | 0-based column index of the speaker field. |
start_time | integer | 0-based column index of start time. |
end_time | integer | 0-based column index of end time. |
nse | integer | 0-based column index of non-speech events. |
extraCols | integer[] | Additional column indices to include (max 20). |
Error Reference
All errors use a uniform envelope:
{
"status": "error",
"requestId": "550e8400-...",
"timestamp": "2026-04-08T21:00:00.000Z",
"error": {
"code": "VALIDATION_ERROR",
"message": "Validation failed",
"details": [{ "field": "original", "message": "\"original\" is required" }]
}
}| HTTP | Code | Cause |
|---|---|---|
| 400 | BAD_REQUEST | Malformed JSON body |
| 401 | UNAUTHORIZED | Missing or invalid x-api-key header |
| 404 | NOT_FOUND | Unknown endpoint |
| 413 | PAYLOAD_TOO_LARGE | Request body exceeds 5 MB |
| 422 | VALIDATION_ERROR | Body failed schema validation (see details array) |
| 429 | RATE_LIMIT_EXCEEDED | Burst or window rate limit hit |
| 500 | INTERNAL_SERVER_ERROR | Unexpected server or engine error |
Request Tracing
Provide an x-request-id header to correlate requests across your system. Alphanumeric characters, hyphens, and underscores only, max 64 characters. The value is echoed back in the response headers.
curl -H "x-request-id: job-2026-01-batch-3" \
-H "x-api-key: YOUR_KEY" \
-X POST https://structural-diff-engine.onrender.com/v1/diff -d '{...}'Get API Access
The API is available to agencies and teams in tasting phase. Keys are provisioned individually. Reach out to receive your key and start integrating.