Structural Diff APIConfig Parameters

Config Parameters

Know exactly which flag to flip and why.

The default config works well for most transcripts. These parameters exist to handle specific annotation workflows: Arabic QA, positional-only comparison, metadata column exclusion, and structural detection control. Each section below shows the exact input/output difference a flag produces.

When to customize the config

Start with no config. Run a diff and inspect the results. Only reach for a config flag when you see a specific problem:

stripDiacriticsArabic transcripts where diacritic additions inflate MODIFIED count
simpleModePure content QA — you know the annotator made no structural changes
ignoreColNamesMetadata columns (confidence score, category) differ between QA layers but aren't the comparison target
positionalModeDebugging unexpected alignments, or processing very large uniform datasets
enableSplits: falseProject guidelines prohibit splits at this annotation layer
enableInlineDiff: falseLarge batches where only statuses and scores are needed — suppress transcript diff computation for speed
enableCER / enableWERSpeed optimization for large batches — skip Levenshtein computation when only structural or sentence-level metrics are needed
enableComposite: falseSuppress aggregate grade when your system consumes individual metric values directly
structuralTransformsRows have ID prefixes, URLs, or phone formats that vary between layers but aren't part of the transcript content

simpleMode

By default the engine runs an 8-pass alignment algorithm that matches rows by similarity across the full transcript, even if they moved positions. simpleMode disables this: row 0 is compared to row 0, row 1 to row 1, strictly by position.

Default (simpleMode: false): the engine detects that one long segment was split into two and labels it SPLIT.

Original

{
  "original": [
    { "speaker": "Candidate", "words": "For new users we relied on content-based filtering. For new items we used metadata clustering to find similar items." }
  ],
  "reworked": [
    { "speaker": "Candidate", "words": "For new users, we relied on content-based filtering." },
    { "speaker": "Candidate", "words": "For new items, we used metadata clustering to find similar items." }
  ]
}

Reworked

/* config: {} (default) */

API Result

json
{
  "results": [
    {
      "status": "SPLIT",
      "notes": "split into 2 rows",
      "originalRow": { "words": "For new users we relied on content-based filtering..." },
      "reworkedRows": [
        { "words": "For new users, we relied on content-based filtering." },
        { "words": "For new items, we used metadata clustering..." }
      ]
    }
  ]
}

With simpleMode: true: the engine compares row 0 to row 0 (finds a mismatch → MODIFIED) and sees an extra row in reworked (→ ADDED). The structural intent is lost, but every character change is visible.

json
{
  "results": [
    {
      "status": "MODIFIED",
      "notes": "words changed",
      "snapData": ["Candidate", "For new users we relied on content-based filtering..."],
      "currData": ["Candidate", "For new users, we relied on content-based filtering."],
      "transcriptDiff": [
        { "type": "equal",  "value": "For new users" },
        { "type": "insert", "value": "," },
        { "type": "equal",  "value": " we relied on content-based filtering." }
      ]
    },
    {
      "status": "ADDED",
      "notes": "new row in reworked",
      "currData": ["Candidate", "For new items, we used metadata clustering..."]
    }
  ]
}

Use when you're confident the annotator made zero structural changes — only text corrections and punctuation. Also useful when you want raw character diffs without any structural interpretation.

simpleMode is faster on very large datasets because it skips alignment. The trade-off is false MODIFIED/ADDED/DELETED counts where SPLIT/MERGED would be more accurate.

enableSplits / enableMerges

Finer-grained alternatives to simpleMode. Instead of disabling all structural detection, you can disable only one type.

SPLIT

enableSplits: false — SPLIT candidates are instead emitted as MODIFIED (truncated match) + ADDED (leftover rows). Use when your annotation guidelines at this layer prohibit splits, so surfacing them as individual changes is more actionable.

MERGED

enableMerges: false — MERGE candidates become MODIFIED (first original row) + DELETED (absorbed originals). Use when merges are not permitted at this layer and you want each deleted row flagged explicitly.

json
{
  "config": {
    "enableSplits": false,
    "enableMerges": true
  }
}

These flags are most useful in multi-layer QA pipelines where each layer has its own permitted operations. Disabling an operation you don't expect to see makes unexpected structural changes surface as distinct ADDED/DELETED flags instead of being silently grouped.

stripDiacritics

Before comparison, the engine normalises Arabic and accented characters by stripping diacritical marks. For Arabic this includes harakat (short vowels: fathah, dammah, kasrah), tanwin, shadda, sukun, and hamza variants (U+064B–U+065F, U+0670). For Latin text it strips combining accent characters (U+0300–U+036F). This flag is ON by default.

Common Arabic QA scenario: an annotator normalises the text per written Arabic style guides (adding harakat, normalising hamza). With the default (stripDiacritics: true), only lexical and segmentation differences are counted. Override to false when diacritical accuracy is itself a QA criterion.

Default behavior (stripDiacritics: true — no config needed): مرحبا → مرحباً is UNCHANGED because diacritical marks are stripped before comparison, making the stripped forms identical.

Original

{
  "original": [{ "speaker": "المذيع", "transcript": "مرحبا بكم في نشرة الاخبار" }],
  "reworked": [{ "speaker": "المذيع", "transcript": "مرحباً بكم في نشرة الأخبار" }]
}

Reworked

/* config: {} (default — stripDiacritics: true) */

API Result

json
{ "status": "UNCHANGED", "notes": "high similarity match (diacritics stripped)" }

With stripDiacritics: false (override): مرحبا → مرحباً is MODIFIED because the ً mark is no longer stripped — raw character differences are flagged.

json
{ "status": "MODIFIED", "notes": "transcript changed",
  "transcriptDiff": [
    { "type": "EQUAL",  "text": "مرحب" },
    { "type": "DELETE", "text": "ا" },
    { "type": "INSERT", "text": "اً" },
    { "type": "EQUAL",  "text": " بكم في نشرة ال" },
    { "type": "DELETE", "text": "ا" },
    { "type": "INSERT", "text": "أ" },
    { "type": "EQUAL",  "text": "خبار" }
  ]
}
json
{ "config": { "stripDiacritics": false } }

The default (true) works for most Arabic transcript QA. Override with stripDiacritics: false only when you are explicitly verifying that an annotator correctly added or removed diacritical marks — i.e., when diacritical precision is a tracked quality criterion.

positionalMode

Skips the similarity-based alignment algorithm entirely. Each original row at index N is compared to the reworked row at index N. If the arrays are different lengths, extra rows are ADDED or DELETED.

Default: if an annotator corrected a sentence and it moved from position 4 to position 6, the engine will still match them (MODIFIED). With positionalMode, row 4 in original is compared to row 4 in reworked — which may be a completely different sentence — producing a confusing MODIFIED with a large diff.

positionalMode produces misleading results when rows have been reordered. Only use it when you can guarantee the annotator did not add, remove, or reorder any rows.

Use for debugging: run positionalMode and compare it to default results to understand which rows the alignment matched. Also useful for very uniform datasets (e.g., word-by-word alignment ground truth) where positional matching is the ground truth.

json
{ "config": { "positionalMode": true } }

ignoreColNames

An array of column names to exclude from MODIFIED detection. A row is only MODIFIED if a non-ignored column changed. The ignored columns are still included in the response (snapData / currData) but do not trigger MODIFIED status.

Scenario: your data has a confidence column set by the annotation tool. QA Layer 1 might record confidence: 0.88 while QA Layer 2 records confidence: 0.91 for the same utterance. Without ignoreColNames, every such row is MODIFIED even if the transcript is identical. With ignoreColNames: ["confidence"], those rows are UNCHANGED as expected.

Without ignoreColNames

Original

{
  "original": [
    { "transcript": "The patient reports mild chest pain.", "speaker": "Doctor", "confidence": 0.88, "category": "symptom" }
  ],
  "reworked": [
    { "transcript": "The patient reports mild chest pain.", "speaker": "Doctor", "confidence": 0.94, "category": "complaint" }
  ]
}

Reworked

/* config: {} */

API Result

json
{ "status": "MODIFIED", "notes": "confidence, category changed" }

With ignoreColNames

json
{
  // request: { "config": { "ignoreColNames": ["confidence", "category"] } }
  "status": "UNCHANGED", "notes": "exact match (after ignoring confidence, category)"
}

Use whenever your schema includes metadata columns that change independently of transcript content: confidence scores, reviewer IDs, batch numbers, internal category tags, auto-generated timestamps.

enableInlineDiff

Controls whether the engine computes a character-level inline diff for MODIFIED rows. When enabled (default), each MODIFIED row in the response includes a transcriptDiff array that you can use to render highlighted changes in your review UI. Disabling it skips the diff computation entirely.

With enableInlineDiff: false, MODIFIED rows still appear in results (status and notes are unchanged), but the transcriptDiff field is absent. Use this when you only need status counts and scores and want to reduce response payload size.

json
{ "config": { "enableInlineDiff": false } }

Each transcriptDiff segment has the shape { type: "EQUAL" | "INSERT" | "DELETE", text: string }. Reconstruct the original by joining all non-INSERT spans; reconstruct the reworked by joining all non-DELETE spans. Note: type values are UPPERCASE.

json
// transcriptDiff format — type is UPPERCASE, field is "text"
[
  { "type": "EQUAL",  "text": "Hello " },
  { "type": "DELETE", "text": "world" },
  { "type": "INSERT", "text": "there" }
]

Disable (enableInlineDiff: false) when processing large batches where you only need CER/WER/SegER/SER scores and status counts, not the per-character diff. This reduces both server CPU and network payload. Re-enable for interactive review UIs where editors need to see exactly what changed.

The diff uses LCS (Longest Common Subsequence). For very long segments (combined original + reworked length > CHAR_DIFF_LIMIT), it automatically falls back from character-level to word-level tokens — still returned as the same array format.

Scoring flags

Six boolean flags control which metrics the engine computes and whether a composite grade is returned. All default to true. Disabling a flag skips its entire computation loop — the field is null in the response, not 0.

FlagDefaultWhat it measures · When to disable
enableCERtrue
Character Error Rate across all columns (overallCER). Computed via Levenshtein on serialized row strings. Use enableTranscriptCER to control the transcript-column-only variant independently.
When you only need structural or sentence-level metrics. Levenshtein is O(m×n) — skipping CER + WER on large batches (5 000+ long segments) produces a measurable latency reduction.
enableTranscriptCERtrue
Computes transcriptCER — CER restricted to the transcript column only. Independent of enableCER: you can disable enableCER (suppress overallCER) while keeping transcriptCER, or vice versa.
When you need overallCER but not the transcript-column breakdown, or vice versa.
enableWERtrue
Word Error Rate across all columns (overallWER). Tokenizes on whitespace after stripping punctuation. Use enableTranscriptWER to control the transcript-column-only variant independently.
Same conditions as enableCER. Typically disabled together with it.
enableTranscriptWERtrue
Computes transcriptWER — WER restricted to the transcript column only. Independent of enableWER.
When you need overallWER but not the transcript-column breakdown, or vice versa.
enableSegERtrue
Segmentation Error Rate: boundary events (splits + merges + added + deleted rows) / expected segment count. The structural quality signal — independent of text content.
Only when your pipeline cares purely about lexical changes and structural events are irrelevant — uncommon in transcript QA.
enableSERtrue
Sentence Error Rate: MODIFIED rows / (UNCHANGED + MODIFIED rows). Fraction of comparable rows with any edit. Returns null — not 0 — when the denominator is 0 (all rows are structural events with no comparable pairs).
When you only need character/word error rates and a sentence-level signal adds no value.
enableTranscriptSERtrue
Sentence-level SER within the transcript column text. Splits transcript content into sentences (on . ! ? ؟ and newlines) and counts changed sentences across MODIFIED, MERGED, and SPLIT rows. Analogous to transcriptWER but at the sentence level rather than word level — measures sentence churn within the document, not row-level edits.
When row-level SER (enableSER) is sufficient and paragraph-level sentence granularity adds no diagnostic value.
enableSACRtrue
Speaker Attribution Change Rate: rows where the speaker column changed / MODIFIED rows. Auto-detects columns named speaker, talker, or spk. Returns null automatically when no speaker column is found — enabling this on speakerless datasets costs nothing.
Only when you want to explicitly suppress the SACR field from the response regardless of whether a speaker column is present.
enableCompositetrue
Averages the per-metric grades (1–5 scale) of enabled metrics that returned a non-null value. CER, Transcript CER, WER, Transcript WER, SegER, SER, and Transcript SER contribute — SACR is excluded because many datasets lack a speaker column, making SACR-inclusive composites incomparable across batches.
When your system reads individual metric fields directly and does not display an aggregate grade.
cerInCompositetrue
Include CER in the composite grade calculation. When false, CER is still computed and returned in the response but does not affect the composite score.
When CER is a diagnostic-only metric and you want the composite to reflect WER/SegER/SER only.
werInCompositetrue
Include WER in the composite grade calculation. When false, WER is still computed but excluded from the composite average.
When WER is tracked for reference but should not penalize the composite (e.g., very short segments where word count is unreliable).
segerInCompositetrue
Include SegER in the composite grade calculation. When false, SegER is computed but excluded from the composite average.
When segmentation quality is a separate concern reviewed independently and should not drag down the overall composite grade.
serInCompositetrue
Include SER in the composite grade calculation. When false, SER is still computed but used as a standalone diagnostic without affecting the composite.
When sentence-level error rate is a diagnostic signal only and you want CER/WER/SegER to determine the composite.
transcriptCerInCompositetrue
Include Transcript CER in the composite grade calculation. When false, transcriptCER is still computed and returned but does not affect the composite score.
When transcript-column CER is a reference metric only and you want the composite to reflect the other metrics.
transcriptWerInCompositetrue
Include Transcript WER in the composite grade calculation. When false, transcriptWER is still computed but excluded from the composite average.
When transcript-column WER is tracked for reference but should not influence the composite grade.
transcriptSerInCompositetrue
Include Transcript SER in the composite grade calculation. When false, transcriptSER is computed but used as a standalone sentence-level diagnostic without affecting the composite.
When paragraph-level sentence churn is a secondary diagnostic signal and you want CER/WER/SegER/SER to determine the composite.
json
{
  "config": {
    "enableCER": false,
    "enableWER": false,
    "enableSACR": false
  }
}
The composite is dynamic: disable WER or set werInComposite: false and the average re-normalizes over the remaining contributing metrics. Use cerInComposite, werInComposite, segerInComposite, serInComposite, transcriptCerInComposite, transcriptWerInComposite, and transcriptSerInComposite to track a metric without letting it influence the final grade. The response includes an enabledMetrics array listing exactly which metrics contributed.
SACR does not need manual opt-out for speakerless datasets. If no column matches speaker, talker, or spk (case-insensitive), SACR is null regardless of the flag. The flag exists only to suppress the field when you want to exclude speaker data from the response entirely. To override auto-detection and specify the speaker column explicitly, pass speakerColName: "<column-name>" in the config object (case-insensitive match, max 100 characters).
SACR only counts MODIFIED rows — the only case where the engine can directly compare original vs. reworked speaker. If a reworker both splits a segment and relabels the speaker at the same time (e.g., one AI-annotated row becomes two rows with a different speaker), the engine classifies this as MODIFIED + ADDED rather than a single split event. The relabeled ADDED row is invisible to SACR. SACR is reliable for direct speaker corrections; it will under-count when speaker relabeling is bundled with resegmentation.
The engine automatically detects and skips header rows in scoring. If a row's every cell value equals its column name (e.g., the first data row contains "transcript", "start", "end" matching the schema), it is counted as UNCHANGED but excluded from all metric denominators (CER, WER, SER, transcriptSER). This prevents header-row inclusion from silently deflating error rates when callers pass raw parsed data without slicing off the header.

structuralTransforms

An array of find/replace rules applied to the transcript text BEFORE the similarity scoring algorithm runs. This lets the engine align rows that differ only in predictable, non-content prefixes or formats (e.g., ID tags, URL prefixes, phone number formats).

Each rule: { find: string, replace: string, isRegex: boolean }. Plain string rules do a literal find-replace. Regex rules (isRegex: true) support standard JavaScript regex syntax (case-insensitive). Up to 20 rules per request.

json
{
  "config": {
    "structuralTransforms": [
      { "find": "^ID-\\d+:\\s*", "replace": "", "isRegex": true },
      { "find": "https?://[^\\s]+",  "replace": "[URL]", "isRegex": true }
    ]
  }
}
Transforms apply to SIMILARITY SCORING only — not to the cell data returned in snapData / currData. A row where only the ID prefix changed ("ID-001: Hello" vs "ID-002: Hello") will still show as MODIFIED because the raw transcript content differs. The transforms ensure the rows are correctly ALIGNED (not misidentified as ADDED+DELETED), but the column diff can still flag the prefix change.

Use when your original and reworked data share a common schema but rows include auto-generated IDs, batch prefixes, or formatting that the annotator changed as part of their work. Without transforms, the alignment algorithm treats rows with different prefixes as entirely different — potentially producing false ADDED/DELETED pairs instead of MODIFIED.

Expert similarity & timing thresholds

These seven numbers control the matching algorithm's sensitivity. The defaults are tuned for standard-length transcription segments (5–30 seconds, 10–60 words). Adjust them only when you've looked at the raw similarity scores and know the default thresholds produce wrong matches.

ParameterDefaultWhen to adjust · Effect
SIM_CONFIDENT
number (0–1)
0.70
Two rows this similar or closer are a definite match — committed in the high-similarity pass.
Raise to require very close text matches before committing. Lower if you have very short utterances that can't achieve high similarity.
SIM_MODERATE
number (0–1)
0.40
Plausible match — accepted when timing also confirms.
Lower if annotators rewrite sentences significantly while keeping the same meaning.
SIM_WEAK
number (0–1)
0.20
Tentative match — only accepted with very strong timing evidence.
Lower to 0.10–0.15 for very short segments (single words, disfluencies) that can't achieve 0.20 similarity.
TIME_EXACT_TOL
number (s)
0.05
Timestamps ≤ this apart count as exact match.
Increase to 0.5–1.0 if your annotation tool rounds timestamps to whole seconds.
TIME_FUZZY_TOL
number (s)
2.5
Timestamps ≤ this apart count as fuzzy match.
Increase when annotators shift segment boundaries significantly.
SPLIT_COMBINED_MIN
number (0–1)
0.35
Min combined text score to accept a SPLIT detection.
Raise to reduce false splits. Lower if your content has very short target segments.
MERGE_COMBINED_MIN
number (0–1)
0.65
Min combined text score to accept a MERGE detection.
Raise to reduce false merges. Lower for datasets with many legitimate merges.
CHAR_DIFF_LIMIT
integer (100–50000)
1500
Max combined character length before falling back to word-level diff.
Increase for batches with very long segments (300-word utterances). Decrease to force word-level diffs for all segments and save CPU on massive batches.
json
{
  "config": {
    "SIM_WEAK": 0.15,
    "TIME_EXACT_TOL": 1.0,
    "SPLIT_COMBINED_MIN": 0.70
  }
}
Config Parameters Guide · Structural Diff API · Built by Mohamed Yaakoubi