API Structural Diff›Démonstration complète

Démonstration complète

Une vraie transcription. Une vraie post-édition. Un vrai diff.

Cette démonstration utilise une véritable transcription de podcast avec des noms de colonnes non standard pour illustrer la boucle d'intégration complète : adaptation du format de données, appel de l'API et interprétation de chaque champ de la réponse. Le cas d'utilisation est un coordinateur QA auditant le travail de post-édition d'un annotateur avant d'accepter un lot.

Le scénario

Les projets d'annotation IA pour la transcription et la traduction ont typiquement plusieurs couches QA. Chaque couche est un point de transfert où un humain révise ou édite la sortie de la couche précédente. Le coordinateur a besoin d'un enregistrement objectif de ce qui a changé à chaque transfert.

Couche 0 — Sortie IA

La plateforme exporte une transcription générée par IA. C'est votre tableau original.

Couche 1 — Édition annotateur

Un annotateur post-édite la transcription selon le guide de style du projet (ponctuation, formatage des chiffres, corrections verbatim, IDs de locuteurs, timestamps). C'est votre tableau reworked.

Couche 2 — Coordinateur QA

Le coordinateur appelle POST /v1/diff avec les deux tableaux. Le rapport diff est joint au lot avant la remise au client.

Dans cette démonstration, l'original est une transcription de podcast de 9 lignes sur la traduction automatique (EP101). La version retravaillée reflète des corrections typiques d'annotateur : standardisation de la ponctuation, formatage des chiffres, une division structurelle et une fusion structurelle.

Les données

La transcription de démonstration utilise les noms de colonnes exportés par une plateforme d'annotation typique. Ceux-ci ne correspondent pas aux noms de champs standard de l'API :

text

Platform column   API field
─────────────────────────────
Talker          → speaker
BeginTime       → start_time
FinishTime      → end_time
Utterance       → transcript
NoiseTag        → non_speech_events
Mood            → emotion
Show            (passed through unchanged)
Subject         (passed through unchanged)

L'API a besoin de transcript et optionnellement speaker, start_time, end_time pour son algorithme d'alignement. Les champs inconnus (Show, Subject) sont transmis sans modification.

Adapter les noms de colonnes

Remappage de clés unique avant l'appel API. Vous devez seulement mapper les champs utilisés par le moteur ; les autres passent automatiquement.

JavaScript

const KEY_MAP = {
  Talker:    'speaker',
  BeginTime: 'start_time',
  FinishTime:'end_time',
  Utterance: 'transcript',
  NoiseTag:  'non_speech_events',
  Mood:      'emotion',
}

const adapt = row =>
  Object.fromEntries(
    Object.entries(row).map(([k, v]) => [KEY_MAP[k] ?? k, v])
  )

const originalAdapted = originalData.map(adapt)
const reworkedAdapted = reworkedData.map(adapt)

Python

python

KEY_MAP = {
    "Talker":    "speaker",
    "BeginTime": "start_time",
    "FinishTime": "end_time",
    "Utterance": "transcript",
    "NoiseTag":  "non_speech_events",
    "Mood":      "emotion",
}

def adapt(row):
    return {KEY_MAP.get(k, k): v for k, v in row.items()}

original_adapted = [adapt(r) for r in original_data]
reworked_adapted = [adapt(r) for r in reworked_data]

La requête

Après adaptation des noms de colonnes, le corps complet de la requête POST ressemble à ceci (tronqué pour la lisibilité) :

Original array (9 rows — AI output)

json

[
  { "speaker": "Sarah Mitchell", "start_time": 0.00,  "end_time": 4.20,  "transcript": "Welcome to The Language Lab, the podcast where we break down how AI is changing the way we communicate.",                          "non_speech_events": "[intro jingle]", "emotion": "warm"       },
  { "speaker": "Sarah Mitchell", "start_time": 4.20,  "end_time": 8.00,  "transcript": "Today we have two fantastic guests joining us to talk about machine translation and quality assurance.",                             "non_speech_events": "",               "emotion": "enthusiastic" },
  { "speaker": "James Park",     "start_time": 8.00,  "end_time": 11.50, "transcript": "Thanks Sarah glad to be here I have been looking forward to this conversation for weeks.",                                           "non_speech_events": "",               "emotion": "friendly"   },
  { "speaker": "Elena Rossi",    "start_time": 11.50, "end_time": 15.00, "transcript": "Same here this is such an important topic right now especially with how fast the field is evolving.",                                 "non_speech_events": "",               "emotion": "engaged"    },
  { "speaker": "Sarah Mitchell", "start_time": 15.00, "end_time": 19.80, "transcript": "James lets start with you. Your team recently published a paper on neural machine translation for low resource languages.",           "non_speech_events": "",               "emotion": "curious"    },
  { "speaker": "James Park",     "start_time": 19.80, "end_time": 26.50, "transcript": "Yes so our main finding was that back translation combined with careful data augmentation can boost BLEU scores by up to twelve points for languages with under fifty thousand parallel sentences.", "non_speech_events": "", "emotion": "analytical" },
  { "speaker": "James Park",     "start_time": 26.50, "end_time": 30.00, "transcript": "The trick is selecting the right seed data and not just throwing everything at the model.",                                          "non_speech_events": "",               "emotion": "technical"  },
  { "speaker": "Elena Rossi",    "start_time": 30.00, "end_time": 34.50, "transcript": "That resonates with our work at the localization lab where we focus on Arabic dialect adaptation.",                                   "non_speech_events": "",               "emotion": "thoughtful" },
  { "speaker": "Elena Rossi",    "start_time": 34.50, "end_time": 39.00, "transcript": "Standard Arabic models completely fail when you feed them Tunisian or Moroccan dialect input.",                                       "non_speech_events": "",               "emotion": "concerned"  }
]

Reworked array (9 rows — annotator post-edit)

json

[
  { "speaker": "Sarah Mitchell", "start_time": 0.00,  "end_time": 4.20,  "transcript": "Welcome to The Language Lab, the podcast where we break down how AI is changing the way we communicate.",                 "non_speech_events": "[intro jingle]", "emotion": "warm"        },
  { "speaker": "Sarah Mitchell", "start_time": 4.20,  "end_time": 8.00,  "transcript": "Today, we have two fantastic guests joining us to talk about machine translation and quality assurance.",                  "non_speech_events": "",               "emotion": "enthusiastic" },
  { "speaker": "James Park",     "start_time": 8.00,  "end_time": 11.50, "transcript": "Thanks, Sarah. Glad to be here — I've been looking forward to this conversation for weeks.",                              "non_speech_events": "",               "emotion": "friendly"    },
  { "speaker": "Elena Rossi",    "start_time": 11.50, "end_time": 15.00, "transcript": "Same here. This is such an important topic right now, especially with how fast the field is evolving.",                    "non_speech_events": "",               "emotion": "engaged"     },
  { "speaker": "Sarah Mitchell", "start_time": 15.00, "end_time": 19.80, "transcript": "James, let's start with you. Your team recently published a paper on neural machine translation for low-resource languages.", "non_speech_events": "",               "emotion": "curious"     },
  { "speaker": "James Park",     "start_time": 19.80, "end_time": 23.20, "transcript": "Yes, so our main finding was that back-translation combined with careful data augmentation can boost BLEU scores by up to 12 points.", "non_speech_events": "", "emotion": "analytical"  },
  { "speaker": "James Park",     "start_time": 23.20, "end_time": 26.50, "transcript": "This holds for languages with under 50,000 parallel sentences.",                                                           "non_speech_events": "",               "emotion": "analytical"  },
  { "speaker": "James Park",     "start_time": 26.50, "end_time": 30.00, "transcript": "The trick is selecting the right seed data and not just throwing everything at the model.",                                "non_speech_events": "",               "emotion": "technical"   },
  { "speaker": "Elena Rossi",    "start_time": 30.00, "end_time": 39.00, "transcript": "That resonates with our work at the localization lab where we focus on Arabic dialect adaptation — standard Arabic models completely fail when you feed them Tunisian or Moroccan dialect input.", "non_speech_events": "", "emotion": "thoughtful" }
]

curl -X POST https://structural-diff-engine.onrender.com/v1/diff \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "x-request-id: batch-ep101-layer1-layer2-qa" \
  -d '{
    "original": <originalAdapted array>,
    "reworked": <reworkedAdapted array>
  }'

Pour le jeu de données complet (9 lignes originales, 9 lignes retravaillées après division et fusion), le corps fait environ 8 Ko — bien en dessous de la limite de 5 Mo. Incluez x-request-id pour corréler cet appel avec l'ID du lot dans votre système.

Parcourir la réponse

Le tableau results a une entrée par ligne originale, plus des lignes source pour tout merge. Voici ce que retourne chaque ligne et pourquoi :

json

{
  "status": "success",
  "requestId": "batch-ep101-layer1-layer2-qa",
  "data": {
    "results": [
      { "status": "UNCHANGED", "notes": "exact match",          ... },  // row 0 — Sarah intro
      { "status": "MODIFIED",  "notes": "transcript changed",   ... },  // row 1 — comma added
      { "status": "MODIFIED",  "notes": "transcript changed",   ... },  // row 2 — James punctuation
      { "status": "MODIFIED",  "notes": "transcript changed",   ... },  // row 3 — Elena punctuation
      { "status": "MODIFIED",  "notes": "transcript changed",   ... },  // row 4 — Sarah question
      { "status": "SPLIT",     "notes": "split into 2 rows",    ... },  // row 5 — James finding (split)
      { "status": "UNCHANGED", "notes": "exact match",          ... },  // row 6 — James trick
      { "status": "MERGED",    "notes": "merged from 2 rows",   ... },  // rows 7+8 — Elena merged
      { "status": "MERGED",    "notes": "Source row 1/2 ...",   ... },  // ← trace entry, skip in counts
      { "status": "MERGED",    "notes": "Source row 2/2 ...",   ... }   // ← trace entry, skip in counts
    ],
    "scores": {
      "CER": 0.09,
      "WER": 0.14,
      "SegER": 0.22,
      "SER": 0.44,
      "cerT": 0.09,
      "werT": 0.14
    },
    "composite": {
      "score": 3.9,
      "grade": "B",
      "label": "Good"
    },
    "meta": {
      "originalRows": 9,
      "reworkedRows": 9
    }
  }
}

UNCHANGED

Ligne 0 — Intro de Sarah

Correspondance exacte. L'annotateur n'a pas touché cette ligne.

MODIFIED

Ligne 1 — "Today we have..."

Annotateur a ajouté une virgule après "Today". transcriptDiff met en évidence l'insertion.

MODIFIED

Ligne 2 — Remerciements de James

Multiples corrections de ponctuation. Structure de phrase entière éditée pour le style verbatim.

MODIFIED

Ligne 3 — "Same here" d'Elena

Ajout d'un point après "Same here" et d'une virgule avant "especially". Norme de ponctuation verbatim classique.

MODIFIED

Ligne 4 — Question de Sarah

Virgule après "James", correction d'apostrophe dans "let's", trait d'union dans "low-resource". Trois corrections distinctes.

SPLIT

Ligne 5 — Long résultat de James

Original : 46 mots en une seule phrase. Retravaillé : divisé en deux segments à une frontière naturelle. "twelve" → "12" corrigé aussi. SER incrémenté.

UNCHANGED

Ligne 6 — "The trick" de James

Déjà bien formé. Annotateur l'a laissé intact.

MERGED

Lignes 7+8 — Deux déclarations d'Elena

Annotateur a fusionné deux lignes Elena consécutives en un segment avec un em-dash. Les deux lignes originales sont absorbées.

Lire les scores

L'objet scores donne un résumé quantitatif du lot. Voici comment interpréter les chiffres dans ce contexte de démonstration :

CER≈ 0.09

CER ≈ 0.09 — Environ 9% des caractères ont changé. C'est faible et attendu : corrections de ponctuation et de chiffres dans de longs segments.

WER≈ 0.14

WER ≈ 0.14 — Environ 14% des mots ont changé. Plus élevé que CER car un échange ("twelve" → "12") compte comme un changement de mot entier.

SegER≈ 0.22

SegER ≈ 0.22 — Environ 22% des lignes originales ont eu des événements structurels (1 split + 1 merge sur 9 = 2/9). Typique pour un premier passage d'annotation.

SER≈ 0.67

SER ≈ 0.67 — Environ 67% des lignes comparables (UNCHANGED + MODIFIED) contiennent au moins une modification : 4 MODIFIED sur 6 lignes comparables. Mesure la fréquence d'édition par ligne, indépendamment des événements structurels.

CompositeB / Good (3.9)

Note composite B / "Good" (score ≈ 3.9) — La formule pondérée récompense un faible taux d'édition et pénalise les changements structurels. Une note B indique que l'annotateur a fait de vraies corrections, mais le volume global est contrôlé.

Interprétation des notes en contexte QA : A = sortie IA quasi-parfaite. B = bon passage QA, corrections attendues. C = post-édition significative requise. D/F = le lot peut nécessiter une ré-annotation complète.

Démonstration complète · API Structural Diff · Développé par Mohamed Yaakoubi

Politique de confidentialité Conditions d'utilisation ← Retour à l'API Structural Diff