Architecture¶
Evaluator Hierarchy¶
All evaluators live in pathbench.evaluator as abstract base classes. The
ABC a class inherits from defines exactly what inputs score() receives.
When adding a new evaluator, pick the right ABC before writing any logic.
ABC |
|
Use for |
|---|---|---|
|
Pre-computed scores |
|
|
Audio-only metrics (CPP, SNR, …) |
|
|
ASR/FA-based metrics |
|
|
Reference comparison (NAD, ESTOI, …) |
|
|
FA-trimmed reference metrics |
|
|
Speaker-level aggregation |
|
|
Speaker-level + language (VSA) |
FA-Trimming: Decorator Pattern¶
Forced-alignment silence trimming is never baked into evaluators. Instead,
wrappers in evaluator.py handle it:
TrimmedReferenceFreeEvaluator– wraps anyReferenceFreeEvaluator, presents aReferenceTxtEvaluatorinterfaceTrimmedReferenceFreeSpeakerEvaluator– speaker-level equivalentTrimmedLanguageAwareSpeakerEvaluator– language-aware speaker-level equivalent
The trimmer is FATrimmer in pathbench/vad.py. If
trimming fails or a segment offset is specified, it falls back to plain
librosa.load().
TrimmedNADEvaluator is an exception – it
implements its own two-pass trimming logic directly, because the fallback must
be group-consistent (all references fall back together).
Dataset Format¶
Each dataset directory uses Kaldi-style plain text files:
wav.scp–utt_id -> audio_file_pathtext–utt_id -> transcriptionutt2spk–utt_id -> speaker_idsegments–utt_id -> recording_id start_time end_time(optional)spk2score–speaker_id -> float(ground truth;N/Afor unavailable)spk2gender–speaker_id -> m|flanguage– single line, two-letter code (en,nl,it,es,cmn)
Dataset loads these and iterates as
(utt_id, audio_path, transcription, ref_audio_list, start_time, end_time).
Reference audio is matched by shared transcription text and, optionally, gender.