Metric Evaluators¶
Concrete evaluator implementations for various acoustic and speech quality metrics.
CPP (Cepstral Peak Prominence)¶
- pathbench.cpp_evaluator.cpp_func(x, fs, normOpt, double_log=False)[source]¶
Computes cepstral peak prominence for a given signal
- Parameters:
x (ndarray) – The audio signal
fs (integer) – The sampling frequency
normOpt (string) – ‘line’, ‘mean’ or ‘nonorm’ for selecting normalisation type
double_log (bool) – If True, uses the legacy double-log formulation (incorrect but kept for comparison). If False (default), uses the standard CPP formulation.
- Returns:
cpp – The CPP with time values
- Return type:
ndarray
- class pathbench.cpp_evaluator.CPPEvaluator(normOpt='line')[source]¶
Bases:
ReferenceFreeEvaluatorCepstral Peak Prominence (standard formulation). Reference-free.
- class pathbench.cpp_evaluator.CPPDoubleLogEvaluator(normOpt='line')[source]¶
Bases:
CPPEvaluatorLegacy CPP evaluator using the double-log formulation (incorrect but kept for comparison).
The standard CPP is: peak(FFT(log(spectrum))) - regression_line This version uses: peak(log(FFT(log(spectrum)))) - regression_line
- class pathbench.cpp_evaluator.PraatCPPEvaluator(pitch_floor=60.0, pitch_ceiling=330.0, time_averaging_window=0.02, quefrency_averaging_window=0.0005)[source]¶
Bases:
ReferenceFreeEvaluatorCPP evaluator using Praat’s built-in PowerCepstrogram implementation via parselmouth.
This is the reference implementation used in clinical voice research. Uses Praat’s “Get CPPS” command which computes smoothed Cepstral Peak Prominence following the methodology of Hillenbrand et al. (1994).
Reference: https://www.fon.hum.uva.nl/praat/manual/PowerCepstrogram__Get_CPPS___.html
STOI / ESTOI¶
- class pathbench.reference_evaluator.STOI(reference_words, test_words, normalization_method, centroid_ind, frame_deletion=True, fs=16000)[source]¶
Bases:
object- static thirdoct(fs, N_fft, number_of_bands, mn)[source]¶
Extracts a one-thirdthird octave band representation
- Parameters:
fs – sampling frequency (Hz)
N_fft – number of bins for the FFT
number_of_bands – number of one-third octave bands, marked as J in the paper
mn
- Returns:
- align_dtw(control, test, frame_deletion, test_time)[source]¶
Aligns two TF representations together using dynamic time warping (DTW)
- Parameters:
control – control signal to align with (np.ndarray)
test – test (pathological) signal to align with (np.ndarray)
frame_deletion (
bool) – whether to delete repeated frames. My intuition is it is useful because you align two identical
length samples, and you don’t need to decide which to align to? (TODO: check) :type test_time:
bool:param test_time: i have no idea (TODO: check) :return: dtw frame paths
- class pathbench.reference_evaluator.ReferenceEvaluator(**kwargs)[source]¶
Bases:
objectDeprecated. Kept for backward compatibility. Use ReferenceAudioEvaluator instead.
- class pathbench.reference_evaluator.PSTOIEvaluator(**kwargs)[source]¶
Bases:
ReferenceAudioEvaluatorAn evaluator that uses PSTOI to compute a score.
NAD (Neural Acoustic Distance)¶
- class pathbench.nad_evaluator.NADEvaluator(model_id='facebook/wav2vec2-large', layer=10)[source]¶
Bases:
ReferenceAudioEvaluatorAn evaluator that computes the Normalized Alignment Distance (NAD) using DTW on wav2vec2 features.
- class pathbench.nad_evaluator.TrimmedNADEvaluator(model_id='facebook/wav2vec2-large', layer=10, trimmer=None)[source]¶
Bases:
ReferenceTxtAndAudioEvaluatorAn evaluator that computes the Normalized Alignment Distance (NAD) using DTW on wav2vec2 features. Falls back to untrimmed audio for the whole group if trimming or featurization fails for any member of the group.
P-ESTOI with Forced Alignment¶
- class pathbench.p_estoi_evaluator.ForcedAlignmentPESTOIEvaluator(model_id='facebook/wav2vec2-xlsr-53-espeak-cv-ft', **kwargs)[source]¶
Bases:
ReferenceEvaluatorAn evaluator that uses P-ESTOI to compute a score after trimming silence using forced alignment.
VSA (Vowel Space Area)¶
- class pathbench.vsa_evaluator.VSAEvaluator(gender=None, visualize=False)[source]¶
Bases:
LanguageAwareSpeakerEvaluatorAn evaluator that computes the Vowel Space Area (VSA) for a speaker.
The algorithm is based on: -
The vowel formant data for initialization is from: - English: Hillenbrand, J., Getty, L. A., Clark, M. J., & Wheeler, K. (1995).
Acoustic characteristics of American English vowels. JASA, 97(5), 3099-3111.
Dutch: Adank et al. We use souther Dutch formant values due to Belgian context.
Italian: Bertinetto, P. M. (?) “The sound pattern of Standard Italian, as compared
with the varieties spoken in Florence, Milan and Rome.” - Spanish: Bradlow: A comparative acoustic study of English and Spanish vowels
Mandarin: Yang: Vowel production by Mandarin speakers of English
When contributing new values please include the reference for the paper and cross-check. A future method be perhaps using controls for formant initialisation.
ASR-based Evaluators¶
- class pathbench.asr_evaluators.ASREvaluator(model_id)[source]¶
Bases:
ReferenceTxtEvaluatorComputes WER using an ASR model.
- class pathbench.asr_evaluators.PEREvaluator(language)[source]¶
Bases:
ReferenceTxtEvaluatorComputes PER using a language-specific ASR model.
- class pathbench.asr_evaluators.DirectPEREvaluator[source]¶
Bases:
ReferenceTxtEvaluatorComputes PER using the espeak-cv-ft model directly.
Articulatory Precision¶
- class pathbench.articulatory_precision_evaluator.PhoneticConfidenceEvaluator(model_id='facebook/wav2vec2-xlsr-53-espeak-cv-ft', use_exp=False)[source]¶
Bases:
ReferenceFreeEvaluatorAn evaluator that scores based on the model’s average confidence in its own greedy-decoded phoneme sequence (no reference text used).
- class pathbench.articulatory_precision_evaluator.ArticulatoryPrecisionEvaluator(model_id='facebook/wav2vec2-xlsr-53-espeak-cv-ft')[source]¶
Bases:
ReferenceTxtEvaluatorAn evaluator that uses a wav2vec 2.0 model to compute articulatory precision.
Articulatory Precision with Double ASR¶
- class pathbench.artp_double_asr_evaluator.ArtPDoubleASREvaluator(language, model_id='facebook/wav2vec2-xlsr-53-espeak-cv-ft')[source]¶
Bases:
ReferenceFreeEvaluatorAn evaluator that uses a wav2vec 2.0 model to compute articulatory precision.
F0 Range / Pitch¶
- class pathbench.f0_range_evaluator.StdPitchEvaluator[source]¶
Bases:
ReferenceFreeEvaluatorAn evaluator that computes the standard deviation of the pitch in semitones.
WADA-SNR¶
Speech Rate¶
- class pathbench.speech_rate.WpmEvaluator[source]¶
Bases:
ReferenceTxtEvaluatorAn evaluator that scores based on the speech rate (words per minute).
- class pathbench.speech_rate.PraatSpeechRateEvaluator[source]¶
Bases:
ReferenceFreeEvaluatorAn evaluator that scores based on the speech rate (syllables per second) using a Python translation of a Praat script by de Jong and Wempe.