Metric Evaluators¶

Concrete evaluator implementations for various acoustic and speech quality metrics.

CPP (Cepstral Peak Prominence)¶

pathbench.cpp_evaluator.cpp_func(x, fs, normOpt, double_log=False)[source]¶

Computes cepstral peak prominence for a given signal

Parameters:

x (ndarray) – The audio signal
fs (integer) – The sampling frequency
normOpt (string) – ‘line’, ‘mean’ or ‘nonorm’ for selecting normalisation type
double_log (bool) – If True, uses the legacy double-log formulation (incorrect but kept for comparison). If False (default), uses the standard CPP formulation.

Returns:

cpp – The CPP with time values

Return type:

ndarray

class pathbench.cpp_evaluator.CPPEvaluator(normOpt='line')[source]¶

Bases: ReferenceFreeEvaluator

Cepstral Peak Prominence (standard formulation). Reference-free.

score(utterance_id, audio_path, start_time=0.0, end_time=-1.0)[source]¶

Return type:: Optional[float]

class pathbench.cpp_evaluator.CPPDoubleLogEvaluator(normOpt='line')[source]¶

Bases: CPPEvaluator

Legacy CPP evaluator using the double-log formulation (incorrect but kept for comparison).

The standard CPP is: peak(FFT(log(spectrum))) - regression_line This version uses: peak(log(FFT(log(spectrum)))) - regression_line

class pathbench.cpp_evaluator.PraatCPPEvaluator(pitch_floor=60.0, pitch_ceiling=330.0, time_averaging_window=0.02, quefrency_averaging_window=0.0005)[source]¶

Bases: ReferenceFreeEvaluator

CPP evaluator using Praat’s built-in PowerCepstrogram implementation via parselmouth.

This is the reference implementation used in clinical voice research. Uses Praat’s “Get CPPS” command which computes smoothed Cepstral Peak Prominence following the methodology of Hillenbrand et al. (1994).

Reference: https://www.fon.hum.uva.nl/praat/manual/PowerCepstrogram__Get_CPPS___.html

score(utterance_id, audio_path, start_time=0.0, end_time=-1.0)[source]¶

Return type:: Optional[float]

STOI / ESTOI¶

class pathbench.reference_evaluator.STOI(reference_words, test_words, normalization_method, centroid_ind, frame_deletion=True, fs=16000)[source]¶

Bases: object

static thirdoct(fs, N_fft, number_of_bands, mn)[source]¶

Extracts a one-thirdthird octave band representation

Parameters:

fs – sampling frequency (Hz)
N_fft – number of bins for the FFT
number_of_bands – number of one-third octave bands, marked as J in the paper
mn

Returns:

static difference_oct(X, Y)[source]¶

log_octave_transform_extractor(word_set)[source]¶

align_dtw(control, test, frame_deletion, test_time)[source]¶

Aligns two TF representations together using dynamic time warping (DTW)

Parameters:

control – control signal to align with (np.ndarray)
test – test (pathological) signal to align with (np.ndarray)
frame_deletion (bool) – whether to delete repeated frames. My intuition is it is useful because you align two identical

length samples, and you don’t need to decide which to align to? (TODO: check) :type test_time: bool :param test_time: i have no idea (TODO: check) :return: dtw frame paths

ref_create()[source]¶

Creates the global reference signal for the comparison based on the reference signal which should contain common word/utterance NOTE: global reference is not exactly the same as centroid. Centroid is the one that’s used for creating the global reference.

Returns:

stoi_calculation(N, X, Y, frame_shift, subject_id)[source]¶

estoi_calculation(N, X, Y, frame_shift, subject_id)[source]¶

STOI_value()[source]¶

class pathbench.reference_evaluator.ReferenceEvaluator(**kwargs)[source]¶

Bases: object

Deprecated. Kept for backward compatibility. Use ReferenceAudioEvaluator instead.

class pathbench.reference_evaluator.PSTOIEvaluator(**kwargs)[source]¶

Bases: ReferenceAudioEvaluator

An evaluator that uses PSTOI to compute a score.

score(utterance_id, audio_path, reference_audios, start_time=0.0, end_time=-1.0)[source]¶

Computes the PSTOI score.

Return type:: Optional[float]

class pathbench.reference_evaluator.ESTOIEvaluator(**kwargs)[source]¶

Bases: ReferenceAudioEvaluator

An evaluator that uses P-ESTOI to compute a score.

score(utterance_id, audio_path, reference_audios, start_time=0.0, end_time=-1.0)[source]¶

Computes the P-ESTOI score.

Return type:: Optional[float]

NAD (Neural Acoustic Distance)¶

class pathbench.nad_evaluator.NADEvaluator(model_id='facebook/wav2vec2-large', layer=10)[source]¶

Bases: ReferenceAudioEvaluator

An evaluator that computes the Normalized Alignment Distance (NAD) using DTW on wav2vec2 features.

score(utterance_id, audio_path, reference_audios, start_time=0.0, end_time=-1.0)[source]¶

Computes the average DTW distance between test and reference audio.

Return type:: Optional[float]

class pathbench.nad_evaluator.TrimmedNADEvaluator(model_id='facebook/wav2vec2-large', layer=10, trimmer=None)[source]¶

Bases: ReferenceTxtAndAudioEvaluator

An evaluator that computes the Normalized Alignment Distance (NAD) using DTW on wav2vec2 features. Falls back to untrimmed audio for the whole group if trimming or featurization fails for any member of the group.

score(utterance_id, audio_path, transcription, language, reference_audios, start_time=0.0, end_time=-1.0)[source]¶

Computes the average DTW distance. If trimming/featurizing fails for any audio in a group (test or any reference), it falls back to untrimmed for all.

Return type:: Optional[float]

P-ESTOI with Forced Alignment¶

class pathbench.p_estoi_evaluator.ForcedAlignmentPESTOIEvaluator(model_id='facebook/wav2vec2-xlsr-53-espeak-cv-ft', **kwargs)[source]¶

Bases: ReferenceEvaluator

An evaluator that uses P-ESTOI to compute a score after trimming silence using forced alignment.

score(utterance_id, audio_path, transcription, language, reference_audios, start_time, end_time, **kwargs)[source]¶

Computes the P-ESTOI score after trimming silence.

Return type:: Optional[float]

VSA (Vowel Space Area)¶

class pathbench.vsa_evaluator.VSAEvaluator(gender=None, visualize=False)[source]¶

Bases: LanguageAwareSpeakerEvaluator

An evaluator that computes the Vowel Space Area (VSA) for a speaker.

The algorithm is based on: -

The vowel formant data for initialization is from: - English: Hillenbrand, J., Getty, L. A., Clark, M. J., & Wheeler, K. (1995).

Acoustic characteristics of American English vowels. JASA, 97(5), 3099-3111.

Dutch: Adank et al. We use souther Dutch formant values due to Belgian context.
Italian: Bertinetto, P. M. (?) “The sound pattern of Standard Italian, as compared

with the varieties spoken in Florence, Milan and Rome.” - Spanish: Bradlow: A comparative acoustic study of English and Spanish vowels

Mandarin: Yang: Vowel production by Mandarin speakers of English

When contributing new values please include the reference for the paper and cross-check. A future method be perhaps using controls for formant initialisation.

ASR-based Evaluators¶

class pathbench.asr_evaluators.ASREvaluator(model_id)[source]¶

Bases: ReferenceTxtEvaluator

Computes WER using an ASR model.

score(utterance_id, audio_path, transcription, language, start_time=0.0, end_time=-1.0)[source]¶

Return type:: Optional[float]

class pathbench.asr_evaluators.PEREvaluator(language)[source]¶

Bases: ReferenceTxtEvaluator

Computes PER using a language-specific ASR model.

score(utterance_id, audio_path, transcription, language, start_time=0.0, end_time=-1.0)[source]¶

Return type:: Optional[float]

class pathbench.asr_evaluators.DirectPEREvaluator[source]¶

Bases: ReferenceTxtEvaluator

Computes PER using the espeak-cv-ft model directly.

score(utterance_id, audio_path, transcription, language, start_time=0.0, end_time=-1.0)[source]¶

Return type:: Optional[float]

class pathbench.asr_evaluators.DoubleASREvaluator(language)[source]¶

Bases: ReferenceFreeEvaluator

Computes PER between greedy and LM-based CTC decoding.

score(utterance_id, audio_path, start_time=0.0, end_time=-1.0)[source]¶

Return type:: Optional[float]

Articulatory Precision¶

class pathbench.articulatory_precision_evaluator.PhoneticConfidenceEvaluator(model_id='facebook/wav2vec2-xlsr-53-espeak-cv-ft', use_exp=False)[source]¶

Bases: ReferenceFreeEvaluator

An evaluator that scores based on the model’s average confidence in its own greedy-decoded phoneme sequence (no reference text used).

score(utterance_id, audio_path, start_time=0.0, end_time=-1.0)[source]¶

Computes the phonetic confidence score.

Return type:: Optional[float]

class pathbench.articulatory_precision_evaluator.ArticulatoryPrecisionEvaluator(model_id='facebook/wav2vec2-xlsr-53-espeak-cv-ft')[source]¶

Bases: ReferenceTxtEvaluator

An evaluator that uses a wav2vec 2.0 model to compute articulatory precision.

score(utterance_id, audio_path, transcription, language, start_time=0.0, end_time=-1.0)[source]¶

Computes the articulatory precision score.

Return type:: Optional[float]

Articulatory Precision with Double ASR¶

class pathbench.artp_double_asr_evaluator.ArtPDoubleASREvaluator(language, model_id='facebook/wav2vec2-xlsr-53-espeak-cv-ft')[source]¶

Bases: ReferenceFreeEvaluator

An evaluator that uses a wav2vec 2.0 model to compute articulatory precision.

score(utterance_id, audio_path, start_time=0.0, end_time=-1.0)[source]¶

Computes the articulatory precision score.

Return type:: Optional[float]

F0 Range / Pitch¶

class pathbench.f0_range_evaluator.StdPitchEvaluator[source]¶

Bases: ReferenceFreeEvaluator

An evaluator that computes the standard deviation of the pitch in semitones.

score(utterance_id, audio_path, start_time=0.0, end_time=-1.0)[source]¶

Return type:: Optional[float]

class pathbench.f0_range_evaluator.F0RangeEvaluator[source]¶

Bases: ReferenceFreeSpeakerEvaluator

An evaluator that computes the F0 range for a speaker.

score(audio_files)[source]¶

Return type:: Optional[float]

WADA-SNR¶

pathbench.wada_snr.wada_snr(wav)[source]¶

Return type:: float32

class pathbench.wada_snr.WadaSnrEvaluator[source]¶

Bases: ReferenceFreeEvaluator

An evaluator that scores based on the WADA SNR of the audio.

score(utterance_id, audio_path, start_time=0.0, end_time=-1.0)[source]¶

Returns the WADA SNR of the audio file.

Return type:: Optional[float]

Speech Rate¶

class pathbench.speech_rate.WpmEvaluator[source]¶

Bases: ReferenceTxtEvaluator

An evaluator that scores based on the speech rate (words per minute).

score(utterance_id, audio_path, transcription, language, start_time=0.0, end_time=-1.0)[source]¶

Returns the speech rate in words per minute (WPM).

Return type:: Optional[float]

class pathbench.speech_rate.PraatSpeechRateEvaluator[source]¶

Bases: ReferenceFreeEvaluator

An evaluator that scores based on the speech rate (syllables per second) using a Python translation of a Praat script by de Jong and Wempe.

score(utterance_id, audio_path, start_time=0.0, end_time=-1.0)[source]¶

Returns the speech rate in syllables per second.

Return type:: Optional[float]

Age Evaluator¶

class pathbench.age_evaluator.Spk2AgeEvaluator(spk2age, utt2spk)[source]¶

Bases: LookupEvaluator

An evaluator that uses a pre-computed spk2age mapping.

score(utterance_id)[source]¶

Return type:: Optional[float]