The robustness of this task can be improved by applying transforms like SPLICE and CMVN. An architecture of composed weighted finite-state transducers is used for this task by the open-source package Kaldi, which does not support the frequently-used CTC loss function in alignment during this task. Classical algorithms for this task extract 39 MFCC features for each time frame of a sliding window, then feed them into an (*) GMM-HMM. Context-dependent models are used in this task to account for allophones. Probabilistic methods for this task compute the product of an acoustic model and a language model to select the most likely word sequence given a sound signal. For 10 points, speech synthesis is the reverse of what natural language processing task used by digital assistants like Siri to process user input? ■END■
ANSWER: speech recognition [or automatic speech recognition or ASR; accept transcription or speech-to-text or STT; accept speech alignment; prompt on speech processing or natural language processing until read; prompt on vocal recognition or voice recognition; prompt on answers like understanding speech; reject “text-to-speech” or "TTS" or "speech synthesis"; reject "speaker identification" or "speaker verification" or "vocal identification”]
<VD, Other Science>
= Average correct buzz position