22 March 2018
Time: 2:00 - 3:00pm
Venue: Engineering W128D.2
Introducing NordFA – Forced Alignment of Nordic Languages
Queen Mary University of London
Forced alignment is a tool that has helped progress phonetic research on major languages, especially English, for over a decade. But a divide persists for smaller languages. Researchers of Nordic phonetics and phonology devote comparatively more resources on manual segmentation. Time spent on segmentation can take up to 400 times real time or 30 seconds per phone (Yuan, Ryant, Liberman, Stoloke, Mitra & Wang, 2013). The divide is even more pronounced when comparing the utility of large corpora like DanPASS (Grønnum, 2009) with ones like the Philadelphia Neighborhood Corpus (Labov & Rosenfelder, 2011). The former has very little segmented material; the latter is entirely segmented.
To address this, I introduce Forced Alignment of Swedish (SweFA) and Forced Alignment of Danish (DanFA) as part of the Forced Alignment of Nordic Languages adaptation suite (NordFA). Norwegian is not part of the release but is planned for the future.
NordFA is adapted from the original architecture of Forced Alignment and Vowel Extraction (FAVE; Rosenfelder, Fruehwald, Evanini & Yuan, 2011). It incorporates the Hidden Markov Model Toolkit (HTK; Young, Woodland & Byrne, 1993) to create a phonetically-segmented TextGrid for Praat (Boersma & Weenink, 2017) from sound files and their orthographic transcriptions. Like the original FAVE – and unlike Prosodylab-aligner (Gorman, Howell, & Wagner, 2011) or MAUS (Schiel, 2015) – NordFA does not require the one-by-one input of intonational phrases. It accepts an entire sound file of any duration.
SweFA is the most developed of the two and currently exists for Stockholm Swedish. Its phonetic dictionary contains more than 2.9 million entries, which include elided and syncopated pronunciations (konstnärerna >> konsnärna), inflected forms (prata, pratar), and the most common compound words (otrevlig, jättetrevlig). Moreso, it has multiethnolectal entries. It also has a “powersandher” that identifies retroflex coalescence (för sig >> fö rsig) and apocopes (ringde >> ringd). It codes vowels for lexical pitch accent 1, 2, and compound-word pitch accent 2.
Tested on a casual speech recording of young multiethnolectal men in Stockholm, the phonetic dictionary covered 99.8% of all words (n=6284). Compared with manual alignment for 606 monophones, mean boundary displacements at onsets were 0.021 seconds and 0.020 seconds at offsets. Root mean square deviations were 0.030 and 0.029 for onsets and offsets, respectively.
DanFA’s pronunciation dictionary contains over 200,000 entries and covers 99.5% (n=53,976) of the dialogue transcriptions in DanPASS (Grønnum, 2009). Multiethnolectal slang and schwa-assimilated pronunciations are not yet included but will be part of the next release. The prototype’s test of 144 Copenhagen monophones has rendered promising results.
Boersma, P. & Weenink, D. (2017). Praat: doing phonetics by computer [computer program]. Version 6.0.29, retrieved 24 May 2017 from http://www.praat.org/
Gorman, K., Howell, J. & Wagner, M. (2011). Prosodylab-aligner: A tool for forced alignment of laboratory speech. Canadian Acoustics, 39(3), 192-193.
Grønnum, N. (2009). A Danish phonetically annotated spontaneous speech corpus (DanPASS), Speech Communication, 51(7), 594-603.
Labov, W. & Rosenfelder, I. (2011). The Philadelphia Neighborhood Corpus of LING 560 studies, 1972-2010. With support of NSF contract, 921643.
Rosenfelder, I., Fruehwald, J., Evanini, K. & Yuan, J. (2011). F.A.V.E. (Forced Alignment and Vowel Extraction) [computer program]. Retrieved from http://fave.ling.upenn.edu.
Schiel, F. (2015). A statistical model for predicting pronunciation. In: Proc. of the International Conference on Phonetic Sciences, Glasgow, United Kingdom, paper 195.
Young, SJ., Woodland, PC. & Byrne, WJ. (1993). HTK Version 1.5: User, Reference and Programmer Manual. Washington DC: Entropic Research Laboratories.
Yuan, J., Ryant, N., Liberman, M., Stolcke, A., Mitra, V. & Wang, W. (2013). Automatic phonetic segmentation using boundary models. In Proceedings of Interspeech, 2306-2310.