Konstanz Prosodically Annotated Infant-Directed Speech (KIDS) Corpus

Description of KIDS

The KIDS corpus is the first prosodically annotated infant-directed speech corpus in German – a tool for formulating hypotheses and modeling acquisition processes in the prosodic domain and at the prosody-syntax interface. This multi-layered corpus consists of 524 intonation phrases (IPs) directed to infants younger than one year (196 IPs extracted from the CHILDES database; 328 IPs from our own recordings). Pitch accents (n=832) and boundary tones (n=1048) were labeled according to GToBI (Grice, Baumann & Benzmüller 2005). Furthermore, we annotated the presence of unaccented syllables and pitch targets before and after the accentual syllable. Such an additional theory-neutral prosodic annotation is important as we do not know whether infants are more sensitive to the pitch movement leading to the accented syllable (a.k.a. onglides) or to the pitch movement following the accented syllables (a.k.a. offglides). The current corpus hence captures the tonal surroundings on both sides of the accented syllable. We also tagged the word-prosodic structure of all accented words (e.g., trochaic, iambic) and the syntactic category of both accented and unaccented words (e.g., noun, verb, adjective).

Example Annotation ( ->play sound )

Figure 1: An example annotation showing a smoothed pitch contour and all ten annotation layers, together with the corresponding sound file.