Annotation - Data Analysis

Two trained annotators (first and last author of Zahner, Schönhuber, Grijzenhout & Braun (2016)) labeled the corpus together, using praat (Boersma & Weenink 2014). For each wav-file, a corresponding TextGrid-file consisting of ten tiers was created, see example annotation below. In the following, we specify the information that is provided on each tier. We also indicate whether the information is annotated on an interval or a point tier. 

  1. Intended representation of utterance 
    (orthographic transcription in German; interval tier)

  2. Actual realization of utterance 
    (orthographic transcription; e.g., "habn" for "haben"; interval tier)

  3. Word category of both accented and unaccented words 
    (simple categories, e.g., "adj" for adjective; (word category labels); point tier)
     
  4. Word category of both accented and unaccented words 
    (more detailed categories, following the guidelines of STTS (Stuttgart - Tübingen Tagset), (Schiller, Teufel, Stöckert & Thielen 1999), e.g., "ADJA" for adjective in attributive position or "ADJD" for an adjective used predicatively or adverbially); (word category labels); point tier)

  5. Accented syllables 
    (orthographic transcription; interval tier)

  6. Word-prosodic structure of the accented word 
    (point tier)
     
    • S: primary stressed syllable (e.g., S for "Maus")
    • W: unstressed, weak syllable (e.g., SW for "'Mama" or WS for "Mu'sik")
    • s: secondary stressed syllable (typically in compounds, e.g., SWsW for "'Sandel,eimer")
  7. Prosodic domain of accent 
    (indication of availability of unaccented syllables to the left or right of the accented syllable (=a) on which leading or trailing tones could be realized; 1 = unaccented syllables available; 0 = no unaccented syllables available; analysis is performed irrespective of word boundaries; point tier)

    • 0a0: accented syllable is immediately surrounded by other accented syllables or boundary tones (e.g., % "NEIN" %; capitalization indicates the accented syllable; % indicates an IP boundary)
    • 1a1: accented syllable has at least one unaccented syllable to its right and its left (e.g., "geSCHLAfen", "was MACHST du", "der RAsselt")
    • 0a1: accented syllable has at least one unaccented syllable to its right and is preceded by another pitch accent or boundary tone (e.g., % "KAtze", % "SCHAU mal", % "HINsetzen")
    • 1a0: accented syllable has at least one unaccented syllable to its left and is immediately followed by another pitch accent or boundary tone (e.g., "MuSIK" %, "mit SAND" %)
  8. GToBI annotation 
    (pitch accentsIP and ip boundaries are annotated; point tier)
     
  9. Tritonal pattern analysis 
    (for 1a1-condition (accented material available on both sides of the accented syllable): indication of tonal surrounding on both sides of the accented syllable; point tier)
    For more details on the motivation for this analysis and precise labeling conventions see Zahner, Pohl & Braun (2015) (paper) and Zahner, Schönhuber, Grijzenhout & Braun (2016) (paper). 

    - Similar to ToBI (Silverman et al. 1992), the tone associated with the accented syllable is marked by an asterisk (e.g., LH*L, HH*L)

    - If the preceding or following tonal target is not associated with a syllable adjacent to the accented syllable, this separation of tonal targets is indicated by ".." (e.g., LH*..L)
     
  10. Comments
    (e.g., "overlaid speech", "onomatopoetica", "breathy voice", "extraordinary wide/narrow pitch range"; point tier)

More details on the data analysis can be found in the paper introducing the KIDS Corpus (Zahner, Schönhuber, Grijzenhout & Braun 2016) (paper).

 

 

Example annotation

( ->play sound )

Figure 1: An example annotation showing a smoothed pitch contour and all ten annotation layers, together with the corresponding sound file.