-
Notifications
You must be signed in to change notification settings - Fork 0
/
appendix_a.Rmd
76 lines (52 loc) · 12.2 KB
/
appendix_a.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
\label{appendix:a}
## Stimulus Considerations {-}
The following list summarizes concerns that were taken into consideration when constructing the stimulus set (see full set in Table \@ref(tab:targetlist)):
* Glides were excluded from the experimental set due to their complex status, which is dependent on both structure and theory. A glide in the nucleus is considered a vowel (sometimes called *semi-vowel*), while a glide immediately adjacent to a nuclear vowel may be analyzed as a vowel in the nucleus, or as a consonant in the onset or coda positions, depending on language and analysis (namely, this depends on whether the language is considered to feature *diphthongs* or not, in itself not always a simple determination). Furthermore, clusters with glides in C~1~ are predicted to be ill-formed in all the models we consider, while clusters with glides in C~2~ are predicted to be well-formed in all of them. We therefore also do not expect them to be very informative with respect to our comparison.
* For the class of liquids, only the lateral /l/ is used, disregarding the sub-class of *rhotics* that are phonetically very varied and highly inconsistent between different languages in terms of phonetic detail [see, e.g., @lindau1980storysk; @ladefoged1996rhotics; @wiese2001phonology]. In that context, it is important to note that the set of stimuli used in this stduy was created with the intention of being used on speakers of many different languages in which the relevant segments can map to native segments to a comparable degree. There are therefore no liquid plateaus in the experimental set.
* The alveolar /s/ is used for the class of voiceless sibilants (voiceless coronal fricatives). In C~1~ positions, the post-alveolar /ʃ/ is also used to control for potential language-specific effects that may appear due to selective restrictions on /s/. For example, in German /ʃC/ onset clusters can be licit, while /sC/ onset clusters occur only marginally in loanwords.
* Stops are used only in C~2~ position, and only voiceless stops are used in order to keep the size of the stimulus set reasonably small. Stops in C~1~ position are avoided since it is also the phrase-initial position of the stimuli, which is practically devoid of acoustic cues for the closure phase of the stop. Within the stream of speech, the movement of articulators towards the target of a stop's closure phase leaves auditory traces from the preceding segment and into the closure of the stop, containing important information about the identity of the stop [e.g. @barry1984place]. In that sense, a stop in C~1~ position at the beginning of a phrase contains only a transient release burst.
Furthermore, note that all the stop-initial clusters (with the exclusion of *stop-stop* plateaus) are generally well-formed according to all sonority models tested here, such that their added value in this comparison would have been smaller than their cost (in terms of the size of the stimulus set).
* The set includes one instance of the dorsal consonant /k/ instead of the coronal /t/ as an alternative to a labial-coronal cluster, which would have required a labial liquid. Instead, the *coronal-dorsal* cluster /lk/ is used to retain the same direction of a labial-coronal cluster---both are front-to-back in terms of places of articulation. There are no other dorsals in the set (i.e., no fricative, nasal or liquid dorsal), as these tend to be relatively more marked and less consistent between languages.
* Lastly, sequences of obstruents that differ in voicing are avoided due to the cross-linguistic tendency of obstruent clusters to agree in voicing [see, e.g., @cho1990typologysk], although note that German allows /ʃv/ and /t͡sv/
clusters while banning /ʃf/.
## Audio Recordings {-}
Audio stimuli for the experiment were recorded by a phonetically trained native Hebrew speaker, AA, and a phonetically trained native German speaker, HN, in a sound-attenuated booth at the phonetics laboratory of the
<!-- first author's university. -->
University of Cologne.
Speech was recorded via a head-mounted headset condenser microphone (AKG C420), capturing mono digital audio files at a resolution of 44.1k Hz sample-rate and 24 bit depth with a *Metric Halo MIO 2882* audio interface. Selected audio takes were treated in the original high resolution for DC offset correction and compression of ultra low frequencies under 52 Hz (to compensate for some room reverberation effects). Audio was then downgraded from 24 to 16 bit depth with dithering (*Goodhertz Good Dither*) to be used in the perception task running on *OpenSesame* 3.1.9 [@mathot2012opensesame]. The audio that was submitted to analyses by the *APP Detector* (see *Obtaining Periodic Energy Data* below) was also downgraded in sample-rate to 16k Hz. Finally, all audio takes, at all resolutions, were normalized to the same RMS target of -20 dBFS (*dB Full Scale*).
To record the stimuli, the combined 87 word-like stimuli (29 targets and 58 fillers) were embedded within carrier sentences in non-final position and produced with default declarative intonation in order to maintain consistent prosody. Carrier sentences were also designed to minimize potential effects of resyllabification as well as co-articulation by controlling the segmental makeup immediately preceding and following target words (see examples in (\@ref(ex:sentence1)--\@ref(ex:sentence2))).
The original sentence elicitation lists are available at the OSF repository in mixed Hebrew, German and phonemic transcripts in the form of slide presentations that were used in this self-paced task (see *Supplementary Data* below).
\begin{exe}
\ex \textbf{/ze maʁˈgiʃ \emph{CCal} kaˈʁe.ga/} (Hebrew: ʼit feels (like) \emph{CCal} at the momentʼ) \label{ex:sentence1}
\ex \textbf{er muss \emph{CCal} kaufen} (German: ʼhe must buy \emph{CCal}ʼ) \label{ex:sentence2}
\end{exe}
## Obtaining Periodic Energy Data {-}
Continuous measurements of periodic energy from acoustic signals were extracted for the experiments using the *Aperiodicity, Periodicity and Pitch (APP) Detector*, a computer code that was introduced in @deshmukh2003detectionsk and developed in subsequent publications [@deshmukh2005use; @vishnubhotla2007detection].
The APP Detector has the ability to measure the spectral distribution of periodic energy from digital audio files with a 16k Hz sample-rate, effectively measuring periodic energy up to 8k Hz.
The periodic energy data was exported from the APP Detector's Matlab analysis tables into R [@R-base] for further data manipulation, visualization, modelling, and statistical analysis.
To obtain the periodic energy curve, it is necessary to first sum over the different frequencies that the APP Detector measures at each time point (every 10 ms) to create a time series of *periodic power*.
Next, a smoothed curve is fitted to the periodic power time series with *Tukey's (Running Median) Smoothing* ("3RS3R"), to eliminate small-scale fluctuations in the periodic power curve.
Finally, the periodic power time series is log-transformed to yield *periodic energy*, see \@ref(eq:perLog).
Within the log-transform function we can plug a value that reflects the threshold of effective voicing periodicity to set a meaningful zero for the periodic energy floor---the *periodic floor* in \@ref(eq:perLog).
This is similar to the standard *dB SPL* measurement (SPL stands for *Sound Pressure Level*), which plugs a generic value that represents the threshold of human hearing in terms of sound pressure into the denominator of the log-transform function. In this way, SPL suggests a shared reference for different dB measurements that use a meaningful zero value.
In the case of the current periodic energy measurement the threshold of the floor is not a universal determination but a calibration that allows us to take the audio quality and the inner-workings of the APP Detector into account.
The effective periodicity threshold for the log-transform of the periodic energy time series was determined by extracting the maximal periodic power value obtained for voiceless portions in the set (to be sure that there was no marginal voicing in this sample, only voiceless C~1~ consonants that precede another voiceless consonant in C~2~ were measured). In this way, the value $\mathit{0}$ in our periodic energy curve is optimally calibrated to reflect the floor of pitch-related periodic components in the signal.
\begin{equation}
periodic~energy = 10 \cdot \log_{10}\!\left(\frac{periodic~power}{periodic~floor}\right) \label{eq:perLog}
\end{equation}
Note that the periodic energy curve is smoothed further with Local Polynomial Regression Fitting (*loess*) in the figures. This is only used for additional visual clarity. All the acoustic analyses are based on the periodic energy curve before this final aesthetic smoothing (and after the other processes mentioned above). All the periodic energy curves in the figures are normalized to fit a $0$--$1$ scale.
The codes of all the above processes are available in the OSF repository (see *Supplementary Data* below).
## Perception Task Details {-}
Recall that the experiments were designed as a forced-choice 2-alternative perception task, where accuracy and response time information were collected.
To normalize response times, the countdown in each trial started in the middle of the transition from C~2~ to /a/, illustrated with the location of the dash in *(ə)C~1~(ə)**C~2~--a**l*. This zero time point was determined individually for each one of the 87 stimuli, capitalizing on the fact that all stimuli share the rime *al*, which is fully predictable in the context of the experiment, in contrast to the unpredictability of preceding material (the predictability of the *al* rime was assumed to become evident already in the training phase, before any data were collected for analysis). Manual segmentations conducted by the first author were used to determine this point for each target. Eventually, response times shorter than 100 ms (i.e., 100 ms after the zero point between C~2~ and /a/) were considered as too fast to be valid and were therefore excluded. This threshold led to only one observation being excluded from Experiment 2.
Participants were seated in a quiet room in front of a laptop computer (a MacBook Air 13-inch, Early 2014) running the experiment on OpenSesame 3.1.9 [@mathot2012opensesame], where they listened to the stimuli through a set of closed headphones (Sennheiser HD 201), fed directly from the laptop's internal audio interface. After verifying that participants shared a standard understanding of the notion of the syllable with a few German or Hebrew examples of words with one and two syllables (e.g, German *See*, *Spaß*, *Quark*, *Angst* vs. *Schu-le*, *Kin-der*, *Bre-zel*, *Pflau-men*), they were instructed to listen to nonce words in an "unknown" foreign language.
The German-speaking listeners in Experiments 1--2 heard speech recording of a native Hebrew speaker and the Hebrew-speaking listeners in Experiment 3 heard speech recording of a native German speaker in order to promote bottom-up inferences (see Subsection \@ref(sec:rationale)).
Participants were instructed to respond quickly and accurately whether they heard one or two syllables by using their left and right index fingers to choose '1' or '2' at the location of the ‘F' (for '1') and ‘J' (for '2') keys on a QWERTY keyboard layout (relevant keys were covered with salient red-on-white ‘1' and ‘2' stickers).
A training session of ten trials preceded the experimental blocks, allowing the participants to familiarize themselves with the task, and allowing the experimenter to adjust listening volume and monitor potential problems and misunderstandings regarding the task.
This study was carried out in accordance with the recommendations of the Local Ethics Committee
of the University of Cologne
with written informed consent from all subjects. The experiments complied with the June 1964 Declaration of Helsinki (carried out by the World Medical Association and entitled “Ethical Principles for Medical Research Involving Human Subjects”), as last revised.
The ethics approval was obtained by the Principal Investigator (Prof. Dr. Martine Grice).
## Supplementary Data {-}
Supplementary data to this article can be found online on *Open Science Framework* (OSF) at <https://osf.io/y477r/>.
## References {-}