-
Notifications
You must be signed in to change notification settings - Fork 0
/
CogSciNAP_reSubmit_again.Rmd
2527 lines (2009 loc) · 219 KB
/
CogSciNAP_reSubmit_again.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title : "Modeling sonority in terms of pitch intelligibility with the Nucleus Attraction Principle"
shorttitle : "Modeling sonority with the Nucleus Attraction Principle"
author:
- name : "Aviad Albert"
affiliation : "1"
corresponding : yes # Define only one corresponding author
address : "Department of Linguistics -- Phonetics, University of Cologne, Herbert-Lewin-Straße 6, 50931 Cologne, Germany"
email : "[email protected]"
- name : "Bruno Nicenboim"
affiliation : "2,3"
affiliation:
- id : "1"
institution : "Department of Linguistics -- Phonetics, University of Cologne"
- id : "2"
institution : "Department of Cognitive Science and Artificial Intelligence, Tilburg University"
- id : "3"
institution : "Department of Linguistics, University of Potsdam"
authornote: |
Author Note: In accordance with the Peer Reviewers' Openness Initiative (opennessinitiative.org), all materials and scripts associated with this manuscript were made available during the review process and will remain available at the following OSF repository: https://osf.io/y477r/.
abstract: |
*Sonority* is a fundamental notion in phonetics and phonology, central to many descriptions of the syllable and
to various useful
predictions in phonotactics.
Although widely-accepted, sonority lacks a clear basis in speech articulation or perception, given that traditional formal principles in linguistic theory are often exclusively based on discrete units in symbolic representation and are typically not designed to be compatible with auditory perception, sensorimotor control, or general cognitive capacities.
On top of that, traditional sonority principles also exhibit systematic gaps in empirical coverage.
Against this backdrop, we propose an incorporation of symbol-based and signal-based models to adequately account for sonority in a complementary manner.
We claim that sonority is primarily a perceptual phenomenon related to pitch, driving the optimization of syllables as pitch-bearing units in all language systems. We suggest a measurable acoustic correlate for sonority in terms of *periodic energy*, and we provide a novel principle that can account for syllabic well-formedness, the *Nucleus Attraction Principle* (NAP).
We present perception experiments that test our two NAP-based models against four traditional sonority models and we use a Bayesian data analysis approach to test and compare them. Our symbolic NAP model outperforms all the other models we test, while our continuous bottom-up NAP model manages to reach the second place, along with the best performing traditional models.
We interpret the results as providing strong support for our proposals:
(i) the designation of periodic energy as the acoustic correlate of sonority;
(ii) the incorporation of continuous entities in phonological models of perception; and
(iii) the dual-model strategy that separately analyzes symbol-based top-down processes and signal-based bottom-up processes in speech perception.
keywords : "Sonority; Pitch intelligibility; Periodic energy; Bayesian data analysis; Speech perception; Phonetics and Phonology"
wordcount : "X"
bibliography : ["bibs/r-references.bib", "bibs/methods.bib", "bibs/phon.bib", "bibs/phon_sk.bib"]
appendix:
- "./appendix_a.Rmd"
- "./appendix_b.Rmd"
floatsintext : no
figurelist : no
tablelist : no
footnotelist : no
linenumbers : no
mask : no
draft : no
numbersections : yes
documentclass : "apa6"
classoption : "doc"
output :
papaja::apa6_pdf:
latex_engine: xelatex
includes:
in_header: load.tex
keep_tex: yes
---
```{r setup, include = FALSE}
library("papaja")
```
```{r knitush, cache=FALSE,include=FALSE}
# global chunk options
knitr::opts_chunk$set(cache=TRUE, autodep=TRUE,fig.path='figure/graphics-', fig.align='center')#, dev="cairo_pdf")
# options(tinytex.clean=TRUE)
```
```{r libraries, message = FALSE}
library(R.matlab)
library(ggplot2)
library(dplyr)
library(Cairo)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library(stringr)
library(readr)
library(purrr)
library(tidyr)
library(loo)
library(brms)
library(ggrepel)
```
```{r prepare-mat-per, include=FALSE}
# 60 x length of the audio file binned each 10 ms; 60 frequency bins with 10 ms for each column
dir_mats <- c("data_tables/APPd_txt_matrices/AA/",
"data_tables/APPd_txt_matrices/HN/")
files_mat <- list.files(path=dir_mats, pattern="*.txt",full.names=TRUE)
#creates a dataframe with 3 columns, the name of the syllable, syl, the time point, t, and the periodic energy, p
per_df <- map_dfr(files_mat, function(f){
#this is like a loop, it takes each file inside the f and does the following:
# Read the file
mat <- read_csv(f,col_names = FALSE,
col_types = cols(.default = col_double()))
#Sum of every column of the matrix
# vector_weights <- c(rep(1,8),rep(1.1,2),rep(1.3,3),rep(1.4,4),rep(1.5,5), rep(1.4,3), rep(1.3,5),rep(1.2,6),rep(1.1,10),rep(1,15))
#mat <- mat * vector_weights
per <- colSums(mat)
#Extract the name of the syllable form the filename: looks for //(syllable).
filename <- str_match(f,"//(.*?)_(.*?)\\.")
syl <- filename[,2]
speaker <- filename[,3]
tibble(syl=syl,t=(0:(length(per)-1))*10,per=per,speaker=speaker)
})
# Periodic energy at every time point
per_df_full <- per_df %>% group_by(syl,speaker) %>%
mutate(smooth_per = unclass(smooth(per,"3RS3R")))
#head(per_df_full)
# ```
#
# ```{r prepare-seg, include=FALSE}
dir_seg <- c("data_tables/praat_seg/AA/",
"data_tables/praat_seg/HN/")
files_praat <- list.files(path=dir_seg, pattern="*.txt",full.names=T)
seg_df <- map_dfr(files_praat, function(f){
#this is like a loop, it takes each file inside the f and does the following:
# filename <- str_match(f,"/([^/]*?)_(.*?)_(.*?)\\.")
filename <- str_match(f,"//(.*?)_(.*?)_(.*?)\\.")
seg <- read_tsv(f, col_types =cols(
rowLabel = col_character(),
tmin = col_double(),
text = col_double(),
tmax = col_double()
)) %>% select(-rowLabel) %>%
mutate(syl = filename[,2],
speaker = filename[,3],
text = str_extract_all(syl, ".")[[1]],
position=row_number(),
t = map2(tmin,tmax, ~ round(seq(.x,.y,.01)*1000)) ) %>%
tidyr::unnest(cols = c(t)) %>%
select(-tmin, -tmax)
})
syl_info <- left_join(per_df_full,seg_df,by = c("syl", "t", "speaker"))
## head(syl_info)
## syl t per speaker smooth_per text position
## <chr> <dbl> <dbl> <chr> <dbl> <chr> <int>
## 1 cefal 0 0 AA 0 c 1
## 2 cefal 10 2.37 AA 2.37 c 1
## 3 cefal 20 4.97 AA 4.97 c 1
## 4 cefal 30 5.80 AA 4.99 c 1
## 5 cefal 40 4.99 AA 4.99 c 1
## 6 cefal 50 4.06 AA 4.06 c 1
# ```
#
# ```{r log-transform, include=FALSE}
subset_voiceless_thresh <- filter(syl_info,
#nchar(as.character(syl))>=4,
position==1,
syl %in% c("cfal","cpal","fsal","ftal","sfal","spal"))
per_thresh <- max(subset_voiceless_thresh$per)
syl_info <- group_by(syl_info,syl,speaker) %>%
mutate(log_per = ifelse(smooth_per<per_thresh, 0,
10*log10(smooth_per/per_thresh)))
## print(syl_info,n=100)
# ```
#
# ```{r, cog}
syl_info <- syl_info %>% group_by(syl, speaker) %>%
mutate(com_syl = sum(log_per*t)/sum(log_per), # position of CoM of the whole syllable in time
t_left_syl = ifelse(t <= com_syl,t,0),
com_onset = sum(log_per*t_left_syl)/sum(log_per*(t_left_syl>0)),
NAP_bu = -(com_syl - com_onset), #flipped sign
NAP_bu_rel = -(com_syl - com_onset)/com_syl) %>%
select(-t_left_syl)%>%
select(NAP_bu, everything())
# ```
#
# ```{r, monosyl, include=FALSE}
monosyl_info_AA <- filter(syl_info,
nchar(as.character(syl))==4,
speaker == "AA")
monosyl_info_AA$text[which(monosyl_info_AA$text=="c")] <- "ʃ"
monosyl_info_AA$syl <- as.factor(monosyl_info_AA$syl)
monosyl_info_AA <- mutate(group_by(monosyl_info_AA,syl),
## loess smoothing
smog_per = predict(loess(log_per~t, data=monosyl_info_AA$syl, span=0.19, degree = 1, na.rm=T)))
### change negatives to 0
monosyl_info_AA$smog_per[(monosyl_info_AA$smog_per<0)]=0
monosyl_info_AA <- mutate(group_by(monosyl_info_AA, syl, position),
pos_mid = round(mean(t),-1),
pos_end = ifelse(position<4, max(t), NA))
monosyl_info_AA <- mutate(group_by(monosyl_info_AA, syl),
ylim_com_ons = ifelse(t==round(com_onset,-1)&t<lead(t), smog_per, NA),
ylim_com_syl = ifelse(t==round(com_syl,-1)&t<lead(t), smog_per, NA),
x_com_ons = ifelse(t==round(com_onset,-1)&t<lead(t), com_onset, NA),
x_com_syl = ifelse(t==round(com_syl,-1)&t<lead(t), com_syl, NA))
```
```{r monosylHN, include=FALSE}
monosyl_info_HN <- filter(syl_info,
nchar(as.character(syl))==4,
speaker == "HN")
monosyl_info_HN$text[which(monosyl_info_HN$text=="c")] <- "ʃ"
monosyl_info_HN$syl <- as.factor(monosyl_info_HN$syl)
monosyl_info_HN <- mutate(group_by(monosyl_info_HN,syl),
## loess smoothing
smog_per = predict(loess(log_per~t, data=monosyl_info_HN$syl, span=0.19, degree = 1, na.rm=T)))
### change negatives to 0
monosyl_info_HN$smog_per[(monosyl_info_HN$smog_per<0)]=0
monosyl_info_HN <- mutate(group_by(monosyl_info_HN, syl, position),
pos_mid = round(mean(t),-1),
pos_end = ifelse(position<4, max(t), NA))
monosyl_info_HN <- mutate(group_by(monosyl_info_HN, syl),
ylim_com_ons = ifelse(t==round(com_onset,-1)&t<lead(t), smog_per, NA),
ylim_com_syl = ifelse(t==round(com_syl,-1)&t<lead(t), smog_per, NA),
x_com_ons = ifelse(t==round(com_onset,-1)&t<lead(t), com_onset, NA),
x_com_syl = ifelse(t==round(com_syl,-1)&t<lead(t), com_syl, NA))
```
<!-- ## Data analysis {#sec:datanlysis} -->
```{r setZeroRT, include=FALSE, message = FALSE}
syl_t <- filter(syl_info) %>%
# Aviad: no idea why we are subsetting by "a"
filter(text=="a" ) %>%
group_by(NAP_bu, syl, speaker) %>%
summarize(tmin = min(t))
other_scores <- read_tsv(file="data_tables/CCal_model_predictions_fix.tsv",
col_types = cols(
syl = col_character(),
SSP = col_integer(),
SSP_obs = col_integer(),
MSD = col_integer(),
MSD_obs = col_integer(),
NAP_td = col_integer()
))
scores_full <- left_join(syl_t, other_scores) %>% ungroup() %>%
mutate(NAP_bu = ifelse(nchar(syl)<5,NAP_bu,NA_real_))
```
```{r openSesameFunction, include=FALSE}
read_list_opensesame <- function(list_files, speaker = "AA", scores_tbl= scores_full){
GLIDES <- c("wlal","wnal","wzal","wsal","wtal","jmal","jval","jfal","jpal", "welal","wenal","wezal","wesal","wetal","jemal","jeval","jefal","jepal")
scores_speaker <- scores_tbl[scores_tbl$speaker == speaker,]
map_dfr(list_files, ~{
message(.x)
suppressMessages (read_csv(.x)) %>%
filter(practice =="no") %>%
mutate(subj = str_match(logfile, '-([0-9]*)\\.csv')[,2],
RT = response_time /1000) %>%
select(subj, stimulus,RT,correct) })%>%
mutate(stimulus = str_replace_all(stimulus, 'ʃ','c')) %>% # ʃ
filter(!stimulus %in% GLIDES) %>% left_join(scores_speaker,by=c("stimulus"="syl")) %>%
mutate(corrRT = (RT - tmin/1000) * 1000, #in milliseconds
type = factor(ifelse(nchar(stimulus)==4,"CCal",
ifelse(str_detect(stimulus,"^e.*" ),"eCCal","CeCal")),
levels=c("CeCal","eCCal","CCal")),
response = case_when(RT>=3 ~ 0,
correct ==1 & type == "CCal" | correct ==0 & type != "CCal" ~ 1,
TRUE ~ 2))
}
```
```{r readPilot, include=FALSE}
opensesame_pilot <- "data_tables/exploratry_results"
files_pilot <-
c(list.files(path=paste0(opensesame_pilot,"/list1"), pattern="*.csv",full.names = TRUE),
list.files(path=paste0(opensesame_pilot,"/list2"), pattern="*.csv",full.names = TRUE))
data_pilot_all <- read_list_opensesame(files_pilot, speaker="AA")
data_pilot <- data_pilot_all %>%
filter(corrRT > 100)
N_below_100_pilot <- data_pilot_all %>%
filter(corrRT < 100)
#0
# Sanity checks
N_trials_pilot <- 58
all(summarize(group_by(data_pilot,subj),N=n()) %>% pull(N)== N_trials_pilot)
summarize(group_by(data_pilot,subj,type),N=n()) %>% ungroup %>% distinct(type, N)
data_pilot %>% group_by(type,correct) %>% summarize(mean(corrRT))
subj_accuracyPilot <- summarize(group_by(data_pilot,subj,nchar = nchar(stimulus)),n(),accuracy = mean(correct))
subj_accuracyPilot %>% group_by(nchar) %>% summarize(mean(accuracy))
bad_subj <- subj_accuracyPilot %>% filter(nchar==5) %>% summarize(acc=mean(accuracy)) %>%
filter(acc < .75)
```
```{r brmsPilot, include=FALSE}
library(brms)
run_brms <- function(data, chains = 4, iter = 3000, warmup=1000){
data <- data %>% filter(type=="CCal", response ==1) %>%
mutate_at(c("SSP","SSP_obs","MSD","MSD_obs","NAP_td"), ~ factor(., ordered = TRUE)) %>%
mutate(sNAP_bu = NAP_bu -.5)
null_priors <- c(prior(normal(6, 2), class = Intercept),
prior(normal(.5, .2), class = sigma))
effect_priors <- c(null_priors, prior(normal(0,1), class = b),
prior(normal(0,1), class = sd),
prior(lkj(2), class = cor))
message("NAP_bu...")
NAP_bu <- brm(corrRT ~ 1 + sNAP_bu + (NAP_bu|subj), data=data,
prior = effect_priors,
family = lognormal(),
control = list(adapt_delta=.999,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)
message("null...")
null <- brm(corrRT ~ 1 + (1|subj), data=data,
prior = null_priors,
family = lognormal(),
control = list(adapt_delta=.999,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)
message("SSP...")
SSP <- brm(corrRT ~ 1 + mo(SSP) +(mo(SSP)|subj), data=data,
prior = effect_priors,
family = lognormal(),
control = list(adapt_delta=.999,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)
message("SSP_obs...")
SSP_obs <- brm(corrRT ~ 1 + mo(SSP_obs) +(mo(SSP_obs)|subj), data=data,
prior = effect_priors,
family = lognormal(),
control = list(adapt_delta=.999,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)
message("MSD...")
MSD <- brm(corrRT ~ 1 + mo(MSD) +(mo(MSD)|subj), data=data,
prior = effect_priors,
family = lognormal(),
control = list(adapt_delta=.999,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)
message("MSD_obs...")
MSD_obs <- brm(corrRT ~ 1 + mo(MSD_obs) +(mo(MSD_obs)|subj), data=data,
prior = effect_priors,
family = lognormal(),
control = list(adapt_delta=.9995,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)
message("NAP_td...")
NAP_td <- brm(corrRT ~ 1 + mo(NAP_td) +(mo(NAP_td)|subj), data=data,
prior = effect_priors,
family = lognormal(),
control = list(adapt_delta=.999,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)
list(data = data, models = list(NAP_bu = NAP_bu, null = null, SSP=SSP, SSP_obs =SSP_obs, MSD = MSD, MSD_obs =MSD_obs, NAP_td = NAP_td))
}
```
```{r mpilot, include=FALSE}
if(file.exists("data_tables/RDS/m_pilot.RDS")){
m_pilot <- readRDS("data_tables/RDS/m_pilot.RDS")
} else {
m_pilot <- run_brms(data_pilot)
saveRDS(m_pilot, file = "data_tables/RDS/m_pilot.RDS")
}
if(file.exists("data_tables/RDS/kfold_pilot.RDS")){
kfold_pilot <- readRDS("data_tables/RDS/kfold_pilot.RDS")
} else {
kfold_pilot <- map(m_pilot$models, kfold, folds = "stratified", group = "subj", K = 15)
saveRDS(kfold_pilot, file = "data_tables/RDS/kfold_pilot.RDS")
}
## loo::compare(x=kfold_pilot)
```
<!-- real data -->
```{r readGer, include=FALSE}
opensesame_german <- "data_tables/confirmatory_results"
files_german <-
list.files(path=opensesame_german,pattern="*.csv",full.names = TRUE)
data_german_all <- read_list_opensesame(files_german, speaker="AA")
data_german <- data_german_all %>%
filter(corrRT > 100)
N_below_100 <- data_german_all %>%
filter(corrRT < 100)
## Sanity checks
N_trials_german <- 232
all(summarize(group_by(data_german,subj),N=n()) %>% pull(N)== N_trials_german)
summarize(group_by(data_german,subj,type),N=n()) %>% ungroup %>% distinct(type, N)
data_german %>% group_by(type,correct) %>% summarize(mean(corrRT))
subj_accuracyGer <- summarize(group_by(data_german,subj,nchar = nchar(stimulus)),n(),accuracy = mean(correct))
subj_accuracyGer %>% group_by(nchar) %>% summarize(mean(accuracy))
bad_subj <- subj_accuracyGer %>% filter(nchar==5) %>% summarize(acc=mean(accuracy)) %>%
filter(acc < .75)
mono_acc <- subj_accuracyGer %>% filter(nchar==4)
bi_acc <- subj_accuracyGer %>% filter(nchar==5)
bi_acc_excluded <- subj_accuracyGer %>% filter(nchar==5) %>% filter(accuracy > .74)
mean_mono_acc <- mean(mono_acc$accuracy)
mean_bi_acc <- mean(bi_acc$accuracy)
mean_bi_acc_excluded <- mean(bi_acc_excluded$accuracy)
# 1 bad subject
data_german <- data_german %>% filter(!subj %in% bad_subj)
```
```{r brms-models, include=FALSE}
if(file.exists("data_tables/RDS/m_german.RDS")){
m_german <- readRDS("data_tables/RDS//m_german.RDS")
} else {
m_german <- run_brms(data_german, iter=4000, warmup=2000)
saveRDS(m_german, file = "data_tables/RDS//m_german.RDS")
}
```
```{r loo-models, include=FALSE}
if(file.exists("data_tables/RDS/kfold_german.RDS")){
kfold_german <- readRDS("data_tables/RDS/kfold_german.RDS")
} else {
kfold_german <- map(m_german$models, kfold, folds = "stratified", group = "subj", K = 15)
saveRDS(kfold_german, file = "data_tables/RDS/kfold_german.RDS")
}
```
<!-- ```{r weigths-models, include=FALSE} -->
<!-- ## ## loo::compare(x=loo_pilot) -->
<!-- ## loo::compare(x=kfold_german) -->
<!-- ## loo::compare(x=kfold_german[-1]) -->
<!-- ## loo::compare(x=kfold_german[-1][-6]) -->
<!-- ## loo::compare(x=kfold_german[c("MSD_obs","SSP_obs","null")]) -->
<!-- ## loo::loo_model_weights(x=loo_german) -->
<!-- ## compare(loo_german$NAP_td, loo_german$SSP_obs) -->
<!-- ## if(file.exists("data_tables/RDS/weights.RDS")){ -->
<!-- ## weights <- readRDS("data_tables/RDS/weights.RDS") -->
<!-- ## } else { -->
<!-- ## weights <-loo_model_weights(,kfold_german$MSD_obs) -->
<!-- ## xx <- ll_matrix[,c(2,3,4,5,6)] -->
<!-- ## wxx <- loo::stacking_weights(xx) -->
<!-- ## names(wxx) <- colnames(xx) -->
<!-- ## saveRDS(weights, file = "data_tables/RDS/weights.RDS") -->
<!-- ## } -->
<!-- ``` -->
```{r loo-proc, include=FALSE}
## w_a <- model_weights(m_german$models$MSD,
## m_german$models$MSD_obs,
## m_german$models$SSP,
## ## m_german$models$SSP_obs,
## ## m_german$models$NAP_bu,
## ## m_german$models$NAP_td,
## ## m_german$models$null,
## weights = "loo")
data_german_s <- m_german$data
## loos <- loo_german %>% map_dfc( ~
## .x$pointwise[,"elpd_loo"]
## ) %>% {setNames(.,paste0("elpd_",colnames(.)))}
## data_german_s <- data_german_s %>% bind_cols(loos)
## data_g_summary <- data_german_s %>%
## group_by(stimulus, NAP_bu, NAP_td) %>%
## summarize_at(vars(starts_with("elpd")), mean)
## loo_model_weights(loo_german)
predictions <- m_german$models %>% map(~
predict(.x,summary=FALSE) %>%
array_branch(margin = 1) %>%
map_dfr( ~ {
data_german_s %>%
mutate(pred= .x) %>%
group_by(stimulus, NAP_bu , SSP, SSP_obs , MSD , MSD_obs , NAP_td) %>%
summarize(pred = mean(pred))
} )
)
```
```{r fitplots, eval =TRUE, warning=FALSE, include=FALSE}
data_RT <- data_german_s %>%
mutate(NAP = round(NAP_bu,7)) %>%
group_by(stimulus,NAP, NAP_bu , SSP, SSP_obs , MSD , MSD_obs , NAP_td) %>%
summarize(corrRT = mean(corrRT)) %>%
## summarize(corrRT = mean(log(corrRT * 1000))) %>%
bind_rows(tibble(NAP = seq(30,220,10))) %>%
# bind_rows(tibble(NAP = seq(0.367355,.4825569,0.01))) %>%
filter(!is.na(stimulus))
```
```{r readHeb, include=FALSE}
opensesame_hebrew <- "data_tables/confirmatory_results_Heb"
files_hebrew <-
list.files(path=opensesame_hebrew,pattern="*.csv",full.names = TRUE)
data_hebrew_all <- read_list_opensesame(files_hebrew, speaker="HN")
data_hebrew <- data_hebrew_all %>%
filter(corrRT > 100)
N_below_100 <- data_hebrew_all %>%
filter(corrRT < 100)
## Sanity checks
data_hebrew %>% group_by(type,correct) %>% summarize(mean(corrRT))
subj_accuracyHeb <- summarize(group_by(data_hebrew,subj,nchar = nchar(stimulus)),n(),accuracy = mean(correct))
subj_accuracyHeb %>% group_by(nchar) %>% summarize(mean(accuracy))
bad_subj <- subj_accuracyHeb %>% filter(nchar==5) %>% summarize(acc=mean(accuracy)) %>%
filter(acc < .75)
mono_acc <- subj_accuracyHeb %>% filter(nchar==4)
bi_acc <- subj_accuracyHeb %>% filter(nchar==5)
bi_acc_excluded <- subj_accuracyHeb %>% filter(nchar==5) %>% filter(accuracy > .74)
mean_mono_acc <- mean(mono_acc$accuracy)
mean_bi_acc <- mean(bi_acc$accuracy)
mean_bi_acc_excluded <- mean(bi_acc_excluded$accuracy)
# NO bad subject!
data_hebrew <- data_hebrew %>% filter(!subj %in% bad_subj)
# saveRDS(data_hebrew, "data_hebrew.RDS")
```
```{r brms-models-heb, include=FALSE}
if(file.exists("data_tables/RDS/m_hebrew.RDS")){
m_hebrew <- readRDS("data_tables/RDS//m_hebrew.RDS")
} else {
m_hebrew <- run_brms(data_hebrew, iter=4000, warmup=2000)
saveRDS(m_hebrew, file = "data_tables/RDS//m_hebrew.RDS")
}
```
```{r loo-models-heb, include=FALSE}
if(file.exists("data_tables/RDS/kfold_hebrew.RDS")){
kfold_hebrew <- readRDS("data_tables/RDS/kfold_hebrew.RDS")
} else {
kfold_hebrew <- map(m_hebrew$models, kfold, folds = "stratified", group = "subj", K = 15)
saveRDS(kfold_hebrew, file = "data_tables/RDS/kfold_hebrew.RDS")
}
```
```{r loo-proc_heb, include=FALSE}
data_hebrew_s <- m_hebrew$data
predictions_heb <- m_hebrew$models %>% map(~
predict(.x,summary=FALSE,ndraws = 2000) %>%
array_branch(margin = 1) %>%
map_dfr( ~ {
data_hebrew_s %>%
mutate(pred= .x) %>%
group_by(stimulus, NAP_bu , SSP, SSP_obs , MSD , MSD_obs , NAP_td) %>%
summarize(pred = mean(pred))
} )
)
```
```{r fitplots_heb, eval =TRUE, warning=FALSE, include=FALSE}
data_RT_heb <- data_hebrew_s %>%
mutate(NAP = round(NAP_bu,7)) %>%
group_by(stimulus,NAP, NAP_bu , SSP, SSP_obs , MSD , MSD_obs , NAP_td) %>%
summarize(corrRT = mean(corrRT)) %>%
## summarize(corrRT = mean(log(corrRT * 1000))) %>%
bind_rows(tibble(NAP = seq(30,220,10))) %>%
# bind_rows(tibble(NAP = seq(0.367355,.4825569,0.01))) %>%
filter(!is.na(stimulus))
```
<!-- participantsStuff -->
```{r participantsRead, include=FALSE}
## pilot
subjects_pilot <- read.csv("data_tables/subjects/subjects_pilot.csv") %>% select(-X)
## Ger (main)
subjects_main <- read.csv("data_tables/subjects/subjects_main.csv") %>% select(-X) %>% distinct(subject_nr, .keep_all = TRUE)
subjects_main$subject_education <- as.character(subjects_main$subject_education)
subjects_main$subject_education[subjects_main$subject_education=="undergrad"] <- "undergraduate"
subjects_main$subject_education <- as.factor(subjects_main$subject_education)
## Heb
subjects_heb <- read.csv("data_tables/subjects/subjects_heb.csv") %>% select(-X) %>% distinct(subject_nr, .keep_all = TRUE)
subjects_heb$subject_education <- as.character(subjects_heb$subject_education)
subjects_heb$subject_education[subjects_heb$subject_education=="high-school"] <- "school"
subjects_heb$subject_education[subjects_heb$subject_education=="undergrad"] <- "undergraduate"
subjects_heb$subject_education <- as.factor(subjects_heb$subject_education)
```
# Introduction
The following work models the contribution of *sonority* to phonology in a manner that attempts to be compatible with general auditory perception and cognition, as well as with linguistic theory. This is in contrast to many of the traditional formal principles in linguistic theory that do not tend to specify how they interact with general systems of human capacity, such as auditory perception or sensorimotor control. Traditional linguistic principles are mostly generalizations that explain linguistic typologies as they have been depicted in writing systems.
The prevailing sonority-related principles such as the *Sonority Sequencing Principle* (SSP) and its derivatives are, indeed, generalizations of the latter type.
Our novel proposals in this study achieve a better empirical coverage compared to different versions of the SSP that we test in a set of perception tasks, while, at the same time, provide a more comprehensive explanation than common formal linguistic principles to the notion of sonority in phonology and phonetics.
We assume here that a proper model of speech perception involves a bottom-up route and a related, yet separate, top-down route. While both inference routes are capable of selecting between discrete alternatives (e.g., selecting different consonants or syllabic parses), they arrive there in two very different ways: bottom-up processes are based on continuous events and top-down processes are based on existing sets of symbolic entities. In that sense, top-down models are more reminiscent of traditional formal principles in phonology that, more often than not, cover processes that start and end with discrete symbols.
Sonority in our account is strongly related to pitch perception and the major role of pitch in all language systems. Pitch in speech is mediated by syllable-size units, regardless of its role as the lexical *tone* in tone languages or the post-lexical *tune* in intonation systems.
We hypothesize that sonority is related to the strength and clarity of pitch perception, serving as a measure of *pitch intelligibility* that acts as a universal drive to optimize the pitch-bearing ability of syllabic units (see Section \@ref(sec:sonPitch)).
We suggest a measurable acoustic correlate for sonority in terms of *periodic energy*, and we propose a novel principle, the *Nucleus Attraction Principle* (NAP), that accounts for syllabic well-formedness based on general principles of competition in real-time (that is, the extent to which the different portions of the speech signal are good candidates for the syllabic nucleus).
We implement NAP with two complementary models (see Section \@ref(sec:modelimp)): (i) a bottom-up model that directly analyzes continuous acoustic signals; and (ii) a top-down model that is based on discrete units of consonants and vowels.
We present a series of syllable count tasks in Section \@ref(sec:experiments),
designed in order to test our two NAP-based models (applying NAP with bottom-up and top-down approaches) against four traditional sonority models, considering two types of common sonority hierarchies combined with two types of common sonority principles---the *Sonority Sequencing Principle* (SSP) and the *Minimum Sonority Distance* (MSD).
We use a Bayesian data analysis approach to test and compare the six different sonority models. Whereas all the different models are found to be capable of predicting the experimental results to a good extent, the symbolic top-down version of NAP is shown to be the superior model. The bottom-up model of NAP comes in second alongside a few of the traditional models.
We consider this to be a very good performance of the bottom-up model, which is based solely on continuous acoustic signals and still has many potential paths for improvement (e.g., better digital signal processing and improved procedures to estimate competition).
Interestingly, some of the results suggest a relatively high degree of complementarity between the two NAP models, even though they represent the same principle. This is a desirable result for our framework, which advocates the need for two complementary models to account for both dynamic and symbolic processes.
Our set of proposals has many advantages over traditional sonority accounts, including methodological aspects, theoretical perspectives, and, essentially, a better empirical coverage (see Subsection \@ref(sec:discussionResults) for a summary of the experimental results).
Some of the major implications of this study are adderessed in Section \@ref(sec:genDiscussion), where we discuss the division of labor between sonority and other phonotactic factors, demonstrated with a holistic account of the phenomenon of */s/-stop clusters* (Subsection \@ref(sec:division)).
We also discuss the contribution of this work to previous theoretical efforts to incorporate continuous entities in phonology (Subsection \@ref(sec:lingMod)).
Finally, we discuss the debate on the universality of sonority in light of our work (Subsection \@ref(sec:projection)), before we present our conclusions in Section \@ref(sec:conclusions).
In the remainder of this Introduction, we briefly present the relevant background on traditional sonority hierarchies and principles, to cover the basics of their rationale and common application.
## Sonority Background {#sec:sonback}
### Sonority hierarchies {#sec:hierarchies}
A sonority hierarchy is a single scale on which all consonants and vowels can be ranked relative to each other. Early versions of current sonority hierarchies often date back to @sievers1893grundzugesk; @jespersen1899fonetik, and @whitney1865relation.[^cf-debrosses]
While the phonetic basis of sonority hierarchies remains controversial, phonological sonority hierarchies have been primarily based on repeated observations that revealed systematic behaviors of segmental distribution and syllabic organization within and across languages. The general consensus regarding the phonological sonority hierarchy thus stems from attested cross-linguistic phonotactic behaviors of different segmental classes, such as, for instance,
the relatively high frequency of stop-liquid sequences in the onset of complex syllables (e.g., /kl/ in the English word ***cl**ean*)
as well as the opposite liquid-stop sequences in the mirror-image coda position of complex syllables (e.g., /lk/ in the English word *mi**lk***),
against the very low frequency of the opposite scenarios, which posit /lk/ in complex onsets and /kl/ in complex codas.
[^cf-debrosses]: @ohala1992alternatives goes even further back to @debrosses1765traite.
Although there are many different proposals for sonority hierarchies [@parker2002quantifying found more than 100 distinct sonority hierarchies in the literature], a very basic hierarchy that seems to reach a considerable consensus, and is often cited in relation to Clements's [-@clements1990role] seminal paper is given in (\@ref(ex:scale)).[^cf-liquid]
[^cf-liquid]: The group of *liquids* is the most loosely defined, as it includes both lateral *approximants* (namely /l/) and various types of rhotics such as *trills* (/r,ʀ,ʁ/), *taps* (namely /ɾ/), and alveolar and retroflex approximants (/ɹ,ɻ/).
\begin{exe}
\ex \emph{Obstruents} $<$ \emph{Nasals} $<$ \emph{Liquids} $<$ \emph{Glides} $<$ \emph{Vowels} \label{ex:scale}
\end{exe}
The ordering of different speech sounds along the sonority hierarchy is assumed to be universal, in line with the common assumption that sonority has a phonetic basis in perception and/or articulation, yet the patterning of segmental classes as distinct groups along the scale is considered to be language-specific, i.e., based on phonological categorization.
For example, voiceless *stops* may be considered universally lower than voiced *fricatives* on the sonority hierarchy, yet for some languages and analyses the relevant patterning of stpos and fricatives along the sonority hierarchy may consider them together as belonging to the same general class of *obstruents*.
Classes along the sonority hierarchy are most commonly modeled as a series of integers (often referred to as sonority indices) reflecting the ordinal nature of phonological interpretations of the sonority hierarchy.
(ref:hierarchy-caption) (\#tab:hierarchy) Traditional phonological sonority hierarchies
(ref:hierarchy-caption2) Index values reflect the ordinal ranking of categories in sonority hierarchies. The obstruents in *H~col~* are collapsed into one category (bottom four rows = 1), while in *H~exp~* they are expanded into four distinct levels.
\begin{table}[tbp]
\begin{center}
\begin{threeparttable}
\caption{(ref:hierarchy-caption)}
\begin{tabular}{cclcclcclccl}
\toprule
\multicolumn{2}{c}{\textbf{Sonority index values}} & \multicolumn{1}{l}{\textbf{Segmental classes}} & \multicolumn{1}{l}{\textbf{Phonemic examples}}\\
\multicolumn{1}{c}{\emph{H\textsubscript{col}} hierarchy} & \multicolumn{1}{c}{\emph{H\textsubscript{exp}} hierarchy} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{}\\
\midrule
5 & 8 & Vowels & \multicolumn{1}{l}{/u, i, o, e, a/}\\
4 & 7 & Glides & \multicolumn{1}{l}{/w, j/}\\
3 & 6 & Liquids & \multicolumn{1}{l}{/l, r/}\\
2 & 5 & Nasals & \multicolumn{1}{l}{/m, n/}\\
\textbf{1} & \textbf{4} & Voiced Fricatives & \multicolumn{1}{l}{/v, z/}\\
\textbf{1}& \textbf{3} & Voiced Stops & \multicolumn{1}{l}{/b, d, g/}\\
\textbf{1}& \textbf{2} & Voiceless Fricatives & \multicolumn{1}{l}{/f, s/}\\
\textbf{1}&\textbf{1} & Voiceless Stops & \multicolumn{1}{l}{/p, t, k/}\\
\bottomrule
\addlinespace
\end{tabular}
\begin{tablenotes}[para]
\normalsize{\textit{Note.} (ref:hierarchy-caption2)}
\end{tablenotes}
\end{threeparttable}
\end{center}
\end{table}
The main differences that result from variation of the basic hierarchy in (\@ref(ex:scale)) concern the class of obstruents, which may contain voiced and voiceless variants of stops and fricatives (to mention just the most prominent distinctions).
Note that vowels are often also commonly divided into subgroups along the sonority hierarchy [see @gordon2012sonority], but these distinctions will be irrelevant in the context of this paper.
It is therefore not uncommon to expand the class of obstruents, whereby stops are lower than fricatives and voiceless consonants are lower than voiced ones.
Also note that the ranking of voiceless fricatives in relation to voiced stops may be disputed, depending on whether manner distinctions or voicing distinctions take precedence. The version we adopt here is in line with @parker2008sound in suggesting that voicing distinctions take precedence. In any case, these differences are not assumed to bear any crucial consequences on the results of this study.
The two variants of the sonority index values given in Table \@ref(tab:hierarchy) reflect two ends of a spectrum of common sonority hierarchies, ranging from hierarchies that collapse all obstruents together into a single class (resulting in the same sonority index value for all obstruents), to hierarchies that expand the class of obstruents by employing voicing distinctions as well as distinctions between stops and fricatives (resulting in multiple sonority index values within the class of obstruents). In what follows we will refer to these two versions of the sonority hierarchy as *H~col~* for the sonority hierarchy with a single collapsed class of obstruents, and *H~exp~* for the sonority hierarchy that exhibits multiple subclasses based on the expanded class of obstruents.
### Sonority principles {#sec:principles}
Sequencing principles can be understood as a mapping scheme between the ranks of a sonority hierarchy and the linear order of symbolic speech segments.
Modern formulations of such principles, which use the ordinal sonority hierarchy to generalize over the phonotactics of consonantal sequences in terms of *sonority slopes* were developed mainly throughout the 1970s and 1980s in seminal works such as @hooper1976introduction; @steriade1982greek; @selkirk1984majorsk; @harris1983syllablesk; @mohanan1986theory, and @clements1990role.
(ref:slopes-pl-lp) Schematic depiction of the sonority slopes of two onset clusters, *plV* and *lpV*. The red line denotes the sonority slope of the onset cluster (i.e., the two onset consonants), while the grey line denotes the slope between the second consonant and the vowel at the nucleus position (always a rise in these cases). The angle of the red lines reflects the well-formed rising sonority slope of the onset cluster in *plV* and the ill-formed falling sonority slope of the onset cluster in *lpv*. Image taken from @albertIPsonoritysk.
```{r slopes-pl-lp, fig.cap = "(ref:slopes-pl-lp)", fig.asp = .45, out.width = '100%', dev="cairo_pdf"}
seg_type = c("Vowels","Glides","Liquids","Nasals","Voiced Fricatives","Voiced Stop", "Voiceless Fricatives","Voiceless Stops")
seg_token = c("p","l","V","l","p","V")
slopes_plot <- function(df) {
ggplot(df, aes(x=x,y=y, linetype=line, color=line)) +
geom_segment(aes(x=0, xend=7.5, y=-1, yend=-1), color="grey", size=.2, alpha=.5, linetype = "solid") +
geom_segment(aes(x=4.5, xend=4.5, y=-.5, yend=8.5), color="black", size=.2, alpha=.2, linetype = "dotted") +
geom_text(data=tibble(seg_token=seg_type,y=8:1,x=0.2),
aes(label=seg_token,x=x,y=y, hjust=0),inherit.aes = FALSE, size=4, family = "Charis SIL") +
geom_text(data=tibble(seg_token=seg_token,y=0,x=2:7),
aes(label=seg_token,x=x,y=y, hjust=0),inherit.aes = FALSE, size=4, family = "Charis SIL") +
geom_text(data=tibble(seg_token="Sonority slopes: different types",y=10.75,x=4.5),
aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
geom_text(data=tibble(seg_token="well-formed",y=9.5,x=3),
aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL", fontface = "italic") +
geom_text(data=tibble(seg_token="onset rise",y=8.75,x=3),
aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
geom_text(data=tibble(seg_token="ill-formed",y=9.5,x=6),
aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL", fontface = "italic") +
geom_text(data=tibble(seg_token="onset fall",y=8.75,x=6),
aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
geom_line() +
scale_x_continuous("",breaks=2:7, labels=NULL) +
scale_y_continuous("",breaks=1:8, labels=NULL) +
scale_linetype_manual(values=c("solid","solid","solid","solid")) +
scale_color_manual(values=c("red","grey","red","grey")) +
geom_point(color="black", size=1) +
theme(legend.position = "none", axis.line.y = element_blank(), panel.grid = element_blank(), panel.background = element_blank(), axis.ticks = element_blank()) +
coord_cartesian(xlim=c(0.2,7.5))
}
tribble(~x , ~y, ~line,
2, 1, "a",
3, 6, "a",
3, 6, "b",
4, 8, "b",
5,6, "c",
6,1, "c",
6,1, "d",
7,8, "d") %>%
slopes_plot()
```
The most basic and widely used sonority-based principle that employs sonority slopes to derive phonotactic predictions in terms of syllabic well-formedness is the *Sonority Sequencing Principle* (SSP). The SSP is a simple yet powerful generalization that has been used in countless theoretical accounts. The SSP assumes that sequences of segments within syllables preferably rise towards the nucleus of the syllable, where sonority is expected to reach the local maximum.
Consequently, sequences of segments should preferably rise in sonority from the consonant(s) in the syllabic onset to the syllable's nucleus (most often a vowel) and fall from the nucleus to the consonant(s) in the syllabic coda.
In this paper we focus on syllable-initial onset consonant clusters that precede a vowel, whereby a rising sonority slope (e.g., *plV*) is considered well-formed and a falling sonority slope (e.g., *lpV*) is considered ill-formed. Sonority plateaus (e.g., *pkV*) fare in between, giving way to various interpretations depending on the language and analysis. As such, plateaus may be considered as ill- or well-formed [e.g., @blevins1995syllable], although they are generally interpreted as denoting a third, mid-level of well-formedness.
The *Minimum Sonority Distance* [MSD; @steriade1982greek; @selkirk1984majorsk] is a well-known elaboration on the preferred angle of sonority slopes compared to basic applications of the SSP, given that the SSP makes no distinction between different angles of rising or falling slopes. The MSD was designed to prefer onset rises with steep slopes over onset rises with shallow slopes, under the assumption that consonantal sequences in the onset are preferred with a larger sonority distance between them. For instance, *plV* has a steeper rise compared to *bnV* and it is therefore better-formed according to the MSD (see Figure \@ref(fig:slopes-pl-bn)).[^cf-sdp]
[^cf-sdp]: The *Sonority Dispersion Principle* [SDP; @clements1990role; @clements1992sonority] is a slightly different yet related principle that prefers onset rises with a large distance and an equal dispersion of sonority index values across the consonantal sequence and the following vowel. The results of the SDP are highly contingent on the given sonority hierarchy and it is not very clear how to apply the SDP with onset sonority falls [among other problems listed in @parker2002quantifying 22--24]. The SDP is therefore not comparable as a model that can generate the full set of well-formedness predictions for onset clusters. Indeed, the SDP is mostly invoked in relation to the status of the onset versus the coda (not directly related to consonantal clusters), where it is used to highlight the assumption that onsets prefer to maximize sonority distance from the following nucleus while codas prefer to minimize sonority distance from the preceding nucleus.
(ref:slopes-pl-bn) Schematic depiction of the sonority slopes of two onset clusters, *plV* and *bnV* (the red solid line denotes the sonority slope of the onset clusters). The angle of the red lines reflects a steeper rise for *plV* (left) compared with *bnV* (right), due to the larger sonority distance between the consonants in *plV*. Image taken from @albertIPsonoritysk.
```{r slopes-pl-bn, fig.cap = "(ref:slopes-pl-bn)", fig.asp = .4, out.width = '100%', dev="cairo_pdf"}
seg_type = c("Vowels","Glides","Liquids","Nasals","Voiced Fricatives","Voiced Stop", "Voiceless Fricatives","Voiceless Stops")
seg_token = c("p","l","V","b","n","V")
slopes_plot <- function(df) {
ggplot(df, aes(x=x,y=y, linetype=line, color=line)) +
geom_segment(aes(x=0, xend=7.5, y=-1, yend=-1), color="grey", size=.2, alpha=.5, linetype = "solid") +
geom_segment(aes(x=4.5, xend=4.5, y=-.5, yend=8.5), color="black", size=.2, alpha=.2, linetype = "dotted") +
geom_text(data=tibble(seg_token=seg_type,y=8:1,x=0.2),
aes(label=seg_token,x=x,y=y, hjust=0),inherit.aes = FALSE, size=4, family = "Charis SIL") +
geom_text(data=tibble(seg_token=seg_token,y=0,x=2:7),
aes(label=seg_token,x=x,y=y, hjust=0),inherit.aes = FALSE, size=4, family = "Charis SIL") +
geom_text(data=tibble(seg_token="Sonority rises: different slopes",y=10.75,x=4.5),
aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
geom_text(data=tibble(seg_token="steep rise",y=9.25,x=3),
aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
geom_text(data=tibble(seg_token="shallow rise",y=9.25,x=6),
aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
geom_line() +
scale_x_continuous("",breaks=2:7, labels=NULL) +
scale_y_continuous("",breaks=1:8, labels=NULL) +
scale_linetype_manual(values=c("solid","solid","solid","solid")) +
scale_color_manual(values=c("red","grey","red","grey")) +
geom_point(color="black", size=1) +
theme(legend.position = "none", axis.line.y = element_blank(), panel.grid = element_blank(), panel.background = element_blank(), axis.ticks = element_blank()) +
coord_cartesian(xlim=c(0.2,7.5))
}
tribble(~x , ~y, ~line,
2, 1, "a",
3, 6, "a",
3, 6, "b",
4, 8,"b",
5,3,"c",
6,5,"c",
6,5,"d",
7,8,"d") %>%
slopes_plot()
```
# Sonority, Pitch and the Nucleus Attraction Principle (NAP) {#sec:sonPitch}
## Sonority and Pitch Intelligibility {#sec:pitchintelligibility}
The observation that sonority summarizes an essential quality that is related to vowels and their propensity to deliver a relatively steady harmonic structure, highlighting pitch and formant information, is by no means new. Previous proposals already defined sonority as either relating to vowels in some general way, or more specifically relating to voicing or glottal fold vibration, or to the clarity/strength of the formants.[^cf-list] A few previous accounts went even further, by addressing the function of this vowel-centric feature, suggesting that sonority may be related to periodic energy or pitch/tone [@lass1988phonology; @nathan1989preliminaries; @puppel1992sonority; @ladefoged1997linguistic; @heselwood1998unusual]. What all these proposals share, explicitly or implicitly, is a recurring insight about a strong link between the preferred type of segmental material in syllabic nuclei and a set of features that conspire to optimize pitch intelligibility, a property which characterizes vowels more than consonants.
[^cf-list]: A partial list of some prominent examples includes @sigurd1955rank; @jakobson1956fundamentals; @chomsky1968spesk; @foley1972rule; @ladefoged1971preliminaries; @allen1973accentsk; @fujimura1975syllable; @Donegan1978onthenatural; @ultan1978typological; @price1980sonority; @lindblom1983production; @anderson1986suprasegmental; @vennemann1988preferencesk; @levitt1991syllable; @pierrehumbert1992lenition; @fujimura1997acoustic; @stemberger1997handbook; @boersma1998functional; @zhang2001effects; @howe2004harmonic; @clements2009does; @sharma2018significance.
Pitch is an indispensable communicative dimension of all linguistic sound systems [@pike1945intonationsk; @bolinger1978intonation; @house1990tonal; @cutler1997prosody], whether it is lexically determined as in linguistic tone,
or post-lexically employed to convey intonation, i.e., the linguistic tune [see typological accounts of prosodic systems in @jun2005prosodicsk; @jun2015prosodicsk].
Tones are used to distinguish lexical items while tunes are used to demarcate units, to modulate semantics (e.g., information structure and sentence modality) and to
express a vast array of non-propositional meanings (e.g., discourse-pragmatic intention, emotional state, socio-indexical identity, and attitudinal stance). The importance of pitch to human communication cannot be overstated [or in the words of @pike1945intonationsk 20: "There are no pitchless sentences"].
Crucially, linguistic pitch events are known to target syllable-sized units as their "docking site", regardless of the type of pitch event, whether they are lexical tones or post-lexical tunes.
These linguistic pitch events are commonly considered to associate with *Tone-Bearing Units* [@leben1973suprasegmental], that are either syllables or *moras*.[^cf-mora]
These associations between the text on the one hand and tone or tune on the other hand are widely assumed to be mediated by syllabic/moraic units.
For example, intonation pitch contours that highlight and modulate whole words and phrases essentially target privileged syllables---*heads* (stressed syllables) and *edges* (syllables at initial and final positions of prosodic words and phrases)---to achieve their communicative goal on textual material of various sizes [@ladd2008intonational; @roettger2019tune].
This tone-bearing role of syllables and moras is the hallmark of many prominent theories regarding tone and intonation, following from *Autosegmental* and *Autosegmental-Metrical* models of phonology [e.g., @liberman1975intonationalsk; @goldsmith1976autosegmental; @pierrehumbert1980phoneticssk; @ladd2008intonational].
[^cf-mora]: Moras are used to represent quantitative differences between light and heavy syllables (weight sensitivity), such that light syllables contain one mora while heavier syllables contain two (and sometimes even three) moras [see @hyman1984atheory; @mccarthy1990footsk; @hayes1989compensatory; @ito1989prosodic; @zec1995sonority; @zec2003prosodic].
The functionally motivated conclusion that emerges with respect to sonority is therefore that syllables require a pitch-bearing nucleus and that sonority is a scalar measure of the ability to bear pitch. In other words, sonority is, most likely, a measure of pitch intelligibility.
This hypothesis comes with an underlying assumption that syllables have followed an evolutionary trajectory that shaped them to optimally carry pitch in their nuclei. Sonority, according to this description, serves as the tool that governs the requirement for intelligible pitch as a fundamental characteristic in the design of the building blocks of prosody.
It is important to note that this view of sonority is explicitly and exclusively based on perception, rather than articulation of speech. However, it does not exclude articulation-based description of syllables under the assumption that restrictions on syllabic structure must be derived from both the perception and the articulation of speech. A case in point is the *Articulatory Phonology* framework
(see Section \@ref(sec:synthesis)),
with its valuable descriptions of temporal coordination and phase relations between motor gestures, which can be effectively linked to syllabic organization [see, e.g., @goldstein2007syllablesk; @goldstein2009coupled; @shaw2009syllabificationsk; @gafos2014stochastic; @hermes2017variabilitysk].
### Pitch intelligibility and periodic energy {#sec:periodicenergy}
Pitch is a psychophysical phenomenon based on perception and cognition [see @plomp1976aspects; @plack2005psychophysics]. We can extract perception-related measurements from acoustics, i.e., not directly from the perceived sensation of a human subject but from the digitally-analyzed description of the physical sound in space.
Using acoustics to cover auditory psychophysical phenomena is not a straightforward task. It requires a consistent and reliable association between acoustics on the one hand, and perception and cognition on the other hand.
This task is potentially complicated further with a complex phenomenon like pitch, which is sensitive to various aspects of the rich acoustic signal as well as to our top-down expectations with regards to learned regularities of pitch behavior [e.g., @houtsma1995pitch; @shepard2001pitch; @moore2013anintro 203-243; @mcpherson2018diversity].
Fortunately, there are strong links between pitch and acoustic markers given the important role of periodicity in pitch. This is well-known from the extensive use of acoustic F0 measurements to estimate perceived pitch height, based on periodicities in the signal, using techniques such as autocorrelation [e.g., @boersma1993accurate].
To estimate perceived pitch intelligibility from acoustic signals, we need to obtain a measure of periodic energy, which is a measurement of the acoustic power of periodic components in the signal. It may be helpful to think of this as a measurement of general intensity that excludes the contribution of aperiodic noise and transient bursts.
To conclude, our ability to detect periodicity in acoustic signals allows us to extract good estimates of F0 and periodic energy from speech data. We stand on firm grounds when we map these acoustic markers to perception in terms of pitch height and pitch intelligibility (respectively).
Given a causal link between perceived pitch height and linguistic tone and intonation contours, it is reasonable and, indeed, commonplace, to assume by transitivity that acoustic F0 maintains a causal link to linguistic tone and intonation.
Likewise, given a causal link between perceived pitch intelligibility and linguistic sonority, it should be reasonable to assume by transitivity that acoustic periodic energy maintains a causal link with the linguistic notion of sonority.
## The Problem of Intensity-Based Accounts {#sec:correlusions}
Although no strong consensus has ever been reached with respect to the phonetic basis of sonority, acoustic *intensity* is perhaps the most widely assumed correlate of linguistic sonority. This is evident from the many influential studies on sonority that consider acoustic intensity as its phonetic correlate [e.g., @sievers1893grundzugesk; @heffner1969generalsk; @ladefoged1975acourse; @clements1990role; @blevins1995syllable; @parker2008sound; and @gordon2012sonority, to name just a few prominent examples].[^cf-parkerIntensity]
[^cf-parkerIntensity]: In his overview of existing literature, @parker2002quantifying found close to 100 different proposals for correlates of sonority in the literature, and he tested five leading proposals in laboratory conditions: *intensity*, *intraoral air pressure*, *F~1~ frequency*, *total air flow*, and *duration*. In his study, the tightest correlations with sonority classes were obtained for acoustic intensity measurements, a conclusion that was repeated and elaborated upon in @parker2008sound.
The main problem with intensity-based accounts is related to the distinction between causation and correlation.
Establishing causation from acoustic signals necessitates a theory that can reliably map acoustic markers to operations or processes in sensorimotor speech articulation and/or auditory speech perception. The problem with accounts that are based on acoustic intensity is that causation cannot be established given that the physical intensity of the acoustic signal does not consistently map to any aspect of human auditory perception, not even perceived loudness.
### Acoustic intensity $≠$ perceived loudness {#sec:intensity}
The acoustic signal has certain physical qualities contributing to its overall power, but they have different effects on the perceptual system of the human hearer. This discrepancy between acoustic intensity and perceived loudness is a well-known problem, playing a role at different dimensions of the mapping between acoustics and perception. The prominent points of departure between acoustic intensity and perceived loudness include
the following:
(i) loudness perception differs for sine waves with the same intensity level at different frequencies [e.g., @fletcher1933loudness; @plack1995loudness; @suzuki2004equal];
(ii) loudness perception differs for comparable sounds at different durations [e.g., @turk1996processing; @seshadri2009perceived; @olsen2010loudness; @moore2013anintro 143];
and (iii) loudness perception differs for otherwise comparable periodic (harmonic) vs. aperiodic (noise) sounds, and band-pass filtered noise, just like sine waves, is not uniformly loud across the frequency spectrum [e.g., @hellman1972asymmetry; @bao2010psychoacousticsk; @moore2013anintro 140].
Acoustic intensity is therefore a physical description of sound waves in space which does not consistently relate to how loud we perceive these sounds, or to any other perceptual phenomenon for that matter.
### Loudness is not a good candidate for sonority {#sec:loudness}
Note also that the relevance of perceived loudness to syllabic organization requires some sort of functional explanation, which seems to be lacking.
The systematic differences in intensity of adjacent speech sounds imply that these differences are neutralized in perception, as it should make sense to assume that the different sounds that compose coherent speech are perceived as having comparable loudness.
The literature on perceived loudness supports this assumption given that speech portions with relatively low acoustic intensity, like voiceless fricatives, appear in speech next to portions with relatively high acoustic intensity, like vowels.
Our auditory system perceives the aperiodic high-mid frequencies of many obstruents as exceptionally loud compared to the periodic low-mid frequency ranges of vowel sounds, thus compensating in perception for physical differences in acoustic intensity.
Given the above, we should anticipate that perceived loudness will not be a good candidate for the acoustic correlate of sonority hierarchies, as a measure of perceived loudness would bring the different speech sounds closer together on its scale, essentially diminishing the distinctions provided by the differences in acoustic intensity (which is typically stronger for vocalic speech sounds).
Indeed, although good approximations of perceived loudness from acoustic signals are available [e.g., @seshadri2009perceived; @skovenborg2012loudnesssk; @lund2014loudnesssk; @itu2015algorithmssk], we are not aware of any attempts to employ such measures for sonority.[^cf-equal]
[^cf-equal]: Note that terms like "loudness" may be used to mean different things by different authors. For example, @arrabothu2015usingsk extracts acoustic measurements that are designed to reflect the "impulse-like excitation" of voiced speech sounds (following @seshadri2009perceived). @arrabothu2015usingsk refers to these measurements as "loudness of speech" under the assumption that vowels are louder than voiceless sounds, an assumption which is derived in large part from the classic literature on sonority.
Rather than attempting to map acoustic intensity to perception in terms of perceived loudness, prominent studies that successfully use intensity-based measures as correlates of sonority and syllabicity [e.g., @Pfitzinger1996syllablesk; @fant2000source; @tilsen2013speech; @rasanen2018pre] tend to enhance the discrepancy between intensity and loudness by dicriminating in favor of low frequency bands (where most of the energy of vocalic elements is found) and against high-mid bands (where most of the energy of obstruents is found).
The signal manipulation behind such metrics is not typically based on grounds of general auditory perception. However, they are often tightly linked to the perceptual quality that is identified with sonority in this work---the capacity to perceive pitch.
## The Nucleus Attraction Principle {#sec:nap}
At the heart of all sonority-based principles lies the idea that the most sonorous segment in a sequence is contained within the nucleus of the syllable. This idea in fact postulates a link between the amount of sonority and the nucleus position of the syllable. We adopt this fundamental insight that guides all other sonority principles in the development of the Nucleus Attraction Principle. However, instead of adding further formal assumptions about non-overlapping segments with fixed sonority values and corresponding sonority slopes in symbolic time, the link between sonority and the syllabic nucleus is simply modeled as a dynamic process in real time. All the portions of the speech signal compete against each other for available syllabic nuclei in this process.
Sonority is therefore conceived as the quality that is capable of *attracting* the nucleus. The varying quantities of this quality, which temporally fluctuate along the stream of speech, determine which portions of speech are prone to succeed in attracting nuclei given their superior local sonority *mass*. The speech portions that fall between those successful attractors are syllabified in the margins of syllables, at onset and coda positions.[^cf-attraction]
In fact, by modeling the link between sonority and the syllabic nucleus in dynamic terms it is not necessary to add further theoretical postulates about sonority slopes or discrete segmental categories of consonants and vowels in order to determine well-formedness of syllabic structures. Syllabic ill-formedness in NAP-based models is positively correlated with the degree of nucleus competition that a given syllabified portion incurs.
[^cf-attraction]: This notion of prosodic *attraction* is, in fact, well-established in phonological theory, with descriptions of *weight sensitivity* in the stress systems of many unrelated languages, in which the stress is said to be attracted to heavy syllables.
Heaviness is mainly the product of a longer vowel in the nucleus, and in some languages heaviness may also result from a (preferably sonorant) consonant in the coda [e.g., @mccarthy1979formalsk; @hayes1980metrical; @prince1990quantitative; @gordon2006syllableweight]. There are also analyses in which vowel qualities that are considered more sonorous can contribute to heaviness and attract the stress [@zec1995sonority; @zec2003prosodic; @kenstowicz1997quality; @delacy2002formal; @gordon2012sonority].
Viewed with NAP in mind, attraction of stress in weight sensitive systems is simply the special case of a regular procedure, whereby weight---i.e., sonority---attracts syllabic nuclei.
It is important to note that the informativeness of NAP-based models is not derived from identifying the winner of the nucleus competition, but from quantifying the degree of competition within different portions of speech that stand for potential syllabic parses.
NAP-based models can analyze speech parts that are parsed together as a single syllabic unit in order to estimate the degree of competition they give rise to when they compete for a single nucleus.
In discrete terms, NAP-based models can quantify different sequences of segments to reflect how strongly they compete for a single nucleus.
Either way, the higher the degree of internal competition, the more ill-formed a syllable is predicted to result from this parse.
To simplify this further with respect to the subset of instances discussed in this work (i.e., syllables with complex consonantal onset clusters), it is possible to say that the winner of the nucleus competition is always the only vowel in the structure. The determination of ill-formedness in these cases is based on quantifying the amount of competition that the winning vowel has to withstand given different consonantal clusters in the onset of the same syllable.
It should be also useful to note that we do not expect serious competition to arise from a consonant adjacent to the vowel in the same syllable.
Nucleus competition, much like sonority slopes, has a limited impact on syllables with simple onsets or codas, **C**V(C) or (C)V**C**. Principles like SSP and NAP play a role chiefly when sequences of consonants are syllabified within a single syllable as complex onset or coda clusters, **CC**V(C) or (C)V**CC**. The phonotactics of these possible consonantal sequences are determined to a large extent by sonority principles. We interpret this aspect of cluster phonotactics such that sequences within syllables are avoided the more they increase the potential competition for the nucleus in the process of syllabifying/parsing the stream of speech.
### Schematic NAP sketches {#sec:NAPsketch}
(ref:nap-depictions) Schematic depictions of competition scenarios with symbolic CCV structures. Nucleus competition can be understood as the competition between the blue and the purple areas under the sonority curve. The two examples in the top row---*plV* and *lpV*---suggest a replication of successful traditional predictions, while the three examples in the bottom row---*spV*, *sfV* and *nmV*---suggest a divergence from SSP-type models (see text for more details). Image taken from @albertIPsonoritysk.
```{r nap-depictions, fig.cap = "(ref:nap-depictions)", out.width = '100%', fig.align = 'center'}
knitr::include_graphics(rep(c("external_figures/napComb150.png")))
```
To understand the rationale of NAP, a series of schematic sketches are presented in Figure \@ref(fig:nap-depictions), accompanied by an impressionistic description. These will eventually be implemented within formal models that are described in detail in Section \@ref(sec:modelimp).
The five examples with specified consonantal clusters exhibit their related sonorant energy depicted as the *area under the curve*, whereby the curve itself is an idealized depiction of schematic sonority.
The purple area in each syllable in Figure \@ref(fig:nap-depictions) denotes the sonorant energy of the winning vowel in the nucleus position while the blue area denotes the sonorant energy of the losing portions in the onset.
Consider for example the pair *plV* and *lpV*, with schematic NAP-related depictions in the top row of Figure \@ref(fig:nap-depictions) (and with more traditional sonority slopes in Figure \@ref(fig:slopes-pl-lp)). A consonantal onset cluster with a well-formed rising sonority slope like *plV* should be also considered well-formed under NAP due to the very low potential of competition between the minimally-sonorous marginal onset consonant /p/ and the non-adjacent vowel that wins the competition for the nucleus. The intervening /l/ in this case promotes a continuous rise in sonority from /p/ to V, which contributes to the formation of a single energy mass with a clear peak.
Likewise, a consonantal onset cluster with an ill-formed falling sonority slope like *lpV* should be also considered ill-formed under NAP due to the strong potential for competition between the marginal sonorous onset consonant /l/ and the non-adjacent vowel, especially given the intervening /p/ that leads to discontinuity in the sonority trajectory between /l/ and V, which contributes to the formation of a bimodal distribution of energy with two clear (competing) peaks.
Unlike the examples above, where the rationale of NAP is expected to replicate successful predictions of the SSP with cases like *plV* and *lpV*, NAP is expected to diverge from traditional sonority sequencing principles in
some cases, as illustrated by the examples in the bottom row of Figure \@ref(fig:nap-depictions). Under NAP, neither */s/-stop* clusters with an ill-formed falling sonority like *spV* nor voiceless obstruent plateaus like *sfV* are expected to incur a strong competition syllable-internally (hence, they are in fact well-formed) due to the low potential for competition between the minimally-sonorous onset consonant /s/ and the non-adjacent vowel that wins the competition. Here, the intervening voiceless obstruents /p,f/ retain a minimally sonorous trajectory throughout the whole onset, which contributes to the formation of only low-level peaks, or *shelves* (see blue portions), that are barely enough to compete with a vowel (purple portions).
At the same time, a relatively strong competition potential (i.e., worse-formed syllable) is predicted under NAP for nasal plateaus like *nmV* when compared to obstruent plateaus like *sfV*. This should be expected given the strong potential for competition between the sonorous marginal onset consonant /n/ and the non-adjacent winning vowel. Here, the intervening nasal /m/ retains a relatively level sonorous trajectory, which contributes to the formation of a sonorous shelf throughout the onset.
Importantly, such differences in the well-formedness of different plateau types cannot be covered by SSP-based accounts since they treat all plateaus as incurring the same violation.
In Section \@ref(sec:modelimp) we show how the impressionistic descriptions of NAP that were provided thus far can be implemented within formal models that atttempt to capture the essence of NAP with estimations of its effects on either continuous acoustic signals or symbolic consonant and vowel classes.
# NAP Implementations {#sec:modelimp}
## Complementary NAP Models {#sec:complementary}
NAP essentially describes a bottom-up process, illustrating the parsing of the stream of speech into syllables as the end point of a process that starts in perception.
A bottom-up perspective on modeling NAP is therefore relatively straightforward as it requires a similar approach to the process NAP describes by analyzing continuous acoustic data at the input to derive well-formedness predictions at the output.
A bottom-up approach for NAP models has no capacity to exploit the power of abstraction, so it essentially has no "memory". It is a mechanistic dynamic model
that describes syllabic parsing.
This means that a bottom-up model can be only designed to analyze concrete speech tokens. Unlike models of traditional sonority principles, a bottom-up model of NAP cannot determine the well-formedness of an abstract syllable as it is depicted in symbolic form. It will therefore give slightly different scores to different renditions of the same syllable, even by the same speaker.
A NAP-based model operating on abstracted symbolic units is used as a separate, complementary top-down model.
Top-down inferences are based on learned regularities and categorical abstractions that reflect linguistic experience. To that end, knowledge about consonantal inventories and the probabilities of consonantal co-occurrence and distribution with respect to position in the syllable is assumed to be acquired and then stored in abstract symbolic forms which are available for top-down inferences. In that sense, top-down inferences in perception are based on the distributional probability of recognized symbols.
The above description of top-down inferences, which are detached from the functional aspects of the bottom-up route, echo models of the language user as a *statistical learner* [see, e.g., @christiansen1999power; @frisch2001psychologicalsk; @tremblay2013processing] and, more specifically, they are very much in line with models of *phonotactic learners* [see, e.g., @coleman1997stochastic; @vitevitch2004webbasedsk; @bailey2001determinants; @hayes2008maximum; @hayes2011interpreting; @albright2009feature; @daland2011explaining; @jarosz2017inputsk; @mayer2019phonotacticsk].
That said, the current project does not explore the statistical nature of top-down inferences. Instead, it operationalizes the rationale behind NAP with symbolic machinery to present the symbolic model of NAP which is used here to estimate top-down inferences.
This choice allows the presentation of a top-down model with a stronger explanatory value with regards to NAP as it uses a similar architecture to that of standard sonority principles, helping to elucidate NAP's core ideas while using a familiar vocabulary (see Subsection \@ref(sec:naptdmodel)).
The symbolic NAP model is also necessary for the application of NAP in typological, diachronic, and many traditional and current studies, where speech data is transcribed into strings of discrete symbols (see Anonymous, in press).
Moreover, it should be noted that the distributional patterns of recognizable symbols in a cognitively plausible top-down architecture are not informative with respect to their various sources, which include a host of universal and idiosyncratic phonotactic pressures. A true top-down statistical learner is thus inherently "contaminated" by all the different sources that contribute to phonotactics in a given system, without a clear distinction between sonority and other factors. Thus, it remains an open question whether top-down inferences that target only sonority-based phonotactics can be modeled in a more direct and principled way than the one presented here with the symbolic model of NAP.
As two complementary inference routes, the top-down and bottom-up models should not be considered equal. The bottom-up route is the source of learned linguistic distinctions and it is functionally motivated by the laws of physics and the limitations of the perceptual and cognitive systems.
In contrast, the top-down route is based on linguistic experience and superficial inferences that reflect the history of the symbols in the system (i.e., the distributional probabilities of recognizable recurring patterns and their extensions by analogy). In other words, top-down inferences reflect functionally motivated behaviors only indirectly, as the outcome of learning the superficial expressions of functionally-motivated (bottom-up) dynamics.
@durvasula2015illusory; @wilson2013bayesian; @wilson2014effects and @daland2018on present different yet comparable approaches, in that both bottom-up ("phonetic") and top-down ("phonological") streams of speech are considered in order to account for perception patterns of non-native consonantal clusters [see also @berent2009listeners; @berent2012language].
@daland2018on even suggest a Bayesian approach to integrate of the two streams. Independent support for such dual-route modeling in language processing can be also found in neurolinguistic studies like @hickok2007cortical and @poeppel2014current.
Our analysis differs from the above-mentioned studies in various ways. Importantly, we do not attempt to integrate the top-down and bottom-up inference routes in this work (see discussion in Subsection \@ref(sec:compofmind)) and we focus on modeling the bottom-up route with a continuous entity that remains (quasi-)continuous in the model (the periodic energy time-series), under the assumption that it retains a reliable and consistent link to either perception or articulation (targeting pitch intelligibility in perception) as well as linguistic processing (i.e., sonority).
## Model Implementations in Dynamic and Symbolic Terms {#sec:modelimpOLD}