CogSciNAP_reSubmit_again.Rmd

---
title             : "Modeling sonority in terms of pitch intelligibility with the Nucleus Attraction Principle"
shorttitle        : "Modeling sonority with the Nucleus Attraction Principle"

author: 
  - name          : "Aviad Albert"
    affiliation   : "1"
    corresponding : yes    # Define only one corresponding author
    address       : "Department of Linguistics -- Phonetics, University of Cologne, Herbert-Lewin-Straße 6, 50931 Cologne, Germany"
    email         : "a.albert@uni-koeln.de"
  - name          : "Bruno Nicenboim"
    affiliation   : "2,3"

affiliation:
  - id            : "1"
    institution   : "Department of Linguistics -- Phonetics, University of Cologne"
  - id            : "2"
    institution   : "Department of Cognitive Science and Artificial Intelligence, Tilburg University"
  - id            : "3"
    institution   : "Department of Linguistics, University of Potsdam"

authornote: |
  Author Note: In accordance with the Peer Reviewers' Openness Initiative (opennessinitiative.org), all materials and scripts associated with this manuscript were made available during the review process and will remain available at the following OSF repository: https://osf.io/y477r/.

abstract: |
  *Sonority* is a fundamental notion in phonetics and phonology, central to many descriptions of the syllable and 
  to various useful
  predictions in phonotactics.
  Although widely-accepted, sonority lacks a clear basis in speech articulation or perception, given that traditional formal principles in linguistic theory are often exclusively based on discrete units in symbolic representation and are typically not designed to be compatible with auditory perception, sensorimotor control, or general cognitive capacities. 
  On top of that, traditional sonority principles also exhibit systematic gaps in empirical coverage.
    Against this backdrop, we propose an incorporation of symbol-based and signal-based models to adequately account for sonority in a complementary manner.
  We claim that sonority is primarily a perceptual phenomenon related to pitch, driving the optimization of syllables as pitch-bearing units in all language systems. We suggest a measurable acoustic correlate for sonority in terms of *periodic energy*, and we provide a novel principle that can account for syllabic well-formedness, the *Nucleus Attraction Principle* (NAP).
  We present perception experiments that test our two NAP-based models against four traditional sonority models and we use a Bayesian data analysis approach to test and compare them. Our symbolic NAP model outperforms all the other models we test, while our continuous bottom-up NAP model manages to reach the second place, along with the best performing traditional models.
  We interpret the results as providing strong support for our proposals:
  (i) the designation of periodic energy as the acoustic correlate of sonority; 
  (ii) the incorporation of continuous entities in phonological models of perception; and 
  (iii) the dual-model strategy that separately analyzes symbol-based top-down processes and signal-based bottom-up processes in speech perception.

keywords          : "Sonority; Pitch intelligibility; Periodic energy; Bayesian data analysis; Speech perception; Phonetics and Phonology"
wordcount         : "X"

bibliography      : ["bibs/r-references.bib", "bibs/methods.bib", "bibs/phon.bib", "bibs/phon_sk.bib"]
appendix:
  - "./appendix_a.Rmd"
  - "./appendix_b.Rmd"
floatsintext      : no
figurelist        : no
tablelist         : no
footnotelist      : no
linenumbers       : no
mask              : no
draft             : no
numbersections    : yes

documentclass     : "apa6"
classoption       : "doc"
output            :
  papaja::apa6_pdf:
    latex_engine: xelatex
    includes:
      in_header: load.tex
keep_tex: yes
---

```{r setup, include = FALSE}
library("papaja")
```

```{r knitush, cache=FALSE,include=FALSE}
# global chunk options
knitr::opts_chunk$set(cache=TRUE, autodep=TRUE,fig.path='figure/graphics-', fig.align='center')#, dev="cairo_pdf")

# options(tinytex.clean=TRUE)
```

```{r libraries, message = FALSE}
library(R.matlab)
library(ggplot2)
library(dplyr)
library(Cairo)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores =  parallel::detectCores())
library(stringr)
library(readr)
library(purrr)
library(tidyr)
library(loo)
library(brms)
library(ggrepel)
```


```{r prepare-mat-per, include=FALSE}
# 60 x length of the audio file binned each 10 ms; 60 frequency bins with 10 ms for each column 
dir_mats <- c("data_tables/APPd_txt_matrices/AA/",
              "data_tables/APPd_txt_matrices/HN/")
files_mat <- list.files(path=dir_mats, pattern="*.txt",full.names=TRUE)

#creates a dataframe with 3 columns, the name of the syllable, syl, the time point, t, and the periodic energy, p
per_df <- map_dfr(files_mat, function(f){
#this is like a loop, it takes each file inside the f and does the following:
    # Read the file
    mat <- read_csv(f,col_names = FALSE,
                    col_types = cols(.default = col_double()))
    #Sum of every column of the matrix
   # vector_weights <- c(rep(1,8),rep(1.1,2),rep(1.3,3),rep(1.4,4),rep(1.5,5), rep(1.4,3), rep(1.3,5),rep(1.2,6),rep(1.1,10),rep(1,15))
    #mat <- mat * vector_weights
    per <- colSums(mat)
    #Extract the name of the syllable form the filename: looks for //(syllable).
    filename <- str_match(f,"//(.*?)_(.*?)\\.")
    syl <- filename[,2] 
    speaker <- filename[,3] 
    tibble(syl=syl,t=(0:(length(per)-1))*10,per=per,speaker=speaker)
})
# Periodic energy at every time point
per_df_full <-  per_df %>%  group_by(syl,speaker) %>% 
                mutate(smooth_per = unclass(smooth(per,"3RS3R"))) 
#head(per_df_full)

# ```
# 
# ```{r prepare-seg, include=FALSE}

dir_seg <- c("data_tables/praat_seg/AA/",
             "data_tables/praat_seg/HN/")

files_praat <- list.files(path=dir_seg, pattern="*.txt",full.names=T)
seg_df <- map_dfr(files_praat, function(f){  
#this is like a loop, it takes each file inside the f and does the following:

    # filename <- str_match(f,"/([^/]*?)_(.*?)_(.*?)\\.")
    filename <- str_match(f,"//(.*?)_(.*?)_(.*?)\\.")
    seg <- read_tsv(f, col_types =cols(
                          rowLabel = col_character(),
                          tmin = col_double(),
                          text = col_double(),
                          tmax = col_double()
                      )) %>% select(-rowLabel) %>%
        mutate(syl = filename[,2],
               speaker = filename[,3],
               text = str_extract_all(syl, ".")[[1]],
               position=row_number(),
               t = map2(tmin,tmax, ~ round(seq(.x,.y,.01)*1000))               ) %>%
        tidyr::unnest(cols = c(t)) %>%
        select(-tmin, -tmax)
})

syl_info <- left_join(per_df_full,seg_df,by = c("syl", "t", "speaker"))

## head(syl_info)

## syl       t   per speaker smooth_per text  position
## <chr> <dbl> <dbl> <chr>        <dbl> <chr>    <int>
## 1 cefal     0  0    AA            0    c            1
## 2 cefal    10  2.37 AA            2.37 c            1
## 3 cefal    20  4.97 AA            4.97 c            1
## 4 cefal    30  5.80 AA            4.99 c            1
## 5 cefal    40  4.99 AA            4.99 c            1
## 6 cefal    50  4.06 AA            4.06 c            1
# ```
# 
# ```{r log-transform, include=FALSE}

subset_voiceless_thresh <- filter(syl_info,
                                  #nchar(as.character(syl))>=4,
                                  position==1,
                                  syl %in% c("cfal","cpal","fsal","ftal","sfal","spal"))
per_thresh <- max(subset_voiceless_thresh$per)

syl_info <- group_by(syl_info,syl,speaker) %>%
    mutate(log_per = ifelse(smooth_per<per_thresh, 0,
                            10*log10(smooth_per/per_thresh)))

## print(syl_info,n=100)

# ```
# 
# ```{r, cog}

syl_info <- syl_info %>% group_by(syl, speaker) %>%
    mutate(com_syl = sum(log_per*t)/sum(log_per), # position of CoM of the whole syllable in time
           t_left_syl = ifelse(t <= com_syl,t,0),
           com_onset = sum(log_per*t_left_syl)/sum(log_per*(t_left_syl>0)),
           NAP_bu = -(com_syl - com_onset), #flipped sign
           NAP_bu_rel = -(com_syl - com_onset)/com_syl) %>%
    select(-t_left_syl)%>%
    select(NAP_bu, everything())
# ```
# 
# ```{r, monosyl, include=FALSE}

monosyl_info_AA <- filter(syl_info,
                       nchar(as.character(syl))==4,
                       speaker == "AA")

monosyl_info_AA$text[which(monosyl_info_AA$text=="c")] <- "ʃ"

monosyl_info_AA$syl <- as.factor(monosyl_info_AA$syl)
monosyl_info_AA <- mutate(group_by(monosyl_info_AA,syl),
                       ##  loess smoothing
                       smog_per = predict(loess(log_per~t, data=monosyl_info_AA$syl, span=0.19, degree = 1, na.rm=T)))

### change negatives to 0
monosyl_info_AA$smog_per[(monosyl_info_AA$smog_per<0)]=0

monosyl_info_AA <- mutate(group_by(monosyl_info_AA, syl, position),
                       pos_mid = round(mean(t),-1),
                       pos_end = ifelse(position<4, max(t), NA))

monosyl_info_AA <- mutate(group_by(monosyl_info_AA, syl),
                       ylim_com_ons = ifelse(t==round(com_onset,-1)&t<lead(t), smog_per, NA),
                       ylim_com_syl = ifelse(t==round(com_syl,-1)&t<lead(t), smog_per, NA),
                       x_com_ons = ifelse(t==round(com_onset,-1)&t<lead(t), com_onset, NA),
                       x_com_syl = ifelse(t==round(com_syl,-1)&t<lead(t), com_syl, NA))

```

```{r monosylHN, include=FALSE}

monosyl_info_HN <- filter(syl_info,
                       nchar(as.character(syl))==4,
                       speaker == "HN")

monosyl_info_HN$text[which(monosyl_info_HN$text=="c")] <- "ʃ"

monosyl_info_HN$syl <- as.factor(monosyl_info_HN$syl)
monosyl_info_HN <- mutate(group_by(monosyl_info_HN,syl),
                       ##  loess smoothing
                       smog_per = predict(loess(log_per~t, data=monosyl_info_HN$syl, span=0.19, degree = 1, na.rm=T)))

### change negatives to 0
monosyl_info_HN$smog_per[(monosyl_info_HN$smog_per<0)]=0

monosyl_info_HN <- mutate(group_by(monosyl_info_HN, syl, position),
                       pos_mid = round(mean(t),-1),
                       pos_end = ifelse(position<4, max(t), NA))

monosyl_info_HN <- mutate(group_by(monosyl_info_HN, syl),
                       ylim_com_ons = ifelse(t==round(com_onset,-1)&t<lead(t), smog_per, NA),
                       ylim_com_syl = ifelse(t==round(com_syl,-1)&t<lead(t), smog_per, NA),
                       x_com_ons = ifelse(t==round(com_onset,-1)&t<lead(t), com_onset, NA),
                       x_com_syl = ifelse(t==round(com_syl,-1)&t<lead(t), com_syl, NA))

```

<!-- ## Data analysis {#sec:datanlysis} -->

```{r setZeroRT, include=FALSE, message = FALSE}
syl_t <- filter(syl_info) %>%
  # Aviad: no idea why we are subsetting by "a"
    filter(text=="a" ) %>%
    group_by(NAP_bu, syl, speaker) %>%
    summarize(tmin = min(t))

other_scores <- read_tsv(file="data_tables/CCal_model_predictions_fix.tsv", 
                           col_types = cols(
                               syl = col_character(),
                               SSP = col_integer(),
                               SSP_obs = col_integer(),
                               MSD = col_integer(),
                               MSD_obs = col_integer(),
                               NAP_td = col_integer()
                           ))

scores_full <- left_join(syl_t, other_scores) %>% ungroup() %>%
    mutate(NAP_bu = ifelse(nchar(syl)<5,NAP_bu,NA_real_))
```

```{r openSesameFunction, include=FALSE}

read_list_opensesame <- function(list_files, speaker = "AA", scores_tbl= scores_full){

    GLIDES <- c("wlal","wnal","wzal","wsal","wtal","jmal","jval","jfal","jpal",               "welal","wenal","wezal","wesal","wetal","jemal","jeval","jefal","jepal")

    scores_speaker <- scores_tbl[scores_tbl$speaker == speaker,]

    map_dfr(list_files, ~{
        message(.x)
       suppressMessages (read_csv(.x)) %>%
            filter(practice =="no") %>%
           mutate(subj = str_match(logfile, '-([0-9]*)\\.csv')[,2],
                   RT = response_time /1000) %>%
            select(subj, stimulus,RT,correct) })%>%
        mutate(stimulus = str_replace_all(stimulus, 'ʃ','c')) %>% # ʃ
        filter(!stimulus %in% GLIDES) %>% left_join(scores_speaker,by=c("stimulus"="syl")) %>%
        mutate(corrRT = (RT - tmin/1000) * 1000, #in milliseconds
               type = factor(ifelse(nchar(stimulus)==4,"CCal",
                             ifelse(str_detect(stimulus,"^e.*" ),"eCCal","CeCal")),
                             levels=c("CeCal","eCCal","CCal")),
               response = case_when(RT>=3 ~ 0,
                                    correct ==1 & type == "CCal"  | correct ==0 & type != "CCal" ~ 1,
                                    TRUE ~ 2))
}
```

```{r readPilot, include=FALSE}

opensesame_pilot <- "data_tables/exploratry_results"

files_pilot <-
    c(list.files(path=paste0(opensesame_pilot,"/list1"), pattern="*.csv",full.names = TRUE),
      list.files(path=paste0(opensesame_pilot,"/list2"), pattern="*.csv",full.names = TRUE))

data_pilot_all <- read_list_opensesame(files_pilot, speaker="AA")

data_pilot <- data_pilot_all %>%
    filter(corrRT > 100)

N_below_100_pilot <- data_pilot_all %>%
    filter(corrRT < 100)
#0

# Sanity checks
N_trials_pilot <- 58
all(summarize(group_by(data_pilot,subj),N=n()) %>% pull(N)== N_trials_pilot)
summarize(group_by(data_pilot,subj,type),N=n()) %>% ungroup %>% distinct(type, N)

data_pilot %>% group_by(type,correct) %>% summarize(mean(corrRT))

subj_accuracyPilot <- summarize(group_by(data_pilot,subj,nchar = nchar(stimulus)),n(),accuracy = mean(correct))
subj_accuracyPilot %>% group_by(nchar) %>% summarize(mean(accuracy))

bad_subj <- subj_accuracyPilot %>% filter(nchar==5) %>% summarize(acc=mean(accuracy)) %>%
    filter(acc < .75)
```

```{r brmsPilot, include=FALSE}
library(brms)
run_brms <- function(data, chains = 4, iter = 3000, warmup=1000){
    data <- data %>% filter(type=="CCal", response ==1) %>%
        mutate_at(c("SSP","SSP_obs","MSD","MSD_obs","NAP_td"), ~ factor(., ordered = TRUE)) %>%
        mutate(sNAP_bu = NAP_bu -.5)
null_priors <- c(prior(normal(6, 2), class = Intercept),
                         prior(normal(.5, .2), class = sigma))
effect_priors <-  c(null_priors, prior(normal(0,1), class = b),
                    prior(normal(0,1), class = sd),
                    prior(lkj(2), class = cor))

    message("NAP_bu...")
    NAP_bu <- brm(corrRT ~ 1 +  sNAP_bu + (NAP_bu|subj), data=data,
               prior = effect_priors,
               family =  lognormal(), 
               control = list(adapt_delta=.999,max_treedepth =12),
               chains =chains, iter =iter, warmup = warmup)

    message("null...")
    null <- brm(corrRT ~ 1 + (1|subj), data=data,
               prior = null_priors,
               family =  lognormal(), 
               control = list(adapt_delta=.999,max_treedepth =12),
    chains =chains, iter =iter, warmup = warmup)

message("SSP...")
    SSP <- brm(corrRT ~ 1 +  mo(SSP) +(mo(SSP)|subj), data=data,
               prior = effect_priors,
               family =  lognormal(), 
               control = list(adapt_delta=.999,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)

message("SSP_obs...")
    SSP_obs <- brm(corrRT ~ 1 +  mo(SSP_obs) +(mo(SSP_obs)|subj), data=data,
               prior = effect_priors,
               family =  lognormal(), 
               control = list(adapt_delta=.999,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)

message("MSD...")
    MSD <- brm(corrRT ~ 1 +  mo(MSD) +(mo(MSD)|subj), data=data,
               prior = effect_priors,
               family =  lognormal(), 
               control = list(adapt_delta=.999,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)

message("MSD_obs...")
    MSD_obs <- brm(corrRT ~ 1 +  mo(MSD_obs) +(mo(MSD_obs)|subj), data=data,
                   prior = effect_priors,
                   family =  lognormal(), 
               control = list(adapt_delta=.9995,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)

message("NAP_td...")
    NAP_td <- brm(corrRT ~ 1 +  mo(NAP_td) +(mo(NAP_td)|subj), data=data,
               prior = effect_priors,
               family =  lognormal(), 
               control = list(adapt_delta=.999,max_treedepth =12),
chains =chains, iter =iter, warmup = warmup)

list(data = data, models = list(NAP_bu = NAP_bu, null = null, SSP=SSP, SSP_obs =SSP_obs, MSD = MSD, MSD_obs =MSD_obs, NAP_td = NAP_td))
}
```

```{r mpilot, include=FALSE}

 if(file.exists("data_tables/RDS/m_pilot.RDS")){
     m_pilot <- readRDS("data_tables/RDS/m_pilot.RDS")
 } else {
     m_pilot <- run_brms(data_pilot)
     saveRDS(m_pilot, file = "data_tables/RDS/m_pilot.RDS")
 }
 if(file.exists("data_tables/RDS/kfold_pilot.RDS")){
     
     kfold_pilot <- readRDS("data_tables/RDS/kfold_pilot.RDS")
 } else {
     kfold_pilot <- map(m_pilot$models, kfold, folds = "stratified", group = "subj", K = 15)
     saveRDS(kfold_pilot, file = "data_tables/RDS/kfold_pilot.RDS")
 }

## loo::compare(x=kfold_pilot)

```

<!-- real data -->

```{r readGer, include=FALSE}
opensesame_german <- "data_tables/confirmatory_results"

files_german <-
    list.files(path=opensesame_german,pattern="*.csv",full.names = TRUE)

data_german_all <- read_list_opensesame(files_german, speaker="AA")

data_german <- data_german_all %>%
    filter(corrRT > 100)

N_below_100 <- data_german_all %>%
    filter(corrRT < 100)

## Sanity checks
N_trials_german <- 232 
all(summarize(group_by(data_german,subj),N=n()) %>% pull(N)== N_trials_german)
summarize(group_by(data_german,subj,type),N=n()) %>% ungroup %>% distinct(type, N)

data_german %>% group_by(type,correct) %>% summarize(mean(corrRT))

subj_accuracyGer <- summarize(group_by(data_german,subj,nchar = nchar(stimulus)),n(),accuracy = mean(correct))
subj_accuracyGer %>% group_by(nchar) %>% summarize(mean(accuracy))

bad_subj <- subj_accuracyGer %>% filter(nchar==5) %>% summarize(acc=mean(accuracy)) %>%
    filter(acc < .75)

mono_acc <- subj_accuracyGer %>% filter(nchar==4)
bi_acc <- subj_accuracyGer %>% filter(nchar==5)
bi_acc_excluded <- subj_accuracyGer %>% filter(nchar==5) %>% filter(accuracy > .74)

mean_mono_acc <- mean(mono_acc$accuracy)
mean_bi_acc <- mean(bi_acc$accuracy)
mean_bi_acc_excluded <- mean(bi_acc_excluded$accuracy)

# 1 bad subject

data_german <- data_german %>% filter(!subj %in% bad_subj)

```

```{r brms-models, include=FALSE}

if(file.exists("data_tables/RDS/m_german.RDS")){
    m_german <- readRDS("data_tables/RDS//m_german.RDS")
 } else {
     m_german <- run_brms(data_german, iter=4000, warmup=2000)
     saveRDS(m_german, file = "data_tables/RDS//m_german.RDS")
 }
 
```

```{r loo-models, include=FALSE}
if(file.exists("data_tables/RDS/kfold_german.RDS")){
    kfold_german <- readRDS("data_tables/RDS/kfold_german.RDS")
} else {
     kfold_german <- map(m_german$models, kfold, folds = "stratified", group = "subj", K = 15)
     saveRDS(kfold_german, file = "data_tables/RDS/kfold_german.RDS")
    }
```

<!-- ```{r weigths-models, include=FALSE} -->

<!-- ## ## loo::compare(x=loo_pilot) -->
<!-- ## loo::compare(x=kfold_german) -->
<!-- ## loo::compare(x=kfold_german[-1]) -->
<!-- ## loo::compare(x=kfold_german[-1][-6]) -->
<!-- ## loo::compare(x=kfold_german[c("MSD_obs","SSP_obs","null")]) -->


<!-- ## loo::loo_model_weights(x=loo_german) -->
<!-- ## compare(loo_german$NAP_td, loo_german$SSP_obs) -->
<!-- ## if(file.exists("data_tables/RDS/weights.RDS")){ -->
<!-- ##     weights <- readRDS("data_tables/RDS/weights.RDS") -->
<!-- ## } else { -->
<!--     ## weights <-loo_model_weights(,kfold_german$MSD_obs) -->

<!--         ## xx <- ll_matrix[,c(2,3,4,5,6)] -->
<!--     ## wxx <- loo::stacking_weights(xx) -->
<!--     ## names(wxx) <-  colnames(xx) -->

<!--     ## saveRDS(weights, file = "data_tables/RDS/weights.RDS") -->
<!-- ## } -->
<!-- ``` -->

```{r loo-proc, include=FALSE}

## w_a <- model_weights(m_german$models$MSD,
##                    m_german$models$MSD_obs,
##                    m_german$models$SSP,
##                   ##  m_german$models$SSP_obs,
##                   ## m_german$models$NAP_bu,
##                   ## m_german$models$NAP_td,
##                   ## m_german$models$null,
##                    weights = "loo")


data_german_s <- m_german$data


## loos <- loo_german %>% map_dfc( ~
##     .x$pointwise[,"elpd_loo"]
##     ) %>% {setNames(.,paste0("elpd_",colnames(.)))}

## data_german_s <-  data_german_s %>% bind_cols(loos)


## data_g_summary <- data_german_s %>%
##     group_by(stimulus, NAP_bu, NAP_td) %>%
##     summarize_at(vars(starts_with("elpd")), mean)


## loo_model_weights(loo_german)

predictions <- m_german$models %>% map(~
                     predict(.x,summary=FALSE) %>%
                     array_branch(margin = 1) %>%
                     map_dfr( ~ {
                         data_german_s %>%
                             mutate(pred= .x) %>%
                             group_by(stimulus, NAP_bu , SSP, SSP_obs , MSD , MSD_obs , NAP_td) %>%
                             summarize(pred = mean(pred))
                     } )
                 )

```

```{r fitplots, eval =TRUE, warning=FALSE, include=FALSE}
data_RT <- data_german_s %>%
    mutate(NAP = round(NAP_bu,7)) %>%
    group_by(stimulus,NAP, NAP_bu , SSP, SSP_obs , MSD , MSD_obs , NAP_td) %>%
    summarize(corrRT = mean(corrRT))  %>%
    ## summarize(corrRT = mean(log(corrRT * 1000)))  %>%
    bind_rows(tibble(NAP = seq(30,220,10)))  %>%
#    bind_rows(tibble(NAP = seq(0.367355,.4825569,0.01))) %>%
    filter(!is.na(stimulus))
    
```


```{r readHeb, include=FALSE}
opensesame_hebrew <- "data_tables/confirmatory_results_Heb"

files_hebrew <-
    list.files(path=opensesame_hebrew,pattern="*.csv",full.names = TRUE)

data_hebrew_all <- read_list_opensesame(files_hebrew, speaker="HN")

data_hebrew <- data_hebrew_all %>%
    filter(corrRT > 100)

N_below_100 <- data_hebrew_all %>%
    filter(corrRT < 100)

## Sanity checks

data_hebrew %>% group_by(type,correct) %>% summarize(mean(corrRT))

subj_accuracyHeb <- summarize(group_by(data_hebrew,subj,nchar = nchar(stimulus)),n(),accuracy = mean(correct))
subj_accuracyHeb %>% group_by(nchar) %>% summarize(mean(accuracy))

bad_subj <- subj_accuracyHeb %>% filter(nchar==5) %>% summarize(acc=mean(accuracy)) %>%
    filter(acc < .75)

mono_acc <- subj_accuracyHeb %>% filter(nchar==4)
bi_acc <- subj_accuracyHeb %>% filter(nchar==5)
bi_acc_excluded <- subj_accuracyHeb %>% filter(nchar==5) %>% filter(accuracy > .74)

mean_mono_acc <- mean(mono_acc$accuracy)
mean_bi_acc <- mean(bi_acc$accuracy)
mean_bi_acc_excluded <- mean(bi_acc_excluded$accuracy)

# NO bad subject!

data_hebrew <- data_hebrew %>% filter(!subj %in% bad_subj)
# saveRDS(data_hebrew, "data_hebrew.RDS")
```

```{r brms-models-heb, include=FALSE}

if(file.exists("data_tables/RDS/m_hebrew.RDS")){
    m_hebrew <- readRDS("data_tables/RDS//m_hebrew.RDS")
 } else {
     m_hebrew <- run_brms(data_hebrew, iter=4000, warmup=2000)
     saveRDS(m_hebrew, file = "data_tables/RDS//m_hebrew.RDS")
 }
 
```

```{r loo-models-heb, include=FALSE}
if(file.exists("data_tables/RDS/kfold_hebrew.RDS")){
    kfold_hebrew <- readRDS("data_tables/RDS/kfold_hebrew.RDS")
} else {
     kfold_hebrew <- map(m_hebrew$models, kfold, folds = "stratified", group = "subj", K = 15)
     saveRDS(kfold_hebrew, file = "data_tables/RDS/kfold_hebrew.RDS")
    }
```


```{r loo-proc_heb, include=FALSE}

data_hebrew_s <- m_hebrew$data


predictions_heb <- m_hebrew$models %>% map(~
                     predict(.x,summary=FALSE,ndraws = 2000)  %>%
                     array_branch(margin = 1) %>%
                     map_dfr( ~ {
                         data_hebrew_s %>%
                             mutate(pred= .x) %>%
                             group_by(stimulus, NAP_bu , SSP, SSP_obs , MSD , MSD_obs , NAP_td) %>%
                             summarize(pred = mean(pred))
                     } )
                 )

```

```{r fitplots_heb, eval =TRUE, warning=FALSE, include=FALSE}

data_RT_heb <- data_hebrew_s %>%
    mutate(NAP = round(NAP_bu,7)) %>%
    group_by(stimulus,NAP, NAP_bu , SSP, SSP_obs , MSD , MSD_obs , NAP_td) %>%
    summarize(corrRT = mean(corrRT))  %>%
    ## summarize(corrRT = mean(log(corrRT * 1000)))  %>%
    bind_rows(tibble(NAP = seq(30,220,10)))  %>%
#    bind_rows(tibble(NAP = seq(0.367355,.4825569,0.01))) %>%
    filter(!is.na(stimulus))
    
```

<!-- participantsStuff -->
```{r participantsRead, include=FALSE}

## pilot
subjects_pilot <- read.csv("data_tables/subjects/subjects_pilot.csv") %>% select(-X)

## Ger (main)
subjects_main <- read.csv("data_tables/subjects/subjects_main.csv") %>% select(-X) %>% distinct(subject_nr, .keep_all = TRUE)

subjects_main$subject_education <- as.character(subjects_main$subject_education)
subjects_main$subject_education[subjects_main$subject_education=="undergrad"] <- "undergraduate"
subjects_main$subject_education <- as.factor(subjects_main$subject_education)

## Heb
subjects_heb <- read.csv("data_tables/subjects/subjects_heb.csv") %>% select(-X) %>% distinct(subject_nr, .keep_all = TRUE)

subjects_heb$subject_education <- as.character(subjects_heb$subject_education)
subjects_heb$subject_education[subjects_heb$subject_education=="high-school"] <- "school"
subjects_heb$subject_education[subjects_heb$subject_education=="undergrad"] <- "undergraduate"
subjects_heb$subject_education <- as.factor(subjects_heb$subject_education)

```

# Introduction

The following work models the contribution of *sonority* to phonology in a manner that attempts to be compatible with general auditory perception and cognition, as well as with linguistic theory. This is in contrast to many of the traditional formal principles in linguistic theory that do not tend to specify how they interact with general systems of human capacity, such as auditory perception or sensorimotor control. Traditional linguistic principles are mostly generalizations that explain linguistic typologies as they have been depicted in writing systems. 

The prevailing sonority-related principles such as the *Sonority Sequencing Principle* (SSP) and its derivatives are, indeed, generalizations of the latter type.
Our novel proposals in this study achieve a better empirical coverage compared to different versions of the SSP that we test in a set of perception tasks, while, at the same time, provide a more comprehensive explanation than common formal linguistic principles to the notion of sonority in phonology and phonetics.

We assume here that a proper model of speech perception involves a bottom-up route and a related, yet separate, top-down route. While both inference routes are capable of selecting between discrete alternatives (e.g., selecting different consonants or syllabic parses), they arrive there in two very different ways: bottom-up processes are based on continuous events and top-down processes are based on existing sets of symbolic entities. In that sense, top-down models are more reminiscent of traditional formal principles in phonology that, more often than not, cover processes that start and end with discrete symbols.

Sonority in our account is strongly related to pitch perception and the major role of pitch in all language systems. Pitch in speech is mediated by syllable-size units, regardless of its role as the lexical *tone* in tone languages or the post-lexical *tune* in intonation systems. 
We hypothesize that sonority is related to the strength and clarity of pitch perception, serving as a measure of *pitch intelligibility* that acts as a universal drive to optimize the pitch-bearing ability of syllabic units (see Section \@ref(sec:sonPitch)). 

We suggest a measurable acoustic correlate for sonority in terms of *periodic energy*, and we propose a novel principle, the *Nucleus Attraction Principle* (NAP), that accounts for syllabic well-formedness based on general principles of competition in real-time (that is, the extent to which the different portions of the speech signal are good candidates for the syllabic nucleus). 
We implement NAP with two complementary models (see Section \@ref(sec:modelimp)): (i) a bottom-up model that directly analyzes continuous acoustic signals; and (ii) a top-down model that is based on discrete units of consonants and vowels. 

We present a series of syllable count tasks in Section \@ref(sec:experiments), 
designed in order to test our two NAP-based models (applying NAP with bottom-up and top-down approaches) against four traditional sonority models, considering two types of common sonority hierarchies combined with two types of common sonority principles---the *Sonority Sequencing Principle* (SSP) and the *Minimum Sonority Distance* (MSD).

We use a Bayesian data analysis approach to test and compare the six different sonority models. Whereas all the different models are found to be capable of predicting the experimental results to a good extent, the symbolic top-down version of NAP is shown to be the superior model. The bottom-up model of NAP comes in second alongside a few of the traditional models. 
We consider this to be a very good performance of the bottom-up model, which is based solely on continuous acoustic signals and still has many potential paths for improvement (e.g., better digital signal processing and improved procedures to estimate competition).
Interestingly, some of the results suggest a relatively high degree of complementarity between the two NAP models, even though they represent the same principle. This is a desirable result for our framework, which advocates the need for two complementary models to account for both dynamic and symbolic processes.

Our set of proposals has many advantages over traditional sonority accounts, including methodological aspects, theoretical perspectives, and, essentially, a better empirical coverage (see Subsection \@ref(sec:discussionResults) for a summary of the experimental results). 
Some of the major implications of this study are adderessed in Section \@ref(sec:genDiscussion), where we discuss the division of labor between sonority and other phonotactic factors, demonstrated with a holistic account of the phenomenon of */s/-stop clusters* (Subsection \@ref(sec:division)). 
We also discuss the contribution of this work to previous theoretical efforts to incorporate continuous entities in phonology (Subsection \@ref(sec:lingMod)).
Finally, we discuss the debate on the universality of sonority in light of our work (Subsection \@ref(sec:projection)), before we present our conclusions in Section \@ref(sec:conclusions).

In the remainder of this Introduction, we briefly present the relevant background on traditional sonority hierarchies and principles, to cover the basics of their rationale and common application.

## Sonority Background {#sec:sonback}

### Sonority hierarchies {#sec:hierarchies}

A sonority hierarchy is a single scale on which all consonants and vowels can be ranked relative to each other. Early versions of current sonority hierarchies often date back to @sievers1893grundzugesk; @jespersen1899fonetik, and @whitney1865relation.[^cf-debrosses]
While the phonetic basis of sonority hierarchies remains controversial, phonological sonority hierarchies have been primarily based on repeated observations that revealed systematic behaviors of segmental distribution and syllabic organization within and across languages. The general consensus regarding the phonological sonority hierarchy thus stems from attested cross-linguistic phonotactic behaviors of different segmental classes, such as, for instance, 
the relatively high frequency of stop-liquid sequences in the onset of complex syllables (e.g., /kl/ in the English word ***cl**ean*) 
as well as the opposite liquid-stop sequences in the mirror-image coda position of complex syllables (e.g., /lk/ in the English word *mi**lk***),
against the very low frequency of the opposite scenarios, which posit /lk/ in complex onsets and /kl/ in complex codas.

[^cf-debrosses]: @ohala1992alternatives goes even further back to @debrosses1765traite.

Although there are many different proposals for sonority hierarchies [@parker2002quantifying found more than 100 distinct sonority hierarchies in the literature], a very basic hierarchy that seems to reach a considerable consensus, and is often cited in relation to Clements's [-@clements1990role] seminal paper is given in (\@ref(ex:scale)).[^cf-liquid]

[^cf-liquid]: The group of *liquids* is the most loosely defined, as it includes both lateral *approximants* (namely /l/) and various types of rhotics such as *trills* (/r,ʀ,ʁ/), *taps* (namely /ɾ/), and alveolar and retroflex approximants (/ɹ,ɻ/). 

\begin{exe} 
\ex \emph{Obstruents} $<$ \emph{Nasals} $<$ \emph{Liquids} $<$ \emph{Glides} $<$ \emph{Vowels}  \label{ex:scale} 
\end{exe} 

The ordering of different speech sounds along the sonority hierarchy is assumed to be universal, in line with the common assumption that sonority has a phonetic basis in perception and/or articulation, yet the patterning of segmental classes as distinct groups along the scale is considered to be language-specific, i.e., based on phonological categorization. 
For example, voiceless *stops* may be considered universally lower than voiced *fricatives* on the sonority hierarchy, yet for some languages and analyses the relevant patterning of stpos and fricatives along the sonority hierarchy may consider them together as belonging to the same general class of *obstruents*.
Classes along the sonority hierarchy are most commonly modeled as a series of integers (often referred to as sonority indices) reflecting the ordinal nature of phonological interpretations of the sonority hierarchy. 

(ref:hierarchy-caption) (\#tab:hierarchy) Traditional phonological sonority hierarchies
(ref:hierarchy-caption2) Index values reflect the ordinal ranking of categories in sonority hierarchies. The obstruents in *H~col~* are collapsed into one category (bottom four rows = 1), while in *H~exp~* they are expanded into four distinct levels.
\begin{table}[tbp]
\begin{center}
\begin{threeparttable}
\caption{(ref:hierarchy-caption)}
\begin{tabular}{cclcclcclccl}
\toprule
\multicolumn{2}{c}{\textbf{Sonority index values}} & \multicolumn{1}{l}{\textbf{Segmental classes}} & \multicolumn{1}{l}{\textbf{Phonemic examples}}\\
\multicolumn{1}{c}{\emph{H\textsubscript{col}} hierarchy} & \multicolumn{1}{c}{\emph{H\textsubscript{exp}} hierarchy} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{}\\
\midrule
5 & 8 & Vowels & \multicolumn{1}{l}{/u, i, o, e, a/}\\
4 & 7 & Glides & \multicolumn{1}{l}{/w, j/}\\
3 & 6 & Liquids & \multicolumn{1}{l}{/l, r/}\\
2 & 5 & Nasals & \multicolumn{1}{l}{/m, n/}\\
\textbf{1} & \textbf{4} & Voiced Fricatives & \multicolumn{1}{l}{/v, z/}\\
\textbf{1}& \textbf{3} & Voiced Stops & \multicolumn{1}{l}{/b, d, g/}\\
\textbf{1}& \textbf{2} & Voiceless Fricatives & \multicolumn{1}{l}{/f, s/}\\
\textbf{1}&\textbf{1} & Voiceless Stops & \multicolumn{1}{l}{/p, t, k/}\\
\bottomrule
\addlinespace
\end{tabular}
\begin{tablenotes}[para]
\normalsize{\textit{Note.} (ref:hierarchy-caption2)}
\end{tablenotes}
\end{threeparttable}
\end{center}
\end{table}


The main differences that result from variation of the basic hierarchy in (\@ref(ex:scale)) concern the class of obstruents, which may contain voiced and voiceless variants of stops and fricatives (to mention just the most prominent distinctions).
Note that vowels are often also commonly divided into subgroups along the sonority hierarchy [see @gordon2012sonority], but these distinctions will be irrelevant in the context of this paper.
It is therefore not uncommon to expand the class of obstruents, whereby stops are lower than fricatives and voiceless consonants are lower than voiced ones.
Also note that the ranking of voiceless fricatives in relation to voiced stops may be disputed, depending on whether manner distinctions or voicing distinctions take precedence. The version we adopt here is in line with @parker2008sound in suggesting that voicing distinctions take precedence. In any case, these differences are not assumed to bear any crucial consequences on the results of this study.

The two variants of the sonority index values given in Table \@ref(tab:hierarchy) reflect two ends of a spectrum of common sonority hierarchies, ranging from hierarchies that collapse all obstruents together into a single class (resulting in the same sonority index value for all obstruents), to hierarchies that expand the class of obstruents by employing voicing distinctions as well as distinctions between stops and fricatives (resulting in multiple sonority index values within the class of obstruents). In what follows we will refer to these two versions of the sonority hierarchy as *H~col~* for the sonority hierarchy with a single collapsed class of obstruents, and *H~exp~* for the sonority hierarchy that exhibits multiple subclasses based on the  expanded class of obstruents.

### Sonority principles {#sec:principles}

Sequencing principles can be understood as a mapping scheme between the ranks of a sonority hierarchy and the linear order of symbolic speech segments.
Modern formulations of such principles, which use the ordinal sonority hierarchy to generalize over the phonotactics of consonantal sequences in terms of *sonority slopes* were developed mainly throughout the 1970s and 1980s in seminal works such as @hooper1976introduction; @steriade1982greek; @selkirk1984majorsk; @harris1983syllablesk; @mohanan1986theory, and @clements1990role.

(ref:slopes-pl-lp) Schematic depiction of the sonority slopes of two onset clusters, *plV* and *lpV*. The red line denotes the sonority slope of the onset cluster (i.e., the two onset consonants), while the grey line denotes the slope between the second consonant and the vowel at the nucleus position (always a rise in these cases). The angle of the red lines reflects the well-formed rising sonority slope of the onset cluster in *plV* and the ill-formed falling sonority slope of the onset cluster in *lpv*. Image taken from @albertIPsonoritysk.
```{r slopes-pl-lp, fig.cap = "(ref:slopes-pl-lp)", fig.asp = .45, out.width = '100%', dev="cairo_pdf"}
seg_type = c("Vowels","Glides","Liquids","Nasals","Voiced Fricatives","Voiced Stop", "Voiceless Fricatives","Voiceless Stops")
seg_token = c("p","l","V","l","p","V") 
slopes_plot <- function(df) {
  ggplot(df, aes(x=x,y=y, linetype=line, color=line)) + 
  geom_segment(aes(x=0, xend=7.5, y=-1, yend=-1), color="grey", size=.2, alpha=.5, linetype = "solid") +
  geom_segment(aes(x=4.5, xend=4.5, y=-.5, yend=8.5), color="black", size=.2, alpha=.2, linetype = "dotted") +
  geom_text(data=tibble(seg_token=seg_type,y=8:1,x=0.2),
            aes(label=seg_token,x=x,y=y, hjust=0),inherit.aes = FALSE, size=4, family = "Charis SIL") +
  geom_text(data=tibble(seg_token=seg_token,y=0,x=2:7),
            aes(label=seg_token,x=x,y=y, hjust=0),inherit.aes = FALSE, size=4, family = "Charis SIL") +
  geom_text(data=tibble(seg_token="Sonority slopes: different types",y=10.75,x=4.5),
            aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
  geom_text(data=tibble(seg_token="well-formed",y=9.5,x=3),
            aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL", fontface = "italic") +
  geom_text(data=tibble(seg_token="onset rise",y=8.75,x=3),
            aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
  geom_text(data=tibble(seg_token="ill-formed",y=9.5,x=6),
            aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL", fontface = "italic") +
  geom_text(data=tibble(seg_token="onset fall",y=8.75,x=6),
            aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
  geom_line() +
  scale_x_continuous("",breaks=2:7, labels=NULL) + 
  scale_y_continuous("",breaks=1:8, labels=NULL) +
  scale_linetype_manual(values=c("solid","solid","solid","solid")) +
  scale_color_manual(values=c("red","grey","red","grey")) +
  geom_point(color="black", size=1) +
  theme(legend.position = "none", axis.line.y = element_blank(), panel.grid = element_blank(), panel.background = element_blank(), axis.ticks = element_blank()) + 
  coord_cartesian(xlim=c(0.2,7.5))
}
tribble(~x , ~y, ~line,
        2, 1, "a",
        3, 6, "a",
        3, 6, "b",
        4, 8, "b",
        5,6, "c",
        6,1, "c",
        6,1, "d",
        7,8, "d") %>%
  slopes_plot()
```

The most basic and widely used sonority-based principle that employs sonority slopes to derive phonotactic predictions in terms of syllabic well-formedness is the *Sonority Sequencing Principle* (SSP). The SSP is a simple yet powerful generalization that has been used in countless theoretical accounts. The SSP assumes that sequences of segments within syllables preferably rise towards the nucleus of the syllable, where sonority is expected to reach the local maximum.
Consequently, sequences of segments should preferably rise in sonority from the consonant(s) in the syllabic onset to the syllable's nucleus (most often a vowel) and fall from the nucleus to the consonant(s) in the syllabic coda. 
In this paper we focus on syllable-initial onset consonant clusters that precede a vowel, whereby a rising sonority slope (e.g., *plV*) is considered well-formed and a falling sonority slope (e.g., *lpV*) is considered ill-formed. Sonority plateaus (e.g., *pkV*) fare in between, giving way to various interpretations depending on the language and analysis. As such, plateaus may be considered as ill- or well-formed [e.g., @blevins1995syllable], although they are generally interpreted as denoting a third, mid-level of well-formedness.

The *Minimum Sonority Distance* [MSD; @steriade1982greek; @selkirk1984majorsk] is a well-known elaboration on the preferred angle of sonority slopes compared to basic applications of the SSP, given that the SSP makes no distinction between different angles of rising or falling slopes. The MSD was designed to prefer onset rises with steep slopes over onset rises with shallow slopes, under the assumption that consonantal sequences in the onset are preferred with a larger sonority distance between them. For instance, *plV* has a steeper rise compared to *bnV* and it is therefore better-formed according to the MSD (see Figure \@ref(fig:slopes-pl-bn)).[^cf-sdp] 

[^cf-sdp]: The *Sonority Dispersion Principle* [SDP; @clements1990role; @clements1992sonority] is a slightly different yet related principle that prefers onset rises with a large distance and an equal dispersion of sonority index values across the consonantal sequence and the following vowel. The results of the SDP are highly contingent on the given sonority hierarchy and it is not very clear how to apply the SDP with onset sonority falls [among other problems listed in @parker2002quantifying 22--24]. The SDP is therefore not comparable as a model that can generate the full set of well-formedness predictions for onset clusters. Indeed, the SDP is mostly invoked in relation to the status of the onset versus the coda (not directly related to consonantal clusters), where it is used to highlight the assumption that onsets prefer to maximize sonority distance from the following nucleus while codas prefer to minimize sonority distance from the preceding nucleus.

(ref:slopes-pl-bn) Schematic depiction of the sonority slopes of two onset clusters, *plV* and *bnV* (the red solid line denotes the sonority slope of the onset clusters). The angle of the red lines reflects a steeper rise for *plV* (left) compared with *bnV* (right), due to the larger sonority distance between the consonants in *plV*. Image taken from @albertIPsonoritysk.
```{r slopes-pl-bn, fig.cap = "(ref:slopes-pl-bn)", fig.asp = .4, out.width = '100%', dev="cairo_pdf"}
seg_type = c("Vowels","Glides","Liquids","Nasals","Voiced Fricatives","Voiced Stop", "Voiceless Fricatives","Voiceless Stops")
seg_token = c("p","l","V","b","n","V") 
slopes_plot <- function(df) {
  ggplot(df, aes(x=x,y=y, linetype=line, color=line)) + 
  geom_segment(aes(x=0, xend=7.5, y=-1, yend=-1), color="grey", size=.2, alpha=.5, linetype = "solid") +
  geom_segment(aes(x=4.5, xend=4.5, y=-.5, yend=8.5), color="black", size=.2, alpha=.2, linetype = "dotted") +
  geom_text(data=tibble(seg_token=seg_type,y=8:1,x=0.2),
            aes(label=seg_token,x=x,y=y, hjust=0),inherit.aes = FALSE, size=4, family = "Charis SIL") +
  geom_text(data=tibble(seg_token=seg_token,y=0,x=2:7),
            aes(label=seg_token,x=x,y=y, hjust=0),inherit.aes = FALSE, size=4, family = "Charis SIL") +
  geom_text(data=tibble(seg_token="Sonority rises: different slopes",y=10.75,x=4.5),
            aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
  geom_text(data=tibble(seg_token="steep rise",y=9.25,x=3),
            aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
  geom_text(data=tibble(seg_token="shallow rise",y=9.25,x=6),
            aes(label=seg_token,x=x,y=y, hjust=0.5),inherit.aes = FALSE, size=4, family = "Charis SIL") +
  geom_line() +
  scale_x_continuous("",breaks=2:7, labels=NULL) + 
  scale_y_continuous("",breaks=1:8, labels=NULL) +
  scale_linetype_manual(values=c("solid","solid","solid","solid")) +
  scale_color_manual(values=c("red","grey","red","grey")) +
  geom_point(color="black", size=1) +
  theme(legend.position = "none", axis.line.y = element_blank(), panel.grid = element_blank(), panel.background = element_blank(), axis.ticks = element_blank()) + 
  coord_cartesian(xlim=c(0.2,7.5))
}
tribble(~x , ~y, ~line,
        2, 1, "a",
        3, 6, "a",
        3, 6, "b",
        4, 8,"b",
        5,3,"c",
        6,5,"c",
        6,5,"d",
        7,8,"d") %>%
  slopes_plot()
```

#     Sonority, Pitch and the Nucleus Attraction Principle (NAP) {#sec:sonPitch}

##		Sonority and Pitch Intelligibility {#sec:pitchintelligibility}

The observation that sonority summarizes an essential quality that is related to vowels and their propensity to deliver a relatively steady harmonic structure, highlighting pitch and formant information, is by no means new. Previous proposals already defined sonority as either relating to vowels in some general way, or more specifically relating to voicing or glottal fold vibration, or to the clarity/strength of the formants.[^cf-list] A few previous accounts went even further, by addressing the function of this vowel-centric feature, suggesting that sonority may be related to periodic energy or pitch/tone [@lass1988phonology; @nathan1989preliminaries; @puppel1992sonority; @ladefoged1997linguistic; @heselwood1998unusual]. What all these proposals share, explicitly or implicitly, is a recurring insight about a strong link between the preferred type of segmental material in syllabic nuclei and a set of features that conspire to optimize pitch intelligibility, a property which characterizes vowels more than consonants. 

[^cf-list]: A partial list of some prominent examples includes @sigurd1955rank; @jakobson1956fundamentals; @chomsky1968spesk; @foley1972rule; @ladefoged1971preliminaries; @allen1973accentsk; @fujimura1975syllable; @Donegan1978onthenatural; @ultan1978typological; @price1980sonority; @lindblom1983production; @anderson1986suprasegmental; @vennemann1988preferencesk; @levitt1991syllable; @pierrehumbert1992lenition; @fujimura1997acoustic; @stemberger1997handbook; @boersma1998functional; @zhang2001effects; @howe2004harmonic; @clements2009does; @sharma2018significance.

Pitch is an indispensable communicative dimension of all linguistic sound systems [@pike1945intonationsk; @bolinger1978intonation; @house1990tonal; @cutler1997prosody], whether it is lexically determined as in linguistic tone, 
or post-lexically employed to convey intonation, i.e., the linguistic tune [see typological accounts of prosodic systems in @jun2005prosodicsk; @jun2015prosodicsk]. 
Tones are used to distinguish lexical items while tunes are used to demarcate units, to modulate semantics (e.g., information structure and sentence modality) and to
express a vast array of non-propositional meanings (e.g., discourse-pragmatic intention, emotional state, socio-indexical identity, and attitudinal stance). The importance of pitch to human communication cannot be overstated [or in the words of @pike1945intonationsk 20: "There are no pitchless sentences"].

Crucially, linguistic pitch events are known to target syllable-sized units as their "docking site", regardless of the type of pitch event, whether they are lexical tones or post-lexical tunes. 
These linguistic pitch events are commonly considered to associate with *Tone-Bearing Units* [@leben1973suprasegmental], that are either syllables or *moras*.[^cf-mora] 
These associations between the text on the one hand and tone or tune on the other hand are widely assumed to be mediated by syllabic/moraic units.
For example, intonation pitch contours that highlight and modulate whole words and phrases essentially target privileged syllables---*heads* (stressed syllables) and *edges* (syllables at initial and final positions of prosodic words and phrases)---to achieve their communicative goal on textual material of various sizes [@ladd2008intonational; @roettger2019tune].
This tone-bearing role of syllables and moras is the hallmark of many prominent theories regarding tone and intonation, following from *Autosegmental* and *Autosegmental-Metrical* models of phonology [e.g., @liberman1975intonationalsk; @goldsmith1976autosegmental; @pierrehumbert1980phoneticssk; @ladd2008intonational].

[^cf-mora]: Moras are used to represent quantitative differences between light and heavy syllables (weight sensitivity), such that light syllables contain one mora while heavier syllables contain two (and sometimes even three) moras [see @hyman1984atheory; @mccarthy1990footsk; @hayes1989compensatory; @ito1989prosodic; @zec1995sonority; @zec2003prosodic].

The functionally motivated conclusion that emerges with respect to sonority is therefore that syllables require a pitch-bearing nucleus and that sonority is a scalar measure of the ability to bear pitch. In other words, sonority is, most likely, a measure of pitch intelligibility. 
This hypothesis comes with an underlying assumption that syllables have followed an evolutionary trajectory that shaped them to optimally carry pitch in their nuclei. Sonority, according to this description, serves as the tool that governs the requirement for intelligible pitch as a fundamental characteristic in the design of the building blocks of prosody.

It is important to note that this view of sonority is explicitly and exclusively based on perception, rather than articulation of speech. However, it does not exclude articulation-based description of syllables under the assumption that restrictions on syllabic structure must be derived from both the perception and the articulation of speech. A case in point is the *Articulatory Phonology* framework 
(see Section \@ref(sec:synthesis)),
with its valuable descriptions of temporal coordination and phase relations between motor gestures, which can be effectively linked to syllabic organization [see, e.g., @goldstein2007syllablesk; @goldstein2009coupled; @shaw2009syllabificationsk; @gafos2014stochastic; @hermes2017variabilitysk]. 

###     Pitch intelligibility and periodic energy {#sec:periodicenergy}

Pitch is a psychophysical phenomenon based on perception and cognition [see @plomp1976aspects; @plack2005psychophysics]. We can extract perception-related measurements from acoustics, i.e., not directly from the perceived sensation of a human subject but from the digitally-analyzed description of the physical sound in space. 
Using acoustics to cover auditory psychophysical phenomena is not a straightforward task. It requires a consistent and reliable association between acoustics on the one hand, and perception and cognition on the other hand.
This task is potentially complicated further with a complex phenomenon like pitch, which is sensitive to various aspects of the rich acoustic signal as well as to our top-down expectations with regards to learned regularities of pitch behavior [e.g., @houtsma1995pitch; @shepard2001pitch; @moore2013anintro 203-243; @mcpherson2018diversity].  

Fortunately, there are strong links between pitch and acoustic markers given the important role of periodicity in pitch. This is well-known from the extensive use of acoustic F0 measurements to estimate perceived pitch height, based on periodicities in the signal, using techniques such as autocorrelation  [e.g., @boersma1993accurate]. 
To estimate perceived pitch intelligibility from acoustic signals, we need to obtain a measure of periodic energy, which is a measurement of the acoustic power of periodic components in the signal. It may be helpful to think of this as a measurement of general intensity that excludes the contribution of aperiodic noise and transient bursts.

To conclude, our ability to detect periodicity in acoustic signals allows us to extract good estimates of F0 and periodic energy from speech data. We stand on firm grounds when we map these acoustic markers to perception in terms of pitch height and pitch intelligibility (respectively).
Given a causal link between perceived pitch height and linguistic tone and intonation contours, it is reasonable and, indeed, commonplace, to assume by transitivity that acoustic F0 maintains a causal link to linguistic tone and intonation.
Likewise, given a causal link between perceived pitch intelligibility and linguistic sonority, it should be reasonable to assume by transitivity that acoustic periodic energy maintains a causal link with the linguistic notion of sonority.  

##   The Problem of Intensity-Based Accounts {#sec:correlusions}

Although no strong consensus has ever been reached with respect to the phonetic basis of sonority, acoustic *intensity* is perhaps the most widely assumed correlate of linguistic sonority. This is evident from the many influential studies on sonority that consider acoustic intensity as its phonetic correlate [e.g., @sievers1893grundzugesk; @heffner1969generalsk; @ladefoged1975acourse; @clements1990role; @blevins1995syllable; @parker2008sound; and @gordon2012sonority, to name just a few prominent examples].[^cf-parkerIntensity]

[^cf-parkerIntensity]: In his overview of existing literature, @parker2002quantifying found close to 100 different proposals for correlates of sonority in the literature, and he tested five leading proposals in laboratory conditions: *intensity*, *intraoral air pressure*, *F~1~ frequency*, *total air flow*, and *duration*. In his study, the tightest correlations with sonority classes were obtained for acoustic intensity measurements, a conclusion that was repeated and elaborated upon in @parker2008sound. 

The main problem with intensity-based accounts is related to the distinction between causation and correlation. 
Establishing causation from acoustic signals necessitates a theory that can reliably map acoustic markers to operations or processes in sensorimotor speech articulation and/or auditory speech perception. The problem with accounts that are based on acoustic intensity is that causation cannot be established given that the physical intensity of the acoustic signal does not consistently map to any aspect of human auditory perception, not even perceived loudness. 

###  Acoustic intensity $≠$ perceived loudness {#sec:intensity}

The acoustic signal has certain physical qualities contributing to its overall power, but they have different effects on the perceptual system of the human hearer. This discrepancy between acoustic intensity and perceived loudness is a well-known problem, playing a role at different dimensions of the mapping between acoustics and perception. The prominent points of departure between acoustic intensity and perceived loudness include
the following:
(i) loudness perception differs for sine waves with the same intensity level at different frequencies [e.g., @fletcher1933loudness; @plack1995loudness; @suzuki2004equal];
(ii) loudness perception differs for comparable sounds at different durations [e.g., @turk1996processing; @seshadri2009perceived; @olsen2010loudness; @moore2013anintro 143];
and (iii) loudness perception differs for otherwise comparable periodic (harmonic) vs. aperiodic (noise) sounds, and band-pass filtered noise, just like sine waves, is not uniformly loud across the frequency spectrum [e.g., @hellman1972asymmetry; @bao2010psychoacousticsk; @moore2013anintro 140].

Acoustic intensity is therefore a physical description of sound waves in space which does not consistently relate to how loud we perceive these sounds, or to any other perceptual phenomenon for that matter.


###  Loudness is not a good candidate for sonority {#sec:loudness}

Note also that the relevance of perceived loudness to syllabic organization requires some sort of functional explanation, which seems to be lacking. 
The systematic differences in intensity of adjacent speech sounds imply that these differences are neutralized in perception, as it should make sense to assume that the different sounds that compose coherent speech are perceived as having comparable loudness. 
The literature on perceived loudness supports this assumption given that speech portions with relatively low acoustic intensity, like voiceless fricatives, appear in speech next to portions with relatively high acoustic intensity, like vowels.
Our auditory system perceives the aperiodic high-mid frequencies of many obstruents as exceptionally loud compared to the periodic low-mid frequency ranges of vowel sounds, thus compensating in perception for physical differences in acoustic intensity.

Given the above, we should anticipate that perceived loudness will not be a good candidate for the acoustic correlate of sonority hierarchies, as a measure of perceived loudness would bring the different speech sounds closer together on its scale, essentially diminishing the distinctions provided by the differences in acoustic intensity (which is typically stronger for vocalic speech sounds).
Indeed, although good approximations of perceived loudness from acoustic signals are available [e.g., @seshadri2009perceived; @skovenborg2012loudnesssk; @lund2014loudnesssk; @itu2015algorithmssk], we are not aware of any attempts to employ such measures for sonority.[^cf-equal]

[^cf-equal]: Note that terms like "loudness" may be used to mean different things by different authors. For example, @arrabothu2015usingsk extracts acoustic measurements that are designed to reflect the "impulse-like excitation" of voiced speech sounds (following @seshadri2009perceived). @arrabothu2015usingsk refers to these measurements as "loudness of speech" under the assumption that vowels are louder than voiceless sounds, an assumption which is derived in large part from the classic literature on sonority.

Rather than attempting to map acoustic intensity to perception in terms of perceived loudness, prominent studies that successfully use intensity-based measures as correlates of sonority and syllabicity [e.g., @Pfitzinger1996syllablesk; @fant2000source; @tilsen2013speech; @rasanen2018pre] tend to enhance the discrepancy between intensity and loudness by dicriminating in favor of low frequency bands (where most of the energy of vocalic elements is found) and against high-mid bands (where most of the energy of obstruents is found). 
The signal manipulation behind such metrics is not typically based on grounds of general auditory perception. However, they are often tightly linked to the perceptual quality that is identified with sonority in this work---the capacity to perceive pitch.

##      The Nucleus Attraction Principle {#sec:nap}

At the heart of all sonority-based principles lies the idea that the most sonorous segment in a sequence is contained within the nucleus of the syllable. This idea in fact postulates a link between the amount of sonority and the nucleus position of the syllable. We adopt this fundamental insight that guides all other sonority principles in the development of the Nucleus Attraction Principle. However, instead of adding further formal assumptions about non-overlapping segments with fixed sonority values and corresponding sonority slopes in symbolic time, the link between sonority and the syllabic nucleus is simply modeled as a dynamic process in real time. All the portions of the speech signal compete against each other for available syllabic nuclei in this process. 

Sonority is therefore conceived as the quality that is capable of *attracting* the nucleus. The varying quantities of this quality, which temporally fluctuate along the stream of speech, determine which portions of speech are prone to succeed in attracting nuclei given their superior local sonority *mass*. The speech portions that fall between those successful attractors are syllabified in the margins of syllables, at onset and coda positions.[^cf-attraction]
In fact, by modeling the link between sonority and the syllabic nucleus in dynamic terms it is not necessary to add further theoretical postulates about sonority slopes or discrete segmental categories of consonants and vowels in order to determine well-formedness of syllabic structures. Syllabic ill-formedness in NAP-based models is positively correlated with the degree of nucleus competition that a given syllabified portion incurs. 

[^cf-attraction]: This notion of prosodic *attraction* is, in fact, well-established in phonological theory, with descriptions of *weight sensitivity* in the stress systems of many unrelated languages, in which the stress is said to be attracted to heavy syllables.
Heaviness is mainly the product of a longer vowel in the nucleus, and in some languages heaviness may also result from a (preferably sonorant) consonant in the coda [e.g., @mccarthy1979formalsk; @hayes1980metrical; @prince1990quantitative; @gordon2006syllableweight]. There are also analyses in which vowel qualities that are considered more sonorous can contribute to heaviness and attract the stress [@zec1995sonority; @zec2003prosodic; @kenstowicz1997quality; @delacy2002formal; @gordon2012sonority].
Viewed with NAP in mind, attraction of stress in weight sensitive systems is simply the special case of a regular procedure, whereby weight---i.e., sonority---attracts syllabic nuclei. 

It is important to note that the informativeness of NAP-based models is not derived from identifying the winner of the nucleus competition, but from quantifying the degree of competition within different portions of speech that stand for potential syllabic parses. 
NAP-based models can analyze speech parts that are parsed together as a single syllabic unit in order to estimate the degree of competition they give rise to when they compete for a single nucleus. 
In discrete terms, NAP-based models can quantify different sequences of segments to reflect how strongly they compete for a single nucleus. 
Either way, the higher the degree of internal competition, the more ill-formed a syllable is predicted to result from this parse.
To simplify this further with respect to the subset of instances discussed in this work (i.e., syllables with complex consonantal onset clusters), it is possible to say that the winner of the nucleus competition is always the only vowel in the structure. The determination of ill-formedness in these cases is based on quantifying the amount of competition that the winning vowel has to withstand given different consonantal clusters in the onset of the same syllable.

It should be also useful to note that we do not expect serious competition to arise from a consonant adjacent to the vowel in the same syllable.
Nucleus competition, much like sonority slopes, has a limited impact on syllables with simple onsets or codas, **C**V(C) or (C)V**C**. Principles like SSP and NAP play a role chiefly when sequences of consonants are syllabified within a single syllable as complex onset or coda clusters, **CC**V(C) or (C)V**CC**. The phonotactics of these possible consonantal sequences are determined to a large extent by sonority principles. We interpret this aspect of cluster phonotactics such that sequences within syllables are avoided the more they increase the potential competition for the nucleus in the process of syllabifying/parsing the stream of speech.

###     Schematic NAP sketches {#sec:NAPsketch}

(ref:nap-depictions) Schematic depictions of competition scenarios with symbolic CCV structures. Nucleus competition can be understood as the competition between the blue and the purple areas under the sonority curve. The two examples in the top row---*plV* and *lpV*---suggest a replication of successful traditional predictions, while the three examples in the bottom row---*spV*, *sfV* and *nmV*---suggest a divergence from SSP-type models (see text for more details). Image taken from @albertIPsonoritysk.
```{r nap-depictions, fig.cap = "(ref:nap-depictions)", out.width = '100%', fig.align = 'center'}

knitr::include_graphics(rep(c("external_figures/napComb150.png")))

```

To understand the rationale of NAP, a series of schematic sketches are presented in Figure \@ref(fig:nap-depictions), accompanied by an impressionistic description. These will eventually be implemented within formal models that are described in detail in Section \@ref(sec:modelimp).
The five examples with specified consonantal clusters exhibit their related sonorant energy depicted as the *area under the curve*, whereby the curve itself is an idealized depiction of schematic sonority.
The purple area in each syllable in Figure \@ref(fig:nap-depictions) denotes the sonorant energy of the winning vowel in the nucleus position while the blue area denotes the sonorant energy of the losing portions in the onset. 

Consider for example the pair *plV* and *lpV*, with schematic NAP-related depictions in the top row of Figure \@ref(fig:nap-depictions) (and with more traditional sonority slopes in Figure \@ref(fig:slopes-pl-lp)). A consonantal onset cluster with a well-formed rising sonority slope like *plV* should be also considered well-formed under NAP due to the very low potential of competition between the minimally-sonorous marginal onset consonant /p/ and the non-adjacent vowel that wins the competition for the nucleus. The intervening /l/ in this case promotes a continuous rise in sonority from /p/ to V, which contributes to the formation of a single energy mass with a clear peak. 
Likewise, a consonantal onset cluster with an ill-formed falling sonority slope like *lpV* should be also considered ill-formed under NAP due to the strong potential for competition between the marginal sonorous onset consonant /l/ and the non-adjacent vowel, especially given the intervening /p/ that leads to discontinuity in the sonority trajectory between /l/ and V, which contributes to the formation of a bimodal distribution of energy with two clear (competing) peaks.

Unlike the examples above, where the rationale of NAP is expected to replicate successful predictions of the SSP with cases like *plV* and *lpV*, NAP is expected to diverge from traditional sonority sequencing principles in 
some cases, as illustrated by the examples in the bottom row of Figure \@ref(fig:nap-depictions). Under NAP, neither */s/-stop* clusters with an ill-formed falling sonority like *spV* nor voiceless obstruent plateaus like *sfV* are expected to incur a strong competition syllable-internally (hence, they are in fact well-formed) due to the low potential for competition between the minimally-sonorous onset consonant /s/ and the non-adjacent vowel that wins the competition. Here, the intervening voiceless obstruents /p,f/ retain a minimally sonorous trajectory throughout the whole onset, which contributes to the formation of only low-level peaks, or *shelves* (see blue portions), that are barely enough to compete with a vowel (purple portions).
At the same time, a relatively strong competition potential (i.e., worse-formed syllable) is predicted under NAP for nasal plateaus like *nmV* when compared to obstruent plateaus like *sfV*. This should be expected given the strong potential for competition between the sonorous marginal onset consonant /n/ and the non-adjacent winning vowel. Here, the intervening nasal /m/ retains a relatively level sonorous trajectory, which contributes to the formation of a sonorous shelf throughout the onset.
Importantly, such differences in the well-formedness of different plateau types cannot be covered by SSP-based accounts since they treat all plateaus as incurring the same violation.

In Section \@ref(sec:modelimp) we show how the impressionistic descriptions of NAP that were provided thus far can be implemented within formal models that atttempt to capture the essence of NAP with estimations of its effects on either continuous acoustic signals or symbolic consonant and vowel classes.

#     NAP Implementations {#sec:modelimp}

##      Complementary NAP Models {#sec:complementary}

NAP essentially describes a bottom-up process, illustrating the parsing of the stream of speech into syllables as the end point of a process that starts in perception.
A bottom-up perspective on modeling NAP is therefore relatively straightforward as it requires a similar approach to the process NAP describes by analyzing continuous acoustic data at the input to derive well-formedness predictions at the output.
A bottom-up approach for NAP models has no capacity to exploit the power of abstraction, so it essentially has no "memory". It is a mechanistic dynamic model 
that describes syllabic parsing.
This means that a bottom-up model can be only designed to analyze concrete speech tokens. Unlike models of traditional sonority principles, a bottom-up model of NAP cannot determine the well-formedness of an abstract syllable as it is depicted in symbolic form. It will therefore give slightly different scores to different renditions of the same syllable, even by the same speaker.

A NAP-based model operating on abstracted symbolic units is used as a separate, complementary top-down model.
Top-down inferences are based on learned regularities and categorical abstractions that reflect linguistic experience. To that end, knowledge about consonantal inventories and the probabilities of consonantal co-occurrence and distribution with respect to position in the syllable is assumed to be acquired and then stored in abstract symbolic forms which are available for top-down inferences. In that sense, top-down inferences in perception are based on the distributional probability of recognized symbols.

The above description of top-down inferences, which are detached from the functional aspects of the bottom-up route, echo models of the language user as a *statistical learner* [see, e.g., @christiansen1999power; @frisch2001psychologicalsk; @tremblay2013processing] and, more specifically, they are very much in line with models of *phonotactic learners* [see, e.g., @coleman1997stochastic; @vitevitch2004webbasedsk; @bailey2001determinants; @hayes2008maximum; @hayes2011interpreting; @albright2009feature; @daland2011explaining; @jarosz2017inputsk; @mayer2019phonotacticsk].

That said, the current project does not explore the statistical nature of top-down inferences. Instead, it operationalizes the rationale behind NAP with symbolic machinery to present the symbolic model of NAP which is used here to estimate top-down inferences. 
This choice allows the presentation of a top-down model with a stronger explanatory value with regards to NAP as it uses a similar architecture to that of standard sonority principles, helping to elucidate NAP's core ideas while using a familiar vocabulary (see Subsection \@ref(sec:naptdmodel)). 
The symbolic NAP model is also necessary for the application of NAP in typological, diachronic, and many traditional and current studies, where speech data is transcribed into strings of discrete symbols (see Anonymous, in press). 

Moreover, it should be noted that the distributional patterns of recognizable symbols in a cognitively plausible top-down architecture are not informative with respect to their various sources, which include a host of universal and idiosyncratic phonotactic pressures. A true top-down statistical learner is thus inherently "contaminated" by all the different sources that contribute to phonotactics in a given system, without a clear distinction between sonority and other factors. Thus, it remains an open question whether top-down inferences that target only sonority-based phonotactics can be modeled in a more direct and principled way than the one presented here with the symbolic model of NAP.

As two complementary inference routes, the top-down and bottom-up models should not be considered equal. The bottom-up route is the source of learned linguistic distinctions and it is functionally motivated by the laws of physics and the limitations of the perceptual and cognitive systems.
In contrast, the top-down route is based on linguistic experience and superficial inferences that reflect the history of the symbols in the system (i.e., the distributional probabilities of recognizable recurring patterns and their extensions by analogy). In other words, top-down inferences reflect functionally motivated behaviors only indirectly, as the outcome of learning the superficial expressions of functionally-motivated (bottom-up) dynamics.

@durvasula2015illusory; @wilson2013bayesian; @wilson2014effects and @daland2018on present different yet comparable approaches, in that both bottom-up ("phonetic") and top-down ("phonological") streams of speech are considered in order to account for perception patterns of non-native consonantal clusters [see also @berent2009listeners; @berent2012language]. 
@daland2018on even suggest a Bayesian approach to integrate of the two streams. Independent support for such dual-route modeling in language processing can be also found in neurolinguistic studies like @hickok2007cortical and @poeppel2014current.
Our analysis differs from the above-mentioned studies in various ways. Importantly, we do not attempt to integrate the top-down and bottom-up inference routes in this work (see discussion in Subsection \@ref(sec:compofmind)) and we focus on modeling the bottom-up route with a continuous entity that remains (quasi-)continuous in the model (the periodic energy time-series), under the assumption that it retains a reliable and consistent link to either perception or articulation (targeting pitch intelligibility in perception) as well as linguistic processing (i.e., sonority). 

##     Model Implementations in Dynamic and Symbolic Terms {#sec:modelimpOLD}

In order to compare the different proposals, four types of traditional sonority models are considered alongside the two NAP models.
For traditional models we use the two types of sonority hierarchies that were presented in Subsection \@ref(sec:hierarchies), where the class of obstruents is either collapsed into a single level (*H~col~*) or expanded to include distinctions between voiced and voiceless obstruents, and between stops and fricatives (*H~exp~*).
Both hierarchies are applied with each of the two main variants of traditional sonority principles, the SSP and the MSD (see Subsection \@ref(sec:principles)).
The four traditional sonority models under discussion are therefore a combination of a sonority principle (either SSP or MSD) and a sonority hierarchy (either *H~col~* or *H~exp~*). Accordingly, they are referred to as *SSP~col~* , *SSP~exp~*, *MSD~col~*, and *MSD~exp~*.

The two NAP models use periodic energy as the correlate of sonority, and periodic energy is applied either continuously through acoustics (bottom-up model), or in a discrete manner using symbols (top-down model). These two NAP models are referred to as *NAP~td~* for the top-down model and *NAP~bu~* for the bottom-up one.

To demonstrate the different sonority models, this study focuses on complex onset clusters of the general form CCV, where C denotes consonants in onset position and V denotes a vowel in nucleus position. Traditional sonority models inspect the sonority slope of the onset cluster to determine well-formedness of CCV syllables, while NAP-based models apply the notion of competition to determine well-formedness.

In the following subsections, we will elaborate on the methods for obtaining well-formedness scores, starting with the ordinal scores obtained from the four traditional sonority models (Subsection \@ref(sec:traditionalmodels)), and the symbolic NAP model *NAP~td~* (Subsection \@ref(sec:naptdmodel)). The implementation of the continuous model *NAP~bu~* follows in Subsection \@ref(sec:napbu). 

It is important to note that the implementations of NAP in the following subsections present procedures for estimated results, based on the rationale of NAP and the study materials (i.e., syllables with a CCV structure). These procedures are not general and they are not supposed to simulate any perceptual or cognitive mechanism, neither top-down nor bottom-up.

### Traditional sonority models {#sec:traditionalmodels}

Implementation of traditional sonority principles like the SSP is based on a calculation of the sonority slope over a given sequence of segments. Speech segments in these models have fixed index values on the sonority hierarchy, based on their class membership, as in the *H~col~* and *H~exp~* hierarchies (see Table \@ref(tab:hierarchy)). Sonority index values are usually expressed in terms of integers since they reflect an ordinal scale, and, for this reason, the mathematical operations that these models employ should be restricted to basic arithmetic functions of addition and subtraction. Sonority slopes can be therefore obtained straightforwardly by a subtraction between the corresponding sonority indices of two adjacent consonants. In onset clusters with two consonants (CCV) this can be simply achieved by the formula $C_2 – C_1$, which yields positive results for rising sonority slopes, negative results for falling sonority slopes, or a zero for plateaus. This calculation is applied to the two SSP models, *SSP~col~* and *SSP~exp~* (see examples in Table \@ref(tab:ordinalscores)).

The exact same formula is also used to obtain scores for the Minimum Sonority Distance models, *MSD~col~* and *MSD~exp~*, which elaborate on the well-formedness of onset rises. 
MSD models differ from the SSP in the interpretation of positive values (that reflect rising sonority slopes). While under the SSP all positive scores map to a single score (i.e., all rises are well-formed to the same extent), under the MSD higher positive scores are preferred over lower positive scores to reflect the preference for a larger sonority distance (or a steeper slope) in a rising onset configuration (see examples in Table \@ref(tab:ordinalscores)). 

### The top-down symbolic NAP model {#sec:naptdmodel}

The symbolic version of NAP, which is used to derive predictions for the top-down NAP (*NAP~td~*), shares a similar architecture with common SSP-based models. Crucially, it also reflects the novelties of the current proposal, both in terms of the sonority hierarchy it assumes, and in terms of the design of the sonority principle. *NAP~td~* uses a sonority hierarchy that is based on the periodic energy potential of different phoneme classes as the basis of distinct categorical patterning (see the following subsection, \@ref(sec:snaphierarchy)). Furthermore, *NAP~td~* models syllabic well-formedness with the notion of nucleus competition, rather than the formal notion of sonority slopes as in traditional SSP-type models (see Subsection \@ref(sec:snapimplementation)).

#### The sonority hierarchy in NAP~td~ {#sec:snaphierarchy}

The symbolic sonority hierarchy in NAP uses the basic ratio between periodic and aperiodic energy in the speech signal to divide all speech sounds into three distinct groups, reflecting the coarse, yet reliable differences in potential periodic energy mass of different abstract speech sound categories. To achieve that, we rely on the following general characteristics: (i) the main source of periodic energy in speech stems from vocal fold vibrations when voicing occurs; and (ii) aperiodic energy in speech is mostly the outcome of the turbulent airflow that is driven by articulatory friction (i.e., fricatives) and from articulatory closure in oral stops, which  results in transient bursts when released [see @rosen1992temporal].

The ratio between periodic and aperiodic components in speech sounds readily yields the following three distinct groups: (i) voiceless obstruents that consist of mostly aperiodic energy are the least sonorous type of speech sounds; (ii) sonorant consonants and vowels that consist of mostly periodic energy are the most sonorous type of speech sounds; and (iii) voiced obstruents that consist of both periodic and aperiodic energy belong in the middle of this ternary scale (see \@ref(ex:napscale)).

\begin{exe}
\ex \emph{Voiceless Obstruents} $<$ \emph{Voiced Obstruents} $<$ \emph{Sonorants}  \label{ex:napscale}
\end{exe}

A further distinction in NAP's sonority hierarchy is based on the general presence or absence of articulatory contact, which can be referred to as the distinction between *contoids* and *vocoids* [@pike1943phonetics].
A free and open vocal tract contributes to a potentially stronger and longer vocalic signal that can qualitatively enhance the potential periodic energy mass.
This distinction effectively separates the sonorants into *sonorant vocoids* (glides and vowels) and *sonorant contoids* (nasals and liquids).
See Table \@ref(tab:napscale) for the full sonority hierarchy in the symbolic model of NAP.

(ref:napscale-caption) (\#tab:napscale) The symbolic sonority hierarchy in *NAP~td~*
(ref:napscale-caption2) Index values reflect the ordinal ranking of categories in the sonority hierarchy. The distinctions between categories in the symbolic NAP hierarchy are based on the characteristic ratio between periodic and aperiodic energy, and on articulatory contact, both taken to reflect the potential of the periodic energy mass, i.e., the potential for nucleus attraction.
\begin{table}[tbp]
\begin{center}
\begin{threeparttable}
\caption{(ref:napscale-caption)}
\begin{tabular}{cclcclcclccl}
\toprule
Sonority index & \multicolumn{1}{c}{Segmental classes} & \multicolumn{1}{c}{Periodic:Aperiodic} & \multicolumn{1}{c}{Articulatory contact}\\
\midrule
4 & \textbf{Sonorant Vocoids} & \multicolumn{1}{c}{1:0} & $-$\\
 & (\emph{glides}, \emph{vowels}) &  & \\
3 & \textbf{Sonorant Contoids} & \multicolumn{1}{c}{1:0} & $+$\\
 & (\emph{nasals}, \emph{liquids}) &  & \\
2 & \textbf{Voiced Obstruents} & \multicolumn{1}{c}{1:1} & $+$\\
 & (\emph{stops}, \emph{fricatives}) &  & \\
1 & \textbf{Voiceless Obstruents} & \multicolumn{1}{c}{0:1} & $+$\\
 & (\emph{stops}, \emph{fricatives}) &  & \\
\bottomrule
\addlinespace
\end{tabular}
\begin{tablenotes}[para]
\normalsize{\textit{Note.} (ref:napscale-caption2)}
\end{tablenotes}
\end{threeparttable}
\end{center}
\end{table}

The symbolic sonority hierarchy in NAP reconciles perceptual and articulatory approaches to sonority by modeling their mutual contribution to enhancing pitch intelligibility (or periodic energy mass, in acoustic terms). This hierarchy is similar to a few proposals for sonority hierarchies that combined levels of voicing/periodicity with degree of vocal tract opening [e.g., @lass1988phonology; @miller2012sonority; and @sharma2018significance].

The complete 4-place sonority hierarchy of *NAP~td~* in Table \@ref(tab:napscale) also reflects a basic typology of nucleus types, which supports the use of this scale as a qualitative measure for nucleus attraction potentials. Sonorant vocoids like glides and vowels can attract the nucleus in all languages we know (a glide is considered a vowel when syllabified in the nucleus position), while sonorant contoids like nasals or liquids can be syllabic (i.e., attract the nucleus) only in a subset of languages, of which a smaller subset may allow obstruents to attract nuclei [but see @easterday2019highly for some divergent patterns with syllabic obstruents relative to syllabic liquids].

#### NAP~td~ implementation {#sec:snapimplementation}

When assessing C~1~C~2~V syllables under the NAP framework, we essentially aim to measure the competition potential between C~1~ and V given C~2~. In and of itself, C~2~ is not considered a competitor due to its proximity to the vowel, as discussed in Subsection \@ref(sec:nap).
The issue of competition may be therefore expressed by the following two questions:
(i) what is the potential periodic energy mass of C~1~ (i.e., how sonorous is C~1~, or what is the intercept of the cluster that determines the starting point of the slope); and
(ii) how much of the energy in C~1~ is potentially lost, gained or maintained in C~2~, before peaking at the vowel (i.e., what is the sonority slope).
Assessing this relationship between C~1~ and V given C~2~ can be achieved by the combination of two subtraction formulas: 
(i) a calculation of the difference between C~1~ and the non-adjacent vowel, to reflect the potential strength of C~1~ in terms of the intercept relative to the nucleus; and
(ii) a calculation of the slope between adjacent C~1~ and C~2~, as in SSP-based models, to reflect the trajectories of fluctuating energy towards the peak. 
This can be summarized with the formula in \@ref(eq:naptdeq), see examples in Table (\@ref(tab:ordinalscores)).

\begin{equation}
  (V - C_1) + (C_2 - C_1)  \label{eq:naptdeq}
\end{equation}

A somewhat similar formula can be found in Fullwood's [-@fullwood2014perceptualsk] *Sonority Angle*, which builds on Flemming's [-@flemming2008asymmetriessk] *Sonority Rise* [see also @fleischhacker2002cluster].[^cf-thnk] @flemming2008asymmetriessk and @fullwood2014perceptualsk combine measurements of C~1~C~2~ and C~1~V slopes to analyze the distance between alternatives with and without a vowel epenthesis (C~1~C~2~ vs. C~1~**V**C~2~). 
By incorporating the sonority distance between consonants in C~1~ and vowels, these formulas are capable of mitigating some of the problems that the standard SSP formula has in accounting for either putatively ill-formed sonority slopes with an overall low sonority (e.g., /s/-stop clustrs) or putatively well-formed sonority slopes with an overall high sonority (e.g., sonorant plateaus and rises).

[^cf-thnk]: We thank an anonymous reviewer for noting these works.

###   Ordinal sonority scores {#sec:ordinalscores}

Table \@ref(tab:ordinalscores) demonstrates and compares the scores of the five ordinal models (2$\times$SSP, 2$\times$MSD and *NAP~td~*) 
with different CCV cluster types. It shows that the main difference between the two sonority hierarchies, *H~exp~* and *H~col~*, concerns fricative-stop clusters like the */s/-stop* cluster *spV*, which are considered as either an onset fall (with the *H~exp~* hierarchy) or an onset plateau (with the *H~col~* hierarchy). When the MSD is applied, the two sonority hierarchies also show differences in ranking within onset rises, given their different treatment of obstruents. In models that use the *H~exp~* hierarchy there are four levels of obstruents (voiced and voiceless stops and fricatives) which are collapsed into one level in models that use the *H~col~* hierarchy. 
This results in five distinct sonority rise scores in the *MSD~exp~* model, but only two in the *MSD~col~* model (where some of the trends also differ, e.g., *smV* vs. *vlV* in the two MSD-based models).

(ref:ordinalscores-caption) (\#tab:ordinalscores) Ordinal sonority scores
(ref:ordinalscores-caption2) Well-formedness scores with ordinal models. The table demonstrates the predictions we obtain using the two traditional sonority hierarchies, *H~col~* and *H~exp~*, with each of the two traditional sonority principles, SSP and MSD. Numbers in brackets next to "Rise" reflect MSD's ranking of onset rises by distance---higher values indicate better-formed rises. The scores derived from *NAP~td~* on the right column are taken to directly reflect the nucleus competition potential, where higher scores are better-formed.
\begin{table}[tbp]
\begin{center}
\begin{threeparttable}
\caption{(ref:ordinalscores-caption)}
\begin{tabular}{cclcclcclcclcclccl}
\toprule
\multicolumn{1}{l}{} & \multicolumn{4}{c}{Traditional sonority principles} & \multicolumn{1}{c}{Symbolic NAP}\\

\multicolumn{1}{l}{Onset} & \multicolumn{2}{c}{\emph{exp} hierarchy} & \multicolumn{2}{c}{\emph{col} hierarchy} & \multicolumn{1}{c}{\textbf{\emph{NAP\textsubscript{td}}}}\\

\multicolumn{1}{l}{clusters} & \multicolumn{1}{c}{$C_2-C_1$} & \multicolumn{1}{c}{\textbf{\emph{SSP(MSD)\textsubscript{exp}}}} & \multicolumn{1}{c}{$C_2-C_1$} & \multicolumn{1}{c}{\textbf{\emph{SSP(MSD)\textsubscript{col}}}} & \multicolumn{1}{c}{$(V-C_1)+(C_2-C_1)$}\\

\midrule
\multicolumn{1}{l}{\textbf{pl}V} & 6$-$1 $=$ 5 & \multicolumn{1}{c}{\textbf{Rise (5)}} & 3$-$1 $=$ 2 & \textbf{Rise (2)} & \multicolumn{1}{c}{(4$-$1)$+$(3$-$1) $=$ \textbf{5}}\\
\multicolumn{1}{l}{\textbf{fl}V} & 6$-$2 $=$ 4 & \multicolumn{1}{c}{\textbf{Rise (4)}} & 3$-$1 $=$ 2 & \textbf{Rise (2)} & \multicolumn{1}{c}{(4$-$1)$+$(3$-$1) $=$ \textbf{5}}\\
\multicolumn{1}{l}{\textbf{sm}V} & 5$-$2 $=$ 3 & \multicolumn{1}{c}{\textbf{Rise (3)}} & 2$-$1 $=$ 1 & \textbf{Rise (1)} & \multicolumn{1}{c}{(4$-$1)$+$(3$-$1) $=$ \textbf{5}}\\
\multicolumn{1}{l}{\textbf{vl}V} & 6$-$4 $=$ 2 & \multicolumn{1}{c}{\textbf{Rise (2)}} & 3$-$1 $=$ 2 & \textbf{Rise (2)} & \multicolumn{1}{c}{(4$-$2)$+$(3$-$2) $=$ \textbf{3}}\\
\multicolumn{1}{l}{\textbf{ml}V} & 6$-$5 $=$ 1 & \multicolumn{1}{c}{\textbf{Rise (1)}} & 3$-$2 $=$ 1 & \textbf{Rise (1)} & \multicolumn{1}{c}{(4$-$3)$+$(3$-$3) $=$ \textbf{1}}\\
\multicolumn{1}{l}{\textbf{sf}V} & 2$-$2 $=$ 0 & \multicolumn{1}{c}{\textbf{Plateau}} & 1$-$1 $=$ 0 & \textbf{Plateau} & \multicolumn{1}{c}{(4$-$1)$+$(1$-$1) $=$ \textbf{3}}\\
\multicolumn{1}{l}{\textbf{zv}V} & 3$-$3 $=$ 0 & \multicolumn{1}{c}{\textbf{Plateau}} & 1$-$1 $=$ 0 & \textbf{Plateau} & \multicolumn{1}{c}{(4$-$2)$+$(2$-$2) $=$ \textbf{2}}\\
\multicolumn{1}{l}{\textbf{nm}V} & 5$-$5 $=$ 0 & \multicolumn{1}{c}{\textbf{Plateau}} & 2$-$2 $=$ 0 & \textbf{Plateau} & \multicolumn{1}{c}{(4$-$3)$+$(3$-$3) $=$ \textbf{1}}\\
\multicolumn{1}{l}{\textbf{sp}V} & 1$-$2 $=$ $-$1 & \multicolumn{1}{c}{\textbf{Fall}} & 1$-$1 $=$ 0 & \textbf{Plateau} & \multicolumn{1}{c}{(4$-$1)$+$(1$-$1) $=$ \textbf{3}}\\
\multicolumn{1}{l}{\textbf{lm}V} & 5$-$6 $=$ $-$1 & \multicolumn{1}{c}{\textbf{Fall}} & 2$-$3 $=$ $-$1 & \textbf{Fall} & \multicolumn{1}{c}{(4$-$3)$+$(3$-$3) $=$ \textbf{1}}\\
\multicolumn{1}{l}{\textbf{mz}V} & 4$-$5 $=$ $-$1 & \multicolumn{1}{c}{\textbf{Fall}} & 1$-$2 $=$ $-$1 & \textbf{Fall} & \multicolumn{1}{c}{(4$-$3)$+$(2$-$3) $=$ \textbf{0}}\\
\multicolumn{1}{l}{\textbf{lv}V} & 4$-$6 $=$ $-$2 & \multicolumn{1}{c}{\textbf{Fall}} & 2$-$4 $=$ $-$2 & \textbf{Fall} & \multicolumn{1}{c}{(4$-$3)$+$(2$-$3) $=$ \textbf{0}}\\
\multicolumn{1}{l}{\textbf{ms}V} & 2$-$5 $=$ $-$3 & \multicolumn{1}{c}{\textbf{Fall}} & 1$-$2 $=$ $-$1 & \textbf{Fall} & \multicolumn{1}{c}{(4$-$3)$+$(1$-$3) $=$ \textbf{$-$1}}\\
\multicolumn{1}{l}{\textbf{np}V} & 1$-$5 $=$ $-$4 & \multicolumn{1}{c}{\textbf{Fall}} & 1$-$2 $=$ $-$1 & \textbf{Fall} & \multicolumn{1}{c}{(4$-$3)$+$(1$-$3) $=$ \textbf{$-$1}}\\
\multicolumn{1}{l}{\textbf{lp}V} & 1$-$6 $=$ $-$5 & \multicolumn{1}{c}{\textbf{Fall}} & 1$-$3 $=$ $-$2 & \textbf{Fall} & \multicolumn{1}{c}{(4$-$3)$+$(1$-$3) $=$ \textbf{$-$1}}\\
\bottomrule
\addlinespace
\end{tabular}
\begin{tablenotes}[para]
\normalsize{\textit{Note.} (ref:ordinalscores-caption2)}
\end{tablenotes}
\end{threeparttable}
\end{center}
\end{table}

Unlike traditional models, the predictions of *NAP~td~* are not grouped into levels that reflect the rough angle of the sonority slope in terms of falls, rises and plateaus. The raw score of the *NAP~td~* formula is taken as reflective of the nucleus competition potential such that higher scores denote weaker competition and are thus better-formed. 
The top-down NAP model allows scores within a range that goes from -$\mathit{3}$ for the most ill-formed syllable up to $\mathit{6}$ for the most well-formed, although a more relevant range to consider, given that glides are excluded from this study, is between -$\mathit{1}$ and $\mathit{5}$. These scores are not immediately comparable to the traditional model scores, but some interesting departures from the traditional models can be observed in Table \@ref(tab:ordinalscores). For example, *NAP~td~* considers the onset rise in the sonorous cluster *mlV* to be equally as ill-formed as the inverse fall, *lmV*. Both of these clusters pattern with nasal plateaus (e.g., *nmV*), where they all receive the same relatively low value of $\mathit{1}$. At the same time, voiceless clusters pattern in with well-formed combinations (scoring $\mathit{3}$) although they may include sonority plateaus (e.g., *sfV*) or sonority falls (e.g., *spV*) in traditional model terms.

###   The bottom-up NAP model {#sec:napbu}

There are various ways to calculate an estimation of the nucleus competition potential within syllables based on the periodic energy in the acoustic signal. The method we present here has the advantage of not relying on segmental landmarks that are categorical abstractions of the type that is not assumed to be available in the bottom-up route.

(ref:com-4examples-1) Smoothed periodic energy measurement (black curve), taken from four recordings of the syllables *lpal*, *nmal*, *vlal*, and *smal* in the AA stimuli set. The red vertical line denotes the center of periodic mass of the entire syllable (*CoM~syllable~*), the blue vertical line denotes the center of periodic mass of the left portion (*CoM~onset~*). Grey dotted vertical lines and annotated text denote segmental intervals by manual segmentation (for exposition purposes only). The distance between the two CoM landmarks is indicative of the energy displacement away from the syllabic center, reflecting the nucleus competition potential within the syllable (see text for details).
```{r com-4examples-1, fig.cap = "(ref:com-4examples-1)", fig.width=7, fig.asp=.6, warning=FALSE, dev="cairo_pdf"}
# CoM_ons <-  expression(CoM[ons])# %>% as_label()
monosyl_examp_1 <- filter(monosyl_info_AA,
                          syl %in% c("smal","vlal","nmal","lpal"))
ordered_syl_abs <- unique(monosyl_examp_1[order(monosyl_examp_1$NAP_bu),]$syl)
monosyl_examp_1$syl <- factor(monosyl_examp_1$syl, levels=ordered_syl_abs)
monosyl_examp_1_plot <-
  ggplot(monosyl_examp_1, aes(x=t)) +
  xlim(0,475) + ylim(-3,19) + #ggtitle("") +
  xlab("time (ms)") + ylab("Periodic energy (normalized scale)") +
  geom_line(aes(y=smog_per), color="black", alpha=1, size=1) +
  #
  geom_segment(aes(x=x_com_ons, xend=x_com_ons, y=-1, yend=13), color="royalblue1", size=1.7, alpha=.7, linetype = "solid", lineend = "round") +
  geom_segment(aes(x=x_com_syl, xend=x_com_syl, y=-1, yend=13),color="red", size=1.7, alpha=.6, linetype = "solid", lineend = "round") +
  # geom_text(aes(x=com_onset, y=15, label=deparse(CoM_ons)), nudge_x = -20, color="royalblue2", alpha=1, size=3, family = "Charis SIL", check_overlap=T, na.rm = T) +
  geom_text(aes(x=com_onset,y=15,label="CoM:onset"), nudge_x = -30, color="royalblue2", alpha=1, size=3, family = "Charis SIL", check_overlap=T, na.rm = T) +
  geom_text(aes(x=x_com_syl,y=15,label="CoM:syllable"), nudge_x = 30, color="red", alpha=.9, size=3, family = "Charis SIL", check_overlap=T, na.rm = T) +
  geom_segment(aes(x=x_com_syl, xend=com_onset, y=-.75, yend=-.75), color="black", size=.5, alpha=1, linetype = "solid", arrow = arrow(length = unit(0.04, "npc"))) +
  geom_text(aes(x=(x_com_syl+com_onset)/2,y=-2.5,label=paste0(round(NAP_bu)," ms")), size=3, family = "Charis SIL", check_overlap=T) + 
  #
  geom_segment(aes(x=pos_end, xend=pos_end, y=0, yend=19), color="grey", size=.5, alpha=.2, linetype = "dotted") +
  geom_text(aes(x=pos_mid,y=18,label=text), size=6, family = "Charis SIL", check_overlap=T) + 
  #
  facet_wrap(~syl, ncol=2) +
  theme(panel.background = element_blank(), axis.title = element_text(size = 12, family="Charis SIL"), axis.text.y = element_blank(), axis.ticks = element_blank(), axis.text.x = element_text(size = 8, family="Charis SIL"), strip.text = element_blank())
print(monosyl_examp_1_plot)
```

The periodic energy data that was extracted from acoustic recordings of speech is viewed in terms of a mass, i.e., the area under the periodic energy curve, integrating duration and power as the two linked dimensions of quantity in sound [see @turk1996processing on interactions between duration and intensity in linguistic perception contexts]. 
Summing is essentially different from averaging, as well as from peak extraction, 
in how much strength is assigned to the dimension of duration in the abstract measurement of quantity (duration is absent from peak extraction, it is normalized in averages, and it is strongly influencing the sum).
Importantly, only summing strategies are capable of uncovering the quantitative difference between two sounds that have similar amplitude envelopes yet differ in duration. 

The contribution of duration to sonority was convincingly illustrated in the seminal work of @price1980sonority. Price showed that disyllabic English words like *polite* /pəlʌɪt/ were perceived when the duration of the sonorant /l/ in the superficially related monosyllabic word *plight* /plʌɪt/ was manipulated. Thus, an increase in the duration of the sonorant essentially leads to the perception of another syllable. 
More supporting evidence on the interaction between duration and syllabic parsing can be found in @dupoux1999epentheticsk, who showed differences in perception between Japanese and French speakers, and in @berent2007we as well as @wilson2014effects, who analyzed patterns of misperception of Russian onset clusters by English speakers.

It is therefore useful to locate the *center of mass* within regions of interest as a measurement that is sensitive to the two axes of periodic energy mass---duration (x-axis) and power (y-axis). The center of mass can be viewed as the point in time in which the area under the curve is split into two equal parts. The location of the center of mass in time (x-axis) is attracted to the peak of the curve (on the y-axis), where it is expected to be found given a perfectly symmetrical shape. However, the center of mass most often diverges from the peak of rise-fall curves so as to reflect asymmetries in the overall distribution of mass. Identification of the center of mass of the periodic energy curve (henceforth CoM) follows a methodology that was introduced with the *Tonal Center of Gravity* [@barnes2012tonal], in calculating a weighted average time point that uses a continuous time series as the weighting term. The equation in \@ref(eq:com) is used to locate the average point in time (*t*), weighted by continuous periodic energy (*per*) at discrete time points:

\begin{equation}
  CoM = \frac{\sum_i per_i\cdot t_i}{\sum_i per_i}  \label{eq:com}
\end{equation}

The location of the center of periodic energy mass of the entire syllable (henceforth *CoM~syllable~*) guides us to the point in time where the periodic mass of all the competing forces within that syllable are split into two equal parts. Once we obtain this reference point we can repeat this process within the resulting left-side portion, i.e., from the beginning of the syllable up to *CoM~syllable~*, to focus on the onset position (henceforth *CoM~onset~*).
We therefore measure the center of mass twice---first for the entire syllable (resulting in *CoM~syllable~*) and then for the left portion of the first measurement (resulting in *CoM~onset~*). 
The distance between *CoM~syllable~* and *CoM~onset~* is indicative of the amount of displacement of energy away from the center of the syllable, which in turn reflects the degree of nucleus competition (see Figure \@ref(fig:com-4examples-1)).

The center of mass is capable of capturing both components of a two-dimensional mass by considering the non-linear shape of the periodic energy curve. 
The leftward displacement of *CoM~onset~* relative to *CoM~syllable~* is affected by the distance, the amplitude, and the amount of discontinuity between the periodic energy at the onset and the center of mass of the entire syllable.
Any increase in the above results in a larger distance between the two centers of mass, as Figure \@ref(fig:com-4examples-1) demonstrates.

#		Experimental Assessments {#sec:experiments}

In order to assess NAP's predictions in situations where both bottom-up and top-down inferences contribute to speech processing, an experimental procedure was designed to collect behavioral responses using a perception task. 
In what follows we present three experiments: 
*Experiment 1* is a short exploratory pilot study with 12 German-speaking subjects; 
*Experiment 2* is a confirmatory study with 51 German-speaking subjects; 
and *Experiment 3* is a confirmatory study with 33 Hebrew-speaking subjects. 

This section starts by describing the rationale of the experimental design (Subsection \@ref(sec:rationale)) before presenting the linguistic and acoustic materials used in the experiments (Subsection \@ref(sec:materials)). 
The predictions of the different models are then summarized in Subsection \@ref(sec:predictions), followed by descriptions of the experimental design (Subsection \@ref(sec:designs)), participants (Subsection \@ref(sec:participants)) and our data analysis strategies (Subsection \@ref(sec:datanlysis)). The results and related discussions follow in Subsection \@ref(sec:results).

##		Rationale {#sec:rationale}

The goal of the experimental procedure is to tap into the cognitive cost of syllabification processes. To that end, we devised a forced-choice task that allows us to systematically compare response times of forced categorical decisions. Response times are linked with cognitive cost, which, in the context of this task, is understood as the result of nucleus competition. The working assumption is that more competition within a structure makes it cognitively harder for this structure to be parsed as a single syllable, which is reflected in slower processing altogether.

This design uses nonce words to test specific consonantal combinations in structures that either feature two vowels and no initial consonantal sequences (typically considered to be disyllabic forms) or one vowel with a word-initial sequence (more likely to be considered as monosyllabic forms).
This experimental design is reminiscent of many experiments on sonority effects that Iris Berent and her colleagues have published, starting with the seminal @berent2007we.[^cf-berents] 
The premise of many of the tasks that Berent et al. test is that an ill-formed sonority onset fall, as in the monosyllable *lbV*, is more likely to be confused with disyllabic *lə.bV* when compared with well-formed monosyllable *blV* and its disyllabic counterpart, *bə.lV* (the schwa /ə/ in these examples denotes a generic epenthetic weak vowel). This misperception and confusion between alternatives is expected to be systematically greater with worse-formed sonority clusters, which leads to a drop in categorical accuracy, accompanied by a scalar increase in response time (we return to Berent's work in the general discussion, in Subsection \@ref(sec:projection)).

Comparable experimental assumptions regarding misperception of consonantal clusters can be found in related works on perception of non-native clusters such as @dupoux1999epentheticsk and @davidson2012sources, including also tasks that utilized the production of such clusters [e.g., @davidson2010phoneticsk; @wilson2014effects].

[^cf-berents]: Examples of further publications by Berent et al. with various experimental settings that test sonority effects in perception with behavioral data include: @berent2008language; @berent2010phonological; @berent2011syllable; @berent2012language; @berent2013phnological; @tamasi2014sensitivity; @zhao2015universalsk; @lennertz2015onthesonority. The following examples also include neurological data: @berent2014languagesk; @gomez2014language; @berent2015role. 

To test the different predictions of the six sonority models (2$\times$NAP, 2$\times$SSP, 2$\times$MSD), we designed a perception task that prompts meta-linguistic syllable count judgement with 29 experimental target items. 
Participants were presented with a collection of speech items that were systematically produced with one or two vowels for each combination of consonants in our set. 
Only the single-vowel productions were considered as targets, and an accurate response to our targets is always the monosyllabic option (note that the term "accuracy" is used here to describe participants' responses with respect to predictions).
By focusing on the response time of "correct" responses to the target words we essentially measure the time it took participants to decide that a given single-vowel stimulus is monosyllabic. 
We can therefore interpret the reaction times of monosyllabic responses to single-vowel targets as reflective of the processing cost of assigning one nucleus to a given target stimulus with one vowel.

We assume that the meta-linguistic task in our experimental design activates top-down inferences which urge subjects to base their reponses on their language experience. Since we want to also tap into subjects' bottom-up inferences, we present these nonce words as real words in another "unknown" language, uttered by a native speaker of that foreign language. For that matter, the German-speaking subjects listened to a stimuli set that featured recordings of a native Hebrew speaker. 
Likewise, the Hebrew-speaking subjects listened to a stimuli set that featured recordings of a native German speaker. Both speakers were trained phoneticians, capable of uttering the full set of stimuli (only the German speaker was naive with respect to the ensuing perception task).

##		Materials {#sec:materials}

The experimental design is focused on onset consonantal clusters with two members. These CC combinations are composed from a set of consonants with one of two major *place of articulation* types: either *coronal* or *labial*. This allows us to avoid articulatory effects that may arise from *homorganic* sequences (i.e., adjacent consonants that share the same place of articulation) while exploiting both directions of each combination---coronal-labial (back-to-front) and labial-coronal (front-to-back). There is also an advantage in the fact that the two places of articulation use different main articulators---the tongue tip reaches the palate in coronals, while the lower lip reaches the upper lip or teeth in labials. This relative articulatory independence helps to reduce co-articulation effects of adjacent gestures in consonantal clusters.
The consonantal classes in this study include stops, fricatives, nasals, and liquids to reflect the main *manner of articulation* classes in traditional sonority hierarchies (excluding glides). 
See Appendix \@ref(appendix:a) for a list of considerations and criteria that were used in constructing the experimental stimulus set, and see the full stimulus set in Table \@ref(tab:targetlist).

(ref:targetlist-caption) (\#tab:targetlist) Experimental stimulus set: Onset cluster types in the experiment
(ref:targetlist-caption2) cor = coronal; lab = labial; * = voicing disagreement between obstruents; ** = no labial liquid; *** = dorsal stop /k/ (see list in Appendix \@ref(appendix:a))
\begin{table}[tbp]
\begin{center}
\begin{threeparttable}
\caption{(ref:targetlist-caption)}
\begin{tabular}{cclcclcclcclcclcclcclcclccl}
\toprule
\multicolumn{1}{r}{\textbf{C1}} & \multicolumn{2}{c}{\textbf{Voiceless}} & \multicolumn{2}{c}{\textbf{Voiced}} & \multicolumn{2}{c}{\textbf{Nasals}} & \multicolumn{2}{c}{\textbf{Liquids}}\\
\multicolumn{1}{l}{\textbf{}} & \multicolumn{2}{c}{\textbf{Fricatives}} & \multicolumn{2}{c}{\textbf{Fricatives}} & & & &\\
\multicolumn{1}{l}{\textbf{C2}} & \multicolumn{1}{c}{cor-lab} & \multicolumn{1}{c}{lab-cor} & \multicolumn{1}{c}{cor-lab} & \multicolumn{1}{c}{lab-cor} & \multicolumn{1}{c}{cor-lab} & \multicolumn{1}{c}{lab-cor} & \multicolumn{1}{c}{cor-lab} & \multicolumn{1}{c}{lab-cor}\\
\midrule
\multicolumn{1}{l}{\textbf{Voiceless}} & \multicolumn{1}{c}{\textbf{sp}, \textbf{ʃp}} & \multicolumn{1}{c}{\textbf{ft}} & \multicolumn{1}{c}{*} & \multicolumn{1}{c}{*} & \multicolumn{1}{c}{\textbf{np}} & \multicolumn{1}{c}{\textbf{mt}} & \multicolumn{1}{c}{\textbf{lp}} & \multicolumn{1}{c}{\textbf{lk}***}\\
\multicolumn{1}{l}{\textbf{Stops}} & & & & & & & &\\
\multicolumn{1}{l}{\textbf{}} & & & & & & & &\\
\multicolumn{1}{l}{\textbf{Voiceless}} & \multicolumn{1}{c}{\textbf{sf}, \textbf{ʃf}} & \multicolumn{1}{c}{\textbf{fs}} & \multicolumn{1}{c}{*} & \multicolumn{1}{c}{*} & \multicolumn{1}{c}{\textbf{nf}} & \multicolumn{1}{c}{\textbf{ms}} & \multicolumn{1}{c}{\textbf{lf}} & \multicolumn{1}{c}{**}\\
\multicolumn{1}{l}{\textbf{Fricatives}} & & & & & & & &\\
\multicolumn{1}{l}{\textbf{}} & & & & & & & &\\
\multicolumn{1}{l}{\textbf{Voiced}} & \multicolumn{1}{c}{*} & \multicolumn{1}{c}{*} & \multicolumn{1}{c}{\textbf{zv}} & \multicolumn{1}{c}{\textbf{vz}} & \multicolumn{1}{c}{\textbf{nv}} & \multicolumn{1}{c}{\textbf{mz}} & \multicolumn{1}{c}{\textbf{lv}} & \multicolumn{1}{c}{**}\\
\multicolumn{1}{l}{\textbf{Fricatives}} & & & & & & & &\\
\multicolumn{1}{l}{\textbf{}} & & & & & & & &\\
\multicolumn{1}{l}{\textbf{Nasals}} & \multicolumn{1}{c}{\textbf{sm}, \textbf{ʃm}} & \multicolumn{1}{c}{\textbf{fn}} & \multicolumn{1}{c}{\textbf{zm}} & \multicolumn{1}{c}{\textbf{vn}} & \multicolumn{1}{c}{\textbf{nm}} & \multicolumn{1}{c}{\textbf{mn}} & \multicolumn{1}{c}{\textbf{lm}} & \multicolumn{1}{c}{**}\\
& & & & & & & &\\
& & & & & & & &\\
\multicolumn{1}{l}{\textbf{Liquids}} & \multicolumn{1}{c}{**} & \multicolumn{1}{c}{\textbf{fl}} & \multicolumn{1}{c}{**} & \multicolumn{1}{c}{\textbf{vl}} & \multicolumn{1}{c}{**} & \multicolumn{1}{c}{\textbf{ml}} & \multicolumn{1}{c}{**} & \multicolumn{1}{c}{**}\\
\bottomrule
\addlinespace
\end{tabular}
\begin{tablenotes}[para]
\normalsize{\textit{Note.} (ref:targetlist-caption2)}
\end{tablenotes}
\end{threeparttable}
\end{center}
\end{table}

Table \@ref(tab:targetlist) presents the 29 CC types in the experimental set, reflecting 16 different combinations of manner classes (16 unique cells in Table \@ref(tab:targetlist), irrespective of differences in place of articulation). Of the 16 cluster types, 7--8 are considered onset falls, 3--4 are considered onset plateaus (11 total), and 5 are considered onset rises.[^cf-plateaufall]

[^cf-plateaufall]: Depending on whether fricatives are considered higher or similar in sonority to stops, clusters of the type *fricative-stop* may be considered as either an onset fall or an onset plateau.

Of the 29 different clusters, only 3 clusters regularly occur in German words (/ʃp, ʃm, fl/), while 6 clusters are attested to some degree in German loanwords [/sp, sf, sm, vl, zv, ml/; see @van2012sonority],
and one cluster (/ʃf/) may be considered as similar to German licit clusters with a voiced obstruent following a voiceless one (i.e., /ʃv/ and /t͡sv/).
Thus, the experimental set contains 19 clusters that are unattested in German words. These unattested clusters appear in 13 of the 16 unique cluster types. 
The other three are rising sonority clusters with a liquid in C~2~, /fl, vl, ml/, that are attested in German complex onsets to some degree, yet only marginally so in the case of /vl/ and /ml/.

More clusters out of the 29 different cluster types in Table \@ref(tab:targetlist) occur regularly in Modern Hebrew [see @asherov2019syllablesk]. These include all of the eight sibilant-initial clusters, /sp, ʃp, sf ʃf, sm, ʃm, zm, zv/, and the liquid-second clusters, /fl, vl/. The voiceless cluster /ft/ and the /m/-initial clusters /ml, mn/ are marginally attested in Modern Hebrew [@asherov2019syllablesk 75, 86].
Thus, the experimental set contains 16 cluster types that are unattested in Hebrew words.
These unattested CC types appear in 12 of the 16 unique combinations in Table \@ref(tab:targetlist), excluding the three rising sonority clusters with a liquid in C~2~ (e.g., /fl, vl, ml/), and the *fricative-stop* clusters (including /ft/), although note that /ft/ and /ml/ are only marginally attested in Hebrew complex onsets.

The different CC sequences were embedded within a /CCal/ word-like frame, with a recurring *-al* rime. These /CCal/ tokens were produced with a single vowel, intended to yield monosyllabic items that resemble typical content words [i.e., prosodically heavier than a single light syllable; see, e.g., @demuth1996prosodic]. Two disyllabic counterparts were prepared for each CC type---one with an epenthetic vowel, /CəCal/, and another with a prothetic vowel, /əCCal/ (a more accurate annotation should be /(ʔ)əCCal/, given that the presence of an initial glottal stop was not controlled for).
Note that the schwa in the stimulus set recorded by speaker AA was produced as a weak (unstressed) /e/ vowel from the 5-vowel inventory of Modern Hebrew, while in the stimulus set recorded by speaker HN it was produced as a typical German schwa.
The entire word set eventually includes 29 single-vowel target types and 58 associated bi-vocalic filler types, adding up to 87 different word-like stimuli.

##		Predictions {#sec:predictions}

(ref:OrdinalTargetPreds-caption) (\#tab:OrdinalTargetPreds) Well-formedness scores for the 29 experimental items using the five ordinal models that are based on symbolic phonemes: SSP~col/exp~, MSD~col/exp~, and NAP~td~
(ref:OrdinalTargetPreds-caption2) Higher values predict better-formed onset clusters in an ordinal scale (i.e., magnitude of differences between values cannot be inferred from these models).
\begin{table}[tbp]
\begin{center}
\begin{threeparttable}
\caption{(ref:OrdinalTargetPreds-caption)}
\begin{tabular}{cclcclcclcclcclccl}
\toprule
\multicolumn{1}{l}{Onset cluster types} & \multicolumn{1}{l}{\textbf{\emph{SSP\textsubscript{col}}}} & \multicolumn{1}{l}{\textbf{\emph{SSP\textsubscript{exp}}}} & \multicolumn{1}{l}{\textbf{\emph{MSD\textsubscript{col}}}} & \multicolumn{1}{l}{\textbf{\emph{MSD\textsubscript{exp}}}} & \multicolumn{1}{c}{\textbf{\emph{NAP\textsubscript{td}}}}\\
\midrule
\multicolumn{1}{l}{\textbf{fl}} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{2 (\emph{rise})} & \multicolumn{1}{l}{4 (\emph{rise})} & \multicolumn{1}{c}{5}\\

\multicolumn{1}{l}{\textbf{sm}, \textbf{ʃm}, \textbf{fn}} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{3 (\emph{rise})} & \multicolumn{1}{c}{5}\\

\multicolumn{1}{l}{\textbf{vl}} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{2 (\emph{rise})} & \multicolumn{1}{l}{2 (\emph{rise})} & \multicolumn{1}{c}{3}\\

\multicolumn{1}{l}{\textbf{zm}, \textbf{vn}} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{c}{3}\\

\multicolumn{1}{l}{\textbf{ml}} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{l}{1 (\emph{rise})} & \multicolumn{1}{c}{1}\\

\multicolumn{1}{l}{\textbf{sf}, \textbf{ʃf}, \textbf{fs}} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{c}{3}\\

\multicolumn{1}{l}{\textbf{zv}, \textbf{vz}} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{c}{2}\\

\multicolumn{1}{l}{\textbf{nm}, \textbf{mn}} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{c}{1}\\

\multicolumn{1}{l}{\textbf{sp}, \textbf{ʃp}, \textbf{ft}} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{l}{0 (\emph{plateau})} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{c}{3}\\

\multicolumn{1}{l}{\textbf{lm}} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{c}{1}\\

\multicolumn{1}{l}{\textbf{mz}, \textbf{nv}, \textbf{lv}} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{c}{0}\\

\multicolumn{1}{l}{\textbf{ms}, \textbf{nf}, \textbf{np}, \textbf{mt}, \textbf{lf}, \textbf{lp}, \textbf{lk}} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{l}{-1 (\emph{fall})} & \multicolumn{1}{c}{-1}\\
\bottomrule
\addlinespace
\end{tabular}
\begin{tablenotes}[para]
\normalsize{\textit{Note.} (ref:OrdinalTargetPreds-caption2)}
\end{tablenotes}
\end{threeparttable}
\end{center}
\end{table}

The full set of predictions for the 29 experimental targets is presented for all the symbol-based ordinal models (2$\times$SSP, 2$\times$MSD, and *NAP~td~*) in Table \@ref(tab:OrdinalTargetPreds), and for the signal-based continuous model (*NAP~bu~*) in Figures \@ref(fig:com-monosyl)--\@ref(fig:com-monosylHeb). Note that the scores of *NAP~bu~* are presented on a continuous ratio scale, with specific predictions for each token and consequential intervals between scores. The scores in *NAP~bu~* are not a generalization (nor are they based on averages). Rather, they were extracted from the specific token recordings, and they are expected to vary to some extent when measuring different tokens.
We present the *NAP~bu~* scores for the two sets of stimuli used in the experiments: a set spoken by a native Hebrew speaker (Figure \@ref(fig:com-monosyl)) and a set spoken by a native German speaker (Figure \@ref(fig:com-monosylHeb)).

(ref:com-monosyl) AA set (Hebrew speaker). Well-formedness scores in the continuous *NAP~bu~* model shown in terms of the distance between the center of mass of the entire syllable, *CoM~syllable~* (red vertical lines), and the center of mass of the left portion, *CoM~onset~* (blue vertical lines). Measured periodic energy of each recorded token is represented by the black curve. Grey dotted vertical lines and annotated text denote segmental intervals by manual segmentation (for exposition purposes only). Items are ordered by score (from worse- to better-formed), going from left-to-right and from top-to-bottom.
```{r com-monosyl, fig.cap = "(ref:com-monosyl)", fig.width=7, fig.asp=.8, warning=FALSE, dev="cairo_pdf"}
# pdf.options(encoding = 'ISOLatin2')
ordered_syl_abs <- unique(monosyl_info_AA[order(monosyl_info_AA$NAP_bu),]$syl)
monosyl_info_AA$syl <- factor(monosyl_info_AA$syl, levels=ordered_syl_abs)
monosyl_plot_abs <- 
  ggplot(monosyl_info_AA, aes(x=t)) + ylim(-5.5,20) +
  xlab("Time (ms)") + ylab("Periodic energy (normalized scale)") +
  geom_line(aes(y=smog_per), color="black", alpha=1, size=1) +
  #
  geom_segment(aes(x=x_com_ons, xend=x_com_ons, y=-2, yend=13), color="royalblue1", size=1.5, alpha=.7, linetype = "solid", lineend = "round") +
  geom_segment(aes(x=x_com_syl, xend=x_com_syl, y=-2, yend=13),color="red", size=1.5, alpha=.6, linetype = "solid", lineend = "round") +
  geom_segment(aes(x=x_com_syl, xend=com_onset, y=-2, yend=-2), color="black", size=.5, alpha=1, linetype = "solid", arrow = arrow(length = unit(0.07, "npc"))) +
  # geom_text(aes(x=(x_com_syl+com_onset)/2, y=-5, label=paste0(round(NAP_bu)," ms")), size=3, check_overlap=T) + 
  geom_text(aes(x=(x_com_syl+com_onset)/2, y=-5, label=paste0(round(NAP_bu)," ms")), size=3, family = "Charis SIL", check_overlap=T) +
  #
  geom_segment(aes(x=pos_end, xend=pos_end, y=0, yend=20), color="grey", size=.5, alpha=.2, linetype = "dotted") +
  # geom_text(aes(x=pos_mid,y=17.5,label=text), size=5, check_overlap=T) +
  geom_text(aes(x=pos_mid,y=17.5,label=text), size=5, family = "Charis SIL", check_overlap=T) + #family = "Charis SIL",
  #
  facet_wrap(~syl, ncol=5) +
  theme(text = element_text(family = "Charis SIL"), panel.background = element_blank(), axis.title = element_text(size = 12), axis.text.y = element_blank(), axis.ticks = element_blank(), axis.text.x = element_text(size = 8), strip.text = element_blank())
  # theme(panel.background = element_blank(), axis.title = element_text(size = 12, family="sans"), axis.text.y = element_blank(), axis.ticks = element_blank(), axis.text.x = element_text(size = 8, family="sans"), strip.text = element_blank())
print(monosyl_plot_abs)

```

(ref:com-monosylHeb) HN set (German speaker). See Figure \@ref(fig:com-monosyl) above for plot details.
```{r com-monosylHeb, fig.cap = "(ref:com-monosylHeb)", fig.width=7, fig.asp=.8, warning=FALSE, dev="cairo_pdf"}
# pdf.options(encoding = 'ISOLatin2')
ordered_syl_abs <- unique(monosyl_info_HN[order(monosyl_info_HN$NAP_bu),]$syl)
monosyl_info_HN$syl <- factor(monosyl_info_HN$syl, levels=ordered_syl_abs)
monosyl_plot_abs <- 
  ggplot(monosyl_info_HN, aes(x=t)) + ylim(-5.5,20) +
  xlab("Time (ms)") + ylab("Periodic energy (normalized scale)") +
  geom_line(aes(y=smog_per), color="black", alpha=1, size=1) +
  #
  geom_segment(aes(x=x_com_ons, xend=x_com_ons, y=-2, yend=13), color="royalblue1", size=1.5, alpha=.7, linetype = "solid", lineend = "round") +
  geom_segment(aes(x=x_com_syl, xend=x_com_syl, y=-2, yend=13),color="red", size=1.5, alpha=.6, linetype = "solid", lineend = "round") +
  geom_segment(aes(x=x_com_syl, xend=com_onset, y=-2, yend=-2), color="black", size=.5, alpha=1, linetype = "solid", arrow = arrow(length = unit(0.07, "npc"))) +
  # geom_text(aes(x=(x_com_syl+com_onset)/2, y=-5, label=paste0(round(NAP_bu)," ms")), size=3, check_overlap=T) + 
  geom_text(aes(x=(x_com_syl+com_onset)/2, y=-5, label=paste0(round(NAP_bu)," ms")), size=3, family = "Charis SIL", check_overlap=T) +
  #
  geom_segment(aes(x=pos_end, xend=pos_end, y=0, yend=20), color="grey", size=.5, alpha=.2, linetype = "dotted") +
  # geom_text(aes(x=pos_mid,y=17.5,label=text), size=5, check_overlap=T) +
  geom_text(aes(x=pos_mid,y=17.5,label=text), size=5, family = "Charis SIL", check_overlap=T) + #family = "Charis SIL",
  #
  facet_wrap(~syl, ncol=5) +
  theme(text = element_text(family = "Charis SIL"), panel.background = element_blank(), axis.title = element_text(size = 12), axis.text.y = element_blank(), axis.ticks = element_blank(), axis.text.x = element_text(size = 8), strip.text = element_blank())
  # theme(panel.background = element_blank(), axis.title = element_text(size = 12, family="sans"), axis.text.y = element_blank(), axis.ticks = element_blank(), axis.text.x = element_text(size = 8, family="sans"), strip.text = element_blank())
print(monosyl_plot_abs)

```

##  Designs {#sec:designs}

The details in the following analyses address three experiments: 
*Experiment 1*, an exploratory pilot experiment with 12 German-speaking subjects responding to stimulus set AA (Hebrew speaker); *Experiment 2*, a confirmatory experiment with 51 German-speaking subjects responding to stimulus set AA; and *Experiment 3*, a confirmatory experiment with 33 Hebrew-speaking subjects responding to stimulus set HN (German speaker).

Given the various novelties in this proposal, the methodologies for data collection, data extraction, and model implementation were first tested on a small body of real data that we collected before finalizing our methodologies (namely, the model implementations in Section \@ref(sec:modelimp) and the various procedural details in Appendix \@ref(appendix:a)).
We used this exploratory study to test our methodologies and to explore the possibilities for properly estimating nucleus competition in each of the NAP models. 
We also used the exploratory pilot study to verify that the number of participants is large enough with respect to the size of expected effects. With 12 participants, we could already observe clear effects (see Subsection \@ref(sec:results)). To be confident that we have enough power to compare the models, we aimed at 50 participants in the confirmatory studies (note that this goal was partially reached in Experiment 3 due to the COVID-19 pandemic).

The exploratory pilot study was conducted in two versions, each with half of the fillers and all of the targets in one block, yielding a total of 58 data points per subject (29 fillers + 29 targets, no repetitions).
The two different versions were evenly split between participants (each version was presented to six participants).

Experiments 2 and 3 are the main confirmatory studies conducted after finalizing our hypotheses and methodologies with the data from Experiment 1. 
The difference between Experiments 2 and 3 concerns the native language of the subjects, and, as a consequence, the stimulus set in use. 
Experiment 2 tested German-speaking subjects on stimulus set AA, featuring a Hebrew speaker, while Experiment 3 tested Hebrew-speaking subjects on stimulus set HN, featuring a German speaker (see explanation in Subsection \@ref(sec:rationale)). 
Each experimental block in Experiments 2-3 consisted of two repetitions of the target words (2 $\times$ 29 $=$ 58) and one trial of each filler word (1 $\times$ 58). The experiment consisted of two blocks with randomized trials, generating altogether four repetitions of the target words (4 $\times$ 29 $=$ 116) and two repetitions of the filler words (2 $\times$ 58 $=$ 116), yielding a total of 232 data points per subject. 

##  Participants {#sec:participants}

```{r participantsPil-tests, fig.show="hide"}

pilot_mean_age <- round(mean(subjects_pilot$subject_age))
pilot_min_age <- round(min(subjects_pilot$subject_age))
pilot_max_age <- round(max(subjects_pilot$subject_age))
pilot_N <- length(subjects_pilot$subject_nr)
pilot_male <- length(which(subjects_pilot$subject_gender=="male"))

```


```{r participantsHeb-tests, fig.show="hide"}

ggplot(subjects_heb, aes(x="", fill=subject_age)) +
  geom_bar(stat = "count") + #color="black",
  geom_text(aes(label = subject_age), size = 3, stat = "count", position = position_stack(vjust = 0.5)) +
  guides(fill = guide_legend(title = "Age")) +
  coord_polar("y", start = 0) +
  scale_fill_brewer(palette = "Blues") +
  ggtitle("Age", subtitle = "Experiment 3 (N = 33)") +
  theme(legend.position="none", panel.background = element_blank(), axis.title = element_blank(), axis.text = element_blank(), panel.border = element_blank(), panel.grid = element_blank(), axis.ticks = element_blank(), plot.title = element_text(size=14, face="bold")
        )

# which(subjects_heb$subject_age == subject_age)

```


### Experiment 1


The exploratory pilot study consisted of 12 subjects (two males and ten females), all native German-speaking students from the Cologne University of Applied Sciences, who volunteered to participate in the study. The experiment was administered in a quiet room at one of the campus buildings of the institution. The mean age of participants in the pilot study was 25 (21--30 range).

### Experiment 2

(ref:participantsGer) Participants in Experiment 2 (n = 51). Education categories refer to academic achievements ("school" = academic degree not yet acquired).
```{r participantsGer, fig.cap = "(ref:participantsGer)", fig.show="hold", fig.align = "default", out.width=c("33%","33%","33%"), fig.width=c(2.3,2.3,2.3)}

ggplot(subjects_main, aes(x="", fill=subject_age)) + 
  geom_bar(stat = "count") + #color="black", 
  geom_text(aes(label = subject_age), size = 4, stat = "count", position = position_stack(vjust = 0.5)) +
  guides(fill = guide_legend(title = "Age")) +
  coord_polar("y", start = 0.4, direction = -1) + 
  scale_fill_brewer(palette = "Blues") +
  ggtitle("Age") +#, subtitle = "Experiment 2 (N = 51)") +
  theme(legend.position="none", panel.background = element_blank(), axis.title = element_blank(), axis.text = element_blank(), panel.border = element_blank(), panel.grid = element_blank(), axis.ticks = element_blank(), plot.title = element_text(size=14, face="bold"), plot.subtitle = element_text(size=10)
        )

ggplot(subjects_main, aes(x="", fill=subject_gender)) +
  geom_bar(stat = "count") + #color="black",
  geom_text(aes(label = subject_gender), size = 4, stat = "count", position = position_stack(vjust = 0.5)) +
  guides(fill = guide_legend(title = "Gender")) +
  coord_polar("y", start = 0) +
  scale_fill_brewer(palette = "Greens") +
  ggtitle("Gender") +#, subtitle = "Experiment 2 (N = 51)") +
  theme(legend.position="none", panel.background = element_blank(), axis.title = element_blank(), axis.text = element_blank(), panel.border = element_blank(), panel.grid = element_blank(), axis.ticks = element_blank(), plot.title = element_text(size=14, face="bold"), plot.subtitle = element_text(size=10)
        )

ggplot(subjects_main, aes(x="", fill=subject_education)) +
  geom_bar(stat = "count") + #color="black",
  geom_text(aes(label = subject_education), size = 4, stat = "count", position = position_stack(vjust = 0.5)) +
  guides(fill = guide_legend(title = "Education")) +
  coord_polar("y", start = 0.2) +
  scale_fill_brewer(palette = "Oranges") +
  ggtitle("Education") +#, subtitle = "Experiment 2 (N = 51)") +
  theme(legend.position="none", panel.background = element_blank(), axis.title = element_blank(), axis.text = element_blank(), panel.border = element_blank(), panel.grid = element_blank(), axis.ticks = element_blank(), plot.title = element_text(size=14, face="bold"), plot.subtitle = element_text(size=10)
        )


```

Fifty-one native German speakers (who did not participate in the exploratory pilot study) participated in Experiment 2, of which `r length(which(subjects_main$subject_german_native=="monolingual"))` were monolingual (the `r length(which(subjects_main$subject_german_native=="multilingual"))` bilingual speakers had either Polish, Low German, or Hebrew as their heritage language). `r length(which(subjects_main$subject_handedness =="right-handed"))` participants were right-handed. See more details on age, gender and education of participants in Figure \@ref(fig:participantsGer).

Of the 51 participants, 34 were students of the University of Cologne who took part in the experiment at the sound-attenuated booth of the phonetics laboratory of the university. The other 17 participants took part in the experiment at three different locations---all small quiet rooms within private apartments. All subjects were paid five Euros for their participation. 

We excluded the responses from one participant who failed in our participant inclusion criterion requiring accuracy of at least 75% with bi-vocalic fillers. The bi-vocalic fillers of the forms /CəCal/ and /əCCal/ link correct responses to the disyllabic choice ($'2'$), and we expect relatively few monosyllabic choices ($'1'$) in response to stimuli with two separate vowels. 
Indeed, the overall average accuracy of all 51 participants, when responding to bi-vocalic filler stimuli, was 96%. The excluded participant achieved a much lower accuracy score for bi-vocalic fillers, almost approaching chance-level with 65%.

### Experiment 3 {#sec:participants3}

(ref:participantsHeb) Participants in Experiment 3 (n = 33). Education categories refer to academic achievements ("school" = academic degree not yet acquired).
```{r participantsHeb, fig.cap = "(ref:participantsHeb)", fig.show="hold", fig.align = "default", out.width=c("33%","33%","33%"), fig.width=c(2.3,2.3,2.3)}

ggplot(subjects_heb, aes(x="", fill=subject_age)) + 
  geom_bar(stat = "count") + #color="black", 
  geom_text(aes(label = subject_age), size = 4, stat = "count", position = position_stack(vjust = 0.5)) +
  guides(fill = guide_legend(title = "Age")) +
  coord_polar("y", start = 0, direction = -1) + 
  scale_fill_brewer(palette = "Blues") +
  ggtitle("Age") + #, subtitle = "Experiment 3 (N = 33)") +
  theme(legend.position="none", panel.background = element_blank(), axis.title = element_blank(), axis.text = element_blank(), panel.border = element_blank(), panel.grid = element_blank(), axis.ticks = element_blank(), plot.title = element_text(size=14, face="bold"), plot.subtitle = element_text(size=10)
        )

ggplot(subjects_heb, aes(x="", fill=subject_gender)) +
  geom_bar(stat = "count") + #color="black",
  geom_text(aes(label = subject_gender), size = 4, stat = "count", position = position_stack(vjust = 0.5)) +
  guides(fill = guide_legend(title = "Gender")) +
  coord_polar("y", start = 0) +
  scale_fill_brewer(palette = "Greens") +
  ggtitle("Gender") + #, subtitle = "Experiment 3 (N = 33)") +
  theme(legend.position="none", panel.background = element_blank(), axis.title = element_blank(), axis.text = element_blank(), panel.border = element_blank(), panel.grid = element_blank(), axis.ticks = element_blank(), plot.title = element_text(size=14, face="bold"), plot.subtitle = element_text(size=10)
        )

ggplot(subjects_heb, aes(x="", fill=subject_education)) +
  geom_bar(stat = "count") + #color="black",
  geom_text(aes(label = subject_education), size = 4, stat = "count", position = position_stack(vjust = 0.5)) +
  guides(fill = guide_legend(title = "Education")) +
  coord_polar("y", start = 0.5) +
  scale_fill_brewer(palette = "Oranges") +
  ggtitle("Education") + #, subtitle = "Experiment 3 (N = 33)") +
  theme(legend.position="none", panel.background = element_blank(), axis.title = element_blank(), axis.text = element_blank(), panel.border = element_blank(), panel.grid = element_blank(), axis.ticks = element_blank(), plot.title = element_text(size=14, face="bold"), plot.subtitle = element_text(size=10)
        )

```

Thirty-three native Hebrew speakers participated in Experiment 3, of which `r length(which(subjects_heb$subject_hebrew_native=="monolingual"))` were monolingual (the five bilinguals were also native speakers of either English, Russian or Spanish). `r length(which(subjects_heb$subject_handedness =="right-handed"))` participants were right-handed. See more details on age, gender and education of participants in Figure \@ref(fig:participantsHeb).

The data collection in Experiment 3 was more diverse, and, perhaps therefore also more "noisy" than in Experiment 2.
The first round of data collection took place in 2019 with student volunteers from Tel Aviv University and The Hebrew University of Jerusalem. The second round of data collection took place during the early phases of the global COVID-19 pandemic, which resulted in fewer overall participants and the use of different ad-hoc and suboptimal locations to administer the experiment.

## Data Analysis {#sec:datanlysis}

We used a Bayesian data analysis approach implemented in the probabilistic programming language  *Stan* [@Stan2018] using the model wrapper package *brms* [@R-brms_a; @R-brms_b] in *R* [@R-base].^[The complete list of *R* packages and versions that we used is: `r cite_r("bibs/r-references.bib")`.] An important motivation for using the Bayesian approach is that it facilitates fitting fully hierarchical models with the so-called "maximal random effect structure", which provide the most conservative estimates of uncertainty [@SchielzethForstmeier2009]. In all our models, we used regularizing priors (detailed below). These priors are minimally informative and have the objective of yielding stable inferences [@chung2013weakly; @gelman2008weakly; @GelmanEtAl2017]. @NicenboimVasishth2016 and @VasishthEtAl2017EDAPS discuss the Bayesian approach in detail in the context of psycholinguistics and phonetics. We fitted the models with four chains and 4000 iterations each, of which 1000 iterations were the warm-up phase. In order to assess convergence, we verified that there were no divergent transitions, that all the $\hat{R}$  (the between- to within-chain variances) were close to one,  that the number of effective sample size was at least 10\% of the number of post-warmup samples, and visually inspected the chains.

For the statistical models, we took into account that the traditional sonority models and the top-down version of NAP (i.e., *SSP~col~* , *SSP~exp~*, *MSD~col~*, *MSD~exp~*, and *NAP~td~*) are ordinal models, while the bottom-up version of NAP (*NAP~bu~*) is a continuous model. The ordinal models predict that certain groups of onset clusters will be better or worse-formed than the other group depending on an ordinal score, but they do not assume that the score will be equidistant with respect to its effect on the response variable, log-transformed response times. For this reason, the discrete scores of these models are assumed to have a monotonic effect on the log-response time in our task, that is, having a monotonically increasing or decreasing relationship with the log-response time, while  the distance between groups is estimated from the data [@burknerModelingMonotonicEffects2018]. 

In contrast, *NAP~bu~* provides scores on a *ratio scale*, in which the distance between scores is also taken to be informative (as opposed to the *ordinal* scales of the other models). *NAP~bu~* is modeled with a continuous predictor which is assumed to have a linear relationship with the log-response times. Finally, as a baseline, we fitted a "null" model which assumes no relationship between the stimuli and the response times. 

All the models included a random intercept and slope by subjects (except for the null model that included only a random intercept) and the following weakly regularizing priors: $Normal(6, 2)$ for the intercept, $Normal(0, 1)$ for the slope, $Normal_+(0,1)$ for the variance components, and $lkj(2)$ for the correlation between by-participant adjustments. The ordinal models also have a Dirichlet prior for the simplex vector that represents the distance between the categories set to one for each of its parameters. 

We evaluated the models in three different ways: (i) estimation; (ii) descriptive adequacy; and (iii) model comparison.  

### (i) Estimation {-}
We report mean estimates and 95\% quantile-based Bayesian credible intervals. A 95\% Bayesian credible interval is interpreted such that it contains the true value with 95% probability given the data and the model [see, e.g., @Jaynes1976; @MoreyEtAl2015].  

### (ii) Descriptive adequacy {-}
We used posterior predictive checks to examine the descriptive adequacy or "fit" of the models [@shiffrinSurveyModelEvaluation2008]. The observed data should look plausible under the posterior predictive distribution of the models. The posterior predictive distribution of each model is composed of simulated datasets generated based on the posterior distributions of its parameters. Given the posterior of the parameters of the model, the posterior predictive distribution shows how similar data may look. Achieving descriptive adequacy means that the current data could have been predicted with the model. It is important to notice that a good fit, that is, passing a test of descriptive adequacy, is not strong evidence in favor of a model. In contrast, a major failure in descriptive adequacy can be interpreted as strong evidence against a model [@shiffrinSurveyModelEvaluation2008]. Thus, we use posterior predictive checks to assess whether the model behavior is reasonable and in which situations it is not [see @gelmanBayesianDataAnalysis2013 for further discussion].  

### (iii) Model comparison {-}
For model comparison, we examine the out-of-sample predictive accuracy of the different models using k-fold (k=15) cross-validation stratified by subjects.^[Pareto smoothed importance sampling approximation to leave-one-out cross-validation [implemented in the package `loo`; @vehtariPracticalBayesianModel2017; @vehtariParetoSmoothedImportance2015] failed to yield stable estimates.] Cross-validation evaluates the different models with respect to their predictive accuracy, that is, how well the models generalize to new data.

##  Results {#sec:results}

###  Estimations

```{r}
results_txt <- function(fitbrms, pred) {
    paste0(
        "$\\hat\\beta = ", signif(fixef(fitbrms)[pred, "Estimate"], 2),
        "\\text{, }95\\% \\text{ CrI } = [", signif(fixef(fitbrms)[pred, "Q2.5"], 2), ",", signif(fixef(fitbrms)[pred, "Q97.5"], 2), "]$"
    )
}

est_SSP_exp_pil <- results_txt(m_pilot$models$SSP, "moSSP")
est_MSD_exp_pil <- results_txt(m_pilot$models$MSD, "moMSD")
est_SSP_col_pil <- results_txt(m_pilot$models$SSP_obs, "moSSP_obs")
est_MSD_col_pil <- results_txt(m_pilot$models$MSD_obs, "moMSD_obs")
est_NAP_td_pil <- results_txt(m_pilot$models$NAP_td, "moNAP_td")
est_NAP_bu_pil <- results_txt(m_pilot$models$NAP_bu, "sNAP_bu")

```

```{r}
scores <- data_pilot %>% filter(!is.na(NAP_bu)) %>% 
  distinct(stimulus, NAP_bu) %>% 
  arrange(NAP_bu)

lpal_nap <- scores %>% filter(stimulus == "lpal") %>% 
  pull(NAP_bu) %>% round(0)
lkal_nap <- scores %>% filter(stimulus == "lkal") %>% 
  pull(NAP_bu)%>% round(0)
spal_nap <- scores %>% filter(stimulus == "spal") %>% 
  pull(NAP_bu)%>% round(0)

```

```{r}
results_txt <- function(fitbrms, pred) {
    paste0(
        "$\\hat\\beta = ", signif(fixef(fitbrms)[pred, "Estimate"], 2),
        "\\text{, }95\\% \\text{ CrI } = [", signif(fixef(fitbrms)[pred, "Q2.5"], 2), ",", signif(fixef(fitbrms)[pred, "Q97.5"], 2), "]$"
    )
}

est_SSP_exp <- results_txt(m_german$models$SSP, "moSSP")
est_MSD_exp <- results_txt(m_german$models$MSD, "moMSD")
est_SSP_col <- results_txt(m_german$models$SSP_obs, "moSSP_obs")
est_MSD_col <- results_txt(m_german$models$MSD_obs, "moMSD_obs")
est_NAP_td <- results_txt(m_german$models$NAP_td, "moNAP_td")
est_NAP_bu <- results_txt(m_german$models$NAP_bu, "sNAP_bu")

```


```{r}
estHeb_SSP_exp <- results_txt(m_hebrew$models$SSP, "moSSP")
estHeb_MSD_exp <- results_txt(m_hebrew$models$MSD, "moMSD")
estHeb_SSP_col <- results_txt(m_hebrew$models$SSP_obs, "moSSP_obs")
estHeb_MSD_col <- results_txt(m_hebrew$models$MSD_obs, "moMSD_obs")
estHeb_NAP_td <- results_txt(m_hebrew$models$NAP_td, "moNAP_td")
estHeb_NAP_bu <- results_txt(m_hebrew$models$NAP_bu, "sNAP_bu")
```

For all the models, the well-formedness score shows a clear effect on response times, with lower scores yielding longer log-transformed response times:

**Experiment 1**

* For *SSP~col~*: `r est_SSP_col_pil`.
* For *SSP~exp~*: `r est_SSP_exp_pil`.
* For *MSD~col~*: `r est_MSD_col_pil`.
* For *MSD~exp~*: `r est_MSD_exp_pil`.
* For *NAP~td~*: `r est_NAP_td_pil`.
* For *NAP~bu~*: `r est_NAP_bu_pil`.

**Experiment 2**

* For *SSP~col~*: `r est_SSP_col`.
* For *SSP~exp~*: `r est_SSP_exp`.
* For *MSD~col~*: `r est_MSD_col`.
* For *MSD~exp~*: `r est_MSD_exp`.
* For *NAP~td~*: `r est_NAP_td`.
* For *NAP~bu~*: `r est_NAP_bu`.

**Experiment 3**

* For *SSP~col~*: `r estHeb_SSP_col`.
* For *SSP~exp~*: `r estHeb_SSP_exp`.
* For *MSD~col~*: `r estHeb_MSD_col`.
* For *MSD~exp~*: `r estHeb_MSD_exp`.
* For *NAP~td~*: `r estHeb_NAP_td`.
* For *NAP~bu~*: `r estHeb_NAP_bu`.

Notice that the posterior of the effect of well-formedness, $\hat\beta$, is not comparable across models. For the ordinal models, it represents  the average increase (or decrease) in the dependent variable associated with two neighboring factor levels, or in other words, $\hat\beta$ multiplied by the number of categories minus one represents the increase in log-scale between the first and the last category. This means that it is highly affected by the number of categories. For the continuous bottom-up model, $NAP_{bu}$, $\beta$ represents, the increase in log-scale for one unit in the well-formedness scale. To give some concrete examples from set AA, there are `r lkal_nap -lpal_nap` units (since their NAP scores are `r lkal_nap` and `r lpal_nap`, respectively) between /lpal/ and /lkal/; and there are `r spal_nap  -lkal_nap` units between /lkal/ and /spal/ (`r spal_nap` and `r lkal_nap`, respectively). However, for all the models, $\hat\beta$ is negative, indicating that well-formedness is associated with faster responses. See Appendix \@ref(appendix:b) for the complete output of the models.

The results shown here reflect the final state of the models in the exploratory stage, which is the same as the state of the models in the confirmatory stage. Importantly, the results of the confirmatory studies, Experiments 2--3, which are statistically much more robust, remain consistent with those of Experiment 1, which had a relatively small number of observations. As such, Experiment 1 was not designed to distinguish between the models and it will not be considered in the further presentation of results. 

###  Descriptive adequacy

The model fits of the different models are shown in Figures \@ref(fig:NullFit)--\@ref(fig:NAPbuFit). The plots in these figures present the dispersion of the average response time results, depicted as red points for related CC clusters, vis-à-vis each models' predictions in the form of distributions, depicted with blue violins. The order of the stimuli, from left to right, follows from the models' scores such that predictions for better-formed clusters appear further to the right. Recall that scores in the *NAP~bu~* model yield slightly different predictions for each stimulus set (AA vs. HN).

#### Null models

The null models are shown in Figure \@ref(fig:NullFit) as baselines in the respective experiments (the order of stimuli along the x-axis follows the *NAP~bu~* scores, but in a forced ordinal scale, with equidistant intervals). The slight differences in predictions for different clusters are due to individual differences in the accuracy. Recall that we subset the response times conditional on the monosyllabic response ($'1'$) to the forced-choice task. This means that when participants give more monosyllabic answers for a specific cluster, their adjusted intercept will have a greater influence on the predictions of the model for that cluster. In addition, clusters with fewer monosyllabic responses show more variability in their predictions (e.g., /lf/ vs. /fl/ in the AA set, on the left side of Figure \@ref(fig:NullFit)).

(ref:NullFit) Null model fit. Observed mean log-transformed response times of accurate responses are depicted with red points, distribution of simulated means based on the null model are depicted with blue violins. 
```{r NullFit,fig.cap = "(ref:NullFit)", warning=FALSE, out.width=c("50%","50%"), fig.width=c(3.5,3.5), fig.asp=c(.5,.5), fig.show="hold", fig.align = "default", dev="cairo_pdf"}
# ```{r NullFit,fig.cap = "(ref:NullFit)", warning=FALSE, fig.width=7, fig.asp=.28, fig.show="hold", dev="cairo_pdf"}
predictions$null %>%
  mutate(NAP= round(NAP_bu,7)) %>%
    # bind_rows(tibble(NAP = seq(-220,-30,1)))  %>%
    bind_rows(tibble(pred = seq(30,220,200)))  %>%
    arrange(NAP) %>%
    ggplot(aes(x = factor(NAP), y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data= data_RT, aes(y = corrRT, x= factor(NAP)), color = "red3", alpha = .5, size =.8) +
    geom_text_repel(data= data_RT, aes(y = corrRT, x=factor(NAP), label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) + 
    scale_x_discrete("Stimuli (AA set: Experiment 2)", breaks=NULL) +
    # scale_x_discrete(bquote(italic(NAP[bu])~scores~(ordinal)), breaks=NULL) +
    ylab("Response Time (ms)") +
    ylim(600,1200) +
    coord_trans(y="log") +
  theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_text(size=8, family = "Charis SIL"), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_text(size=10, family = "Charis SIL"), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
    # ggtitle("Null model fit")

predictions_heb$null %>%
  mutate(NAP= round(NAP_bu,7)) %>%
    # bind_rows(tibble(NAP = seq(-220,-30,1)))  %>%
    bind_rows(tibble(pred = seq(30,220,200)))  %>%
    arrange(NAP) %>%
    ggplot(aes(x = factor(NAP), y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data= data_RT_heb, aes(y = corrRT, x= factor(NAP)), color = "red3", alpha = .5, size =.8) +
    geom_text_repel(data= data_RT_heb, aes(y = corrRT, x=factor(NAP), label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) + 
    scale_x_discrete("Stimuli (HN set: Experiment 3)", breaks=NULL) +
    # scale_x_discrete(bquote(italic(NAP[bu])~scores~(ordinal)), breaks=NULL) +
    ylab("RT (Experiment 3)") +
    ylim(600,1200) +
    coord_trans(y="log") +
  theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_blank(), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_blank(), 
              axis.ticks.x = element_blank(), plot.title = element_blank())


```

#### SSP and MSD models {#sec:traditionalModelfit}

We consider a good fit in the case of the ordinal models to be roughly characterized by the following three criteria:
(i) the data is contained within the predictions, i.e., the red points appear within the respective violins;
(ii) the data is consistent within each predicted level, i.e., the vertical dispersion of red points pattern together around the same area within each level (preferably in the middle of the distribution); and 
(iii) the model predictors are not redundant, i.e., the violins of the different model levels show little overlap between them.

(ref:SSPcolFit) *SSP~col~* model fit. Stimuli ordered from left to right according to their score in the model in ascending well-formedness (other details are the same as above).
```{r SSPcolFit,fig.cap = "(ref:SSPcolFit)", warning=FALSE, out.width=c("50%","50%"), fig.width=c(3.5,3.5), fig.asp=c(.5,.5), fig.show="hold", fig.align = "default", dev="cairo_pdf"}

predictions$SSP_obs %>%
  left_join(data_RT) %>%
    ungroup() %>%
  filter(!is.na(SSP_obs)) %>%
    arrange(SSP_obs) %>%
    mutate(SSP_obs = factor(SSP_obs, levels= unique(SSP_obs)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = SSP_obs, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_text_repel(data=distinct(.,corrRT,stimulus,SSP_obs), aes(y = corrRT, label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    geom_point(data=distinct(.,corrRT,stimulus,SSP_obs), aes(y = corrRT), color = "red3", alpha = .5, size =.8) +
    xlab(bquote(italic(SSP[col])~scores~(Experiment~2))) + ylab("Response Time (ms)") +
        ylim(600,1200) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_text(size=8, family = "Charis SIL"), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_text(size=10, family = "Charis SIL"), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
        # ggtitle(bquote(italic(SSP[col])~fit))
   }

predictions_heb$SSP_obs %>%
  left_join(data_RT_heb) %>%
    ungroup() %>%
  filter(!is.na(SSP_obs)) %>%
    arrange(SSP_obs) %>%
    mutate(SSP_obs = factor(SSP_obs, levels= unique(SSP_obs)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = SSP_obs, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_text_repel(data=distinct(.,corrRT,stimulus,SSP_obs), aes(y = corrRT, label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    geom_point(data=distinct(.,corrRT,stimulus,SSP_obs), aes(y = corrRT), color = "red3", alpha = .5, size =.8) +
    xlab(bquote(italic(SSP[col])~scores~(Experiment~3))) + ylab("RT (Experiment 3)") +
        ylim(600,1200) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_blank(), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_blank(), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
        # ggtitle(bquote(italic(SSP[col])~fit))
   }

```

(ref:SSPexpFit) *SSP~exp~* model fit (plot details are the same as above).
```{r SSPexpFit,fig.cap = "(ref:SSPexpFit)", warning=FALSE, out.width=c("50%","50%"), fig.width=c(3.5,3.5), fig.asp=c(.5,.5), fig.show="hold", fig.align = "default", dev="cairo_pdf"}

predictions$SSP %>%
  left_join(data_RT) %>%
    ungroup() %>%
  filter(!is.na(SSP)) %>%
    arrange(SSP) %>%
    mutate(SSP = factor(SSP, levels= unique(SSP)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = SSP, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_text_repel(data=distinct(.,corrRT,stimulus,SSP), aes(y = corrRT, label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    geom_point(data=distinct(.,corrRT,stimulus,SSP), aes(y = corrRT), color = "red3", size =.8, alpha = .5) +
    xlab(bquote(italic(SSP[exp])~scores~(Experiment~2))) +
       ylab("Response Time (ms)") +
        ylim(600,1200) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_text(size=8, family = "Charis SIL"), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_text(size=10, family = "Charis SIL"), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
        # ggtitle(bquote(italic(SSP[exp])~fit))
   }

predictions_heb$SSP %>%
  left_join(data_RT_heb) %>%
    ungroup() %>%
  filter(!is.na(SSP)) %>%
    arrange(SSP) %>%
    mutate(SSP = factor(SSP, levels= unique(SSP)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = SSP, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_text_repel(data=distinct(.,corrRT,stimulus,SSP), aes(y = corrRT, label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    geom_point(data=distinct(.,corrRT,stimulus,SSP), aes(y = corrRT), color = "red3", size =.8, alpha = .5) +
    xlab(bquote(italic(SSP[exp])~scores~(Experiment~3))) +
       ylab("RT ") +
        ylim(600,1200) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_blank(), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_blank(), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
        # ggtitle(bquote(italic(SSP[exp])~fit))
   }

```

(ref:MSDcolFit) *MSD~col~* model fit (plot details are the same as above).
```{r MSDcolFit,fig.cap = "(ref:MSDcolFit)", warning=FALSE, out.width=c("50%","50%"), fig.width=c(3.5,3.5), fig.asp=c(.5,.5), fig.show="hold", fig.align = "default", dev="cairo_pdf"}
# ```{r MSDcolFit,fig.cap = "(ref:MSDcolFit)", warning=FALSE, fig.width=8, fig.asp=.28, fig.show="hold", dev="cairo_pdf"}

predictions$MSD_obs %>%
  left_join(data_RT) %>%
    ungroup() %>%
  filter(!is.na(MSD_obs)) %>%
    arrange(MSD_obs) %>%
    mutate(MSD_obs = factor(MSD_obs, levels= unique(MSD_obs)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = MSD_obs, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_text_repel(data=distinct(.,corrRT,stimulus,MSD_obs), aes(y = corrRT, label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    geom_point(data=distinct(.,corrRT,stimulus,MSD_obs), aes(y = corrRT), color = "red3", size =.8, alpha = .5) +
    xlab(bquote(italic(MSD[col])~scores~(Experiment~2))) + ylab("Response Time (ms)") +
        ylim(600,1200) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_text(size=8, family = "Charis SIL"), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_text(size=10, family = "Charis SIL"), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
        # ggtitle(bquote(italic(MSD[col])~fit))
   }

predictions_heb$MSD_obs %>%
  left_join(data_RT_heb) %>%
    ungroup() %>%
  filter(!is.na(MSD_obs)) %>%
    arrange(MSD_obs) %>%
    mutate(MSD_obs = factor(MSD_obs, levels= unique(MSD_obs)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = MSD_obs, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_text_repel(data=distinct(.,corrRT,stimulus,MSD_obs), aes(y = corrRT, label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    geom_point(data=distinct(.,corrRT,stimulus,MSD_obs), aes(y = corrRT), color = "red3", size =.8, alpha = .5) +
    xlab(bquote(italic(MSD[col])~scores~(Experiment~3))) + ylab("RT") +
        ylim(600,1200) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_blank(), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_blank(), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
        # ggtitle(bquote(italic(MSD[col])~fit))
   }

```


(ref:MSDexpFit) *MSD~exp~* model fit (plot details are the same as above).
```{r MSDexpFit,fig.cap = "(ref:MSDexpFit)", warning=FALSE, out.width=c("50%","50%"), fig.width=c(3.5,3.5), fig.asp=c(.5,.5), fig.show="hold", fig.align = "default", dev="cairo_pdf"}
# ```{r MSDexpFit,fig.cap = "(ref:MSDexpFit)", warning=FALSE, fig.width=8, fig.asp=.28, fig.show="hold", dev="cairo_pdf"}

predictions$MSD %>%
  left_join(data_RT) %>%
    ungroup() %>%
  filter(!is.na(MSD)) %>%
    arrange(MSD) %>%
    mutate(MSD = factor(MSD, levels= unique(MSD)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = MSD, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_text_repel(data=distinct(.,corrRT,stimulus,MSD), aes(y = corrRT, label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    geom_point(data=distinct(.,corrRT,stimulus,MSD), aes(y = corrRT), color = "red3", size =.8, alpha = .5) +
    xlab(bquote(italic(MSD[exp])~scores~(Experiment~2))) + ylab("Response Time (ms)") +
        ylim(600,1200) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_text(size=8, family = "Charis SIL"), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_text(size=10, family = "Charis SIL"), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
        # ggtitle(bquote(italic(MSD[exp])~fit))
   }

predictions_heb$MSD %>%
  left_join(data_RT_heb) %>%
    ungroup() %>%
  filter(!is.na(MSD)) %>%
    arrange(MSD) %>%
    mutate(MSD = factor(MSD, levels= unique(MSD)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = MSD, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_text_repel(data=distinct(.,corrRT,stimulus,MSD), aes(y = corrRT, label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    geom_point(data=distinct(.,corrRT,stimulus,MSD), aes(y = corrRT), color = "red3", size =.8, alpha = .5) +
    xlab(bquote(italic(MSD[exp])~scores~(Experiment~3))) + ylab("RT") +
        ylim(600,1200) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_blank(), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_blank(), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
        # ggtitle(bquote(italic(MSD[exp])~fit))
   }

```

A quick glance at the four plots for Experiment 2, in the left panels of Figures \@ref(fig:SSPcolFit)--\@ref(fig:MSDexpFit), reveals a common failure of all the traditional sonority models to contain the nasal plateaus (/mn/ and /nm/) within their predicted distribution alongside all the other plateaus ($\mathit{0}$ model score in all figures). Furthermore, the data within the $\mathit{0}$ plateau levels appears to be broadly dispersed for the German-speaking subjects in Experiment 2 (plots on the left side) but quite well centered for the Hebrew-speaking subjects in Experiment 3 (right plots).

A comparison of the left-most violin in Figures \@ref(fig:SSPcolFit)--\@ref(fig:MSDexpFit) highlights some differences between the two sonority hierarchies *H~col~* (*SSP/MSD~col~*) and *H~exp~* (*SSP/MSD~exp~*).
The left-most violins reflect the onset fall levels of the SSP and MSD models.
For the Hebrew-speaking subjects in Experiment 3 (right panels), there was no clear difference between the two sonority hierarchies and a similar, broad distribution appears in all fits of sonority falls. In contrast, the reponse times of German-speaking subjects in Experiment 2 exhibit a bimodal distribution in the falling onsets of sonority models that use the *H~exp~* hierarchy (*SSP/MSD~exp~*), whereby *fricative-stop* clusters /ʃp, sp, ft/ are considered to be highly ill-formed onset falls.

This suggests that the *H~col~* hierarchy (where all obstruents are grouped into one class on the sonority hierarchy such that *fricative-stop* clusters are considered plateaus) is better than the *H~exp~* hierarchy in treating *fricative-stop* clusters.
This can be deduced from the better model fits for onset sonority falls and plateaus when the *H~col~* hierarchy is applied (*SSP/MSD~col~* vs. *SSP/MSD~exp~*). However, the difference between the two sonority hierarchies also plays a role in the grouping of onset rises when the MSD-based models are taken into account.

The violins in the right panel of each plot, reflecting well-formed onset rises with positive model scores, present three types of grouping across the four models. The two SSP models (*SSP~col/exp~*) make identical predictions with respect to onset rises, lumping all rises into one category ($\mathit{1}$ in Figures \@ref(fig:SSPcolFit)--\@ref(fig:SSPexpFit)). 
This, again, results in a broader distribution for the German-speaking subjects in Experiment 2 (left panels) compared to the Hebrew-speaking subjects in Experiment 3 (right panels).

The MSD models present multiple levels of well-formedness for onset rises. *MSD~col~* exhibits two levels of rises ($\mathit{1}$--$\mathit{2}$ in Figure \@ref(fig:MSDcolFit)) while *MSD~exp~* exhibits four levels of rises ($\mathit{1}$--$\mathit{4}$ in Figure \@ref(fig:MSDexpFit)).
This elaboration seems to be beneficial in fitting the scores of the German-speaking subjects to the 4 rise levels of *MSD~exp~*, but less so for *MSD~col~*. Furthermore, the additional levels of the MSD are highly redundant for fitting the scores of the Hebrew-speaking subjects in Experiment 3 (right plots).

To conclude, an observation of the model fits of the four traditional sonority models in the two confirmatory studies reveals a mixed picture. The *H~col~* hierarchy (in models *SSP/MSD~col~*) appears to result in a better fit with onset falls and plateaus, especially for the German-speaking subjects.
The competing *H~exp~* hierarchy appears to be advantageous when fitting the scores of rising onset slopes, but mostly with *MSD~exp~* and only for the German-speaking subjects in Experiment 2, where sonority falls exhibit an undesirable bimodal distribution.

#### NAP models {#sec:NAPtdModelfit}

Although *NAP~td~* is an ordinal model like all the traditional sonority models, it follows a different rationale (see Subsection \@ref(sec:naptdmodel)), whereby the scores of the model estimate nucleus competition to reflect well-formedness. 

(ref:NAPtdFit) *NAP~td~* model fit (plot details are the same as above).
```{r NAPtdFit,fig.cap = "(ref:NAPtdFit)", warning=FALSE, out.width=c("50%","50%"), fig.width=c(3.5,3.5), fig.asp=c(.5,.5), fig.show="hold", fig.align = "default", dev="cairo_pdf"}

predictions$NAP_td %>% 
  left_join(data_RT) %>%
    ungroup() %>%
    arrange(NAP_td) %>%
    mutate(NAP_td = factor(NAP_td, levels= unique(NAP_td)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
  { ggplot(., aes(x = NAP_td, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_text_repel(data=distinct(.,corrRT,stimulus,NAP_td), aes(y = corrRT, label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    geom_point(data=distinct(.,corrRT,stimulus,NAP_td), aes(y = corrRT), color = "red3", alpha=.5, size =.8)+
  xlab(bquote(italic(NAP[td])~scores~(Experiment~2))) + ylab("Response Time (ms)") +
      ylim(600,1200) +
      coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_text(size=8, family = "Charis SIL"), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_text(size=10, family = "Charis SIL"), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
      # ggtitle(bquote(italic(NAP[td])~fit))
  }

predictions_heb$NAP_td %>% 
  left_join(data_RT_heb) %>%
    ungroup() %>%
    arrange(NAP_td) %>%
    mutate(NAP_td = factor(NAP_td, levels= unique(NAP_td)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
  { ggplot(., aes(x = NAP_td, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_text_repel(data=distinct(.,corrRT,stimulus,NAP_td), aes(y = corrRT, label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    geom_point(data=distinct(.,corrRT,stimulus,NAP_td), aes(y = corrRT), color = "red3", alpha=.5, size =.8)+
  xlab(bquote(italic(NAP[td])~scores~(Experiment~3))) + ylab("RT") +
      ylim(600,1200) +
      coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_blank(), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_blank(), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
      # ggtitle(bquote(italic(NAP[td])~fit))
  }

```

Figure \@ref(fig:NAPtdFit) shows that *NAP~td~* succeeds in containing all the data (points) within the respective predictions (blue violins) in both experiments, making *NAP~td~* the only model to achieve such coverage.
*NAP~td~* appears to exhibit some redundancy, as suggested by the relatively large degrees of overlap between some of the predictive distributions of the model. This is apparent from the overlap between violins in the left side (worse-formed) of the model fit with Experiment 2 (left panel), and between violins in the right side (better-formed) of the model fit with Experiment 3 (right panel) in Figure \@ref(fig:NAPtdFit).

(ref:NAPbuFit) *NAP~bu~* model fit (plot details are the same as above).
```{r NAPbuFit,fig.cap = "(ref:NAPbuFit)", warning=FALSE, out.width=c("50%","50%"), fig.width=c(3.5,3.5), fig.asp=c(.5,.5), fig.show="hold", fig.align = "default", dev="cairo_pdf"}

predictions$NAP_bu %>% 
  mutate(NAP= round(NAP_bu,7)) %>%
    bind_rows(tibble(NAP = seq(-220,-30,1)))  %>%
    arrange(NAP) %>%
    ggplot(aes(x = factor(NAP), y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data= data_RT, aes(y = corrRT, x= factor(NAP)), color = "red3", alpha = .5, size =.8) +
    geom_text_repel(data= data_RT, aes(y = corrRT, x=factor(NAP),label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    scale_x_discrete(bquote(italic(NAP[bu])~scores~(AA~set:~Experiment~2)), breaks=NULL) +
    ylab("Response Time (ms)") +
    ylim(600,1200) +
    coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_text(size=8, family = "Charis SIL"), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_text(size=10, family = "Charis SIL"), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
    # ggtitle(bquote(italic(NAP[bu])~fit))

predictions_heb$NAP_bu %>% 
  mutate(NAP= round(NAP_bu,7)) %>%
    bind_rows(tibble(NAP = seq(-220,-30,1)))  %>%
    arrange(NAP) %>%
    ggplot(aes(x = factor(NAP), y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data= data_RT_heb, aes(y = corrRT, x= factor(NAP)), color = "red3", alpha = .5, size =.8) +
    geom_text_repel(data= data_RT_heb, aes(y = corrRT, x=factor(NAP),label = {gsub("al", "", stimulus)} %>% {gsub("c", "ʃ", .)}), color = "red3", alpha = 1, size =3.5, family="Charis SIL", min.segment.length=0, segment.size = .4, segment.alpha = .6) +
    scale_x_discrete(bquote(italic(NAP[bu])~scores~(HN~set:~Experiment~3)), breaks=NULL) +
    ylab("RT (Experiment 3)") +
    ylim(600,1200) +
    coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.x = element_text(size=9, family = "Charis SIL"), 
              axis.text.y = element_blank(), 
              axis.title.x = element_text(size=11, family = "Charis SIL"), 
              axis.title.y = element_blank(), 
              axis.ticks.x = element_blank(), plot.title = element_blank())
    # ggtitle(bquote(italic(NAP[bu])~fit))

```

*NAP~bu~* is different from all the other models in that it presents scores that are specific to each token in a continuous ratio scale, rather than an ordinal scale (i.e., the distances between scores in the model are also predicted). 
Importantly, the expected correlation between response time and ill-formedness appears to hold for the model fits of *NAP~bu~* in Figure \@ref(fig:NAPbuFit).

Our criteria for goodness of fit based on the plot analyses (see Subsection 
\@ref(sec:traditionalModelfit)) are not all valid when evaluating *NAP~bu~* since we have no classes and no vertical dispersion of data (points) within levels, and since the horizontal overlap of predictions (violins) between levels requires a different interpretation. However, the criterion for inclusion of data points within the violins of the models' predictions naturally also holds for the *NAP~bu~* fit, which fails to include the data for the nasal plateaus /nm/ and /mn/ within the respective predictive distribution in Experiment 2 (a failure that is shared by all the traditional models in Experiment 2; see Subsection \@ref(sec:traditionalModelfit)). Furthermore, in Experiment 2 *NAP~bu~* also fails to include the /z/-initial clusters---/zm/ and /zv/---within their respective predictive distribution.

The failures in the fit of the *NAP~bu~* model with German-speaking subjects in Experiment 2 can be split into two types:
(i) nasal-initial clusters---*nval*, *nmal*, and *mnal*---which received results on a par with the slowest responses in the data, reflecting an overestimation of well-formedness by the model; and
(ii) syllables beginning with a voiced sibilant---*zval* and *zmal*---which received results that pattern with faster responses, reflecting an underestimation of well-formedness by the model

These results may be taken to suggest language-specific top-down effects of German. In German, sibilants are regularly unvoiced/devoiced at edges of clusters, while nasals, on the other hand, can be syllabic. In that sense, German-speaking listeners may be more prone to considering marginal sibilance as a voiceless nucleus repeller and nasality as a potential nucleus attractor.
Compare this with Hebrew (the native language of the subjects in Experiment 3), in which nasals cannot be syllabic and voiced sibilants are common in marginal cluster edges. 

###  Model comparison

While the model fits give us an insight into the behavior of each model with respect to the data, they are not well-suited to a comparison of different models against a consistent criterion. To do this, we ran out-of-sample predictions using cross-validation, thereby testing the ability of each model to predict unseen items.

#### Experiment 2

A bird's eye view of all the six model fits in Experiment 2 is available in Figure \@ref(fig:SonFitAll). The results of the model comparison from Experiment 2 are available in Table \@ref(tab:resultsmodels) and reveal a clear advantage of *NAP~td~* over all other models. 
The main metric in the table is the *elpd* score, which stands for *expected log-predictive density* (higher score indicating better predictive accuracy). The raw values are transformed to more informative values that measure the distance from the best score in terms of *Difference in elpd*. The size of this difference can be compared to the standard error of the difference, *Difference SE*.

The difference of *NAP~td~* from the next three models---*SSP~col~*, *MSD~col~* and *NAP~bu~*---is about 6 standard errors (considering that the difference is around 90 elpd and the corresponding standard error is around 15), reflecting a very robust lead for *NAP~td~*. 
The small differences between the next three models (*SSP~col~*, *MSD~col~* and *NAP~bu~*) make them all indistinguishable in second place.
The two traditional models that are based on the *H~exp~* hierarchy---*SSP/MSD~exp~*---are similar to each other in last place and only marginally better than the null model.

The right-most column in Table \@ref(tab:resultsmodels), *weight*, shows model averaging via stacking of predictive distributions. Stacking maximizes the potential elpd score by pulling the predictions of all the different models together. The values under the weight column represent the relative contribution of each model to this combined optimal model.
*NAP~td~* alone contributes the lion's share with 65% and *NAP~bu~* comes second with 14%. This is notable as both NAP models are essentially based on the same principle, lending support to the idea that the two models are essentially complementary.
The other traditional models contribute 8% (*SSP~col~*) and 3% (*MSD~col~*) to this picture, less than the 9% that the null model manages to contribute.

```{r resultsmodels, results = "asis"}

comparison <- loo::loo_compare(x=kfold_german)
ll_matrix <- map2_dfc(kfold_german, names(kfold_german), ~ data.frame( .x$pointwise) %>% set_names(.y)) %>% as.matrix()
weights <- loo::stacking_weights(ll_matrix)
names(weights) <- names(kfold_german)

comparison%>%
    {bind_cols(model = rownames(.), as_tibble(.))} %>%
    mutate(elpd_kfold = round(elpd_kfold) %>% as.character,
           elpd_diff = as.numeric(elpd_diff), se_diff = as.numeric(se_diff)) %>%
    
    select(model, elpd = elpd_kfold, `Difference in elpd`=elpd_diff ,`Difference SE` = se_diff)%>%
    left_join(weights %>% tibble::enframe() %>%
              setNames(c("model", "weight")) %>%
              arrange(-weight) %>%
              mutate(weight = round(weight,2) %>% {ifelse(.==0,"$\\approx$ 0", as.character(.))})) %>%
        mutate(model = recode(model, "NAP_bu" = "$NAP_{bu}$", "NAP_td"= "$NAP_{td}$", "SSP_obs" = "$SSP_{col}$", "MSD_obs"= "$MSD_{col}$" , "SSP"= "$SSP_{exp}$", "null" = "Null", "MSD"="$MSD_{exp}$")) %>%
 
     apa_table(escape = FALSE, caption ="(\\#tab:modelstacking) All models comparison: Experiment 2",note= "The table is ordered by the expected log-predictive density (elpd) score of the models, with a higher score indicating better predictive accuracy. The highest scored model is used as a baseline for the difference in elpd and the difference standard error (SE). The column weight represents the weights of the individual models that maximize the total elpd score of all the models.")

```

(ref:SonFitAll) Experiment 2: all sonority model fits (unspecified cluster types, see detailed versions above). Observed mean log-transformed response times of accurate responses are depicted with red points; distribution of simulated means based on the model are depicted with blue violins. Stimuli are ordered from left to right according to their score in a given model in ascending well-formedness. 
```{r SonFitAll,fig.cap = "(ref:SonFitAll)", warning=FALSE, out.width=c("50%","50%","50%","50%","50%","50%"), fig.width=c(3.5,3.5,3.5,3.5,3.2,3.2), fig.asp=c(.4,.4,.4,.4,.4,.4), fig.show="hold", fig.align = "default"}

predictions$SSP_obs %>%
  left_join(data_RT) %>%
    ungroup() %>%
  filter(!is.na(SSP_obs)) %>%
    arrange(SSP_obs) %>%
    mutate(SSP_obs = factor(SSP_obs, levels= unique(SSP_obs)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = SSP_obs, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data=distinct(.,corrRT,stimulus,SSP_obs), aes(y = corrRT), color = "red3", alpha = .5, size =.8) +
    xlab(bquote(italic(SSP[col]))) + ylab("Response Time (ms)") +
        ylim(600,1200) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.y = element_text(size=8, family = "Times"), 
              axis.text.x = element_blank(), 
              axis.title.y = element_text(size=9, family = "Times"), 
              axis.title.x = element_text(size=11, family = "Times"), 
              axis.ticks = element_blank(), plot.title = element_blank())
   }


predictions$MSD_obs %>%
  left_join(data_RT) %>%
    ungroup() %>%
  filter(!is.na(MSD_obs)) %>%
    arrange(MSD_obs) %>%
    mutate(MSD_obs = factor(MSD_obs, levels= unique(MSD_obs)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = MSD_obs, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data=distinct(.,corrRT,stimulus,MSD_obs), aes(y = corrRT), color = "red3", size =.8, alpha = .5) +
    xlab(bquote(italic(MSD[col]))) + ylab("Response Time (ms)") +
        ylim(600,1200) + 
        coord_trans(y="log") +
        theme(panel.background = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title.y = element_blank(), axis.title.x = element_text(size=11, family = "Times"), plot.title = element_blank()) 
     }

predictions$SSP %>%
  left_join(data_RT) %>%
    ungroup() %>%
  filter(!is.na(SSP)) %>%
    arrange(SSP) %>%
    mutate(SSP = factor(SSP, levels= unique(SSP)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = SSP, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data=distinct(.,corrRT,stimulus,SSP), aes(y = corrRT), color = "red3", size =.8, alpha = .5) +
    xlab(bquote(italic(SSP[exp]))) +
       ylab("Response Time (ms)") +
        ylim(600,1200) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.y = element_text(size=8, family = "Times"), 
              axis.text.x = element_blank(), 
              axis.title.y = element_text(size=9, family = "Times"), 
              axis.title.x = element_text(size=11, family = "Times"), 
              axis.ticks = element_blank(), plot.title = element_blank())
     }

predictions$MSD %>%
  left_join(data_RT) %>%
    ungroup() %>%
  filter(!is.na(MSD)) %>%
    arrange(MSD) %>%
    mutate(MSD = factor(MSD, levels= unique(MSD)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = MSD, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data=distinct(.,corrRT,stimulus,MSD), aes(y = corrRT), color = "red3", size =.8, alpha = .5) +
    xlab(bquote(italic(MSD[exp]))) + ylab("Response Time (ms)") +
        ylim(600,1200) + 
        coord_trans(y="log") +
        theme(panel.background = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title.y = element_blank(), axis.title.x = element_text(size=11, family = "Times"), plot.title = element_blank())
     }

predictions$NAP_td %>% 
  left_join(data_RT) %>%
    ungroup() %>%
    arrange(NAP_td) %>%
    mutate(NAP_td = factor(NAP_td, levels= unique(NAP_td)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
  { ggplot(., aes(x = NAP_td, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data=distinct(.,corrRT,stimulus,NAP_td), aes(y = corrRT), color = "red3", alpha = .5, size =.8)+
  xlab(bquote(italic(NAP[td]))) + ylab("Response Time (ms)") +
      ylim(600,1200) + 
      coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.y = element_text(size=8, family = "Times"), 
              axis.text.x = element_blank(), 
              axis.title.y = element_text(size=9, family = "Times"), 
              axis.title.x = element_text(size=11, family = "Times"), 
              axis.ticks = element_blank(), plot.title = element_blank())
    }

predictions$NAP_bu %>% 
  mutate(NAP= round(NAP_bu,7)) %>%
    bind_rows(tibble(NAP = seq(-220,-30,1)))  %>%
    arrange(NAP) %>%
    ggplot(aes(x = factor(NAP), y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data= data_RT, aes(y = corrRT, x= factor(NAP)), color = "red3", alpha = .7, size =.8) +
    scale_x_discrete(bquote(italic(NAP[bu])), breaks=NULL) +
    ylab("Response Time (ms)") +
    ylim(600,1200) + 
    coord_trans(y="log") +
    theme(panel.background = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title.y = element_blank(), axis.title.x = element_text(size=11, family = "Times"), plot.title = element_blank()) 

```

#### Experiment 3

A bird's eye view of all the six model fits in Experiment 3 is available in Figure \@ref(fig:SonFitAllHeb). 
The results of the model comparison from Experiment 3 (see Table \@ref(tab:resultsmodelsHeb)) reveal a borderline advantage of *NAP~td~* over other models.
The difference in elpd scores from the next two models---*MSD~col~* and *NAP~bu~*---is only about 2 standard errors (considering the difference at around 10 elpd and the corresponding standard error at around 5).

*SSP~col~* is more clearly distinguishable from *NAP~td~*, with a difference that is almost 3 standard errors (about 20:7). *MSD~col~* and *NAP~bu~* are barely distinguishable from *SSP~col~* and *NAP~td~*. The two traditional models that are based on the *H~exp~* hierarchy---*SSP/MSD~exp~*---are, again, very clearly the worst in the comparison.

The weight values of Experiment 3 in Table \@ref(tab:resultsmodelsHeb) show that, again, *NAP~td~* alone provides the biggest relative contribution to a combined optimal model, with 61%. *MSD~col~* covers almost the entire remaining space with 37%, leaving *NAP~bu~* and all the other traditional models with zero additional contribution. 


```{r resultsmodelsHeb, results = "asis"}

comparison <- loo::loo_compare(x=kfold_hebrew)
ll_matrix <- map2_dfc(kfold_hebrew, names(kfold_hebrew), ~ data.frame( .x$pointwise) %>% set_names(.y)) %>% as.matrix()
weights <- loo::stacking_weights(ll_matrix)
names(weights) <- names(kfold_hebrew)

comparison%>%
    {bind_cols(model = rownames(.), as_tibble(.))} %>%
    mutate(elpd_kfold = round(elpd_kfold) %>% as.character,
           elpd_diff = as.numeric(elpd_diff), se_diff = as.numeric(se_diff)) %>%
    
    select(model, elpd = elpd_kfold, `Difference in elpd`=elpd_diff ,`Difference SE` = se_diff)%>%
    left_join(weights %>% tibble::enframe() %>%
              setNames(c("model", "weight")) %>%
              arrange(-weight) %>%
              mutate(weight = round(weight,2) %>% {ifelse(.==0,"$\\approx$ 0", as.character(.))})) %>%
        mutate(model = recode(model, "NAP_bu" = "$NAP_{bu}$", "NAP_td"= "$NAP_{td}$", "SSP_obs" = "$SSP_{col}$", "MSD_obs"= "$MSD_{col}$" , "SSP"= "$SSP_{exp}$", "null" = "Null", "MSD"="$MSD_{exp}$")) %>%
 
     apa_table(escape = FALSE, caption ="(\\#tab:modelstacking) All models comparison: Experiment 3",note= "(details are the same as the previous table).")


```

(ref:SonFitAllHeb) Experiment 3: all sonority model fits (plot details are the same as Figure \@ref(fig:SonFitAll)).
```{r SonFitAllHeb,fig.cap = "(ref:SonFitAllHeb)", warning=FALSE, out.width=c("50%","50%","50%","50%","50%","50%"), fig.width=c(3.5,3.5,3.5,3.5,3.2,3.2), fig.asp=c(.4,.4,.4,.4,.4,.4), fig.show="hold", fig.align = "default"}

predictions_heb$SSP_obs %>%
  left_join(data_RT_heb) %>%
    ungroup() %>%
  filter(!is.na(SSP_obs)) %>%
    arrange(SSP_obs) %>%
    mutate(SSP_obs = factor(SSP_obs, levels= unique(SSP_obs)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = SSP_obs, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data=distinct(.,corrRT,stimulus,SSP_obs), aes(y = corrRT), color = "red3", alpha=.5, size =.8) +
    xlab(bquote(italic(SSP[col]))) + ylab("Response Time (ms)") +
        ylim(600,1000) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.y = element_text(size=8, family = "Times"), 
              axis.text.x = element_blank(), 
              axis.title.y = element_text(size=9, family = "Times"), 
              axis.title.x = element_text(size=11, family = "Times"), 
              axis.ticks = element_blank(), plot.title = element_blank())
   }


predictions_heb$MSD_obs %>%
  left_join(data_RT_heb) %>%
    ungroup() %>%
  filter(!is.na(MSD_obs)) %>%
    arrange(MSD_obs) %>%
    mutate(MSD_obs = factor(MSD_obs, levels= unique(MSD_obs)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = MSD_obs, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data=distinct(.,corrRT,stimulus,MSD_obs), aes(y = corrRT), color = "red3", size =.8, alpha=.5) +
    xlab(bquote(italic(MSD[col]))) + ylab("Response Time (ms)") +
        ylim(600,1000) + 
        coord_trans(y="log") +
        theme(panel.background = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title.y = element_blank(), axis.title.x = element_text(size=11, family = "Times"), plot.title = element_blank()) 
     }

predictions_heb$SSP %>%
  left_join(data_RT_heb) %>%
    ungroup() %>%
  filter(!is.na(SSP)) %>%
    arrange(SSP) %>%
    mutate(SSP = factor(SSP, levels= unique(SSP)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = SSP, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data=distinct(.,corrRT,stimulus,SSP), aes(y = corrRT), color = "red3", size =.8, alpha=.5) +
    xlab(bquote(italic(SSP[exp]))) +
       ylab("Response Time (ms)") +
        ylim(600,1000) +
        coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.y = element_text(size=8, family = "Times"), 
              axis.text.x = element_blank(), 
              axis.title.y = element_text(size=9, family = "Times"), 
              axis.title.x = element_text(size=11, family = "Times"), 
              axis.ticks = element_blank(), plot.title = element_blank())
     }

predictions_heb$MSD %>%
  left_join(data_RT_heb) %>%
    ungroup() %>%
  filter(!is.na(MSD)) %>%
    arrange(MSD) %>%
    mutate(MSD = factor(MSD, levels= unique(MSD)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
   { ggplot(., aes(x = MSD, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data=distinct(.,corrRT,stimulus,MSD), aes(y = corrRT), color = "red3", size =.8, alpha=.5) +
    xlab(bquote(italic(MSD[exp]))) + ylab("Response Time (ms)") +
        ylim(600,1000) + 
        coord_trans(y="log") +
        theme(panel.background = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title.y = element_blank(), axis.title.x = element_text(size=11, family = "Times"), plot.title = element_blank())
     }

predictions_heb$NAP_td %>% 
  left_join(data_RT_heb) %>%
    ungroup() %>%
    arrange(NAP_td) %>%
    mutate(NAP_td = factor(NAP_td, levels= unique(NAP_td)),
           stimulus = factor(stimulus, levels = unique(stimulus))) %>%
  { ggplot(., aes(x = NAP_td, y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data=distinct(.,corrRT,stimulus,NAP_td), aes(y = corrRT), color = "red3", alpha=.5, size =.8)+
  xlab(bquote(italic(NAP[td]))) + ylab("Response Time (ms)") +
      ylim(600,1000) + 
      coord_trans(y="log") +
        theme(panel.background = element_blank(), 
              axis.text.y = element_text(size=8, family = "Times"), 
              axis.text.x = element_blank(), 
              axis.title.y = element_text(size=9, family = "Times"), 
              axis.title.x = element_text(size=11, family = "Times"), 
              axis.ticks = element_blank(), plot.title = element_blank())
    }

predictions_heb$NAP_bu %>% 
  mutate(NAP= round(NAP_bu,7)) %>%
    bind_rows(tibble(NAP = seq(-220,-30,1)))  %>%
    arrange(NAP) %>%
    ggplot(aes(x = factor(NAP), y= pred) ) +
    geom_violin(draw_quantiles = TRUE, alpha=1, colour = "cornflowerblue", fill = "azure2") +
    geom_point(data= data_RT_heb, aes(y = corrRT, x= factor(NAP)), color = "red3", alpha = .7, size =.8) +
    scale_x_discrete(bquote(italic(NAP[bu])), breaks=NULL) +
    ylab("Response Time (ms)") +
    ylim(600,1000) + 
    coord_trans(y="log") +
    theme(panel.background = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title.y = element_blank(), axis.title.x = element_text(size=11, family = "Times"), plot.title = element_blank()) 

```

### Summary of results  {#sec:discussionResults} 

The results of the confirmatory studies, Experiments 2--3, can be summarized as follows: 
(i) all of the sonority models we tested are capable of explaining the response time data for different consonant clusters to a reasonable extent; 
(ii) the symbolic top-down NAP model, *NAP~td~*, outperforms all the the other models; 
(iii) the dynamic bottom-up NAP model, *NAP~bu~* performed relatively well considering that it is based on a unique architecture that does not require  prior segmentation of the acoustic signal;
and (iv) some interesting differences between the *H~col~* and *H~exp~* sonority hierarchies were observed and the advantages of the minimal *H~col~* sonority hierarchy proved to be more effective.

Experiment 3 exhibits most of the general trends found in Experiment 2, albeit in a less compelling way. One path of explanation for these discrepancies can be found in the differences between the native languages of the subjects. We expect language-specific differences to account for some of the differences between the experiments, as was mentioned in Subsection \@ref(sec:designs). Specifically, the difference between nasals in Hebrew and German was suggested as explanatory in Subsection \@ref(sec:NAPtdModelfit).

On top of that, we suspect that differences between the experiments were also due to the various sources of noise that were introduced in the process. These include the smaller group of participants and the diverse physical locations in which Experiment 3 was administered (see Subsection \@ref(sec:participants3)). 
The results may be taken to support this with a larger standard deviation for the by-subject adjustments to the intercept for the models of Experiment 3 in comparison with Experiment 2. 
A larger standard deviation of the group-level effect indicates larger differences between participants in their response times.
For example, see $\hat\sigma_\alpha = 0.32~[0.25, 0.41]$ in Experiment 3 vs. $\hat\sigma_\alpha = 0.21~[0.17, 0.26]$ in Experiment 2, when comparing the null models (see the full models in Appendix \@ref(appendix:b)).

The success of our NAP models relative to the traditional models in predicting the data can be mainly attributed to the following traits of NAP:
(i) all the voiceless-initial onset clusters, including onset falls and plateaus (e.g., /sp/ and /sf/), are relatively well-formed in NAP, correctly predicting the patterning together of such data with faster response times (at the low-right parts of the plots);
(ii) onset rises (like /ml/), nasal plateaus (/nm/ and /mn/), and onset falls (like /lm/) pattern together as similar and relatively ill-formed in NAP, correctly predicting the data, as sonorant-initial plateaus and rises do not tend to pattern with (better-formed) obstruent-initial plateaus and rises.

A superficial formal generalization that can illustrate these results in symbolic terms may be that the sonority intercept of onset clusters appears to be (at least) as impactful as the sonority slope in determining syllabic well-formedness (i.e., the starting level of the onset cluster is at least as predictive of well-formedness as the angle of the cluster's slope).

#		General Discussion {#sec:genDiscussion}

Our experimental results provide strong support for our synergy of proposals, including our choice of sonority's perceptual basis and acoustic correlate, the incorporation of continuous entities in models of phonology, and the dual-route modeling strategy that accounts for both top-down and bottom-up inferences with separate compatible machinery. The following subsections discuss important implications that are borne out of our interpretation of the results: 
(i) in  Subsection \@ref(sec:division) we discuss a potential phonotactic division of labor, demonstrated with a more holistic account of /s/-stop clusters;
(ii) in Subsection \@ref(sec:lingMod) we discusses the current contribution in the context of the tension between discrete and continous in models of phonology; and
(iii) in Subsection \@ref(sec:projection) we discuss the universality of sonority, with special attention to the body of works by Iris Berent and her colleagues.

##		Phonotactic Division of Labor {#sec:division} 

Sonority has been widely used to explain practically any type of phonotactic phenomenon since there is nothing in the standard theory that commits the formal conception of sonority to any specific effect in perception or articulation of speech. The position taken in this work is very different, drawing explicit links between sonority and auditory perception of pitch. As a result, sonority in this work is a narrower concept with limits that can be more carefully considered. This is important since it is very unlikely that all the different phonotactic phenomena stem from a single driving force. A more well-defined notion of sonority therefore allows us to achieve a better understanding of the phonotactic division of labor between different forces that play a role in the processing of speech.
As we shall see below, */s/-stop* clusters make a good case in point. 

###   /s/-stop clusters in traditional accounts

One rather well-known and well-studied consistent problem in the empirical coverage of all traditional sonority principles concerns sequences that are often termed */s/-stop clusters*, referring to cases where a sibilant fricative---most often /s/---precedes a stop consonant, like in the English words ***st**op*, ***sk**y* and ***sp**ort* [see, e.g., @fudge1969syllables; @yavacs2008sonority; @vaux2009append; @goad2011representation; @goad2016sonority]. 
The sonority slope of */s/-stop* clusters is either an onset fall or an onset plateau, depending on the given sonority hierarchy. When fricatives and stops are assigned to two distinct classes on a given sonority hierarchy, */s/-stop* clusters exhibit onset sonority fall. If, on the other hand, stops and fricatives are grouped together, */s/-stop* clusters are considered plateaus.
Thus, although */s/-stop* clusters are relatively common in languages that tolerate sequences and should therefore be considered as relatively well-formed [@morelli2003relative; @steriade1999alternativessk], such clusters are predicted to be rare, or even extremely rare, due to their ill-formed sonority slopes.

Rather than redefining sonority principles to be able to account for the phenomenon of */s/-stop* clusters, more successful attempts to solve this problem in the phonological literature redefined deviant marginal sibilants as exceptional, keeping the traditional sonority principles unaffected by their consistent failure to predict the attested relative well-formedness of */s/-stop* clusters.
The main type of exception that is used to explain sibilant-initial clusters is based on tweaking symbolic representations by removing the symbol of the marginal sibilant segment outside of the syllable that contains the following consonant such that---in theory---there is no tautosyllabic complex cluster to trigger sonority restrictions in the first place [see, e.g., @steriade1982greek; @kaye1992you; @rialland1994phonologysk; @vaux2009append]. 
A slightly different theoretical solution with similar results is to assert that */s/-stop* clusters are, in fact, a single complex segment [see, e.g., @fudge1969syllables; @weijer1996segmental] such that, again, there simply is no cluster to account for in theoretical accounts [for an overview, see @goad2016sonority].

Those theoretical tweaks are not without merit as they follow a strong intuition that marginal sibilants are somehow "outside" the scope of syllabic processes. This intuition is supported by evidence of some unique behaviors of marginal sibilants, such as the articulatory kinematic data presented in @hermes2013phonologysk, which finds unique coordination patterns in the articulation of sibilant-initial consonantal gestures in Italian (suggesting that the sibilant is outside the syllable). 
These findings are however not inconsistent with NAP-based accounts. Periodic energy has prosodic roles to play in carrying the pitch and the overall prosodic strength at the nucleus of the syllable. 
Marginal voiceless elements can therefore be timed with different considerations in NAP-based accounts. For example, it may be beneficial to prolong duration of marginal voiceless elements to increase their recoverability without the risk of increased nucleus competition. 
This would, indeed, result in some unique timing patterns of marginal sibilants in complex onsets while still fitting comfortably with the rationale of NAP.

Eventually, it is important to remember that the problem with */s/-stop* clusters is not that they are common and, at the same time, unique. The problem with */s/-stop* clusters is that traditional sonority principles fail to account for them in consistent manners, without resorting to exceptions. 
NAP-based accounts present an advantage because they do not need to carve out exceptions in order to theoretically remove sibilants from syllables that are not predicted by the model. Under NAP, those sibilants can remain in the structure as members of a well-formed syllable.

###   Towards a holistic account of /s/-stop clusters

NAP's account of the well-formedness of */s/-stop* clusters does not suffice to explain this phonotactic phenomenon since there is nothing in NAP specific to sibilants or stops that would justify assigning a special status to the particular obstruent combination of a sibilant followed by a stop.
In fact, any voiceless element is practically invisible to NAP as it is only sensitive to portions of the speech signal that contain sufficient periodic energy. Indeed, the predictions of NAP, which were largely corroborated by experimental results (in Section \@ref(sec:results)), expect non-sibilant counterparts of /s/, like /f/ in the cluster *ftV*, to pattern with *spV* and *ʃpV*. Furthermore, *NAP~bu~* successfully predicted that all the voiceless-initial clusters in the experiment---including the */s/-stop* clusters---generally pattern together as well-formed, as far as sonority-based restrictions are concerned.
This may suffice to explain why */s/-stop* clusters are tolerated, but not why they are so often preferred over other obstruent combinations. The complete phonotactic story of */s/-stop* clusters thus requires an integrative explanation, in which sonority only plays a limited role. 

First, there are various reasons to assume that *fricative-stop* clusters are better-formed than *stop-stop* clusters. This generalization is traditionally captured in abstract formal phonological constraints like the *Obligatory Contour Principle* [OCP; going back to @leben1973suprasegmental; and @goldsmith1976autosegmental], which acts as a general dissimilatory requirement banning two successive units of the same type. The OCP in this case may be the reflection of an articulatory disadvantage of the *stop-stop* configuration since it should be harder to coordinate two successive closure and release gestures within the span of a complex onset due to reasons of aerodynamics.

Note that this also leads to a disadvantage of *stop-stop* from a perceptual point of view, since the first stop in a *stop-stop* configuration is released into the closure phase of the following stop [see @surprenant1998perception]. The release of a stop burst into a silent closure phase of another stop, rather than the periodic signal of a vowel, means that many of the acoustic cues to the identity of the first stop consonant are severely attenuated [see @fujimura1978perception]. 

This explanation is essentially based on the concept of perceptual *cue robustness* [@wright2004review], which is less relevant to syllabic organization, but rather based on adjacency between speech sounds and their chances of being recovered given transitions between them. As @kawasaki1997alternativessk [361] concluded, "the degree of salience of modulations created by segmental transitions", rather than sonority and syllabicity, is the determinant factor of many phonotactic constraints.

Wright's [-@wright2004review] *cue robustness* is also critical for the remaining explanation regarding the phonotactic advantage of */s/-stop* clusters over comparable non-sibilant *fricative-stop* clusters, e.g., *spV* vs. *ftV*.
Here, the notion of cue robustness serves to explain why sibilants, with their salient and distinctive high frequency aperiodic energy, stand out more than other fricatives, thus allowing more effective recoverability from relatively weak marginal positions (i.e., distant from the vocalic nucleus). 

The three phonotactic perspectives are complementary, and although they do not represent an exhaustive list of phonotactic pressures, we need at least these three---sonority, articulatory dissimilation, and cue robustness---in order to properly appreciate the phonotactic phenomenon of */s/-stop* clusters. According to this more holistic account, */s/-stop* clusters are relatively well-formed in terms of sonority because they are not competing for the nucleus, they are well-formed in terms of articulatory coordination complexity due to the two dissimilar successive gestures and, finally, they are robust in terms of their acoustic cues: 
stops in C~2~ can be released into a vowel to optimize the effect of the burst in the release phase, while sibilants retain strong cues to their identity thanks to their unique spectral profile.

##     Reshuffling Dichotomies in Linguistic Models {#sec:lingMod}

The study of the sound system of human languages has been one of the longest-standing intersections of symbol-based categorical analyses on the one hand, and signal-based continuous descriptions on the other. These two different types of analysis stand at the core of the distinction many researchers make between phonetics and phonology, where the former addresses continuous and measurable aspects of the speech signal (namely, sensorimotor aspects of articulation, acoustic signals and neurological patterns in perception), while the latter addresses categorical aspects of the speech signal using discrete and symbolic units like consonants, vowels and phonemes [see overviews in @harris2007representation; @ladd2011phonetics].

The incompatibility between the continuous and the discrete types of description did not escape early studies. @menzerath1933koartikulationsk; @wickelgren1969context and @fowler1980coarticulation noted how the reality of co-articulation of segments defies the idealized conception of speech signals as consisting of a sequence of non-overlapping discrete phonemes [see @halle1954strategy]. @warren1982auditory [172-187] also provides an overview of this problem given the limitations of auditory perception.

### Connectionism {#sec:connectionism}

The connectionist program in cognitive psychology [e.g., @rumelhart1986parallel1; @rumelhart1986parallel2; @bates1993counectionism] was set to change that classic view with the introduction of connectionist models to phonology [e.g., @goldsmith1992local; @laks1995connectionistsk; @joanisse2000connectionist; @smolensky2006harmonic; @tupper2012sonoritysk]. These models replaced classic symbolic models with *neuromimetic* models [@laks1995connectionistsk 52] that attempt to improve the cognitive plausibility of language models with architectures that resemble neurobiological systems.

It should be noted, however, that the main focus of connectionist models is not so much on the symbols in the system as on the classic processes of symbol manipulation. Connectionist models present alternatives to the notion of rules of symbol manipulation that directly transform symbols [see @harnad1990symbol]. 
In the context of the present work, connectionist models can be effective in modeling the phonology of speech perception in a top-down model, which represents processes that start and end with discrete symbols. However, we assume here that there is a functional source for linguistic distinctions in perception, which has to be accounted for via the bottom-up route, originating from continuous events in real time.

###    Dynamical systems {#sec:synthesis}

The introduction of dynamical systems to phonology [see @haken1985theoretical; @browman1992articulatorysk; @goldstein2009coupled; @nam2009self] made it possible to model the interactions between continuous aspects of the speech signal itself (e.g., motor gestures in articulation) and the discrete symbolic categories that linguists assume. One way to achieve this is via attractor landscape models, where discrete categorical units can be modeled as attractor basins [see @haken1990synergeticssk]. In this type of models, various continuous events can contribute more or less to the (partial) activation of different, often competing, attractors.
Good examples for application of attractor landscapes in such manners can be found in @tuller1994nonlinear; @case1995evaluation; @raczaszek1999categorization; @gafosbenus2006dynamics; @roessig2019modeling; and @roessig2019dynamics.

Attractor landscape models are therefore an essential component of models concerned with the interaction between continuous and discrete entities in a language system.
However, much like the connectionist models in phonology, attractor landscapes explicate the process by which a discrete alternative can be (partially) activated, but they have little to say about the components of the system otherwise.
For example, attractor models cannot explain or predict the shape and behavior of the attractor landscape itself (e.g., universal and idiosyncratic language categories), they cannot address the limitations on dynamic events that the system can reliably detect (e.g., selecting the relevant effects in auditory perception), and they pose no restrictions on what a valid symbol is in a natural language system [the "symbol grounding problem"; @harnad1990symbol]. 

The phonological theory saw dynamical systems playing an increasingly more important role with the growing body of work coming from the school of *Articulatory Phonology*.
Applying similar concepts to perception has been thus far a less productive avenue in phonology [although see @tuller1994nonlinear; @case1995evaluation; @hock2003dynamical; @tuller2004categorization; @tuller2008dynamical; and @lancia2013interaction], perhaps because it is much less clear what are the relevant continuous entities that we need to model in perception.

###   The complementarity of mind {#sec:compofmind}

The relationship in this work between the discrete and symbolic modes of language on the one hand, and the continuous and dynamic modes on the other hand, is consistent with the works of physicist and theoretical biologist, Howard Pattee, whose insights have been previously extended to include cognition and human language [see @cariani2001symbols; @raczaszek2008reconciling; and see @pattee2012lawssk for a collection of Pattee's classic papers with contemporary commentary].
In Pattee's original studies, the interaction between symbolic and dynamic modes was related to symbolic aspects of the DNA and their relationship with the dynamics of proteins. For Pattee, the symbolic and dynamic modes of biological systems are two critical components with specific roles to play: symbols are the stable forms that *harness* dynamic events. Symbols, according to @pattee1987instabilities [337], cannot exist outside of a dynamic system that they *constrain*.
It is therefore pertinent to understand symbols in language systems with respect to the continuous events that they relate to. In our interpretation, this means that to fully understand sonority we need to address its potential functions in auditory perception and cognition, and their effects in linguistic communication. Although tightly related to top-down symbol-based generalizations, this bottom-up route is a separate process that is based on different driving forces (e.g., laws of physics rather than statistics).

##		The Universality of Sonority {#sec:projection}

A consistent interest within theoretical phonology concerns the universality of sonority-based principles. An impressive volume of publications devoted to this question can be found in the works of Iris Berent and her colleagues, starting with @berent2007we, and followed by many subsequent studies [e.g., @berent2008language; @berent2012language; @berent2014languagesk; @gomez2014language; @berent2017origins].
Berent et al. collected mostly behavioral data from perception tasks, where subjects of various different language backgrounds were found to adhere to the SSP, even when presented with combinations that are not attested in their language. The patterns under Berent's consistent scrutiny are usually limited to a set of initial clusters with an onset rise (e.g., *blif*), an onset plateau (e.g., *bdif*) and an onset fall (e.g., *lbif*). Since /s/-clusters and sonorant plateaus are absent from these studies, Berent's experimental results with SSP-based models are largely compatible with NAP, as the hierarchy *blif* ($\mathit{3}$) $>$ *bdif* ($\mathit{2}$) $>$ *lbif* ($\mathit{0}$) is maintained in the top-down version of NAP (*NAP~td~* model scores in brackets).[^cf-napbucant] 

[^cf-napbucant]: The bottom-up version of NAP (*NAP~bu~*) cannot make such determinations based on symbolic representations, but it should be expected to generally follow the same trends in the vast majority of cases.

Berent and her colleagues interpret these findings as supporting the innateness hypothesis, assuming that all humans share a universal linguistic knowledge, which is genetically encoded (the *Universal Grammar* in generative traditions).
The universality of sonority principles thus implies innate knowledge of ordinal sonority hierarchies that map onto a discrete representation of the speech signal, with mechanisms that compute the sonority slopes within syllables to determine well-formedness.

The interpretation of Berent's findings has been a matter of interest in the literature. Some responses, like @daland2011explaining and @hayes2011interpreting, have argued that the universal phonotactic behaviors that Berent et al. present can be shown to result from speakers' ability to generalize categories and distributions from the attested lexicon, and use analogy and probabilities to predict unattested forms. 
Such models can successfully apply statistical learning methods based on the lexicon, without a requirement for prior formal knowledge of sonority [e.g., @jurafsky2009speech; @coleman1997stochastic; @vitevitch2004webbasedsk; @hayes2008maximum; @hayes2011interpreting; @bailey2001determinants; @albright2009feature; @futrell2017generative; @jarosz2017inputsk; @mayer2019phonotacticsk; and @mirea2019usingsk]. 

While it is relatively clear that statistical learners reflect top-down inferences, it is perhaps less obvious that connectionist models, such as @goldsmith1992local, @laks1995connectionistsk @smolensky2014optimization and @tupper2012sonoritysk, also seem to be quite compatible with what is considered here as top-down phonology.
Connectionist models can be historically related to an opposition to the classic symbol-based models (see Subsection \@ref(sec:connectionism)). However, the inputs and outputs of these models are expressed in discrete symbols and they are designed to capture generalizations in terms of the weights of connections in the system, which may serve as a good mechanistic description of top-down operations. 

In contrast to traditional sonority principles, NAP was designed to be compatible with general cognitive processes and auditory perception, such that no unique assumptions are required for postulating an innate formal knowledge of sonority. 
Sonority-based patterns in NAP arise from the general cognitive process that underlies the parsing of the speech stream into syllables with a pitch-bearing nucleus (i.e., nucleus competition). This requirement for pitch-bearing units may be explained in evolutionary timescales as the inevitable result of the important role of pitch in speech communication [@pike1945intonationsk; @bolinger1978intonation; @house1990tonal; @cutler1997prosody] and the observation that tune-text integration occurs with syllable-sized units [e.g., @goldsmith1976autosegmental; @liberman1975intonationalsk; @pierrehumbert1980phoneticssk; @ladd2008intonational; @roettger2019tune]. 

The NAP approach appears capable of synthesizing the different views on the origins of universal sonority. The bottom-up model of NAP can explain the universality of sonority as the natural development of communication systems that exploit pitch perception as they shape language systems. The top-down model of NAP is, at the same time, very much in line with statistical phonotactic learners, in which the regularities of language can be deduced from the symbolic abstractions that reflect the speakers' knowledge in stable forms. 

To conclude, bottom-up NAP combines the innateness claims for formal sonority universals with a more general explanation that is based on the workings of the perceptual and cognitive systems and the evolution of languages as pitch-bearing communication systems. At the same time, top-down NAP is in line with the rationale of statistical learners and the mechanics of connectionist models. These explanations require symbolic interpretation of the signal that abstract from variable dynamic events into stable forms (e.g., consonants, vowels, phonological features) in order to learn and generalize over their distributions.

# Conclusion  {#sec:conclusions}

This work suggests a synergy of novel theoretical and methodological approaches in an attempt to shed new light on old problems in linguistics. 
The novelties that are proposed here for models of sonority, and more generally, for perception-based models in phonology, will need to amass more supporting evidence from multiple sources in order to be more widely considered, and further developed. 
We hope that in this paper we laid the foundations for such potential long-term contributions. 

We demonstrated how our set of proposals results in a model of sonority that can account for some of the most persistent problems in phonological theory.
Our NAP-based models not only present advantages over traditional sonority models in terms of empirical coverage, they also provide functional explanations as to the source and cause of sonority phenomena, linking sonority to pitch intelligibility at the level of the syllable. 

Furthermore, our dual-route modeling strategy (see Subsection \@ref(sec:complementary)) 
makes the important distinction between (bottom-up) signal-based inferences and (top-down) symbol-based inferences, thus appropriately predicting the complementary contribution of these two essentially different inference routes of the same linguistic phenomena.

The more modest success of the continuous bottom-up model of NAP (*NAP~bu~*) compared to its symbolic top-down derivative (*NAP~td~*), is, in our view, impressive. The bottom-up NAP model, as it was implemented here, deriving well-formedness scores directly from the continuous acoustic signal, is already as good as the best traditional models, and it still has many potential paths for improvement. 
For example, the bottom-up NAP model can directly benefit from independent improvements in extraction of periodic energy in digital signal processing, or from improvements in the conception of competition, and the formulas used to estimate or simulate it. We therefore interpret the results of the bottom-up model as very promising.

Lastly, this study also provides compelling motivations and strong evidence that support our proposal to link the notion of sonority with periodic energy in the acoustic signal. This proposal entails that the role of periodic energy is of great importance to linguistic analysis of speech, beyond the scope of the current proposal. A partial list of relevant topics includes the following:
(i) acoustic descriptions of prosodic weight and prosodic *prominence* in terms of periodic energy mass;
(ii) automatic syllabification procedures based on the smoothed fluctuation rates of the periodic energy curve [useful for a myriad of tasks, including speech rhythmicity studies; see, e.g., @galves2002sonoritysk; @tilsen2013speech; @rasanen2018pre; @lin2020hit]; and
(iii) improved visualization and analysis of pitch contours in intonation research based on the interaction of the periodic energy and the F0 trajectories 
[see @albert2018using; @cangemi2019modellingsk; and @albert2020propersk for continuous visualization and quantification procedures in intonational phonology using periodic energy].

<!-- --- -->

# CRediT authorship contribution statement {-}
**AA**: Conceptualization, Data Curation, Investigation, Methodology, Visualization, Writing - Original Draft, Writing - Review & Editing. **BN**: Conceptualization, Formal Analysis, Methodology, Visualization, Writing - Review & Editing.

# Acknowledgements {-}
This work was supported by the DAAD (German Academic Exchange Service) and SFB 1252 Collaborative Research Center “Prominence in Language" at the University of Cologne. We thank Carol Espy-Wilson and her colleagues for sharing the APP Detector code with us, and we thank Yair Lakretz and Doron Veltzer for their related technical support. We are also very grateful to Martine Grice, Doris Mücke, Timo B. Roettger, Francesco Cangemi, Simon Wehrle, Evan Gary Cohen, Joanna Rączaszek-Leonardi, Mauricio Milchberg, and two anonymous reviewers for their many insightful comments on earlier drafts of this work.

# Conflict of Interest {-}
The authors declare no conflict of interest.

\newpage

# References
```{r create_r-references}
r_refs(file = "bibs/r-references.bib")
```

\begingroup
\setlength{\parindent}{-0.5in}
\setlength{\leftskip}{0.5in}

<div id = "refs"></div>
\endgroup

```{r echo = FALSE, results = 'asis', cache = FALSE}
papaja::render_appendix('./appendix_a.Rmd') 
```

```{r echo = FALSE, results = 'asis', cache = FALSE}
papaja::render_appendix('./appendix_b.Rmd')
```