Skip to content

Correlation between word segmentation on child directed speech and reported infants' word understanding in several languages

Notifications You must be signed in to change notification settings

bootphon/XLingCorrelation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XLingCorrelation

TODO

  • tests
  • documentation
  • group all preprocessing files into one folder :
    • all_cha
    • select, clean -> ortholines.txt
    • phono -> phono.txt
    • syllabify -> syllabified.txt
    • auto-tags -> auto-tags.txt
    • build grammars

Goal

To have an easy to use package to compute correlation between algorithms segmentation and CDI reports describe project

Files

Corpus.py

Handles .cha (or other ? or nothing at all, just clean ortho file ? or just tags.txt ?) Can (or can't) phonologize and syllabify (which languages ? -none for now, except for English some time soon)

  • Get nb of words, phones, syllables in corpus

  • Get nb of single word utterances

  • Get stats on corpus

  • Store+stats ortho, gold

Segmented.py

Given segmented, ortho, gold

  • Get dict from phono to ortho (rather build it from CDI ?)

  • Nb/list of words, syllables, phones

  • Freq_top, freq_words, write these in files

  • True pos and all

  • Evaluation (f-score &cie)

  • Correct words

  • Incorrect words

  • POS tagging ?

Model.py

translate.py

About

Correlation between word segmentation on child directed speech and reported infants' word understanding in several languages

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published