OrderlyAdverbs

Code and materials associated to the paper "Let's do it orderly: a proposal for a better taxonomy of adverbs in Universal Dependencies, and beyond", by Flavio Massimiliano Cecchini, published on the Prague Bulletin of Mathematical Linguistics 121 of June 2024.

Explanation of files and codes

The main script is ADVextractor.py, which is meant to be launched from this repository by giving the path to a single CoNLL-U file, a folder containing CoNLL-U files, or a mixture of both. The script then proceeds to create a folder which contains different files with statistics about adverbs (ADV) in the data. Outputs for all the treebanks discussed in the paper are already provided, plus for the new Latin CIRCSE treebank.

The script and the tables are admittedly somewhat rough. We notice that, in order to read CoNLL-U files and extract data, an own Python "module" has been deployed, part of a suite developed by the author starting from 2018 which has not been published yet (but hopefully will at some point). Any suggestions to better integrate the code with already existing tools like Udapi are welcome.

ADV_advmod.tsv: the distribution over parts of speech of all syntactic words receiving the advmod relation in the data, and, for every part of speech, the distribution over lemmas.
ADV_coinc.tsv: a list of ADV form types coinciding with forms of other parts of speech in the data, with the given part of speech, the lemma and morphological features of the coinciding form.
ADV_coord.tsv: list of groups of ADV form types of which one appears co-ordinated to at least another one. Since such occurrences are quite rare, the file might be empty. "Nominal-like" ADVs are upper case.
ADV_difflemma.tsv: list of ADVs for which the lemma (third column) differs from the form (second column); the first column shows the transformation that takes the lemma to the form, in terms of deletion and addition of initial and final characters ("prefixes" and "suffixes").
- For example, for Latin UDante, for the couple fecunde/fecundius the transformation 0||1|ius means that 0 characters have to be deleted from the beginning of fecundius and than the empty string has to be added (so nothing changes at the left margin), while the last character (e) has to be deleted, and then the string ius appended. This transformation is seen to be quite common, and can be linguistically interpreted as creating a comparative form for a given inflection class.
ADV_distr.tsv: table showing the patterns of dependencies of an ADV. Beside form type, lemma and absolute frequency among the data, the distribution over the head category is shown. The categories are basically the parts of speech of the heads, with two macrocategories and a special category:
- ROOT: the ADV is itself at the head of a clause (non-expected non-metapredicating behaviour for non-elliptical clauses)
- PRED: the head is a predicate, which includes both verbal and nominal ones (i.e. copulae), synthetic or periphrastic constructions, and also any kind of modifier (ADJ, DET, NUM, ADV)
- NOM: any nominal element (NOUN/PROPN and PRON) which is not part of a predicate
ADV_morpho.tsv: All single couples of morphological features and values that can be associated to ADVs in the data.
ADV_nominals.tsv: all ADVs which receive a nominal dependency relation, shown distributed per form according to each such dependency relation.

Latin adverbs

The subfolder Latin contains a single file ADV_omnia.tsv where each ADV lemma among Latin treebanks is assigned the actual part of sppech of the base it is derived from or instead of which has been mistagged (see §4.4.2 for details). The tag REL, which is not part of UD, but which is discussed in the paper, is also used (and discussed, cf. §5.1.5). Please notice that this enquiry does not take into account the treebank Latin CIRCSE, which appeared after the writing of the paper.

Also, unfortunately, morphological features of each derivation were part of this overview, but are absent due to data loss. They might be (re)added as future work.

References

For any question, do not hesitate to contact the author, as specified in the paper!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
ADV_am_att-ud-test		ADV_am_att-ud-test
ADV_bg_btb-ud-train_bg_btb-ud-dev_bg_btb-ud-test		ADV_bg_btb-ud-train_bg_btb-ud-dev_bg_btb-ud-test
ADV_de_hdt-ud-train-a-1_de_hdt-ud-train-a-2_de_hdt-ud-train-b-1_de_hdt-ud-dev_de_hdt-ud-train-b-2_de_hdt-ud-test		ADV_de_hdt-ud-train-a-1_de_hdt-ud-train-a-2_de_hdt-ud-train-b-1_de_hdt-ud-dev_de_hdt-ud-train-b-2_de_hdt-ud-test
ADV_el_gdt-ud-test_el_gdt-ud-train_el_gdt-ud-dev		ADV_el_gdt-ud-test_el_gdt-ud-train_el_gdt-ud-dev
ADV_en_ewt-ud-dev_en_ewt-ud-train_en_ewt-ud-test		ADV_en_ewt-ud-dev_en_ewt-ud-train_en_ewt-ud-test
ADV_fo_farpahc-ud-test_fo_farpahc-ud-dev_fo_farpahc-ud-train_1928.ntacts.rel-bib_1936.ntjohn.rel-bib_fo_farpahc-ud-test_fo_farpahc-ud-dev_fo_farpahc-ud-train		ADV_fo_farpahc-ud-test_fo_farpahc-ud-dev_fo_farpahc-ud-train_1928.ntacts.rel-bib_1936.ntjohn.rel-bib_fo_farpahc-ud-test_fo_farpahc-ud-dev_fo_farpahc-ud-train
ADV_hyw_armtdp-ud-train_hyw_armtdp-ud-dev_hyw_armtdp-ud-test		ADV_hyw_armtdp-ud-train_hyw_armtdp-ud-dev_hyw_armtdp-ud-test
ADV_la_circse-ud-test		ADV_la_circse-ud-test
ADV_la_ittb-ud-dev_la_ittb-ud-train_la_ittb-ud-test		ADV_la_ittb-ud-dev_la_ittb-ud-train_la_ittb-ud-test
ADV_la_llct-ud-dev_la_llct-ud-test_la_llct-ud-train		ADV_la_llct-ud-dev_la_llct-ud-test_la_llct-ud-train
ADV_la_perseus-ud-train_la_perseus-ud-test		ADV_la_perseus-ud-train_la_perseus-ud-test
ADV_la_proiel-ud-test_la_proiel-ud-dev_la_proiel-ud-train		ADV_la_proiel-ud-test_la_proiel-ud-dev_la_proiel-ud-train
ADV_la_udante-ud-train_la_udante-ud-test_la_udante-ud-dev		ADV_la_udante-ud-train_la_udante-ud-test_la_udante-ud-dev
ADV_quc_iu-ud-test		ADV_quc_iu-ud-test
ADV_th_pud-ud-test		ADV_th_pud-ud-test
Latin		Latin
Tools		Tools
ADVextractor.py		ADVextractor.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OrderlyAdverbs

Explanation of files and codes

Latin adverbs

References

About

Releases

Packages

Languages

Stormur/OrderlyAdverbs

Folders and files

Latest commit

History

Repository files navigation

OrderlyAdverbs

Explanation of files and codes

Latin adverbs

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages