Annotating graph-like structures (word families) #22

LinguList · 2017-07-07T13:56:59Z

We can annotate motivation of compound structures now, but we can't annotate word families yet. I am thinking of things like "walk" vs. "walker", etc., which are as a default best handled as directed graphs, where a given word form is annotated by adding a source, a relation, and a target. Source and target serve as identifiers for the node in the network.

ID	Segments	Cognate Set	Source	Target	Relation
1	w a l k	1		walk
2	w a l k + e r	1	walk	walker	er-nominalization
3	j u m p	2		jump
4	j u m p + e r	jump	jumper	er-nominalization
5	j u m p + e r + s	jumper	jumpers	plural

This may seem similar to the compound motivation we use and which I gave examples for in #21, but it is essentially different in so far as the relation between source and target form may not be linear (think of Umlaut, Ablaut, ellipsis, etc.). So the source form defines an origin of the derivation, which is then rendered as node-ID in the directed network. We could ignore the target if we just use the Word_ID, but I think that the user-defined language-internal IDs are more easy for annotation, also in terms of readability.

As a rule, validation of these relations would require derivation rules, which are usually pretty language-specific and cannot really be handled cross-linguistically. But ideally, an application would have code to derive potential targets.

Working examples for this will be produced soon in CALC, but this is probably nothing we need to consider for the first publication of the CLDF specs.

The text was updated successfully, but these errors were encountered:

LinguList · 2017-07-07T14:05:01Z

Here's a graph just illustrating this relation, which is hierarchical, and a monopartite graph rather than the bipartite graph for morpheme graphs of partial colexification.

LinguList · 2017-07-07T14:32:08Z

Just figured why derivation is conceptually different: derivation is always implicitly hierarchical, while just listing parts of a compound is not. So if bark is tree + skin, and dry material for lighting a fire (Zunder in German) is dried + tree + skin (as they use in Scandinavia), the hierarchy is "(dried (tree skin))". So derivation is really different and needs its own spec.

xrotwang added this to the CLDF 2.0 milestone Sep 20, 2017

xrotwang added the data modeling label Mar 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotating graph-like structures (word families) #22

Annotating graph-like structures (word families) #22

LinguList commented Jul 7, 2017

LinguList commented Jul 7, 2017

LinguList commented Jul 7, 2017

Annotating graph-like structures (word families) #22

Annotating graph-like structures (word families) #22

Comments

LinguList commented Jul 7, 2017

LinguList commented Jul 7, 2017

LinguList commented Jul 7, 2017