Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotating graph-like structures (word families) #22

Open
LinguList opened this issue Jul 7, 2017 · 2 comments
Open

Annotating graph-like structures (word families) #22

LinguList opened this issue Jul 7, 2017 · 2 comments

Comments

@LinguList
Copy link
Contributor

We can annotate motivation of compound structures now, but we can't annotate word families yet. I am thinking of things like "walk" vs. "walker", etc., which are as a default best handled as directed graphs, where a given word form is annotated by adding a source, a relation, and a target. Source and target serve as identifiers for the node in the network.

ID Segments Cognate Set Source Target Relation
1 w a l k 1 walk
2 w a l k + e r 1 walk walker er-nominalization
3 j u m p 2 jump
4 j u m p + e r jump jumper er-nominalization
5 j u m p + e r + s jumper jumpers plural

This may seem similar to the compound motivation we use and which I gave examples for in #21, but it is essentially different in so far as the relation between source and target form may not be linear (think of Umlaut, Ablaut, ellipsis, etc.). So the source form defines an origin of the derivation, which is then rendered as node-ID in the directed network. We could ignore the target if we just use the Word_ID, but I think that the user-defined language-internal IDs are more easy for annotation, also in terms of readability.

As a rule, validation of these relations would require derivation rules, which are usually pretty language-specific and cannot really be handled cross-linguistically. But ideally, an application would have code to derive potential targets.

Working examples for this will be produced soon in CALC, but this is probably nothing we need to consider for the first publication of the CLDF specs.

@LinguList
Copy link
Contributor Author

Here's a graph just illustrating this relation, which is hierarchical, and a monopartite graph rather than the bipartite graph for morpheme graphs of partial colexification.

@LinguList
Copy link
Contributor Author

Just figured why derivation is conceptually different: derivation is always implicitly hierarchical, while just listing parts of a compound is not. So if bark is tree + skin, and dry material for lighting a fire (Zunder in German) is dried + tree + skin (as they use in Scandinavia), the hierarchy is "(dried (tree skin))". So derivation is really different and needs its own spec.

@xrotwang xrotwang added this to the CLDF 2.0 milestone Sep 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants