The ISWOC Treebank

As of April 2023, releases of the ISWOC Treebank have moved to https://github.com/syntacticus/syntacticus-treebank-data.

The ISWOC Treebank

The ISWOC Treebank is a dependency treebank with morphosyntactic and information-structure annotation. It includes texts in several older Indo-European languages and is freely available under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

Please cite as

Bech, Kristin and Kristine Eide. 2014. The ISWOC corpus. Department of Literature, Area Studies and European Languages, University of Oslo. http://iswoc.github.com.

Releases of the ISWOC Treebank are hosted on Github.

Text	Language	Filename	Size
Ælfric's Lives of Saints	Old English	æls	3137 tokens
Apollonius of Tyre	Old English	apt	5541 tokens
Anglo-Saxon Chronicles	Old English	chrona	5939 tokens
Orosius	Old English	or	1728 tokens
West-Saxon Gospels	Old English	wscp	13061 tokens
La Vie Saint Eustace	Old French	eustace	2340 tokens
Crónica Geral de Espanha 2-12	Portuguese	cge1	12074 tokens
Crónica Geral de Espanha 155-167	Portuguese	cge2	10547 tokens
Décadas Livro 5, VIII, 9-14	Portuguese	coutdec-v-8	13794 tokens
Crónica de Alfonso XI	Spanish	alfonso-xi	7942 tokens
Crónica de España	Spanish	ce	4627 tokens
El Conde Lucanor	Spanish	cdeluc	17551 tokens
Estoria de Espanna I	Spanish	ee1	9488 tokens
General Estoria parte IV Daniel	Spanish	ge4	9233 tokens
Libro delos claros varones	Spanish	varones	5820 tokens

(The 'size' column in the table above shows the number of annotated tokens in a text. The number of tokens will be slightly larger than the number of words in the original printed edition as some words have been split into multiple tokens and some tokens have been inserted during annotation.)

Please see the XML files for detailed metadata and a full list of contributors.

Data formats

The texts are available on two formats:

PROIEL XML: These files are the authoritative source files and the only ones that contain all available annotation. They contain the complete morphological, syntactic and information-structure annotation, as well as the complete text, including punctuation, section headers etc. The schema is defined in proiel.xsd.
CoNLL-X format

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
alfonso-xi.conll		alfonso-xi.conll
alfonso-xi.xml		alfonso-xi.xml
apt.conll		apt.conll
apt.xml		apt.xml
cdeluc.conll		cdeluc.conll
cdeluc.xml		cdeluc.xml
ce.conll		ce.conll
ce.xml		ce.xml
cge1.conll		cge1.conll
cge1.xml		cge1.xml
cge2.conll		cge2.conll
cge2.xml		cge2.xml
chrona.conll		chrona.conll
chrona.xml		chrona.xml
coutdec-v-8.conll		coutdec-v-8.conll
coutdec-v-8.xml		coutdec-v-8.xml
ee1.conll		ee1.conll
ee1.xml		ee1.xml
eustace.conll		eustace.conll
eustace.xml		eustace.xml
ge4.conll		ge4.conll
ge4.xml		ge4.xml
or.conll		or.conll
or.xml		or.xml
varones.conll		varones.conll
varones.xml		varones.xml
wscp.conll		wscp.conll
wscp.xml		wscp.xml
æls.conll		æls.conll
æls.xml		æls.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The ISWOC Treebank

Contents

Data formats

About

Releases 1

Packages

iswoc/iswoc-treebank

Folders and files

Latest commit

History

Repository files navigation

The ISWOC Treebank

Contents

Data formats

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Packages