Releases · dkpro/dkpro-core

18 Jul 12:00

reckart

de.tudarmstadt.ukp.dkpro.core-gpl-1.6.0

b493142

DKPro Core 1.6.0 (GPL)

We are pleased to announce the release of

DKPro Core, version 1.6.0 (ASL & GPL)

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

DKPro Core 1.6.0 requires Java 7.

New components

opennlp - OpenNLP chunker component added

New I/O modules/components

io.bliki - Reader for Wikipedia
io.tiger - Writer for the Tiger XML format (also also supports semantic frame annotations)

Further highlights in this release include:

Upgrade to ClearNLP 2.0.2
Upgrade to Stanford CoreNLP 3.3.1 including the new CVG models
Upgrade to TT4J 1.1.2
Upgrade to LanguageTool 2.5

Also, this release supports many additional models for various
components and brings the usual set of bug fixes and minor
improvements.

A more detailed overview of the changes in this release can be found here.

This release was first planned as a bugfix release for DKPro Core 1.5.0,
but we decided to call it 1.6.0 because it depends on Java 7 now instead
of Java 6.

As Google Code has recently disabled downloads, we do currently not
provide a ZIP file with all DKPro Core JARs for non-Maven users.

When upgrading, please mind that you should not mix different versions
of DKPro Core components in your projects - they may not be compatible
with each other.

Assets 2

18 Jul 11:59

reckart

de.tudarmstadt.ukp.dkpro.core-gpl-1.6.1

5e10b1c

DKPro Core 1.6.1 (GPL)

We are pleased to announce the release of

DKPro Core, version 1.6.1 (ASL & GPL)

a collection of interoperable software components for natural language processing

(NLP) based on the Apache UIMA framework.

Changed requirements:

UIMAJ SDK 2.6.0
uimaFIT 2.1.0

Major improvements:

Many writers can now write to ZIP files
Better support for reading/writing binary CAS formats

Major bug fixes:

treetagger - NPE when explicitly specifying a model
stanfordnlp - StanfordPosTagger not applying PTB3 escaping

A more detailed overview of the changes in this release can be found here.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

Assets 2

18 Jul 11:58

reckart

de.tudarmstadt.ukp.dkpro.core-asl-1.6.1

3349054

DKPro Core 1.6.1 (ASL)

We are pleased to announce the release of

DKPro Core, version 1.6.1 (ASL & GPL)

a collection of interoperable software components for natural language processing

(NLP) based on the Apache UIMA framework.

Changed requirements:

UIMAJ SDK 2.6.0
uimaFIT 2.1.0

Major improvements:

Many writers can now write to ZIP files
Better support for reading/writing binary CAS formats

Major bug fixes:

treetagger - NPE when explicitly specifying a model
stanfordnlp - StanfordPosTagger not applying PTB3 escaping

A more detailed overview of the changes in this release can be found here.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

Assets 2

18 Jul 12:01

reckart

de.tudarmstadt.ukp.dkpro.core-asl-1.6.0

b9c6579

DKPro Core 1.6.0 (ASL)

We are pleased to announce the release of

DKPro Core, version 1.6.0 (ASL & GPL)

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

DKPro Core 1.6.0 requires Java 7.

New components

opennlp - OpenNLP chunker component added

New I/O modules/components

io.bliki - Reader for Wikipedia
io.tiger - Writer for the Tiger XML format (also also supports semantic frame annotations)

Further highlights in this release include:

Upgrade to ClearNLP 2.0.2
Upgrade to Stanford CoreNLP 3.3.1 including the new CVG models
Upgrade to TT4J 1.1.2
Upgrade to LanguageTool 2.5

Also, this release supports many additional models for various
components and brings the usual set of bug fixes and minor
improvements.

A more detailed overview of the changes in this release can be found here.

This release was first planned as a bugfix release for DKPro Core 1.5.0,
but we decided to call it 1.6.0 because it depends on Java 7 now instead
of Java 6.

As Google Code has recently disabled downloads, we do currently not
provide a ZIP file with all DKPro Core JARs for non-Maven users.

When upgrading, please mind that you should not mix different versions
of DKPro Core components in your projects - they may not be compatible
with each other.

Assets 2

18 Jul 12:03

reckart

de.tudarmstadt.ukp.dkpro.core-gpl-1.5.0

052e061

DKPro Core 1.5.0 (GPL)

We are pleased to announce the release of

DKPro Core, version 1.5.0 (ASL & GPL)

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

The release brings new modules to DKPro Core:

New API modules

api.phonetics - Annotation types for the phonetic level
api.semantics - Annotation types for semantic information (semantic fields and semantic role labelling)

New I/O modules

io.conll - Reader and writer for the CONLL 2006 format
io.tcf - Reader and writer for the CLARIN TCF format
io.tgrep - Writer for TGrep2 corpus files
io.tiger - Reader for the Tiger XML format

New analysis modules

commonscodec - Phonetic transcription based on the Apache Commons Codec library
decompounding - Flexible set of components for decompounding, based on different splitting and ranking algorithms
mate-tools - Wrapper for the mate-tools suite
morpha - Wrapper for the morpha stemmer/lemmatizer
mstparser - Wrapper for the mstparser
sfst - New module for SFST-based morphological analyzers
umlautnormalizer - Normalizer for umlauts in German texts (ASL)

Further highlights in this release include:

Added support for resolving models from remote repositories at runtime
Added @TypeCapabilities annotations declaring which annotations they consume and produce
Added auto-generated XML descriptors for UIMA components (via uimafit-maven-plugin)
Added support for ClearNLP Semantic Role Labelling
Added support for GATE Hepple POS tagger
Added support for OpenNLP parser and name finder
Upgrade to Apache uimaFIT 2.0.0
Upgrade to Apache UIMA 2.4.2
Updated to ArkTweet-NLP 0.3.2
Upgrade to ClearNLP 1.3.1
Upgrade to CoreNLP 3.2.0
Upgrade to GATE 7.1
Upgrade to jweb1t 1.3.0
Upgrade to LanguageTool 2.2
Upgrade to Maltparser 1.7.2
Upgrade to Mate-Tools anna 3.5
Upgrade to OpenNLP 1.5.3

Some modules are no longer maintained and were not considered of use for the general public, e.g. the io.mmax2 module and the io.wsdl module. They have been retired and are not included in this release.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

Assets 2

18 Jul 12:03

reckart

de.tudarmstadt.ukp.dkpro.core-asl-1.5.0

6fd1633

DKPro Core 1.5.0 (ASL)

We are pleased to announce the release of

DKPro Core, version 1.5.0 (ASL & GPL)

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

The release brings new modules to DKPro Core:

New API modules

api.phonetics - Annotation types for the phonetic level
api.semantics - Annotation types for semantic information (semantic fields and semantic role labelling)

New I/O modules

io.conll - Reader and writer for the CONLL 2006 format
io.tcf - Reader and writer for the CLARIN TCF format
io.tgrep - Writer for TGrep2 corpus files
io.tiger - Reader for the Tiger XML format

New analysis modules

commonscodec - Phonetic transcription based on the Apache Commons Codec library
decompounding - Flexible set of components for decompounding, based on different splitting and ranking algorithms
mate-tools - Wrapper for the mate-tools suite
morpha - Wrapper for the morpha stemmer/lemmatizer
mstparser - Wrapper for the mstparser
sfst - New module for SFST-based morphological analyzers
umlautnormalizer - Normalizer for umlauts in German texts (ASL)

Further highlights in this release include:

Added support for resolving models from remote repositories at runtime
Added @TypeCapabilities annotations declaring which annotations they consume and produce
Added auto-generated XML descriptors for UIMA components (via uimafit-maven-plugin)
Added support for ClearNLP Semantic Role Labelling
Added support for GATE Hepple POS tagger
Added support for OpenNLP parser and name finder
Upgrade to Apache uimaFIT 2.0.0
Upgrade to Apache UIMA 2.4.2
Updated to ArkTweet-NLP 0.3.2
Upgrade to ClearNLP 1.3.1
Upgrade to CoreNLP 3.2.0
Upgrade to GATE 7.1
Upgrade to jweb1t 1.3.0
Upgrade to LanguageTool 2.2
Upgrade to Maltparser 1.7.2
Upgrade to Mate-Tools anna 3.5
Upgrade to OpenNLP 1.5.3

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

Assets 2

18 Jul 12:13

reckart

de.tudarmstadt.ukp.dkpro.core-gpl-1.4.0

7a9135d

DKPro Core 1.4.0 (GPL)

DKPro Core ASL 1.4.0 is a major release. It is not expected to be binary backwards compatible with previous versions. This page lists some of the more important changes and caveats regarding this version. For more detailed information on the changes in this version, please refer to the issue tracker or to the GIT history.

General

First release on Maven Central.
New infrastructure and parameters for loading models and configuring type mapping.
New versioning scheme for models and standardized model artifact names.
Changed various parameters in all components to follow a common naming scheme.
Added support to print tagset information when a model is loaded in most components
Added various new and updated mappings for POS tags (Chinese, English, Estonian, French, German, ...).
Added modules to support unit tests and to measure performance
Fixed problems with paths containing spaces and paths on Windows systems
Added support for POS mapping in various readers
Changed types for compound words

Analysis modules

Added modules
- Mecab - part-of-speech tagger for Japanese
- !MaltParser - dependency parser

CAS transformation module

Fixed several bugs
Improved performance (slightly)

JWordSplitter integration module

Changed to current JWordSplitter version

LanguageTool integration module

Changed to LanguageTool 1.9
Added LanguageToolSegmenter - supports many languages

OpenNLP integration module

Added OpenNlpParser
Added OpenNlpPosTagger
Added OpenNlpSegmenter

Tokenizer module

Added !TokenMerger component to merge multi-word named entities into a single token

I/O modules

Added modules
- bincas - binary serialization for CAS, much faster than XMI
- !JdbcReader - generic reader for SQL databases
Added parameter to escape characters when document ID is used as a file name.
Added support for more compression methods in most writers
Added support for custom Spring resource loaders, e.g. to support reading from HDFS using Spring Hadoop

IMS Open Corpus Workbench Support

Added support compressing using cwb-huffcode and cwb-compress-rdx
Added support to write coarse part-of-speech tags (DKPro type names)
Added support to better handle !WaCky corpora, e.g. to generate document IDs

PDF support

Changed to pdfbox 1.7.0
Changed to extend !ResourceCollectionReaderBase
Added parameters to control start and end page
Added support for progress information

Text support

Added parameter to control extension of written text files

TEI support

Added support for Digitale Bibliothek of !TextGrid
Added support for TEI documents with multiple texts inside.

Web1t support

Added option to generate lowercase ngrams (off by default)
Added option to generate jweb1t indexed (on by default)
Added possibility to write all ngrams to one file by setting threshold to 0 or negative

Wikipedia support

Added support for reading a predefined set of revisions with the WikipediaRevisionReader.
Improved template cleaning

Assets 2

18 Jul 12:13

reckart

de.tudarmstadt.ukp.dkpro.core-asl-1.4.0

c7d91c0

DKPro Core 1.4.0 (ASL)

General

First release on Maven Central.
New infrastructure and parameters for loading models and configuring type mapping.
New versioning scheme for models and standardized model artifact names.
Changed various parameters in all components to follow a common naming scheme.
Added support to print tagset information when a model is loaded in most components
Added various new and updated mappings for POS tags (Chinese, English, Estonian, French, German, ...).
Added modules to support unit tests and to measure performance
Fixed problems with paths containing spaces and paths on Windows systems
Added support for POS mapping in various readers
Changed types for compound words

Analysis modules

Added modules
- Mecab - part-of-speech tagger for Japanese
- !MaltParser - dependency parser

CAS transformation module

Fixed several bugs
Improved performance (slightly)

JWordSplitter integration module

Changed to current JWordSplitter version

LanguageTool integration module

Changed to LanguageTool 1.9
Added LanguageToolSegmenter - supports many languages

OpenNLP integration module

Added OpenNlpParser
Added OpenNlpPosTagger
Added OpenNlpSegmenter

Tokenizer module

Added !TokenMerger component to merge multi-word named entities into a single token

I/O modules

Added modules
- bincas - binary serialization for CAS, much faster than XMI
- !JdbcReader - generic reader for SQL databases
Added parameter to escape characters when document ID is used as a file name.
Added support for more compression methods in most writers
Added support for custom Spring resource loaders, e.g. to support reading from HDFS using Spring Hadoop

IMS Open Corpus Workbench Support

Added support compressing using cwb-huffcode and cwb-compress-rdx
Added support to write coarse part-of-speech tags (DKPro type names)
Added support to better handle !WaCky corpora, e.g. to generate document IDs

PDF support

Changed to pdfbox 1.7.0
Changed to extend !ResourceCollectionReaderBase
Added parameters to control start and end page
Added support for progress information

Text support

Added parameter to control extension of written text files

TEI support

Added support for Digitale Bibliothek of !TextGrid
Added support for TEI documents with multiple texts inside.

Web1t support

Added option to generate lowercase ngrams (off by default)
Added option to generate jweb1t indexed (on by default)
Added possibility to write all ngrams to one file by setting threshold to 0 or negative

Wikipedia support

Added support for reading a predefined set of revisions with the WikipediaRevisionReader.
Improved template cleaning

Assets 2

18 Jul 12:15

reckart

de.tudarmstadt.ukp.dkpro.core-gpl-1.2.0

13df0b0

DKPro Core 1.2.0 (GPL)

DKPro Core ASL 1.3.0 is a major release. It is not expected to be binary backwards compatible with previous versions. This page lists some of the more important changes and caveats regarding this version. For more detailed information on the changes in this version, please refer to the issue tracker or to the GIT history.

General

Fixed several issues with DocumentMetaData.
Changed features of some DKPro Types so that they start with a lower-case letter. Breaks XMI file compatibility with older DKPro versions.
Added new base class JCasFileWriter_ImplBase

Analysis modules

TreeTagger integration

Upgraded to TT4J 1.0.16 to support chinese model.
Works with any model now, even if no mapping is provided.

I/O modules

British National Corpus Support

Added BncReader.

IMS Open Corpus Workbench Support

Added ImsCwbReader and ImsCwbWriter.
ImsCwbWriter can use a local CWB installation to directly write the index format.

NeGra Export Format support

Can now read version 3 files (TIGER Corpus).
Fixed several issues.

TEI support

Added TeiReader mainly to be able to read text from the TEI version of the Brown Corpus for now.

Web1t support

New Web1TFormatWriter which uses an external sort mechanism to support larger n-gram models.

Wikipedia support

Upgraded to JWPL 0.9.0.
Added WikipediaPageReader, which reads articles and discussion pages.

Assets 2

18 Jul 12:15

reckart

de.tudarmstadt.ukp.dkpro.core-asl-1.2.0

ecef416

DKPro Core 1.2.0 (ASL)

General

Fixed several issues with DocumentMetaData.
Changed features of some DKPro Types so that they start with a lower-case letter. Breaks XMI file compatibility with older DKPro versions.
Added new base class JCasFileWriter_ImplBase

Analysis modules

TreeTagger integration

Upgraded to TT4J 1.0.16 to support chinese model.
Works with any model now, even if no mapping is provided.

I/O modules

British National Corpus Support

Added BncReader.

IMS Open Corpus Workbench Support

Added ImsCwbReader and ImsCwbWriter.
ImsCwbWriter can use a local CWB installation to directly write the index format.

NeGra Export Format support

Can now read version 3 files (TIGER Corpus).
Fixed several issues.

TEI support

Added TeiReader mainly to be able to read text from the TEI version of the Brown Corpus for now.

Web1t support

New Web1TFormatWriter which uses an external sort mechanism to support larger n-gram models.

Wikipedia support

Upgraded to JWPL 0.9.0.
Added WikipediaPageReader, which reads articles and discussion pages.

Assets 2

Releases: dkpro/dkpro-core

DKPro Core 1.6.0 (GPL)

DKPro Core, version 1.6.0 (ASL & GPL)

New components

New I/O modules/components

DKPro Core 1.6.1 (GPL)

DKPro Core, version 1.6.1 (ASL & GPL)

DKPro Core 1.6.1 (ASL)

DKPro Core, version 1.6.1 (ASL & GPL)

DKPro Core 1.6.0 (ASL)

DKPro Core, version 1.6.0 (ASL & GPL)

New components

New I/O modules/components

DKPro Core 1.5.0 (GPL)

DKPro Core, version 1.5.0 (ASL & GPL)

New API modules

New I/O modules

New analysis modules

DKPro Core 1.5.0 (ASL)

DKPro Core, version 1.5.0 (ASL & GPL)

New API modules

New I/O modules

New analysis modules

DKPro Core 1.4.0 (GPL)

General

Analysis modules

CAS transformation module

JWordSplitter integration module

LanguageTool integration module

OpenNLP integration module

Tokenizer module

I/O modules

IMS Open Corpus Workbench Support

PDF support

Text support

TEI support

Web1t support

Wikipedia support

DKPro Core 1.4.0 (ASL)

General

Analysis modules

CAS transformation module

JWordSplitter integration module

LanguageTool integration module

OpenNLP integration module

Tokenizer module

I/O modules

IMS Open Corpus Workbench Support

PDF support

Text support

TEI support

Web1t support

Wikipedia support

DKPro Core 1.2.0 (GPL)

General

Analysis modules

TreeTagger integration

I/O modules

British National Corpus Support

IMS Open Corpus Workbench Support

NeGra Export Format support

TEI support

Web1t support

Wikipedia support

DKPro Core 1.2.0 (ASL)

General

Analysis modules

TreeTagger integration

I/O modules

British National Corpus Support

IMS Open Corpus Workbench Support

NeGra Export Format support

TEI support

Web1t support

Wikipedia support