Skip to content

Releases: dkpro/dkpro-core

DKPro Core 1.6.0 (GPL)

18 Jul 12:00
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core, version 1.6.0 (ASL & GPL)

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

DKPro Core 1.6.0 requires Java 7.

New components

  • opennlp - OpenNLP chunker component added

New I/O modules/components

  • io.bliki - Reader for Wikipedia
  • io.tiger - Writer for the Tiger XML format (also also supports semantic frame annotations)

Further highlights in this release include:

  • Upgrade to ClearNLP 2.0.2
  • Upgrade to Stanford CoreNLP 3.3.1 including the new CVG models
  • Upgrade to TT4J 1.1.2
  • Upgrade to LanguageTool 2.5

Also, this release supports many additional models for various
components and brings the usual set of bug fixes and minor
improvements.

A more detailed overview of the changes in this release can be found here.

This release was first planned as a bugfix release for DKPro Core 1.5.0,
but we decided to call it 1.6.0 because it depends on Java 7 now instead
of Java 6.

As Google Code has recently disabled downloads, we do currently not
provide a ZIP file with all DKPro Core JARs for non-Maven users.

When upgrading, please mind that you should not mix different versions
of DKPro Core components in your projects - they may not be compatible
with each other.

DKPro Core 1.6.1 (GPL)

18 Jul 11:59
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core, version 1.6.1 (ASL & GPL)

a collection of interoperable software components for natural language processing

(NLP) based on the Apache UIMA framework.

Changed requirements:

  • UIMAJ SDK 2.6.0
  • uimaFIT 2.1.0

Major improvements:

  • Many writers can now write to ZIP files
  • Better support for reading/writing binary CAS formats

Major bug fixes:

  • treetagger - NPE when explicitly specifying a model
  • stanfordnlp - StanfordPosTagger not applying PTB3 escaping

A more detailed overview of the changes in this release can be found here.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.6.1 (ASL)

18 Jul 11:58
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core, version 1.6.1 (ASL & GPL)

a collection of interoperable software components for natural language processing

(NLP) based on the Apache UIMA framework.

Changed requirements:

  • UIMAJ SDK 2.6.0
  • uimaFIT 2.1.0

Major improvements:

  • Many writers can now write to ZIP files
  • Better support for reading/writing binary CAS formats

Major bug fixes:

  • treetagger - NPE when explicitly specifying a model
  • stanfordnlp - StanfordPosTagger not applying PTB3 escaping

A more detailed overview of the changes in this release can be found here.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.6.0 (ASL)

18 Jul 12:01
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core, version 1.6.0 (ASL & GPL)

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

DKPro Core 1.6.0 requires Java 7.

New components

  • opennlp - OpenNLP chunker component added

New I/O modules/components

  • io.bliki - Reader for Wikipedia
  • io.tiger - Writer for the Tiger XML format (also also supports semantic frame annotations)

Further highlights in this release include:

  • Upgrade to ClearNLP 2.0.2
  • Upgrade to Stanford CoreNLP 3.3.1 including the new CVG models
  • Upgrade to TT4J 1.1.2
  • Upgrade to LanguageTool 2.5

Also, this release supports many additional models for various
components and brings the usual set of bug fixes and minor
improvements.

A more detailed overview of the changes in this release can be found here.

This release was first planned as a bugfix release for DKPro Core 1.5.0,
but we decided to call it 1.6.0 because it depends on Java 7 now instead
of Java 6.

As Google Code has recently disabled downloads, we do currently not
provide a ZIP file with all DKPro Core JARs for non-Maven users.

When upgrading, please mind that you should not mix different versions
of DKPro Core components in your projects - they may not be compatible
with each other.

DKPro Core 1.5.0 (GPL)

18 Jul 12:03
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core, version 1.5.0 (ASL & GPL)

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

The release brings new modules to DKPro Core:

New API modules

  • api.phonetics - Annotation types for the phonetic level
  • api.semantics - Annotation types for semantic information (semantic fields and semantic role labelling)

New I/O modules

  • io.conll - Reader and writer for the CONLL 2006 format
  • io.tcf - Reader and writer for the CLARIN TCF format
  • io.tgrep - Writer for TGrep2 corpus files
  • io.tiger - Reader for the Tiger XML format

New analysis modules

  • commonscodec - Phonetic transcription based on the Apache Commons Codec library
  • decompounding - Flexible set of components for decompounding, based on different splitting and ranking algorithms
  • mate-tools - Wrapper for the mate-tools suite
  • morpha - Wrapper for the morpha stemmer/lemmatizer
  • mstparser - Wrapper for the mstparser
  • sfst - New module for SFST-based morphological analyzers
  • umlautnormalizer - Normalizer for umlauts in German texts (ASL)

Further highlights in this release include:

  • Added support for resolving models from remote repositories at runtime
  • Added @TypeCapabilities annotations declaring which annotations they consume and produce
  • Added auto-generated XML descriptors for UIMA components (via uimafit-maven-plugin)
  • Added support for ClearNLP Semantic Role Labelling
  • Added support for GATE Hepple POS tagger
  • Added support for OpenNLP parser and name finder
  • Upgrade to Apache uimaFIT 2.0.0
  • Upgrade to Apache UIMA 2.4.2
  • Updated to ArkTweet-NLP 0.3.2
  • Upgrade to ClearNLP 1.3.1
  • Upgrade to CoreNLP 3.2.0
  • Upgrade to GATE 7.1
  • Upgrade to jweb1t 1.3.0
  • Upgrade to LanguageTool 2.2
  • Upgrade to Maltparser 1.7.2
  • Upgrade to Mate-Tools anna 3.5
  • Upgrade to OpenNLP 1.5.3

Some modules are no longer maintained and were not considered of use for the general public, e.g. the io.mmax2 module and the io.wsdl module. They have been retired and are not included in this release.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.5.0 (ASL)

18 Jul 12:03
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core, version 1.5.0 (ASL & GPL)

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

The release brings new modules to DKPro Core:

New API modules

  • api.phonetics - Annotation types for the phonetic level
  • api.semantics - Annotation types for semantic information (semantic fields and semantic role labelling)

New I/O modules

  • io.conll - Reader and writer for the CONLL 2006 format
  • io.tcf - Reader and writer for the CLARIN TCF format
  • io.tgrep - Writer for TGrep2 corpus files
  • io.tiger - Reader for the Tiger XML format

New analysis modules

  • commonscodec - Phonetic transcription based on the Apache Commons Codec library
  • decompounding - Flexible set of components for decompounding, based on different splitting and ranking algorithms
  • mate-tools - Wrapper for the mate-tools suite
  • morpha - Wrapper for the morpha stemmer/lemmatizer
  • mstparser - Wrapper for the mstparser
  • sfst - New module for SFST-based morphological analyzers
  • umlautnormalizer - Normalizer for umlauts in German texts (ASL)

Further highlights in this release include:

  • Added support for resolving models from remote repositories at runtime
  • Added @TypeCapabilities annotations declaring which annotations they consume and produce
  • Added auto-generated XML descriptors for UIMA components (via uimafit-maven-plugin)
  • Added support for ClearNLP Semantic Role Labelling
  • Added support for GATE Hepple POS tagger
  • Added support for OpenNLP parser and name finder
  • Upgrade to Apache uimaFIT 2.0.0
  • Upgrade to Apache UIMA 2.4.2
  • Updated to ArkTweet-NLP 0.3.2
  • Upgrade to ClearNLP 1.3.1
  • Upgrade to CoreNLP 3.2.0
  • Upgrade to GATE 7.1
  • Upgrade to jweb1t 1.3.0
  • Upgrade to LanguageTool 2.2
  • Upgrade to Maltparser 1.7.2
  • Upgrade to Mate-Tools anna 3.5
  • Upgrade to OpenNLP 1.5.3

Some modules are no longer maintained and were not considered of use for the general public, e.g. the io.mmax2 module and the io.wsdl module. They have been retired and are not included in this release.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.4.0 (GPL)

18 Jul 12:13
Compare
Choose a tag to compare

DKPro Core ASL 1.4.0 is a major release. It is not expected to be binary backwards compatible with previous versions. This page lists some of the more important changes and caveats regarding this version. For more detailed information on the changes in this version, please refer to the issue tracker or to the GIT history.

General

  • First release on Maven Central.
  • New infrastructure and parameters for loading models and configuring type mapping.
  • New versioning scheme for models and standardized model artifact names.
  • Changed various parameters in all components to follow a common naming scheme.
  • Added support to print tagset information when a model is loaded in most components
  • Added various new and updated mappings for POS tags (Chinese, English, Estonian, French, German, ...).
  • Added modules to support unit tests and to measure performance
  • Fixed problems with paths containing spaces and paths on Windows systems
  • Added support for POS mapping in various readers
  • Changed types for compound words

Analysis modules

  • Added modules
    • Mecab - part-of-speech tagger for Japanese
    • !MaltParser - dependency parser
CAS transformation module
  • Fixed several bugs
  • Improved performance (slightly)
JWordSplitter integration module
  • Changed to current JWordSplitter version
LanguageTool integration module
  • Changed to LanguageTool 1.9
  • Added LanguageToolSegmenter - supports many languages
OpenNLP integration module
  • Added OpenNlpParser
  • Added OpenNlpPosTagger
  • Added OpenNlpSegmenter
Tokenizer module
  • Added !TokenMerger component to merge multi-word named entities into a single token

I/O modules

  • Added modules
    • bincas - binary serialization for CAS, much faster than XMI
    • !JdbcReader - generic reader for SQL databases
  • Added parameter to escape characters when document ID is used as a file name.
  • Added support for more compression methods in most writers
  • Added support for custom Spring resource loaders, e.g. to support reading from HDFS using Spring Hadoop
IMS Open Corpus Workbench Support
  • Added support compressing using cwb-huffcode and cwb-compress-rdx
  • Added support to write coarse part-of-speech tags (DKPro type names)
  • Added support to better handle !WaCky corpora, e.g. to generate document IDs
PDF support
  • Changed to pdfbox 1.7.0
  • Changed to extend !ResourceCollectionReaderBase
  • Added parameters to control start and end page
  • Added support for progress information
Text support
  • Added parameter to control extension of written text files
TEI support
  • Added support for Digitale Bibliothek of !TextGrid
  • Added support for TEI documents with multiple texts inside.
Web1t support
  • Added option to generate lowercase ngrams (off by default)
  • Added option to generate jweb1t indexed (on by default)
  • Added possibility to write all ngrams to one file by setting threshold to 0 or negative
Wikipedia support
  • Added support for reading a predefined set of revisions with the WikipediaRevisionReader.
  • Improved template cleaning

DKPro Core 1.4.0 (ASL)

18 Jul 12:13
Compare
Choose a tag to compare

DKPro Core ASL 1.4.0 is a major release. It is not expected to be binary backwards compatible with previous versions. This page lists some of the more important changes and caveats regarding this version. For more detailed information on the changes in this version, please refer to the issue tracker or to the GIT history.

General

  • First release on Maven Central.
  • New infrastructure and parameters for loading models and configuring type mapping.
  • New versioning scheme for models and standardized model artifact names.
  • Changed various parameters in all components to follow a common naming scheme.
  • Added support to print tagset information when a model is loaded in most components
  • Added various new and updated mappings for POS tags (Chinese, English, Estonian, French, German, ...).
  • Added modules to support unit tests and to measure performance
  • Fixed problems with paths containing spaces and paths on Windows systems
  • Added support for POS mapping in various readers
  • Changed types for compound words

Analysis modules

  • Added modules
    • Mecab - part-of-speech tagger for Japanese
    • !MaltParser - dependency parser
CAS transformation module
  • Fixed several bugs
  • Improved performance (slightly)
JWordSplitter integration module
  • Changed to current JWordSplitter version
LanguageTool integration module
  • Changed to LanguageTool 1.9
  • Added LanguageToolSegmenter - supports many languages
OpenNLP integration module
  • Added OpenNlpParser
  • Added OpenNlpPosTagger
  • Added OpenNlpSegmenter
Tokenizer module
  • Added !TokenMerger component to merge multi-word named entities into a single token

I/O modules

  • Added modules
    • bincas - binary serialization for CAS, much faster than XMI
    • !JdbcReader - generic reader for SQL databases
  • Added parameter to escape characters when document ID is used as a file name.
  • Added support for more compression methods in most writers
  • Added support for custom Spring resource loaders, e.g. to support reading from HDFS using Spring Hadoop
IMS Open Corpus Workbench Support
  • Added support compressing using cwb-huffcode and cwb-compress-rdx
  • Added support to write coarse part-of-speech tags (DKPro type names)
  • Added support to better handle !WaCky corpora, e.g. to generate document IDs
PDF support
  • Changed to pdfbox 1.7.0
  • Changed to extend !ResourceCollectionReaderBase
  • Added parameters to control start and end page
  • Added support for progress information
Text support
  • Added parameter to control extension of written text files
TEI support
  • Added support for Digitale Bibliothek of !TextGrid
  • Added support for TEI documents with multiple texts inside.
Web1t support
  • Added option to generate lowercase ngrams (off by default)
  • Added option to generate jweb1t indexed (on by default)
  • Added possibility to write all ngrams to one file by setting threshold to 0 or negative
Wikipedia support
  • Added support for reading a predefined set of revisions with the WikipediaRevisionReader.
  • Improved template cleaning

DKPro Core 1.2.0 (GPL)

18 Jul 12:15
Compare
Choose a tag to compare

DKPro Core ASL 1.3.0 is a major release. It is not expected to be binary backwards compatible with previous versions. This page lists some of the more important changes and caveats regarding this version. For more detailed information on the changes in this version, please refer to the issue tracker or to the GIT history.

General

  • Fixed several issues with DocumentMetaData.
  • Changed features of some DKPro Types so that they start with a lower-case letter. Breaks XMI file compatibility with older DKPro versions.
  • Added new base class JCasFileWriter_ImplBase

Analysis modules

TreeTagger integration
  • Upgraded to TT4J 1.0.16 to support chinese model.
  • Works with any model now, even if no mapping is provided.

I/O modules

British National Corpus Support
  • Added BncReader.
IMS Open Corpus Workbench Support
  • Added ImsCwbReader and ImsCwbWriter.
  • ImsCwbWriter can use a local CWB installation to directly write the index format.
NeGra Export Format support
  • Can now read version 3 files (TIGER Corpus).
  • Fixed several issues.
TEI support
  • Added TeiReader mainly to be able to read text from the TEI version of the Brown Corpus for now.
Web1t support
  • New Web1TFormatWriter which uses an external sort mechanism to support larger n-gram models.
Wikipedia support
  • Upgraded to JWPL 0.9.0.
  • Added WikipediaPageReader, which reads articles and discussion pages.

DKPro Core 1.2.0 (ASL)

18 Jul 12:15
Compare
Choose a tag to compare

DKPro Core ASL 1.3.0 is a major release. It is not expected to be binary backwards compatible with previous versions. This page lists some of the more important changes and caveats regarding this version. For more detailed information on the changes in this version, please refer to the issue tracker or to the GIT history.

General

  • Fixed several issues with DocumentMetaData.
  • Changed features of some DKPro Types so that they start with a lower-case letter. Breaks XMI file compatibility with older DKPro versions.
  • Added new base class JCasFileWriter_ImplBase

Analysis modules

TreeTagger integration
  • Upgraded to TT4J 1.0.16 to support chinese model.
  • Works with any model now, even if no mapping is provided.

I/O modules

British National Corpus Support
  • Added BncReader.
IMS Open Corpus Workbench Support
  • Added ImsCwbReader and ImsCwbWriter.
  • ImsCwbWriter can use a local CWB installation to directly write the index format.
NeGra Export Format support
  • Can now read version 3 files (TIGER Corpus).
  • Fixed several issues.
TEI support
  • Added TeiReader mainly to be able to read text from the TEI version of the Brown Corpus for now.
Web1t support
  • New Web1TFormatWriter which uses an external sort mechanism to support larger n-gram models.
Wikipedia support
  • Upgraded to JWPL 0.9.0.
  • Added WikipediaPageReader, which reads articles and discussion pages.