Releases: dkpro/dkpro-core
DKPro Core 1.6.0 (GPL)
We are pleased to announce the release of
DKPro Core, version 1.6.0 (ASL & GPL)
a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.
DKPro Core 1.6.0 requires Java 7.
New components
- opennlp - OpenNLP chunker component added
New I/O modules/components
- io.bliki - Reader for Wikipedia
- io.tiger - Writer for the Tiger XML format (also also supports semantic frame annotations)
Further highlights in this release include:
- Upgrade to ClearNLP 2.0.2
- Upgrade to Stanford CoreNLP 3.3.1 including the new CVG models
- Upgrade to TT4J 1.1.2
- Upgrade to LanguageTool 2.5
Also, this release supports many additional models for various
components and brings the usual set of bug fixes and minor
improvements.
A more detailed overview of the changes in this release can be found here.
This release was first planned as a bugfix release for DKPro Core 1.5.0,
but we decided to call it 1.6.0 because it depends on Java 7 now instead
of Java 6.
As Google Code has recently disabled downloads, we do currently not
provide a ZIP file with all DKPro Core JARs for non-Maven users.
When upgrading, please mind that you should not mix different versions
of DKPro Core components in your projects - they may not be compatible
with each other.
DKPro Core 1.6.1 (GPL)
We are pleased to announce the release of
DKPro Core, version 1.6.1 (ASL & GPL)
a collection of interoperable software components for natural language processing
(NLP) based on the Apache UIMA framework.
Changed requirements:
- UIMAJ SDK 2.6.0
- uimaFIT 2.1.0
Major improvements:
- Many writers can now write to ZIP files
- Better support for reading/writing binary CAS formats
Major bug fixes:
- treetagger - NPE when explicitly specifying a model
- stanfordnlp - StanfordPosTagger not applying PTB3 escaping
A more detailed overview of the changes in this release can be found here.
When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
DKPro Core 1.6.1 (ASL)
We are pleased to announce the release of
DKPro Core, version 1.6.1 (ASL & GPL)
a collection of interoperable software components for natural language processing
(NLP) based on the Apache UIMA framework.
Changed requirements:
- UIMAJ SDK 2.6.0
- uimaFIT 2.1.0
Major improvements:
- Many writers can now write to ZIP files
- Better support for reading/writing binary CAS formats
Major bug fixes:
- treetagger - NPE when explicitly specifying a model
- stanfordnlp - StanfordPosTagger not applying PTB3 escaping
A more detailed overview of the changes in this release can be found here.
When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
DKPro Core 1.6.0 (ASL)
We are pleased to announce the release of
DKPro Core, version 1.6.0 (ASL & GPL)
a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.
DKPro Core 1.6.0 requires Java 7.
New components
- opennlp - OpenNLP chunker component added
New I/O modules/components
- io.bliki - Reader for Wikipedia
- io.tiger - Writer for the Tiger XML format (also also supports semantic frame annotations)
Further highlights in this release include:
- Upgrade to ClearNLP 2.0.2
- Upgrade to Stanford CoreNLP 3.3.1 including the new CVG models
- Upgrade to TT4J 1.1.2
- Upgrade to LanguageTool 2.5
Also, this release supports many additional models for various
components and brings the usual set of bug fixes and minor
improvements.
A more detailed overview of the changes in this release can be found here.
This release was first planned as a bugfix release for DKPro Core 1.5.0,
but we decided to call it 1.6.0 because it depends on Java 7 now instead
of Java 6.
As Google Code has recently disabled downloads, we do currently not
provide a ZIP file with all DKPro Core JARs for non-Maven users.
When upgrading, please mind that you should not mix different versions
of DKPro Core components in your projects - they may not be compatible
with each other.
DKPro Core 1.5.0 (GPL)
We are pleased to announce the release of
DKPro Core, version 1.5.0 (ASL & GPL)
a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.
The release brings new modules to DKPro Core:
New API modules
- api.phonetics - Annotation types for the phonetic level
- api.semantics - Annotation types for semantic information (semantic fields and semantic role labelling)
New I/O modules
- io.conll - Reader and writer for the CONLL 2006 format
- io.tcf - Reader and writer for the CLARIN TCF format
- io.tgrep - Writer for TGrep2 corpus files
- io.tiger - Reader for the Tiger XML format
New analysis modules
- commonscodec - Phonetic transcription based on the Apache Commons Codec library
- decompounding - Flexible set of components for decompounding, based on different splitting and ranking algorithms
- mate-tools - Wrapper for the mate-tools suite
- morpha - Wrapper for the morpha stemmer/lemmatizer
- mstparser - Wrapper for the mstparser
- sfst - New module for SFST-based morphological analyzers
- umlautnormalizer - Normalizer for umlauts in German texts (ASL)
Further highlights in this release include:
- Added support for resolving models from remote repositories at runtime
- Added @TypeCapabilities annotations declaring which annotations they consume and produce
- Added auto-generated XML descriptors for UIMA components (via uimafit-maven-plugin)
- Added support for ClearNLP Semantic Role Labelling
- Added support for GATE Hepple POS tagger
- Added support for OpenNLP parser and name finder
- Upgrade to Apache uimaFIT 2.0.0
- Upgrade to Apache UIMA 2.4.2
- Updated to ArkTweet-NLP 0.3.2
- Upgrade to ClearNLP 1.3.1
- Upgrade to CoreNLP 3.2.0
- Upgrade to GATE 7.1
- Upgrade to jweb1t 1.3.0
- Upgrade to LanguageTool 2.2
- Upgrade to Maltparser 1.7.2
- Upgrade to Mate-Tools anna 3.5
- Upgrade to OpenNLP 1.5.3
Some modules are no longer maintained and were not considered of use for the general public, e.g. the io.mmax2 module and the io.wsdl module. They have been retired and are not included in this release.
When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
DKPro Core 1.5.0 (ASL)
We are pleased to announce the release of
DKPro Core, version 1.5.0 (ASL & GPL)
a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.
The release brings new modules to DKPro Core:
New API modules
- api.phonetics - Annotation types for the phonetic level
- api.semantics - Annotation types for semantic information (semantic fields and semantic role labelling)
New I/O modules
- io.conll - Reader and writer for the CONLL 2006 format
- io.tcf - Reader and writer for the CLARIN TCF format
- io.tgrep - Writer for TGrep2 corpus files
- io.tiger - Reader for the Tiger XML format
New analysis modules
- commonscodec - Phonetic transcription based on the Apache Commons Codec library
- decompounding - Flexible set of components for decompounding, based on different splitting and ranking algorithms
- mate-tools - Wrapper for the mate-tools suite
- morpha - Wrapper for the morpha stemmer/lemmatizer
- mstparser - Wrapper for the mstparser
- sfst - New module for SFST-based morphological analyzers
- umlautnormalizer - Normalizer for umlauts in German texts (ASL)
Further highlights in this release include:
- Added support for resolving models from remote repositories at runtime
- Added @TypeCapabilities annotations declaring which annotations they consume and produce
- Added auto-generated XML descriptors for UIMA components (via uimafit-maven-plugin)
- Added support for ClearNLP Semantic Role Labelling
- Added support for GATE Hepple POS tagger
- Added support for OpenNLP parser and name finder
- Upgrade to Apache uimaFIT 2.0.0
- Upgrade to Apache UIMA 2.4.2
- Updated to ArkTweet-NLP 0.3.2
- Upgrade to ClearNLP 1.3.1
- Upgrade to CoreNLP 3.2.0
- Upgrade to GATE 7.1
- Upgrade to jweb1t 1.3.0
- Upgrade to LanguageTool 2.2
- Upgrade to Maltparser 1.7.2
- Upgrade to Mate-Tools anna 3.5
- Upgrade to OpenNLP 1.5.3
Some modules are no longer maintained and were not considered of use for the general public, e.g. the io.mmax2 module and the io.wsdl module. They have been retired and are not included in this release.
When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.
DKPro Core 1.4.0 (GPL)
DKPro Core ASL 1.4.0 is a major release. It is not expected to be binary backwards compatible with previous versions. This page lists some of the more important changes and caveats regarding this version. For more detailed information on the changes in this version, please refer to the issue tracker or to the GIT history.
General
- First release on Maven Central.
- New infrastructure and parameters for loading models and configuring type mapping.
- New versioning scheme for models and standardized model artifact names.
- Changed various parameters in all components to follow a common naming scheme.
- Added support to print tagset information when a model is loaded in most components
- Added various new and updated mappings for POS tags (Chinese, English, Estonian, French, German, ...).
- Added modules to support unit tests and to measure performance
- Fixed problems with paths containing spaces and paths on Windows systems
- Added support for POS mapping in various readers
- Changed types for compound words
Analysis modules
- Added modules
- Mecab - part-of-speech tagger for Japanese
- !MaltParser - dependency parser
CAS transformation module
- Fixed several bugs
- Improved performance (slightly)
JWordSplitter integration module
- Changed to current JWordSplitter version
LanguageTool integration module
- Changed to LanguageTool 1.9
- Added LanguageToolSegmenter - supports many languages
OpenNLP integration module
- Added OpenNlpParser
- Added OpenNlpPosTagger
- Added OpenNlpSegmenter
Tokenizer module
- Added !TokenMerger component to merge multi-word named entities into a single token
I/O modules
- Added modules
- bincas - binary serialization for CAS, much faster than XMI
- !JdbcReader - generic reader for SQL databases
- Added parameter to escape characters when document ID is used as a file name.
- Added support for more compression methods in most writers
- Added support for custom Spring resource loaders, e.g. to support reading from HDFS using Spring Hadoop
IMS Open Corpus Workbench Support
- Added support compressing using cwb-huffcode and cwb-compress-rdx
- Added support to write coarse part-of-speech tags (DKPro type names)
- Added support to better handle !WaCky corpora, e.g. to generate document IDs
PDF support
- Changed to pdfbox 1.7.0
- Changed to extend !ResourceCollectionReaderBase
- Added parameters to control start and end page
- Added support for progress information
Text support
- Added parameter to control extension of written text files
TEI support
- Added support for Digitale Bibliothek of !TextGrid
- Added support for TEI documents with multiple texts inside.
Web1t support
- Added option to generate lowercase ngrams (off by default)
- Added option to generate jweb1t indexed (on by default)
- Added possibility to write all ngrams to one file by setting threshold to 0 or negative
Wikipedia support
- Added support for reading a predefined set of revisions with the WikipediaRevisionReader.
- Improved template cleaning
DKPro Core 1.4.0 (ASL)
DKPro Core ASL 1.4.0 is a major release. It is not expected to be binary backwards compatible with previous versions. This page lists some of the more important changes and caveats regarding this version. For more detailed information on the changes in this version, please refer to the issue tracker or to the GIT history.
General
- First release on Maven Central.
- New infrastructure and parameters for loading models and configuring type mapping.
- New versioning scheme for models and standardized model artifact names.
- Changed various parameters in all components to follow a common naming scheme.
- Added support to print tagset information when a model is loaded in most components
- Added various new and updated mappings for POS tags (Chinese, English, Estonian, French, German, ...).
- Added modules to support unit tests and to measure performance
- Fixed problems with paths containing spaces and paths on Windows systems
- Added support for POS mapping in various readers
- Changed types for compound words
Analysis modules
- Added modules
- Mecab - part-of-speech tagger for Japanese
- !MaltParser - dependency parser
CAS transformation module
- Fixed several bugs
- Improved performance (slightly)
JWordSplitter integration module
- Changed to current JWordSplitter version
LanguageTool integration module
- Changed to LanguageTool 1.9
- Added LanguageToolSegmenter - supports many languages
OpenNLP integration module
- Added OpenNlpParser
- Added OpenNlpPosTagger
- Added OpenNlpSegmenter
Tokenizer module
- Added !TokenMerger component to merge multi-word named entities into a single token
I/O modules
- Added modules
- bincas - binary serialization for CAS, much faster than XMI
- !JdbcReader - generic reader for SQL databases
- Added parameter to escape characters when document ID is used as a file name.
- Added support for more compression methods in most writers
- Added support for custom Spring resource loaders, e.g. to support reading from HDFS using Spring Hadoop
IMS Open Corpus Workbench Support
- Added support compressing using cwb-huffcode and cwb-compress-rdx
- Added support to write coarse part-of-speech tags (DKPro type names)
- Added support to better handle !WaCky corpora, e.g. to generate document IDs
PDF support
- Changed to pdfbox 1.7.0
- Changed to extend !ResourceCollectionReaderBase
- Added parameters to control start and end page
- Added support for progress information
Text support
- Added parameter to control extension of written text files
TEI support
- Added support for Digitale Bibliothek of !TextGrid
- Added support for TEI documents with multiple texts inside.
Web1t support
- Added option to generate lowercase ngrams (off by default)
- Added option to generate jweb1t indexed (on by default)
- Added possibility to write all ngrams to one file by setting threshold to 0 or negative
Wikipedia support
- Added support for reading a predefined set of revisions with the WikipediaRevisionReader.
- Improved template cleaning
DKPro Core 1.2.0 (GPL)
DKPro Core ASL 1.3.0 is a major release. It is not expected to be binary backwards compatible with previous versions. This page lists some of the more important changes and caveats regarding this version. For more detailed information on the changes in this version, please refer to the issue tracker or to the GIT history.
General
- Fixed several issues with
DocumentMetaData
. - Changed features of some DKPro Types so that they start with a lower-case letter. Breaks XMI file compatibility with older DKPro versions.
- Added new base class
JCasFileWriter_ImplBase
Analysis modules
TreeTagger integration
- Upgraded to TT4J 1.0.16 to support chinese model.
- Works with any model now, even if no mapping is provided.
I/O modules
British National Corpus Support
- Added
BncReader
.
IMS Open Corpus Workbench Support
- Added
ImsCwbReader
andImsCwbWriter
. ImsCwbWriter
can use a local CWB installation to directly write the index format.
NeGra Export Format support
- Can now read version 3 files (TIGER Corpus).
- Fixed several issues.
TEI support
- Added
TeiReader
mainly to be able to read text from the TEI version of the Brown Corpus for now.
Web1t support
- New
Web1TFormatWriter
which uses an external sort mechanism to support larger n-gram models.
Wikipedia support
- Upgraded to JWPL 0.9.0.
- Added
WikipediaPageReader
, which reads articles and discussion pages.
DKPro Core 1.2.0 (ASL)
DKPro Core ASL 1.3.0 is a major release. It is not expected to be binary backwards compatible with previous versions. This page lists some of the more important changes and caveats regarding this version. For more detailed information on the changes in this version, please refer to the issue tracker or to the GIT history.
General
- Fixed several issues with
DocumentMetaData
. - Changed features of some DKPro Types so that they start with a lower-case letter. Breaks XMI file compatibility with older DKPro versions.
- Added new base class
JCasFileWriter_ImplBase
Analysis modules
TreeTagger integration
- Upgraded to TT4J 1.0.16 to support chinese model.
- Works with any model now, even if no mapping is provided.
I/O modules
British National Corpus Support
- Added
BncReader
.
IMS Open Corpus Workbench Support
- Added
ImsCwbReader
andImsCwbWriter
. ImsCwbWriter
can use a local CWB installation to directly write the index format.
NeGra Export Format support
- Can now read version 3 files (TIGER Corpus).
- Fixed several issues.
TEI support
- Added
TeiReader
mainly to be able to read text from the TEI version of the Brown Corpus for now.
Web1t support
- New
Web1TFormatWriter
which uses an external sort mechanism to support larger n-gram models.
Wikipedia support
- Upgraded to JWPL 0.9.0.
- Added
WikipediaPageReader
, which reads articles and discussion pages.