Skip to content

Coreferee v1.2.0

Compare
Choose a tag to compare
@richardpaulhudson richardpaulhudson released this 06 May 08:04
· 28 commits to master since this release

Removed dependencies to TensorFlow and Keras, switching to Thinc as the neural network platform. Switching to Thinc has led to serialized models that are around 30% of the size of the old models, and has also allowed the old limitation to be removed where nlp.pipe() could not be called with n_process > 1 with forked processes.
Implemented a softmax layer to select the best potential referent for each anaphor as opposed to calculating independent scores for each pair.
Added matrix tests to support a variety of Python and spaCy versions, including spaCy 3.2 and spaCy 3.3.
Implemented a stable-random split into train and test corpora as opposed to using the last 20% of loaded documents as the test corpus.
Improved the training script so that it remembers the model state at each epoch and chooses the best-performing state from the training history as the model to save.
Added the coreferee check command to enable performance measurement for an existing Coreferee model with a new spaCy model.