Skip to content

Commit

Permalink
Merge pull request #10 from richardpaulhudson/support/3.4.1
Browse files Browse the repository at this point in the history
Support for v3.4.1 English spaCy models
  • Loading branch information
richardpaulhudson authored Nov 9, 2022
2 parents 133cd91 + 2c107e6 commit e239393
Show file tree
Hide file tree
Showing 6 changed files with 26 additions and 10 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/test-coreferee.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ jobs:
spacy_version: ['3.3.0']
click_version: ['8.0.1']
include:
- os: 'ubuntu-latest'
python-version: '3.9'
spacy_version: '3.4.2'
click_version: '8.0.1'
- os: 'ubuntu-latest'
python-version: '3.9'
spacy_version: '3.2.0'
Expand Down
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Author: <a href="mailto:[email protected]">Richard Paul Hudson, Explosion AI<
- [6.6 Version 1.1.3](#version-113)
- [6.7 Version 1.2.0](#version-120)
- [6.8 Version 1.3.0](#version-130)
- [6.9 Version 1.3.1](#version-131)
- [7. Open issues/requests for assistance](#open-issues)

<a id="introduction"></a>
Expand All @@ -43,7 +44,7 @@ Author: <a href="mailto:[email protected]">Richard Paul Hudson, Explosion AI<

Coreferences are situations where two or more words within a text refer to the same entity, e.g. _**John** went home because **he** was tired_. Resolving coreferences is an important general task within the natural language processing field.

Coreferee is a Python 3 library (tested with versions 3.6—3.10) that is used together with [spaCy](https://spacy.io/) (tested with versions 3.0.0—3.4.0) to resolve coreferences within English, French, German and Polish texts. It is designed so that it is easy to add support for new languages. It uses a mixture of neural networks and programmed rules.
Coreferee is a Python 3 library (tested with versions 3.6—3.10) that is used together with [spaCy](https://spacy.io/) (tested with versions 3.0.0—3.4.2) to resolve coreferences within English, French, German and Polish texts. It is designed so that it is easy to add support for new languages. It uses a mixture of neural networks and programmed rules.

The library was originally developed at [msg systems](https://www.msg.group/en), but is now being maintained at [Explosion AI](https://explosion.ai).

Expand Down Expand Up @@ -273,7 +274,7 @@ Coreferee started life to assist the [Holmes](https://github.com/explosion/holme
<table style="text-align:center; vertical-align:middle">
<tr><td rowspan="2">ISO 639-1</td><td rowspan="2">Language</td><td rowspan="2">Training corpora</td><td rowspan="2">Total words in training corpora</td><td colspan="2"><code>*_trf</code> models</td><td colspan="2"><code>*_lg</code> models</td><td colspan="2"><code>*_md</code> models</td><td colspan="2"><code>*_sm</code> models</td></tr>
<tr><td align="center">Anaphors in 20%</td><td align="center">Accuracy (%)</td><td align="center">Anaphors in 20%</td><td align="center">Accuracy (%)</td><td align="center">Anaphors in 20%</td><td align="center">Accuracy (%)</td><td align="center">Anaphors in 20%</td><td align="center">Accuracy (%)</td></tr>
<tr><td align="center">en</td><td align="center">English</td><td align="center"><a href="https://opus.nlpl.eu/ParCor/">ParCor</a>/<a href="https://github.com/dbamman/litbank"> LitBank</a></td><td align="center">393564</td><td align="center"><b>2500—2580</b></td><td align="center"><b>80—83</b><td align="center"><b>2480—2520</b></td><td align="center"><b>81—82</b></td></td><td align="center">2480—2510</td><td align="center">81</td><td align="center">2540—2560</td><td align="center">81—82</td></tr>
<tr><td align="center">en</td><td align="center">English</td><td align="center"><a href="https://opus.nlpl.eu/ParCor/">ParCor</a>/<a href="https://github.com/dbamman/litbank"> LitBank</a></td><td align="center">393564</td><td align="center"><b>2500—2580</b></td><td align="center"><b>80—83</b><td align="center"><b>2480—2520</b></td><td align="center"><b>81—82</b></td></td><td align="center">2480—2510</td><td align="center">81-83</td><td align="center">2510—2560</td><td align="center">81—82</td></tr>
<tr><td align="center">de</td><td align="center">German</td><td align="center"><a href="https://opus.nlpl.eu/ParCor/">ParCor</a></td><td align="center">164300</td><td align="center">-</td><td align="center">-</td><td align="center"><b>530—570</b></td><td align="center"><b>79—80</b></td><td align="center">520—550</td><td align="center">79—80</td><td align="center">530—550</td><td align="center">76—79</td></tr>
<tr><td align="center">fr</td><td align="center">French</td><td align="center"><a href="https://www.ortolang.fr/market/corpora/democrat/v1.1">DEMOCRAT</a></td><td align="center">323754</td><td align="center">-</td><td align="center">-</td><td align="center"><b>1270—1280</b></td><td align="center"><b>71—72</b></td><td align="center">1280—1300</td><td align="center">68—70</td><td align="center">1130—1140</td><td align="center">63—64</td></tr>
<tr><td align="center">pl</td><td align="center">Polish</td><td align="center"><a href="http://zil.ipipan.waw.pl/PolishCoreferenceCorpus">PCC</a></td><td align="center">548268</td><td align="center">-</td><td align="center">-</td><td align="center"><b>1730—1790</b></td><td align="center"><b>72—76</b></td><td align="center">1740—1790</td><td align="center">70—75</td><td align="center">-</td><td align="center">-</td></tr>
Expand Down Expand Up @@ -628,6 +629,12 @@ The initial open-source version.

- Added support for spaCy v3.4 for English, German and Polish.

<a id="version-131"></a>

##### 6.9 Version 1.3.1

- Added support for the v3.4.1 English models.

<a id="open-issues"></a>

### 7. Open issues / requests for assistance
Expand Down
2 changes: 1 addition & 1 deletion SHORTREADME.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Coreferences are situations where two or more words within a text refer to the same entity, e.g. _**John** went home because **he** was tired_. Resolving coreferences is an important general task within the natural language processing field.

Coreferee is a Python 3 library (tested with versions 3.6—3.10) that is used together with [spaCy](https://spacy.io/) (tested with versions 3.0.0—3.4.0) to resolve coreferences within English, French, German and Polish texts. It is designed so that it is easy to add support for new languages. It uses a mixture of neural networks and programmed rules.
Coreferee is a Python 3 library (tested with versions 3.6—3.10) that is used together with [spaCy](https://spacy.io/) (tested with versions 3.0.0—3.4.2) to resolve coreferences within English, French, German and Polish texts. It is designed so that it is easy to add support for new languages. It uses a mixture of neural networks and programmed rules.

The library was originally developed at [msg systems](https://www.msg.group/en), but is now being maintained at [Explosion AI](https://explosion.ai).

Expand Down
8 changes: 4 additions & 4 deletions coreferee/lang/en/config.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -26,25 +26,25 @@ vectors_model: core_web_lg
[lg_3_4_0]
model: core_web_lg
from_version: 3.4.0
to_version: 3.4.0
to_version: 3.4.1
train_version: 3.4.0

[md_3_4_0]
model: core_web_md
from_version: 3.4.0
to_version: 3.4.0
to_version: 3.4.1
train_version: 3.4.0

[sm_3_4_0]
model: core_web_sm
from_version: 3.4.0
to_version: 3.4.0
to_version: 3.4.1
train_version: 3.4.0

[trf_3_4_0]
model: core_web_trf
from_version: 3.4.0
to_version: 3.4.0
to_version: 3.4.1
train_version: 3.4.0
vectors_model: core_web_lg

2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = coreferee
version = 1.3.0
version = 1.3.1
description = Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages
long_description = file: SHORTREADME.md
long_description_content_type = text/markdown
Expand Down
9 changes: 7 additions & 2 deletions tests/en/test_rules_en.py
Original file line number Diff line number Diff line change
Expand Up @@ -435,8 +435,12 @@ def test_potential_pair_he_she_antecedent_person_noun(self):
@unittest.skipIf(train_version_mismatch, train_version_mismatch_message)
def test_potential_pair_he_she_antecedent_non_person_proper_noun(self):
self.compare_potential_pair(
"I worked for Skateboards plc. She was there", 4, False, 6, 1,
excluded_nlps=["core_web_sm"]
"I worked for Skateboards plc. She was there",
4,
False,
6,
1,
excluded_nlps=["core_web_sm"],
)

def test_potential_pair_it_exclusively_person_antecedent(self):
Expand Down Expand Up @@ -491,6 +495,7 @@ def test_potential_pair_she_exclusively_male_name_compound_antecedent_control(se
def test_potential_pair_person_word_non_capitalized(self):
self.compare_potential_pair("I saw a job. He was there", 3, False, 5, 0)

@unittest.skipIf(train_version_mismatch, train_version_mismatch_message)
def test_potential_pair_person_word_capitalized(self):
self.compare_potential_pair("I saw Job. He was there", 2, False, 4, 1)

Expand Down

0 comments on commit e239393

Please sign in to comment.