A spaCy custom component that extracts and normalizes dates and other temporal expressions.
- 💥 Extract dates and durations for various languages. See here a list of currently supported languages
- 💥 Normalize dates to timestamps or normalize dates and durations to the TimeML TIMEX3 standard
- 🇩🇪 German
- 🇬🇧 English
- 🇫🇷 French
pip install timexy
After installation, simply integrate the timexy component in any of your spaCy pipelines to extract and normalize dates and other temporal expressions:
import spacy
from timexy import Timexy
nlp = spacy.load("en_core_web_sm")
# Optionally add config if varying from default values
config = {
"kb_id_type": "timex3", # possible values: 'timex3'(default), 'timestamp'
"label": "timexy", # default: 'timexy'
"overwrite": False # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")
doc = nlp("Today is the 10.10.2010. I was in Paris for six years.")
for e in doc.ents:
print(f"{e.text}\t{e.label_}\t{e.kb_id_}")
>>> 10.10.2010 timexy TIMEX3 type="DATE" value="2010-10-10T00:00:00"
>>> six years timexy TIMEX3 type="DURATION" value="P6Y"
Timexy allows the normalization of all temporal expressions to
- TimeML Timex3 standard
- timestamp
The normalization is configured with the kb_id_type
config parameter:
config = {
"kb_id_type": "timex3", # possible values: 'timex3'(default), 'timestamp'
"label": "timexy", # default: 'timexy'
"overwrite": False # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")
NOTE: Normalizing temporal expressions that are not concrete dates to timestamp is not viable. Therefore, all non-date temporal expressions are always normalized to timex3 regardless of the
kb_id_type
config.
Please refer to the contributing guidelines here.