layout |
---|
default |
In this document we will build an application for the slot filling task of the TAC KBP competition. Slot filling involves extracting information about entities in text; the goal is to use a seed knowledge base to create an augmented knowledge base of (entity, relation, entity) tuples (the second entity is referred to as the "slot").The relationships are defined by the competition guidelines.
This example uses a sample of the data for the 2010 task. Note that the data provided in this example application is only 0.2% of the original corpus so the recall (and thus the F1 score) will be low. However, using 100% of the 2010 corpus, this example system achieves an F1 score of XX on the KBP task, which beats the top result of 29 from the 2010 competition.
Note that in order to run the system on the full data set, you need to replace 2 tables with their full versions (but with the same exact schema): sentence
and freebase
.
The application is an extension of the mention-level extraction system, so please make sure you have gone through that part of the tutorial and have an understanding of basic relation extraction using DeepDive. The main difference here is that we are now concerned with entity-level relationships, not mention-level. In other words,
given the following input:
- a set of sentences with NLP features
- a set of Freebase entities
- an entity-level training set of the form (entity1, relation, entity2),
instead of producing a set of (mention1, relation, mention2) tuples as the final output, we want to produces tuples of the form (entity1, relation, entity2).
Note that in order to obtain the entity-level result we need to perform entity linking, which will associate mentions in text with Freebase entities (the mentions "Barack Hussein Obama" and "President Barack Obama" all refer to the entity Barack Obama).
This tutorial will walk you through building a full DeepDive application that extracts relationships between entities in raw text. We use news articles and blogs as our input data and want to extract all pairs of entities that participate in the KBP relations (e.g. Barack Obama and Michelle Obama for the spouse
relation).
The application performs the following high-level steps:
- Load data from provided database dump
- Extract features. This includes steps to:
- Extract entity mentions from sentences
- Extract lexical and syntactic features from mention-level relation candidates (entity mention pairs in the same sentence)
- Link Freebase entities to mentions in text (entity linking)
- Generate positive and negative training examples for relation candidates
- Extract the non-example mention-level relation candidates
- Extract the entity-level relation candidates by combining the mention-level candidates with entity linking
- Generate a factor graph using inference rules
- Perform inference and learning
- Generate results
Let us now go through the steps to get the example KBP system up and running.
- Installing DeepDive
- Setting up KBP application
- Running KBP application
- Evaluating the results
- Writing extractors
- Debugging extractors
- Writing inference rules
This tutorial assumes a working installation of DeepDive. Please go through the example application walkthrough before proceeding.
After following the walkthrough, your deepdive
directory should contain a folder called app
, which should contain a folder called spouse
.
Let's now proceed to a tutorial for setting up the KBP application, which will help you get started.