GitHub - fferegrino/zeldaKG: A TLOZ inspired knowledge graph

         _     _       _   _______ 
        | |   | |     | | / /  __ \
 _______| | __| | __ _| |/ /| |  \/
|_  / _ \ |/ _` |/ _` |    \| | __ 
 / /  __/ | (_| | (_| | |\  \ |_\ \
/___\___|_|\__,_|\__,_\_| \_/\____/

a TLOZ inspired knowledge graph.

Step 0: Gather a lot of wiki pages (check if you can use a tool like HTTrack), in this case, I downloaded a copy of the whole Zeldapedia and Zelda Wiki.
Step 1: If you did not configure your crawlers/copiers correctly, the previous step might have got you a lot of useless sites, such as User pages, templates or even forum pages. The purpose of this step is to reduce the number of files to be processed by filtering out the documents whose name starts with "User_", "Category_Zeldapedians_", "Message_Wall_" and similar. In this cleaning stage, the real content of the site (in wikia that is the tag article) is extracted discarding the templated website out.
- Here are the Zeldapedia notebook and the Zelda Wiki notebook.
- Download the "clean" data here: zelda-wikia2-clean.zip and zelda-gamepedia-clean.zip
Step 2: Information Extraction
- Title-Link relationship: extract a relationship between each file and the title of the article it represents into two dataframes. Title-Link relationship notebook.
- Infobox extraction: extract raw relationships between entities extracted from the infobox of each page. The relationships are generated as json objects that are interpreted in the next step. for gamepedia and for wikia.
- Merge infobox sources: In this step we can extract information from the infoboxes. Information such as Gender, Race, Appereances, and many more. Merge sources notebook.
- Text extraction using spaCy. In this step the text of each article is analysed using the spaCy package to extract raw relationships between a Resource and names in the notebook text_extraction, and then processed again to ground them to only relationships between Resources existing in our graph, his happens in text_extraction_processing.
Step 3: Insertion into neo4j

The graph is not accurate nor complete. I'm just playing around since I want to learn a bit more about neo4j while performing information extraction and building a knowledge graph.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
database		database
html_cleaning		html_cleaning
link_analysis		link_analysis
question_answering		question_answering
relation_extraction		relation_extraction
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

database

database

html_cleaning

html_cleaning

link_analysis

link_analysis

question_answering

question_answering

relation_extraction

relation_extraction

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

About

Releases 2

Packages

Languages

License

fferegrino/zeldaKG

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages