As I work through understanding "node2vec: Scalable Feature Learning for Networks"1, I'm experimenting with reproducing some of Grover & Leskovec's findings.
G&L use the Les Mis' data (see my json version) to demonstrate both homophily and structural equivalence. The dataset has 77 Character nodes and 254 APPEARED_WITH relationships connecting them.
See my python code in kmeans.py. To use it:
- Load the sample data using APOC's
apoc.import.json
proc. It should just load. - Create a Python virtualenv and install deps:
pip install -r requirements.txt
- Run the code:
$ python kmeans.py -A bolt://localhost:7687 -U neo4j -P password -d 16 -p 1 -q 0.6
G&L claim they set d = 16
and p = 1, q = 0.5
.
I'm struggling to reproduce this...
...more details coming.