Simple topological embedding for nodes (STEN).

Deterministic graph algorithm for node embedding generation based on their closeness to each other

This unsupervised learning technique is based on the idea of "signal" transfer from one node to the rest of the graph and is also inspired by Google's PageRank algorithm. Main use-case of the algorithm so far is social networks clustering and community detection. Please refer to the jupyter notebook in this repository for the exampples.

Installation

conda install -c monomonedula sten

The idea

Here's presented the basic idea for directed/undirected unweighted graph.

Every node is assigned a vector. For node i the vector's j-th element is a number representing its closeness to node j (you can think of it as signal strength).
The closeness of the node i to the node j (the "central" node the signal flows from) is defined as a sum of the signal strengths of the neighboring nodes multiplied by the damping factor parameter. (The dumping factor works like a kind of distance penalty)
Signal a node "emits" to other connected nodes equal to the given signal's strength in the node itself divided by the number of edges it is able to emit to.
Initially, the node j is assigned a closeness (to itself) equal to 1.

The implementation

The implementation presented in this repository gets things done through solving systems of equations. For every node j its signal strength in all other nodes is computed through a single system of linear equations with N equations and N unknown variables, where N is the number of nodes in the graph. Those equations are assumed to be sparse, so the systems are stored as a CSR matrices since it drastically reduces the space complexity. This implementation relies on pypardiso anaconda package for solving sparse linear equations.

This implementation provides classes to work with Networkx's directed and undirected graphs. Please note, in this implementation node i can receive a signal from node j only if there's an edge i-j, in other words, direction of the signal flow is opposite to the direction of the edge, resembling Instagram's "follow" relation.

Transposing resulting embeddings can sometimes yield better results. Transposing in this case basically means that every individual dimension j of node i embedding is carrying information of how signal is transmitted from the node i to the node j, rather than how string signal of node j is when it reaches node i.

Example

Here's an example of application of the algorithm to the famous Zachary's karate club dataset.

Zachary's karate club split:

Zachary's karate club split predicted with K-Means clustering on the generated embeddings with damping factor of 0.7:

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
conda-recipe		conda-recipe
demos		demos
sten		sten
test		test
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.py		setup.py
zachary_computed.png		zachary_computed.png
zachary_expected.png		zachary_expected.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conda-recipe

conda-recipe

demos

demos

sten

sten

test

test

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.md

README.md

setup.py

setup.py

zachary_computed.png

zachary_computed.png

zachary_expected.png

zachary_expected.png

Repository files navigation

Simple topological embedding for nodes (STEN).

Installation

The idea

The implementation

Example

About

Releases 1

Packages

Languages

License

monomonedula/sten

Folders and files

Latest commit

History

Repository files navigation

Simple topological embedding for nodes (STEN).

Installation

The idea

The implementation

Example

About

Topics

Resources

License

Stars

Watchers

Forks

Languages