Skip to content

Implementation of "Determining whether two datasets cluster similarly without determining the clusters" [Van Eeghem & De Lathauwer] [2020]

Notifications You must be signed in to change notification settings

MaxenceGiraud/datasets-similarity-tensorization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datasets similarity via Tensorization

In this work we implement algorithms based on the preprint "Determining whether two datasets cluster similarly without determining the clusters" by Van Eeghem et al. [1].

This work was an initial part of a research project by Maxence Giraud on "higher order clustering" supervized by Remy Boyer.

Usage

import dataset_similarity_tensor as dst

# Load 2 datasets V,W

## 1. Using kronecker product
VV = dst.tensorize_kr(V) 
WW = dst.tensorize_kr(W)

## 2. Using Third Order moment
VV = dst.tensorize_thirdordermoment(V).reshape(V.shape[1],-1) # We reshape because the principal angle are computed on an unfolded tensor (which becomes a matrix)
WW = dst.tensorize_thirdordermoment(W).reshape(W.shape[1],-1)

## Compute principal angle 
angle = dst.principal_angles_tensors(VV,WW)

The algorithms computing the principal angle thus resulting in an output between 0 and π/2, the closest this number is to 0 the more similar are the 2 datasets.

References

[1] Van Eeghem F., De Lathauwer L. (2020). Determining whether two datasets cluster similarly without determining the clusters.

About

Implementation of "Determining whether two datasets cluster similarly without determining the clusters" [Van Eeghem & De Lathauwer] [2020]

Topics

Resources

Stars

Watchers

Forks

Languages