Skip to content
This repository has been archived by the owner on Dec 19, 2020. It is now read-only.

Latest commit

 

History

History
22 lines (11 loc) · 487 Bytes

README.md

File metadata and controls

22 lines (11 loc) · 487 Bytes

sparksoniq-dataframe-perf-v1

Confusion data obtained from: http://lars.yencken.org/datasets/languagegame/

2 smaller sample sets created locally with:

head -n 500000 confusion-2014-03-02.json > confusion500k.json  
head -n 5000000 confusion-2014-03-02.json > confusion5m.json  

Compilation and running details are listed in test-setup.sh

Files containing *-df-* are related to the dataframe implementation.

*-log.txt files contain execution times.