Define a "standard" process to follow when solving simple data science problems #26

NickSeagull · 2018-01-18T18:44:58Z

No description provided.

Drezil · 2018-01-19T13:15:51Z

Some short things this should talk about:

Data preparation
- cleaning (removal of incomplete data, lookup of further data, get everything in one matrix/table/format)
- whitening (normalize, center & de-corrolate - Warning: throws away corrolation-information!)
- dimension-reduction (PCA, ICA, ... - Warning: throws away information!)
Algorithm selection
- supervised? unsupervised?
- typical solutions for typical problems (classification, corrolation, non-metric-solutions (i.e. NLP with suffix-trees, edit-distance, etc.))
- for each algorithm
  - when and when NOT to use
  - further reading
Ways to present/interpret results
- statistical significance?
- typical tests/metrics (AUC, F_1 score, sensitivity/specificity, etc.)

ocramz · 2018-01-25T16:11:50Z

@ixxie wants to write about reproducibility with Jupyter and Nix, I've added to the DH members'list, he should see this soon as well

ixxie · 2018-04-28T18:52:26Z

Hmmm, I am not sure if this is quite relevant to this; my goal is more to try and create easily reproducible infrastructure as code, i.e. to allow anybody to deploy a data science platform relatively easily. Reproducibility of individual computations is also of great interest and Nix can help with this, but I don't know much about this atm (would be willing to look into it some time!).

FWIW, it seems a bit far fetched to be able to specify a simple decision tree recipe for doing data science; the way I would approach this is to think of it like a bipartite graph: list some problems (e.g. tokenization, classification, clustering, etc) and some algorithms (CRFs, RNNs, HDBSCAN) and link between them.

NickSeagull changed the title **[documentation]** - Define a "standard" process to follow when solving simple data science problems Define a "standard" process to follow when solving simple data science problems Jan 18, 2018

ocramz added help wanted documentation labels Jan 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define a "standard" process to follow when solving simple data science problems #26

Define a "standard" process to follow when solving simple data science problems #26

NickSeagull commented Jan 18, 2018

Drezil commented Jan 19, 2018

ocramz commented Jan 25, 2018

ixxie commented Apr 28, 2018

Define a "standard" process to follow when solving simple data science problems #26

Define a "standard" process to follow when solving simple data science problems #26

Comments

NickSeagull commented Jan 18, 2018

Drezil commented Jan 19, 2018

ocramz commented Jan 25, 2018

ixxie commented Apr 28, 2018