Skip to content

Commit

Permalink
Added model persistence section to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewdalpino committed Oct 24, 2019
1 parent 3e885d2 commit bfa9bc9
Show file tree
Hide file tree
Showing 7 changed files with 87 additions and 29 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
- Unreleased
- 0.0.16-beta
- Radius Neighbors allows user-definable anomaly class
- Added KNN Imputer
- Added Random Hot Deck Imputer
Expand Down
6 changes: 3 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Contributing Guidelines
Thank you for considering a code contribution to Rubix ML. We strongly believe that our contributors play the most important role in bringing powerful machine learning tools to the PHP language. Please read over the following guidelines so that we can continue to provide a high quality product that our users love.
Thank you for considering a code contribution to Rubix ML. We strongly believe that our contributors play the most important role in bringing powerful machine learning tools to the PHP language. Please read over the following guidelines so that we can continue to provide high quality machine learning tools that our users love.

### Pull Request Checklist
Here are some things to check off before sending in a pull request
Here are a few things to check off before sending in a pull request ...

- The change provides high value to Rubix ML engineers
- The change does not introduce unnecessary complexity
Expand Down Expand Up @@ -55,7 +55,7 @@ Rubix ML uses a unique end-to-end testing schema for all learners that involves
Typically bugs indicate an area of the code that has not been properly tested yet. When submitting a bug fix, please include a passing test that would have reproduced the bug prior to your changes.

### Mutability Policy
Objects implemented in Rubix have a mutability policy of *generally* immutable which means properties are kept protected and state cannot be modified without creating a new object. Certain objects such as Learners have model parameters that are mutated during training. In such cases, mutability must be controlled through public interfaces. In general, any stateful object that requires mutation must only be updated through a well-defined public method. In some special cases, such as for performance reasons, object properties may be allowed to be mutated directly.
Objects implemented in Rubix ML have a mutability policy of *generally* immutable which means properties are kept protected and state cannot be modified without creating a new object. Certain objects such as Learners have model parameters that are mutated during training. In such cases, mutability must be controlled through public interfaces. In general, any stateful object that requires mutation must only be updated through a well-defined public method. In some special cases, such as for performance reasons, object properties may be allowed to be mutated directly.

### Anti Plagiarism Policy
Our community takes a strong stance against plagiarism, or the copying of another author's code without attribution. Since the spirit of open source is to make code freely available, it is up to the community to enforce policies that deter plagiarism. As such, we do not allow contributions from those who violate this policy.
16 changes: 14 additions & 2 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@
"email": "[email protected]",
"homepage": "https://andrewdalpino.com",
"role": "Lead Engineer"
},
{
"name": "Core Team",
"homepage": "https://github.com/orgs/RubixML/teams/core"
},
{
"name": "Contributors",
"homepage": "https://github.com/RubixML/RubixML/graphs/contributors"
}
],
"require": {
Expand Down Expand Up @@ -72,6 +80,10 @@
"preferred-install": "dist",
"sort-packages": true
},
"minimum-stability": "dev",
"prefer-stable": true
"support": {
"email": "[email protected]",
"issues": "https://github.com/RubixML/RubixML/issues",
"source": "https://github.com/RubixML/RubixML",
"docs": "https://docs.rubixml.com/en/latest"
}
}
10 changes: 4 additions & 6 deletions docs/basic-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,9 @@ array(5) {
```

# Model Evaluation
To test that the estimator can correctly generalize what it has learned during training to the real world we use a process called *cross validation*. The goal of cross validation is to train and test the learner on different subsets of the dataset as to produce a validation score. For the purposes of this introduction, we will use the [Hold Out](cross-validation/hold-out.md) validator which takes a portion of the dataset for testing and leaves the rest for training. The reason we do not use *all* of the data for training is because we want to test the estimator on samples that it has never seen before.
To test that the estimator can correctly generalize what it has learned during training to the real world we use a process called *cross validation*. The goal of cross validation is to train and test the learner on different subsets of the dataset in order to produce a validation score. For the purposes of the introduction, we will use the [Hold Out](cross-validation/hold-out.md) validator which takes a portion of the dataset for testing and leaves the rest for training. The reason we do not use *all* of the data for training is because we want to test the estimator on samples that it has never seen before.

The Hold Out validator requires you to set the ratio of testing to training samples as a constructor parameter. Let's choose to use a factor of 0.2 (20%) of the dataset for testing leaving the rest (80%) for training.
The Hold Out validator requires the user to set the ratio of testing to training samples as a constructor parameter. Let's choose to use a factor of 0.2 (20%) of the dataset for testing leaving the rest (80%) for training.

> **Note:** Typically, 0.2 is a good default choice however your mileage may vary. The important thing to note here is the trade off between more data for training and more data to produce better testing results.
Expand All @@ -140,7 +140,5 @@ var_dump($score);
float(0.945)
```

Congratulations! You're done with the basic introduction to machine learning in Rubix ML.

# Next Steps
For a more in-depth tutorial using the K Nearest Neighbors classifier, check out the [Iris Flower](https://github.com/RubixML/Iris) example project. We highly recommend browsing the rest of the documentation and the other [example projects](https://github.com/RubixML) which range from beginner to advanced skill level.
### Next Steps
Congratulations! You've completed the basic introduction to machine learning in PHP with Rubix ML. For a more in-depth tutorial using the K Nearest Neighbors classifier, check out the [Iris Flower](https://github.com/RubixML/Iris) example project. We highly recommend browsing the rest of the documentation and the other [example projects](https://github.com/RubixML) which range from beginner to advanced skill level. Have fun and stay curious!
38 changes: 38 additions & 0 deletions docs/model-persistence.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Model Persistence
Model persistence refers to the capability of an estimator to be trained and used to make predictions in processes other the current running process. Imagine that you trained a classifier to categorize comment posts and now you want to deploy it to a server to perform real-time inference on your website. Or, say you just finished training a model that took the whole day and you want to save it for later. Rubix ML allows you to handle both of these scenarios using [Persisters](./persiters/api.md) and [Persistable](persistable.md) objects.

### Persisters
Persisters are objects whose responsibility is to save and load model data to and from storage. For example, the [Filesystem](./persisters/filesystem.md) serializes and reconstitutes a persistable model from a location on a filesystem such as a local hard disk or network attached storage.

**Example**

```php
use Rubix\ML\Persisters\Filesystem;

$persister = new Filesystem('example.model');

$estimator = $persister->load();

// Do something

$persister->save($estimator);
```

### Serialization
Very often a model will need to be serialized, or packaged into a discrete chunk of data, before it can be persisted. The same is true for loading a model which is serialization in reverse. Rubix ML is compatible with a number of portable serialization formats including the Native PHP format as well as the Igbinary format. By knowing the format, you can easily transport models between systems.

### The Persistent Model Meta-estimator
The [Persistent Model](persistent-model.md) meta-estimator is a model wrapper that uses the persistence subsystem under the hood. It provides `save()` and `load()` methods for the persistable learner that it wraps.

**Example**

```php
use Rubix\ML\PersistentModel;
use Rubix\ML\Persisters\Filesystem;

$estimator = PersistentModel::load(new Filesystem('example.model'));

// Do something

$estimator->save();
```
10 changes: 8 additions & 2 deletions docs/system-architecture.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# System Architecture
The Rubix architecture is defined by a few key abstractions and their corresponding types and interfaces.
The high level API is designed around a few key abstractions and their corresponding types and interfaces. In addition, Rubix ML employs various mid and low level subsystems that power many of the learners. This layered architecture allows for power and flexibility while keeping the public interface simple and straighforward.

![Rubix ML System Architecture](https://raw.githubusercontent.com/RubixML/RubixML/master/docs/img/rubix-ml-system-architecture.svg?sanitize=true)
### General Architecture
From the perspective of data flowing in and out of a machine learning system, there are a number of components that the user *may* interact with. These include [Dataset](./datasets/api.md) objects, [Transformers](./transformers/api.md), [Estimators](estimator.md), and Meta-estimators. Starting from the top, the illustration below shows the path of data from input features to prediction within Rubix ML.

![Rubix ML System Architecture](https://raw.githubusercontent.com/RubixML/RubixML/master/docs/img/rubix-ml-system-architecture.svg?sanitize=true)

### Subsystems
Under the hood, Rubix ML utilizes a number of modular subsystems that are highly optimized for their purpose such as the graph, neural net, SVM, and tensor processing subsystems. Some mid and low level subsystems run as optional PHP extensions.
34 changes: 19 additions & 15 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ site_description: 'A high-level machine learning and deep learning library for t
nav:
- Home: index.md
- Basic Introduction: basic-introduction.md
- System Architecture: system-architecture.md
- Representing Your Data: representing-your-data.md
- Model Persistence: model-persistence.md
- System Architecture: system-architecture.md
- FAQ: faq.md
- Dataset Objects:
- API Reference: datasets/api.md
Expand Down Expand Up @@ -207,20 +208,23 @@ nav:
- Monte Carlo: cross-validation/monte-carlo.md
- Metrics:
- API Reference: cross-validation/metrics/api.md
- Accuracy: cross-validation/metrics/accuracy.md
- Completeness: cross-validation/metrics/completeness.md
- F Beta: cross-validation/metrics/f-beta.md
- Homogeneity: cross-validation/metrics/homogeneity.md
- Informedness: cross-validation/metrics/informedness.md
- MCC: cross-validation/metrics/mcc.md
- Mean Absolute Error: cross-validation/metrics/mean-absolute-error.md
- Mean Squared Error: cross-validation/metrics/mean-squared-error.md
- Median Absolute Error: cross-validation/metrics/median-absolute-error.md
- Rand Index: cross-validation/metrics/rand-index.md
- RMSE: cross-validation/metrics/rmse.md
- R Squared: cross-validation/metrics/r-squared.md
- SMAPE: cross-validation/metrics/smape.md
- V Measure: cross-validation/metrics/v-measure.md
- Classification and Anomaly Detection:
- Accuracy: cross-validation/metrics/accuracy.md
- F Beta: cross-validation/metrics/f-beta.md
- Informedness: cross-validation/metrics/informedness.md
- MCC: cross-validation/metrics/mcc.md
- Clustering:
- Completeness: cross-validation/metrics/completeness.md
- Homogeneity: cross-validation/metrics/homogeneity.md
- Rand Index: cross-validation/metrics/rand-index.md
- V Measure: cross-validation/metrics/v-measure.md
- Regression:
- Mean Absolute Error: cross-validation/metrics/mean-absolute-error.md
- Mean Squared Error: cross-validation/metrics/mean-squared-error.md
- Median Absolute Error: cross-validation/metrics/median-absolute-error.md
- RMSE: cross-validation/metrics/rmse.md
- R Squared: cross-validation/metrics/r-squared.md
- SMAPE: cross-validation/metrics/smape.md
- Reports:
- API Reference: cross-validation/reports/api.md
- Aggregate Report: cross-validation/reports/aggregate-report.md
Expand Down

0 comments on commit bfa9bc9

Please sign in to comment.