Added model persistence section to docs

RubixML · Oct 24, 2019 · bfa9bc9 · bfa9bc9
1 parent 3e885d2
commit bfa9bc9
Show file tree

Hide file tree

Showing 7 changed files with 87 additions and 29 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,4 +1,4 @@
-- Unreleased
+- 0.0.16-beta
     - Radius Neighbors allows user-definable anomaly class
     - Added KNN Imputer
     - Added Random Hot Deck Imputer

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,8 +1,8 @@
 # Contributing Guidelines
-Thank you for considering a code contribution to Rubix ML. We strongly believe that our contributors play the most important role in bringing powerful machine learning tools to the PHP language. Please read over the following guidelines so that we can continue to provide a high quality product that our users love.
+Thank you for considering a code contribution to Rubix ML. We strongly believe that our contributors play the most important role in bringing powerful machine learning tools to the PHP language. Please read over the following guidelines so that we can continue to provide high quality machine learning tools that our users love.
 
 ### Pull Request Checklist
-Here are some things to check off before sending in a pull request
+Here are a few things to check off before sending in a pull request ...
 
 - The change provides high value to Rubix ML engineers
 - The change does not introduce unnecessary complexity
@@ -55,7 +55,7 @@ Rubix ML uses a unique end-to-end testing schema for all learners that involves
 Typically bugs indicate an area of the code that has not been properly tested yet. When submitting a bug fix, please include a passing test that would have reproduced the bug prior to your changes.
 
 ### Mutability Policy
-Objects implemented in Rubix have a mutability policy of *generally* immutable which means properties are kept protected and state cannot be modified without creating a new object. Certain objects such as Learners have model parameters that are mutated during training. In such cases, mutability must be controlled through public interfaces. In general, any stateful object that requires mutation must only be updated through a well-defined public method. In some special cases, such as for performance reasons, object properties may be allowed to be mutated directly.
+Objects implemented in Rubix ML have a mutability policy of *generally* immutable which means properties are kept protected and state cannot be modified without creating a new object. Certain objects such as Learners have model parameters that are mutated during training. In such cases, mutability must be controlled through public interfaces. In general, any stateful object that requires mutation must only be updated through a well-defined public method. In some special cases, such as for performance reasons, object properties may be allowed to be mutated directly.
 
 ### Anti Plagiarism Policy
 Our community takes a strong stance against plagiarism, or the copying of another author's code without attribution. Since the spirit of open source is to make code freely available, it is up to the community to enforce policies that deter plagiarism. As such, we do not allow contributions from those who violate this policy.
diff --git a/composer.json b/composer.json
@@ -16,6 +16,14 @@
             "email": "[email protected]",
             "homepage": "https://andrewdalpino.com",
             "role": "Lead Engineer"
+        },
+        {
+            "name": "Core Team",
+            "homepage": "https://github.com/orgs/RubixML/teams/core"
+        },
+        {
+            "name": "Contributors",
+            "homepage": "https://github.com/RubixML/RubixML/graphs/contributors"
         }
     ],
     "require": {
@@ -72,6 +80,10 @@
         "preferred-install": "dist",
         "sort-packages": true
     },
-    "minimum-stability": "dev",
-    "prefer-stable": true
+    "support": {
+        "email": "[email protected]",
+        "issues": "https://github.com/RubixML/RubixML/issues",
+        "source": "https://github.com/RubixML/RubixML",
+        "docs": "https://docs.rubixml.com/en/latest"
+    }
 }
diff --git a/docs/basic-introduction.md b/docs/basic-introduction.md
@@ -117,9 +117,9 @@ array(5) {
 ```
 
 # Model Evaluation
-To test that the estimator can correctly generalize what it has learned during training to the real world we use a process called *cross validation*. The goal of cross validation is to train and test the learner on different subsets of the dataset as to produce a validation score. For the purposes of this introduction, we will use the [Hold Out](cross-validation/hold-out.md) validator which takes a portion of the dataset for testing and leaves the rest for training. The reason we do not use *all* of the data for training is because we want to test the estimator on samples that it has never seen before.
+To test that the estimator can correctly generalize what it has learned during training to the real world we use a process called *cross validation*. The goal of cross validation is to train and test the learner on different subsets of the dataset in  order to produce a validation score. For the purposes of the introduction, we will use the [Hold Out](cross-validation/hold-out.md) validator which takes a portion of the dataset for testing and leaves the rest for training. The reason we do not use *all* of the data for training is because we want to test the estimator on samples that it has never seen before.
 
-The Hold Out validator requires you to set the ratio of testing to training samples as a constructor parameter. Let's choose to use a factor of 0.2 (20%) of the dataset for testing leaving the rest (80%) for training.
+The Hold Out validator requires the user to set the ratio of testing to training samples as a constructor parameter. Let's choose to use a factor of 0.2 (20%) of the dataset for testing leaving the rest (80%) for training.
 
 > **Note:** Typically, 0.2 is a good default choice however your mileage may vary. The important thing to note here is the trade off between more data for training and more data to produce better testing results.
 
@@ -140,7 +140,5 @@ var_dump($score);
 float(0.945)
 ```
 
-Congratulations! You're done with the basic introduction to machine learning in Rubix ML.
-
-# Next Steps
-For a more in-depth tutorial using the K Nearest Neighbors classifier, check out the [Iris Flower](https://github.com/RubixML/Iris) example project. We highly recommend browsing the rest of the documentation and the other [example projects](https://github.com/RubixML) which range from beginner to advanced skill level.
+### Next Steps
+Congratulations! You've completed the basic introduction to machine learning in PHP with Rubix ML. For a more in-depth tutorial using the K Nearest Neighbors classifier, check out the [Iris Flower](https://github.com/RubixML/Iris) example project. We highly recommend browsing the rest of the documentation and the other [example projects](https://github.com/RubixML) which range from beginner to advanced skill level. Have fun and stay curious!
diff --git a/docs/model-persistence.md b/docs/model-persistence.md
@@ -0,0 +1,38 @@
+# Model Persistence
+Model persistence refers to the capability of an estimator to be trained and used to make predictions in processes other the current running process. Imagine that you trained a classifier to categorize comment posts and now you want to deploy it to a server to perform real-time inference on your website. Or, say you just finished training a model that took the whole day and you want to save it for later. Rubix ML allows you to handle both of these scenarios using [Persisters](./persiters/api.md) and [Persistable](persistable.md) objects.
+
+### Persisters
+Persisters are objects whose responsibility is to save and load model data to and from storage. For example, the [Filesystem](./persisters/filesystem.md) serializes and reconstitutes a persistable model from a location on a filesystem such as a local hard disk or network attached storage.
+
+**Example**
+
+```php
+use Rubix\ML\Persisters\Filesystem;
+
+$persister = new Filesystem('example.model');
+
+$estimator = $persister->load();
+
+// Do something
+
+$persister->save($estimator);
+```
+
+### Serialization
+Very often a model will need to be serialized, or packaged into a discrete chunk of data, before it can be persisted. The same is true for loading a model which is serialization in reverse. Rubix ML is compatible with a number of portable serialization formats including the Native PHP format as well as the Igbinary format. By knowing the format, you can easily transport models between systems.
+
+### The Persistent Model Meta-estimator
+The [Persistent Model](persistent-model.md) meta-estimator is a model wrapper that uses the persistence subsystem under the hood. It provides `save()` and `load()` methods for the persistable learner that it wraps.
+
+**Example**
+
+```php
+use Rubix\ML\PersistentModel;
+use Rubix\ML\Persisters\Filesystem;
+
+$estimator = PersistentModel::load(new Filesystem('example.model'));
+
+// Do something
+
+$estimator->save();
+```
diff --git a/docs/system-architecture.md b/docs/system-architecture.md
@@ -1,4 +1,10 @@
 # System Architecture
-The Rubix architecture is defined by a few key abstractions and their corresponding types and interfaces.
+The high level API is designed around a few key abstractions and their corresponding types and interfaces. In addition, Rubix ML employs various mid and low level subsystems that power many of the learners. This layered architecture allows for power and flexibility while keeping the public interface simple and straighforward.
 
-![Rubix ML System Architecture](https://raw.githubusercontent.com/RubixML/RubixML/master/docs/img/rubix-ml-system-architecture.svg?sanitize=true)
+### General Architecture
+From the perspective of data flowing in and out of a machine learning system, there are a number of components that the user *may* interact with. These include [Dataset](./datasets/api.md) objects, [Transformers](./transformers/api.md), [Estimators](estimator.md), and Meta-estimators. Starting from the top, the illustration below shows the path of data from input features to prediction within Rubix ML.
+
+![Rubix ML System Architecture](https://raw.githubusercontent.com/RubixML/RubixML/master/docs/img/rubix-ml-system-architecture.svg?sanitize=true)
+
+### Subsystems
+Under the hood, Rubix ML utilizes a number of modular subsystems that are highly optimized for their purpose such as the graph, neural net, SVM, and tensor processing subsystems. Some mid and low level subsystems run as optional PHP extensions.
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -9,8 +9,9 @@ site_description: 'A high-level machine learning and deep learning library for t
 nav:
 - Home: index.md
 - Basic Introduction: basic-introduction.md
-- System Architecture: system-architecture.md
 - Representing Your Data: representing-your-data.md
+- Model Persistence: model-persistence.md
+- System Architecture: system-architecture.md
 - FAQ: faq.md
 - Dataset Objects:
   - API Reference: datasets/api.md
@@ -207,20 +208,23 @@ nav:
   - Monte Carlo: cross-validation/monte-carlo.md
   - Metrics:
     - API Reference: cross-validation/metrics/api.md
-    - Accuracy: cross-validation/metrics/accuracy.md
-    - Completeness: cross-validation/metrics/completeness.md
-    - F Beta: cross-validation/metrics/f-beta.md
-    - Homogeneity: cross-validation/metrics/homogeneity.md
-    - Informedness: cross-validation/metrics/informedness.md
-    - MCC: cross-validation/metrics/mcc.md
-    - Mean Absolute Error: cross-validation/metrics/mean-absolute-error.md
-    - Mean Squared Error: cross-validation/metrics/mean-squared-error.md
-    - Median Absolute Error: cross-validation/metrics/median-absolute-error.md
-    - Rand Index: cross-validation/metrics/rand-index.md
-    - RMSE: cross-validation/metrics/rmse.md
-    - R Squared: cross-validation/metrics/r-squared.md
-    - SMAPE: cross-validation/metrics/smape.md
-    - V Measure: cross-validation/metrics/v-measure.md
+    - Classification and Anomaly Detection:
+      - Accuracy: cross-validation/metrics/accuracy.md
+      - F Beta: cross-validation/metrics/f-beta.md
+      - Informedness: cross-validation/metrics/informedness.md
+      - MCC: cross-validation/metrics/mcc.md
+    - Clustering:
+      - Completeness: cross-validation/metrics/completeness.md
+      - Homogeneity: cross-validation/metrics/homogeneity.md
+      - Rand Index: cross-validation/metrics/rand-index.md
+      - V Measure: cross-validation/metrics/v-measure.md
+    - Regression:
+      - Mean Absolute Error: cross-validation/metrics/mean-absolute-error.md
+      - Mean Squared Error: cross-validation/metrics/mean-squared-error.md
+      - Median Absolute Error: cross-validation/metrics/median-absolute-error.md
+      - RMSE: cross-validation/metrics/rmse.md
+      - R Squared: cross-validation/metrics/r-squared.md
+      - SMAPE: cross-validation/metrics/smape.md
   - Reports:
     - API Reference: cross-validation/reports/api.md
     - Aggregate Report: cross-validation/reports/aggregate-report.md