Skip to content

Commit

Permalink
Added exceptions and layer tests
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewdalpino committed Sep 21, 2018
1 parent db962a8 commit 68c7e40
Show file tree
Hide file tree
Showing 43 changed files with 714 additions and 131 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
- Unreleased

- 0.0.5-alpha
- Added Gaussian Mixture clusterer
- Added Batch Norm hidden layer
- Added PReLU hidden layer
Expand All @@ -20,6 +22,7 @@
- Removed parameters from Dropout and Alpha Dropout layers
- Added option to remove biases in Dense and Placeholder layers
- Optimized Dataset objects
- Optimized matrix and vector operations
- Added grid params to Param helper
- Added Gaussian RBF activation function
- Renamed Quadratic cost function to Least Squares
Expand Down
34 changes: 16 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,6 @@ MIT
- [Evaluation](#evaluating-model-performance)
- [Visualization](#visualization)
- [Next Steps](#next-steps)
- [Environments](#environments)
- [Command Line](#command-line)
- [Web Server](#web-server)
- [API Reference](#api-reference)
- [Datasets](#datasets)
- [Dataset Objects](#dataset-objects)
Expand Down Expand Up @@ -202,6 +199,10 @@ MIT
- [Tokenizers](#tokenizers)
- [Whitespace](#whitespace)
- [Word](#word-tokenizer)
- [FAQ](#faq)
- [What environment should I run Rubix in?](#what-environment-should-i-run-rubix-in)
- [Testing](#testing)
- [Contributing](#contributing)

---
### Basic Introduction
Expand Down Expand Up @@ -347,21 +348,6 @@ If you are looking for a place to start, we highly recommend [D3.js](https://d3j
### Next Steps
After you've gone through this basic introduction to machine learning in Rubix, we highly recommend reading over the [API Reference](#api-reference) to get an idea of what the library can do. The API Reference is the place you'll go to get detailed information and examples about the classes that make up the library. If you have a question or need help, feel free to post on our Github page.

---
### Environments
Typically, there are two different types of *environments* that a PHP program can run in - on the command line in a terminal window or on a web server such as Nginx via the FPM module. Most of the time you will only be working with the command line in Rubix unless you are building a system to work live in production. For more information regarding the environments in which PHP can run in you can refer to the [general installation considerations](http://php.net/manual/en/install.general.php) on the PHP website.

### Command Line
The most common use cases for Rubix only require the PHP command line interface (CLI) to run since we don't need to handle any web requests. The CLI runs directly in a terminal and does not have a maximum execution time set by default. Note that you may need to adjust your memory limit in php.ini to a suitable value (or -1 for no limit).

To run a program on the command line, make sure the PHP binary is in your default PATH and enter:
```sh
$ php Model.php
```

### Web Server
It is possible to run a model trained with Rubix in a live system on a web server either during a request or in the background in a queue but many considerations need to be taken into account to ensure a smooth system. The primary consideration is one of resource allocation as machine learning models tend to be highly resource (CPU and memory) intensive. It is generally discouraged to run an ML model within a web request cycle, but if you must, you will need to consider the execution time of the script as it can be used as a denial of service (DOS) attack if not handled properly.

---
### API Reference

Expand Down Expand Up @@ -4152,6 +4138,18 @@ use Rubix\ML\Extractors\Tokenizers\Word;
$tokenizer = new Word();
```

---
## FAQ
Here you can find answers to the most frequently asked questions.

### What environment should I run Rubix in?
Typically, there are two different types of *environments* that a PHP program can run in - on the command line in a terminal window or on a web server such as Nginx via the FPM module. Most of the time you will only be working with the command line in Rubix unless you are building a system to work live in production. Even then, it is advised to run your models as background services and serve requests from a cache. For more information regarding the environments in which PHP can run in you can refer to the [general installation considerations](http://php.net/manual/en/install.general.php) on the PHP website.

To run a program on the command line, make sure the PHP binary is in your default PATH and enter:
```sh
$ php program.php
```

---
## Testing
Rubix utilizes a combination of static analysis and unit tests to reduce errors in code. Rubix provides two Composer scripts that can be run from the root directory that automate the testing process.
Expand Down
7 changes: 3 additions & 4 deletions composer.json
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
{
"name": "rubix/ml",
"type": "library",
"description": "Rubix ML is a library that lets you build intelligent programs that learn from data in PHP.",
"homepage": "https://github.com/andrewdalpino/Rubix-ML",
"description": "Rubix ML is a machine learning library that lets you build programs that learn from data in PHP.",
"homepage": "https://github.com/RubixML/RubixML",
"license": "MIT",
"keywords": [
"machine", "learning", "data", "science", "mining", "predictive", "modeling", "ai", "classification", "regression", "clustering", "anomaly", "detection", "neural", "network"
"machine", "learning", "data", "science", "mining", "predictive", "modeling", "classification", "regression", "clustering", "anomaly", "detection", "neural", "network"
],
"authors": [
{
Expand All @@ -16,7 +16,6 @@
"require": {
"php": ">=7.1.3",
"intervention/image": "^2.4",
"league/csv": "^9.1.4",
"markrogoyski/math-php": "^0.43.0"
},
"require-dev": {
Expand Down
6 changes: 6 additions & 0 deletions src/AnomalyDetectors/LocalOutlierFactor.php
Original file line number Diff line number Diff line change
Expand Up @@ -164,11 +164,17 @@ public function predict(Dataset $dataset) : array
* median density of the local region.
*
* @param \Rubix\ML\Datasets\Dataset $dataset
* @throws \InvalidArgumentException
* @throws \RuntimeException
* @return array
*/
public function proba(Dataset $dataset) : array
{
if (in_array(DataFrame::CATEGORICAL, $dataset->types())) {
throw new InvalidArgumentException('This estimator only works with'
. ' continuous features.');
}

if (empty($this->samples)) {
throw new RuntimeException('Estimator has not been trained.');
}
Expand Down
10 changes: 8 additions & 2 deletions src/AnomalyDetectors/RobustZScore.php
Original file line number Diff line number Diff line change
Expand Up @@ -125,13 +125,13 @@ public function mads() : array
*/
public function train(Dataset $dataset) : void
{
$this->medians = $this->mads = [];

if (in_array(DataFrame::CATEGORICAL, $dataset->types())) {
throw new InvalidArgumentException('This estimator only works with'
. ' continuous features.');
}

$this->medians = $this->mads = [];

foreach ($dataset->rotate() as $column => $values) {
list($median, $mad) = Stats::medMad($values);

Expand All @@ -145,11 +145,17 @@ public function train(Dataset $dataset) : void
* to a tolerance and threshold respectively.
*
* @param \Rubix\ML\Datasets\Dataset $dataset
* @throws \InvalidArgumentException
* @throws \RuntimeException
* @return array
*/
public function predict(Dataset $dataset) : array
{
if (in_array(DataFrame::CATEGORICAL, $dataset->types())) {
throw new InvalidArgumentException('This estimator only works with'
. ' continuous features.');
}

if (empty($this->medians) or empty($this->mads)) {
throw new RuntimeException('Estimator has not been trained.');
}
Expand Down
53 changes: 35 additions & 18 deletions src/Classifiers/GaussianNB.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
use Rubix\ML\Other\Functions\LogSumExp;
use Rubix\ML\Other\Structures\DataFrame;
use InvalidArgumentException;
use RuntimeException;

/**
* Gaussian Naive Bayes
Expand Down Expand Up @@ -62,29 +63,23 @@ class GaussianNB implements Estimator, Online, Probabilistic, Persistable
/**
* The precomputed prior log probabilities of each label given by their weight.
*
* @var array
* @var array|null
*/
protected $_priors = [
//
];
protected $_priors;

/**
* The precomputed means of each feature column of the training set.
*
* @var array
* @var array|null
*/
protected $means = [
//
];
protected $means;

/**
* The precomputed variances of each feature column of the training set.
*
* @var array
* @var array|null
*/
protected $variances = [
//
];
protected $variances;

/**
* The possible class outcomes.
Expand Down Expand Up @@ -126,29 +121,29 @@ public function type() : int
* Return the class prior log probabilities based on their weight over all
* training samples.
*
* @return array
* @return array|null
*/
public function priors() : array
public function priors() : ?array
{
return $this->_priors;
}

/**
* Return the running mean of each feature column of the training data.
*
* @return array
* @return array|null
*/
public function means() : array
public function means() : ?array
{
return $this->means;
}

/**
* Return the running variances of each feature column of the training data.
*
* @return array
* @return array|null
*/
public function variances() : array
public function variances() : ?array
{
return $this->variances;
}
Expand Down Expand Up @@ -246,10 +241,21 @@ public function partial(Dataset $dataset) : void
* choose the class with the highest likelihood as the prediction.
*
* @param \Rubix\ML\Datasets\Dataset $dataset
* @throws \InvalidArgumentException
* @throws \RuntimeException
* @return array
*/
public function predict(Dataset $dataset) : array
{
if (in_array(DataFrame::CATEGORICAL, $dataset->types())) {
throw new InvalidArgumentException('This estimator only works with'
. ' continuous features.');
}

if (is_null($this->means) or is_null($this->variances)) {
throw new RuntimeException('Estimator has not been trained.');
}

$predictions = [];

foreach ($dataset as $sample) {
Expand All @@ -266,10 +272,21 @@ public function predict(Dataset $dataset) : array
* of a sample.
*
* @param \Rubix\ML\Datasets\Dataset $dataset
* @throws \InvalidArgumentException
* @throws \RuntimeException
* @return array
*/
public function proba(Dataset $dataset) : array
{
if (in_array(DataFrame::CATEGORICAL, $dataset->types())) {
throw new InvalidArgumentException('This estimator only works with'
. ' continuous features.');
}

if (is_null($this->means) or is_null($this->variances)) {
throw new RuntimeException('Estimator has not been trained.');
}

$probabilities = [];

foreach ($dataset as $i => $sample) {
Expand Down
6 changes: 6 additions & 0 deletions src/Classifiers/KDNeighbors.php
Original file line number Diff line number Diff line change
Expand Up @@ -141,11 +141,17 @@ public function predict(Dataset $dataset) : array
* Output a vector of class probabilities per sample.
*
* @param \Rubix\ML\Datasets\Dataset $dataset
* @throws \InvalidArgumentException
* @throws \RuntimeException
* @return array
*/
public function proba(Dataset $dataset) : array
{
if (in_array(DataFrame::CATEGORICAL, $dataset->types())) {
throw new InvalidArgumentException('This estimator only works with'
. ' continuous features.');
}

if ($this->bare() === true) {
throw new RuntimeException('Estimator has not been trainied.');
}
Expand Down
6 changes: 6 additions & 0 deletions src/Classifiers/KNearestNeighbors.php
Original file line number Diff line number Diff line change
Expand Up @@ -159,11 +159,17 @@ public function predict(Dataset $dataset) : array
* Output a vector of class probabilities per sample.
*
* @param \Rubix\ML\Datasets\Dataset $dataset
* @throws \InvalidArgumentException
* @throws \RuntimeException
* @return array
*/
public function proba(Dataset $dataset) : array
{
if (in_array(DataFrame::CATEGORICAL, $dataset->types())) {
throw new InvalidArgumentException('This estimator only works with'
. ' continuous features.');
}

if (empty($this->samples) or empty($this->labels)) {
throw new RuntimeException('Estimator has not been trained.');
}
Expand Down
15 changes: 12 additions & 3 deletions src/Classifiers/LogisticRegression.php
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\NeuralNet\FeedForward;
use Rubix\ML\Other\Functions\Argmax;
use Rubix\ML\Other\Structures\Matrix;
use Rubix\ML\NeuralNet\Layers\Binary;
use Rubix\ML\NeuralNet\Optimizers\Adam;
use Rubix\ML\Other\Structures\DataFrame;
Expand Down Expand Up @@ -278,21 +279,29 @@ public function predict(Dataset $dataset) : array
* Output a vector of class probabilities per sample.
*
* @param \Rubix\ML\Datasets\Dataset $dataset
* @throws \InvalidArgumentException
* @throws \RuntimeException
* @return array
*/
public function proba(Dataset $dataset) : array
{
if (in_array(DataFrame::CATEGORICAL, $dataset->types())) {
throw new InvalidArgumentException('This estimator only works with'
. ' continuous features.');
}

if (is_null($this->network)) {
throw new RuntimeException('Estimator has not been trained.');
}

$samples = Matrix::build($dataset->samples(), false)->transpose();

$probabilities = [];

foreach ($this->network->infer($dataset) as $activation) {
foreach ($this->network->infer($samples)->row(0) as $activation) {
$probabilities[] = [
$this->classes[0] => 1. - $activation[0],
$this->classes[1] => $activation[0],
$this->classes[0] => 1. - $activation,
$this->classes[1] => $activation,
];
}

Expand Down
11 changes: 10 additions & 1 deletion src/Classifiers/MultiLayerPerceptron.php
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
use Rubix\ML\NeuralNet\Snapshot;
use Rubix\ML\NeuralNet\FeedForward;
use Rubix\ML\Other\Functions\Argmax;
use Rubix\ML\Other\Structures\Matrix;
use Rubix\ML\NeuralNet\Layers\Hidden;
use Rubix\ML\NeuralNet\Optimizers\Adam;
use Rubix\ML\Other\Structures\DataFrame;
Expand Down Expand Up @@ -393,18 +394,26 @@ public function predict(Dataset $dataset) : array
* Output a vector of class probabilities per sample.
*
* @param \Rubix\ML\Datasets\Dataset $dataset
* @throws \InvalidArgumentException
* @throws \RuntimeException
* @return array
*/
public function proba(Dataset $dataset) : array
{
if (in_array(DataFrame::CATEGORICAL, $dataset->types())) {
throw new InvalidArgumentException('This estimator only works with'
. ' continuous features.');
}

if (is_null($this->network)) {
throw new RuntimeException('Estimator has not been trained.');
}

$samples = Matrix::build($dataset->samples(), false)->transpose();

$probabilities = [];

foreach ($this->network->infer($dataset) as $activations) {
foreach ($this->network->infer($samples)->transpose() as $activations) {
$probabilities[] = array_combine($this->classes, $activations);
}

Expand Down
Loading

0 comments on commit 68c7e40

Please sign in to comment.