Skip to content

Commit

Permalink
Nicer Dataset object validation error messages
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewdalpino committed Aug 8, 2020
1 parent e2c79d3 commit 228780b
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 10 deletions.
7 changes: 4 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
- Unreleased
- 0.1.1
- Fixed Image Resizer placeholder image
- Fixed Filesystem no write permissions on instantiation
- Nicer object string representations
- Do not terminate empty spatial leaf nodes
- Nicer Stringable object string representations
- Do not terminate empty Spatial tree leaf nodes
- Additional Filesystem persister checks
- Nicer Dataset object validation error messages

- 0.1.0
- CV Report Generators now return Report objects
Expand Down
2 changes: 1 addition & 1 deletion docs/datasets/labeled.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Labeled
A Labeled dataset is used to train supervised learners and for testing a model by providing the ground-truth. In addition to the standard dataset API, a labeled dataset can perform operations such as stratification and sorting the dataset using the label column.

**Note:** Since PHP silently converts integer strings (ex. `'1'`) to integers in some circumstances, you should not use integer strings as class labels. Instead, use an appropriate non-integer string class label such as `first` or `class 1`.
**Note:** Since PHP silently converts integer strings (ex. `'1'`) to integers in some circumstances, you should not use integer strings as class labels. Instead, use an appropriate non-integer string class name such as `'class 1'`, `'#1'`, or `'first'`.

## Parameters
| # | Param | Default | Type | Description |
Expand Down
4 changes: 2 additions & 2 deletions src/Datasets/Dataset.php
Original file line number Diff line number Diff line change
Expand Up @@ -90,15 +90,15 @@ public function __construct(array $samples = [], bool $validate = true)
if (count($sample) !== $n) {
throw new InvalidArgumentException('Number of columns'
. " must be equal for all samples, $n expected but"
. ' ' . count($sample) . ' given.');
. ' ' . count($sample) . " given at row $row.");
}

foreach ($sample as $column => $value) {
if (DataType::detect($value) != $types[$column]) {
throw new InvalidArgumentException("Column $column must"
. ' contain values of the same data type,'
. " $types[$column] expected but "
. DataType::detect($value) . ' given.');
. DataType::detect($value) . " given at row $row.");
}
}
}
Expand Down
12 changes: 8 additions & 4 deletions src/Datasets/Labeled.php
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@
use function get_class;
use function gettype;
use function array_slice;
use function is_string;
use function is_numeric;
use function is_float;
use function is_nan;

use const Rubix\ML\PHI;
use const Rubix\ML\EPSILON;
Expand Down Expand Up @@ -148,16 +152,16 @@ public function __construct(array $samples = [], array $labels = [], bool $valid
. " categorical or continuous, $type given.");
}

foreach ($labels as $label) {
foreach ($labels as $offset => $label) {
if (DataType::detect($label) != $type) {
throw new InvalidArgumentException('Labels must be'
. " the same data type, $type expected but "
throw new InvalidArgumentException('Invalid label type'
. " found at offset $offset, $type expected but "
. DataType::detect($label) . ' given.');
}

if (is_float($label) and is_nan($label)) {
throw new InvalidArgumentException('Labels must not'
. ' contain NaN values.');
. " contain NaN values, NaN found at offset $offset.");
}
}
}
Expand Down

0 comments on commit 228780b

Please sign in to comment.