All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Add CLI option to use
treeReduce
on Spark
0.40.0 - 2024-08-22
- Add CLI option to disable finding enums
- Fix circular dependency with property sets
- Properly respect disjoint object detection in CLI
- Disable assertions by default
- Expand Spark compatibility by shading assembly
0.30.1 - 2024-07-01
- Update dependencies
0.30.0 - 2024-06-27
- Support
patternProperties
when converting to a dynamic object - Enable SBOM in Docker build
- Fix some anomaly checking with
patternProperties
- Fix some anomaly checking with
additionalProperties
- Update to Scala 2.13.14
- Update to Java 11
- Change base image for Docker container
- Set build timestamp for reproducible builds
0.20.1 - 2024-02-28
- Support for using
treeReduce
with Apache Spark
0.20.0 - 2024-02-20
- Fix array uniqueness subset check
- Combine numeric schemas in
ProductSchema
- Use JSON Pointer objects instead of strings
0.19.0 - 2023-11-26
- Fix package coordinates
0.18.0 - 2023-11-20
- Testing validation via Bowtie
- Collect anomalies for
patternProperties
- Throw error when parsing with
unevaluatedProperties
- Set
items
tofalse
in tuple schemas - Declare lack of support for
unevaluatedItems
andif
/then
- Declare partial support for
additionalProperties
- Track Boolean values which are constant
0.17.0 - 2023-08-08
- Add calculation of schema entropy
- Switch to draft 2020-12
- Switch back to
prefixItems
0.16.0 - 2023-08-02
- Add dynamic object transformer
- Add disjoint key transformer
- Allow selecting a subset of properties by classes
- Allow removing subsets of properties
- Detect numeric strings and treat as numeric
- Added extended format checkers
- Apply
EnumTransformer
to constant Booleans - Add test time assertions
- Add expansion to
ZeroSchema
- Add some basic property-based testing
- Increase length penalty for primary key detection
- Update to Scala 2.13
- Move property set configuration to params object
- Only generate tuple schemas on multiple observations
- Move transformers to separate package
- Allow top-level schemas that aren't objects
- Ignore invalid JSON input
- Allow schema replacement to replace the entire schema
- Correct histogram anomaly checking
- Avoid errors with very small
multipleOf
values - Avoid errors with
multipleOf
and very different scales - Don't anchor string matches in
StaticPatternProperty
- Fix compatibility check with
ProductSchema
- Don't crash with extremely large integer values
- Fix crash when merging numbers of different scales
- More complete merging with
ProductSchema
- Better error handling when deserializing schemas
- Fix broken
isSubsetOf
for empty tuple schemas - Fix broken
isSubsetOf
for emptyStringNumericProperty
- Fix
isSubsetOf
inDependenciesProperty
AnySchema
should be a subset of itself- Large
BigDecimal
values should be treated as extreme - Add
JLong
to valid types forIntegerSchema
- Fix extreme value anomaly checks in histograms
- Fix tuple schema deserialization from JSON Schema
0.15.0 - 2023-04-11
- Add namespace to
AnomalyLevel
- Unknown properties are not anomalous if
additionalProperties
is true - Make histogram bounds violation
Info
level - Don't consider non-null values as anomalous with simple
NullSchema
- Properly handle zero values in histogram anomaly checking
- Correct anomaly detection in
ProductSchema
- More accurate anomaly detection for
PatternProperty
0.14.0 - 2023-04-04
- Move to new package name
- Fix JSONoid version in generated schema
0.13.0 - 2023-02-28
- Allow disabling expansion entirely
- Add possible debug output to CLI
- Add some compatibility checking for
StaticDependenciesProperty
- Refactor reading of JSON Schema from file
- Avoid crash during expansion if a property is missing
- Fix regex compatibility checking for
StaticPatternProperty
- Properly report compatibility for
PatternProperty
- Add missing
NumMultipleOfProperty
when convertingIntegerSchema
- Make expansion actually work with large numbers
- Fix
multipleOf
compatibility checks with 0 - Correctly deal with negative multiples
- Exit with non-zero status for invalid arguments
- Allow oblivious expansion with split discovery
0.12.2 - 2023-02-24
- Allow specifying random number seed for reproducible discovery
0.12.1 - 2023-02-23
- Add a property to track the percentage of true Boolean values
- Allow for oblivious expansion without another schema
- Add additionalProperties during oblivious expansion
- Allow reset of min/max length for strings with format
- Fix missing bash in Docker image
0.12.0 - 2023-02-21
- Add equivalence relation which checks types
- Allow additional ER choices in CLI
- Allow the maximum number of examples to be configured
- Allow configuration of
additionalProperties
- Allow checking of schema compatibility
- Expand schema properties to cover another schema where possible
- Use ranking to improve possible primary key suggestions
- Use a configurable threshold for format detection, defaulting to 1
- Refactor reference replacement to allow replacing with any schema
- Don't output a format if most string values have no format
- Correctly check UUID format
- Correctly check email format
0.11.0 - 2023-01-13
- Make definition transformation optional when running via Spark
- Use DDSketch for histograms
- Build a Docker image on each push
- Include sbt wrapper script
- Add option to write schema directly to file
- Generate website with schema previews each push
- Don't output items in
ArraySchema
if empty - Keep definitions on schema copy
- Don't fail in
DefinitionTransfer
on single keys
- Refactor Bloom filters to reduce code duplication
- Added Scalafix along with some minor rewrites
- Pretty print final JSON schema
0.10.0 - 2022-04-15
- Add intersecting label equivalence relation
- Include serialized HLL in generated schema
- Allow limiting discovered properties from CLI
- Require minimum examples for format property
- Include array item uniqueness and examples in simple property set
- Avoid discovering properties not required in CLI
0.9.7 - 2022-03-14
- Detect anomalies resulting from dependency violations
0.9.6 - 2022-03-12
- Fix nested property transformation
0.9.5 - 2022-03-12
- Fix property restriction for complex types
0.9.4 - 2022-03-11
- Allow restricting to a subset of properties
0.9.3 - 2022-03-09
- Fix Bloom filter deserialization
0.9.2 - 2022-03-09
- Fix
dependentRequired
during conversion
0.9.1 - 2022-03-09
- Support
dependentRequired
during conversion
0.9.0 - 2022-03-09
- Correct type anomaly detection
0.8.3 - 2022-03-08
- Avoid unnecessary anomaly errors on
patternProperties
0.8.2 - 2022-03-08
- Serialize/deserialize Bloom filters
- Support
const
when converting schemas
0.8.1 - 2022-03-08
- Allow transformers to work on top-level objects
- Separate transformer to merge schemas containing
allOf
- Anomaly detection for different
ProductSchema
types - Parse examples when converting schemas
- Add separate types for
ProductSchema
- Properly detect type anomalies with nested schemas
0.8.0 - 2022-02-22
- Allow merging schemas by intersection instead of union
- Use a base schema in
ProductSchema
- Restore CLI functionality
- Ensure special schemas are merged correctly
0.7.3 - 2022-02-16
- Don't show reference in
ReferenceObjectProperty#toString
to fix circular references - Support circular references during reference resolution
0.7.2 - 2022-02-15
- Added missing
patternProperties
support toSchemaWalker
0.7.1 - 2022-02-15
- Support
patternProperties
during object conversion
0.7.0 - 2022-02-09
- Support exclusive min/max during conversion
- Support
allOf
during object conversion - Track
multipleOf
forNumberSchema
- Add more explicit object conversion errors
MultipleOfProperty
renamed toIntMultipleOfProperty
0.6.3 - 2022-02-08
- Perform heuristic type detection during object conversion
- Support
additionalItems
during object conversion
- Use
AnySchema
during conversion when no type detected - Allow
items
to be an array during object conversion
- Allow arrays to be converted without item type
0.6.2 - 2022-02-04
- Perform definition conversion for all types
0.6.1 - 2022-02-04
- Don't require root to be
ObjectSchema
when resolving references
0.6.0 - 2022-02-04
- Show error message when converting with
patternProperties
- Include definitions when converting to JSONoid objects
- Store definitions directly on the
JsonSchema
object, not as a property
0.5.5 - 2022-02-03
- Allow a reference object to be stored for schema references
- Correctly parse
$ref
during object conversion - New
ReferenceResolver
which will add a property toReferenceSchema
with the path of the schema object
- Enable Wartremover only for compilation (not tests)
0.5.4 - 2022-02-01
- Support cases where
type
is an array in JSONoid object conversion - Support enums in JSONoid object conversion
- Add
format
support in JSONoid object conversion - Add a separate
StaticPatternProperty
to statically specify regexes - Handle
allOf
with a single element in JSONoid conversion - Handle
true
andfalse
in JSONoid object conversion
- Make JSONoid conversion helper methods private
- Don't construct
ProductSchema
with a single element during object conversion
0.5.3 - 2022-01-31
- Throw a more readable exception if
$ref
orallOf
are found during conversion - Assume something is object type during conversion if it has
properties
- Allow objects with no defined properties when converting to JSONoid
0.5.2 - 2022-01-31
NonEquivalenceRelation
which never merges- Also convert
anyOf
andoneOf
toProductSchema
during conversion
0.5.1 - 2022-01-29
- New
getOrNone
method forSchemaProperties
which returns anOption
- Moved
BuildInfo
class to insidediscovery
package - Update assembly JAR configuration to change name
0.5.0 - 2022-01-28
- Histograms are tracked for array and string lengths
- JSON values can be checked for anomalies against a schema
- JSON schemas can be converted to JSONoid objects
- Better names for generated definitions
- Definition replacement now works for
ProductSchema
0.4.2 - 2021-10-15
- Don't include
ProductSchema
counts in generated schema
0.4.1 - 2021-10-15
- Allow
ProductSchema
as a top-level result
- Better handling of clustering failures
0.4.0 - 2021-10-14
- Allow configurable equivalence relations when merging
- Add option to select whether definitions should be found
- Allow conversion of any type to a full JSON Schema
- Record count of different alternatives in
ProductSchema
- Use
oneOf
instead ofanyOf
inProductSchema
0.3.0 - 2021-10-05
- Use clustering of related objects to automatically create definitions
- Prevented duplicate values from occurring in
enum
in some circumstances - Change
dependencies
property todependentRequired
to match latest schema version - Change
prefixItems
back toitems
for now since we are using draft 2019
- Produce 2019 draft version schemas
- Remove statistics from
NumberSchema
simple properties - Rename
stats
property tostatistics
- Separate elements of tuple schemas when generating value tables
- Add preliminary support for the definition and use of references
0.2.4 - 2021-08-27
- Approach to multiline strings for description generation was broken on some JREs
0.2.3 - 2021-08-27
- Add option for property sets to main CLI
- Warning message in generated schema description
- Include version number in CLI output
0.2.2 - 2021-08-24
- Removed
$vocabulary
from the generated schema
0.2.1 - 2021-08-24
- Add class to directly run JSONoid on Spark
- Add different sets of properties which can be discovered with Spark
- Version number (incorrect) removed from CLI output
0.2.0 - 2021-06-15
PrimaryKeyFinder
now searches recursively and returns JSON Paths- Generate schema using 2020-12 spec
- Tuple schemas are now produced using
prefixItems
- Arrays now correctly use
minItems
andmaxItems
instead ofminLength
andmaxLength
multipleOf
is not included if the multiple is zero (only happens for constants)
0.1.3 - 2021-06-09
- Subschemas can now be extracted using a JSON pointer string
- Reservoir sampling for example collection now actually uses new examples
0.1.2 - 2021-06-08
- Also detect suffix patterns in strings
0.1.1 - 2021-06-01
- Method for conversion to simple
JObject
JSON Schema
0.1.0 - 2021-05-31
- Initial release