Skip to content

Commit

Permalink
Add missing phone_assertions.js file
Browse files Browse the repository at this point in the history
Javscript and Markdown formatting
Upgrade dataform cli version in package.json to 1.20.0
  • Loading branch information
ddeleo committed Oct 12, 2021
1 parent 3f2d0fb commit 7095eab
Show file tree
Hide file tree
Showing 13 changed files with 299 additions and 210 deletions.
55 changes: 43 additions & 12 deletions dataform/examples/dataform_assertion_unit_test/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,28 @@
# Dataform Custom Assertions
Dataform is a platform to manage data in Big Query and other data warehouses and dataform perform the **T (Transformation)** in the **ELT** Pipeline. But before any transformaiton can be done, we need to make sure our input data is valid. Dataform has many built in data assertions such as uniqueKey, nonNull, and etc. But you can also customize your dataform assertions to meet individual's needs. This directory gives you a set of custom assertions that you can use in testing your data quailty in your project.
## Requirement
In order to test and run the custom dataform assertions, the person has to have have:

Dataform is a platform to manage data in Big Query and other data warehouses and
dataform perform the **T (Transformation)** in the **ELT** Pipeline. But before
any transformaiton can be done, we need to make sure our input data is valid.
Dataform has many built in data assertions such as uniqueKey, nonNull, and etc.
But you can also customize your dataform assertions to meet individual's needs.
This directory gives you a set of custom assertions that you can use in testing
your data quailty in your project.

## Requirement

In order to test and run the custom dataform assertions, the person has to have
have:

- A dataform project
- Credentials granting access to bigquery warehouse .df-credentials

## Usage
The custom assertions are in javascript files in the includes folder. To test them, simply refer to the function in your transformation query. For example, if a person wants to use the ```test_telephone_number_digits(colName)``` assertions in the ```phone_assertions.js``` file, the person only needs to refer in the config section of the transformation query like:

The custom assertions are in javascript files in the includes folder. To test
them, simply refer to the function in your transformation query. For example, if
a person wants to use the ```test_telephone_number_digits(colName)``` assertions
in the ```phone_assertions.js``` file, the person only needs to refer in the
config section of the transformation query like:

```
type: ....,
Expand All @@ -20,13 +36,23 @@ The custom assertions are in javascript files in the includes folder. To test th
]
}
```

## Unit Testing Custom Assertions
Unit testing your custom assertions is important because it helps you safeguard your ELT pipeline. In this project, we want to demonstrate an easy way for you to unit test your custom assertions. The workflow is simple and as listed below:

* Create a ```test_[NAME_YOUR_TEST]_assertions.js``` file in the ```definitions/tests/``` folder if your custom row assertions are not included in the existing template.
* In ```test_[NAME_YOUR_TEST]_assertions.js``` change the code snippets ```const {[YOUR_CUSTOM_ASSERTION]} = [CUSTOMER_ASSERTION_FILE_NAME];``` and change the test name ```const test_name = "[YOUR_TEST_NAME]";```.
* Add the testing data in the ```test_cases``` block with the following format ```"[INPUT]" : "[EXPECTED_OUTPUT]"```
* Finally supply your custom function name in the ```generatetest(...)``` function.
Unit testing your custom assertions is important because it helps you safeguard
your ELT pipeline. In this project, we want to demonstrate an easy way for you
to unit test your custom assertions. The workflow is simple and as listed below:

* Create a ```test_[NAME_YOUR_TEST]_assertions.js``` file in
the ```definitions/tests/``` folder if your custom row assertions are not
included in the existing template.
* In ```test_[NAME_YOUR_TEST]_assertions.js``` change the code
snippets ```const {[YOUR_CUSTOM_ASSERTION]} = [CUSTOMER_ASSERTION_FILE_NAME];```
and change the test name ```const test_name = "[YOUR_TEST_NAME]";```.
* Add the testing data in the ```test_cases``` block with the following
format ```"[INPUT]" : "[EXPECTED_OUTPUT]"```
* Finally supply your custom function name in the ```generatetest(...)```
function.

Below is an example of the ```test_[NAME_YOUR_TEST]_assertions.js``` file:

Expand Down Expand Up @@ -54,7 +80,12 @@ generate_test(test_name,
[YOUT_CUSTOM_ASSERTIONS]);
```

* Afterwards you can perform the unit test by running ```dataform test``` command.

* Afterwards you can perform the unit test by running ```dataform test```
command.

## Liscense
All solutions within this repository are provided under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) license. Please see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for more detailed terms and conditions.

All solutions within this repository are provided under
the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) license. Please
see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for more
detailed terms and conditions.
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@
"defaultSchema": "dataform",
"assertionSchema": "dataform_assertions",
"defaultDatabase": "YOUR_PROJECT_ID",
"useRunCache": false
"concurrentQueryLimit": 100
}
Original file line number Diff line number Diff line change
Expand Up @@ -16,25 +16,23 @@ const {generate_test} = unit_test_utils;
const {test_date} = date_assertions;
const test_name = "test_date_assertions";
const test_cases = {
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/
"1997/11/03" : "TRUE",
"2008/08/08" : "TRUE",
"1996/11/03" : "TRUE",
"2005/04/13" : "TRUE",
"1998/11/03" : "TRUE",
"2006/07/29" : "TRUE",
"2025/03/24" : "FALSE",
"1769/03/24" : "FALSE"
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/

"1997/11/03": "TRUE",
"2008/08/08": "TRUE",
"1996/11/03": "TRUE",
"2005/04/13": "TRUE",
"1998/11/03": "TRUE",
"2006/07/29": "TRUE",
"2025/03/24": "FALSE",
"1769/03/24": "FALSE"
};
// The function below will generate the necessary SQL to run unit tests.
generate_test(test_name,
test_cases,
test_date);
generate_test(test_name, test_cases, test_date);

Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,18 @@ const {generate_test} = unit_test_utils;
const {test_email_validity} = personal_info_assertions;
const test_name = "test_email_assertion_test";
const test_cases = {
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/
"ruinanliu@google.com" : "TRUE",
"[email protected]" : "TRUE",
"1736#$%.com" : "FALSE"
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/

"someone@google.com": "TRUE",
"[email protected]": "TRUE",
"1736#$%.com": "FALSE"
};
// The function below will generate the necessary SQL to run unit tests.
generate_test(test_name,
test_cases,
test_email_validity);
generate_test(test_name, test_cases, test_email_validity);

Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,17 @@ const {generate_test} = unit_test_utils;
const {test_gender_status} = personal_info_assertions;
const test_name = "test_gemder_assertions";
const test_cases = {
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/
"Female" : "TRUE",
"Male" : "TRUE",
"one" : "FALSE"
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/

"Female": "TRUE",
"Male": "TRUE",
"one": "FALSE"
};
// The function below will generate the necessary SQL to run unit tests.
generate_test(test_name,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,17 @@ const {generate_test} = unit_test_utils;
const {test_marital_status} = personal_info_assertions;
const test_name = "test_marital_status_assertions";
const test_cases = {
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/
"Married" : "TRUE",
"Divorced" : "TRUE",
"Happy" : "FALSE"
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/

"Married": "TRUE",
"Divorced": "TRUE",
"Happy": "FALSE"
};
// The function below will generate the necessary SQL to run unit tests.
generate_test(test_name,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,20 @@ const {generate_test} = unit_test_utils;
const {test_name} = personal_info_assertions;
const test_file_name = "test_personal_info_assertions";
const test_cases = {
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/
"Alan" : "TRUE",
"Bob" : "TRUE",
"Jack" : "TRUE",
"John" : "TRUE",
"y*(*&^^%$" : "FALSE",
"Alannnn" : "FALSE"
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/

"Alan": "TRUE",
"Bob": "TRUE",
"Jack": "TRUE",
"John": "TRUE",
"y*(*&^^%$": "FALSE",
"Alannnn": "FALSE"
};
// The function below will generate the necessary SQL to run unit tests.
generate_test(test_file_name,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,26 @@ const {generate_test} = unit_test_utils;
const {test_phone_number} = phone_assertions;
const test_name = "test_telephone_number_assertions";
const test_cases = {
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/
"8123456789" : "TRUE",
"1234567899" : "TRUE",
"5123456789" : "TRUE",
"4576839485" : "TRUE",
"2938475638" : "TRUE",
"7928374657" : "TRUE",
"7847563738" : "TRUE",
"6768907654" : "TRUE",
"1234567" : "FALSE",
"0123456789" : "FALSE",
"1111111111" : "FALSE",
"374657389a" : "FALSE"
/*
Provide your own testing data following the structure
<INPUT_TESTING_DATA> : "<EXPECTED OUTCOME>"
For example, if a testing data has the <EXPECTED OUTCOME> to be TRUE,
then the program will expect the custom data quality rules to also produce TRUE.
Otherwise it will show that the custom data quality rules failed.
*/

"8123456789": "TRUE",
"1234567899": "TRUE",
"5123456789": "TRUE",
"4576839485": "TRUE",
"2938475638": "TRUE",
"7928374657": "TRUE",
"7847563738": "TRUE",
"6768907654": "TRUE",
"1234567": "FALSE",
"0123456789": "FALSE",
"1111111111": "FALSE",
"374657389a": "FALSE"
};
// The function below will generate the necessary SQL to run unit tests.
generate_test(test_name,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,53 +15,53 @@
/*
This assertion checks whether input date is future
*/
function test_future_date(colName){
var result_query = `PARSE_DATE('%Y/%m/%d', ${colName}) < CURRENT_DATE()`
return result_query
function test_future_date(colName) {
var result_query = `PARSE_DATE('%Y/%m/%d', ${colName}) < CURRENT_DATE()`
return result_query
}

/*
This assertion checks whether the input birthdate is less than 100 yrs old
*/
function test_valid_years(colName){
var result_query = `DATE_DIFF(CURRENT_DATE(), PARSE_DATE('%Y/%m/%d', ${colName}), YEAR) < 100`
return result_query
function test_valid_years(colName) {
var result_query = `DATE_DIFF(CURRENT_DATE(), PARSE_DATE('%Y/%m/%d', ${colName}), YEAR) < 100`
return result_query
}

/*
This function checks whether the format of the date is correct
*/
function test_date_format(colName, date_format){
if(date_format == "yyyy/mm/dd"){
var result_query = `REGEXP_CONTAINS(${colName}, r'^[0-9]{4}[/][0-9]{2}[/][0-9]{2}$')`
return result_query
} else if (date_format == "yyyymmdd"){
var result_query = `REGEXP_CONTAINS(${colName}, r'^[0-9]{4}[0-9]{2}[0-9]{2}$')`
return result_query
}else{
return `FALSE`
}
function test_date_format(colName, date_format) {
if (date_format == "yyyy/mm/dd") {
var result_query = `REGEXP_CONTAINS(${colName}, r'^[0-9]{4}[/][0-9]{2}[/][0-9]{2}$')`
return result_query
} else if (date_format == "yyyymmdd") {
var result_query = `REGEXP_CONTAINS(${colName}, r'^[0-9]{4}[0-9]{2}[0-9]{2}$')`
return result_query
} else {
return `FALSE`
}
}

/*
This assertions combines custom assertions for testing future date and valid years
*/

function test_date(colName){
var result_query =
`IF(${colName} IS NOT NULL AND ${colName} <> "",` +
`IF(${test_date_format(colName, "yyyy/mm/dd")}, ` +
`IF(${test_future_date(colName)}, ` +
`${test_valid_years(colName, 100)}` +
`, FALSE),` +
`IF(${test_date_format(colName, "yyyymmdd")}, ` +
`TRUE, FALSE)), FALSE)`
return result_query
function test_date(colName) {
var result_query =
`IF(${colName} IS NOT NULL AND ${colName} <> "",` +
`IF(${test_date_format(colName, "yyyy/mm/dd")}, ` +
`IF(${test_future_date(colName)}, ` +
`${test_valid_years(colName, 100)}` +
`, FALSE),` +
`IF(${test_date_format(colName, "yyyymmdd")}, ` +
`TRUE, FALSE)), FALSE)`
return result_query
}

module.exports = {
test_future_date,
test_valid_years,
test_date_format,
test_date
test_future_date,
test_valid_years,
test_date_format,
test_date
}
Loading

0 comments on commit 7095eab

Please sign in to comment.