Skip to content

Commit

Permalink
Merge pull request #401 from percona/ps-9457
Browse files Browse the repository at this point in the history
PS-9457 Update data masking for 8.0
  • Loading branch information
patrickbirch authored Oct 16, 2024
2 parents 5644103 + fac8ae3 commit 48f0de1
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 12 deletions.
23 changes: 13 additions & 10 deletions docs/data-masking-comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,19 @@

The Data Masking component feature is in [tech preview](glossary.md#tech-preview).

Percona Server for MySQL 8.0.34 adds the data masking component. Either the component or the plugin extends the server's functionality, but the architecture of the plugin is different than the architecture of the component.
Percona Server for MySQL 8.0.34 introduces a data masking component that operates like a plugin but features a different architecture, enhancing the server’s functionality. Below are the main differences between the component and the plugin:

The main differences between the component and plugin are the following:
| Scenario | Description |
|------------------------|----------------------------------------------------------------------------------------------------------------|
| Character set support | The component allows multi-byte character sets for general-purpose masking functions, while the plugin does not.|
| Masking capabilities | The component can mask PAN, SSN, IBAN, UUID, Canada SIN, and UK NIN. In contrast, the plugin only handles PAN and SSN. |
| Data generation | The component generates random email, US phone, PAN, SSN, IBAN, UUID, Canada SIN, and UK NIN data, while the plugin generates fewer types: email, US phone, PAN, and SSN. |
| Dictionary storage | The component stores substitution dictionaries in the database, as opposed to the plugin, which keeps these dictionaries in a file. |
| Privilege management | The component uses the `MASKING_DICTIONARIES_ADMIN` privilege for dictionary management, while the plugin requires the `FILE` privilege. |
| Function handling | The component automatically registers or unregisters loadable functions during installation or uninstallation, while the plugin does not offer this automatic process. |

| Component | Plugin |
|--- | --- |
| Allows multi-byte character sets for the general-purpose masking functions | Does not allow multi-byte character sets |
| Supports masking PAN, SSN, IBAN, UUID, Canada SIN, and UK NIN | Supports masking PAN and SSN |
| Generates random email, US phone, PAN, SSN, IBAN, UUID, Canada SIN, and UK NIN data | Generates random email, US phone, PAN, and SSN |
| Supports persisting substitution dictionaries in the database | Supports persisting substitution dictionaries in a file |
| Supports a dedicated privilege MASKING_DICTIONARIES_ADMIN to manage the dictionaries | Requires FILE privilege |
| Automates the loadable-function registration or unregistration during either component installation or uninstallation | Does not automate the loadable-function registration or unregistration during either the plugin installation or uninstallation |
## Additional resources

[Install the data masking component](install-data-masking-component.md)

[Data masking component functions](data-masking-function-list.md)
2 changes: 1 addition & 1 deletion docs/data-masking-function-list.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The feature is in [tech preview](glossary.md#tech-preview).

## gen_blocklist(str, from_dictionary_name, to_dictionary_name)

Replaces one term in a dictionary with a term, selected at random, in another dictionary.
Replaces a term from one dictionary with a randomly selected term in another dictionary.

### Parameters

Expand Down
30 changes: 29 additions & 1 deletion docs/data-masking-overview.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
# Data masking overview

Data masking protects sensitive information by blocking unauthorized users from accessing the real data. This process creates altered versions of data for specific uses, like presentations, sales demonstrations, or software testing. The masked data keeps the same format as the original but contains changed values that cannot be reversed to reveal the true information. By making the data worthless to outsiders, masking helps organizations reduce their risk of data breaches or misuse. Companies can safely use masked data in various scenarios without exposing confidential details to unauthorized parties.

Data masking in Percona Server for MySQL is an essential tool for protecting sensitive information in various scenarios:

| Scenario | Description |
|---|----|
| Protecting data in development and testing | Developers and testers require realistic data to validate applications. By masking sensitive details, such as credit card numbers, Social Security numbers, and addresses, accurate user information can be safeguarded in non-production environments. |
| Compliance with data privacy regulations | Stringent laws like GDPR, HIPAA, and CCPA mandate the protection of personal data. Data masking enables the anonymization of personal information, facilitating its use for analysis and reporting while ensuring compliance with regulations. |
| Securing data when collaborating with external entities | Sharing data with third-party vendors demands the masking of sensitive information to prevent access to accurate personal details. |
| Supporting customer service and training | Customer support teams and trainers often require access to customer data. Through data masking, they can utilize realistic information without compromising actual customer details. |
| Facilitating data analysis and reporting | Analysts rely on access to data for generating reports and uncovering insights. By employing data masking techniques, they can work with realistic data sets without compromising privacy. |

These examples underscore how data masking serves as a crucial safeguard for sensitive information, allowing organizations to leverage their data effectively across diverse functions.

Data masking helps to limit the exposure of sensitive data by preventing access to non-authorized users. Masking provides a way to create a version of the data in situations, such as a presentation, sales demo, or software testing, when the real data should not be used. Data masking changes the data values while using the same format and cannot be reverse engineered. Masking reduces an organization's risk by making the data useless to an outside party.

## Data masking techniques
Expand All @@ -9,4 +23,18 @@ The common data masking techniques are the following:
| Technique | Description |
| --- | --- |
| Custom string | Replaces sensitive data with a specific string, such as a phone number with XXX-XXX-XXXX |
| Data substitution | Replaces sensitive data with realistic alternative values, such as city name with another name from a dictionary |
| Data substitution | Replaces sensitive data with realistic alternative values, such as city name with another name from a dictionary |

## Additional resources

Component:

[Install the data masking component](install-data-masking-component.md)

[Data masking component functions](data-masking-function-list.md)

Plugin:

[Install data masking plugin](install-data-masking-plugin.md)

[Data masking plugin funtions](data-masking-plugin-functions.md)

0 comments on commit 48f0de1

Please sign in to comment.