Configure Data Quality Policy
To create data quality policy, do the following:
Click Discover from the side menu bar. The Discover page is displayed.
Search for an asset by its name in the search bar.
On finding the asset, click and click Add Data Quality from the drop-down list. The Data Quality Policy Configuration page is displayed.
Specify a name for the data quality policy.
Specify a description for the data quality policy.
Click Show Sample Data to view the columns in the asset. Select the columns to which you would like to add a rule definition.
note
Only the selected columns will appear while trying to add a rule definition. If you leave all the columns in the asset unselected, then all columns will be visible while creating a rule definition.
Click any type of rule definition and specify values accordingly. The below table describes the different types of rule definitions you can add to the asset.
Rule Definition Description Null Values Checks for null values. Schema Match Checks if the column value matches the data type selected. Pattern Match Checks if the column value matches with the pattern provided by you in the input box. Enumerations Checks if the selected column values are present in the list provided. Tags Match Checks if the selected column values are present in the Tag provided. Range Match Check if a value falls within the selected range. Duplicate Check Checks for distinct values. Row Check Checks for number of rows. Business Rules Matches a set of rules that are configured. Custom Make a custom condition involving one or more columns for example C1 + C2 > C3. Click the toggle button to incrementally check the conditions by selecting one of the following incremental strategies and specify required values accordingly.
- Auto Increment ID based Every time a new row or rows of data are added to the database, they are allotted with an auto-incrementing numeric value. For instance upon adding 1000 rows of data to the database, each row is given an id starting from 1 to 1000. On execution of a policy on the database, the first 1000 rows are taken into consideration. Lets say you added another thousand rows of data to the database. An auto increment id based strategy is used to provide values from the last incremented value of the preceding set of rows, i.e., 1001 to 2000. On re-execution of the policy, only the new set of rows is executed.
- Partition based Incremental profile uses a date based partition column to determine the bounds for selecting data from the data source. Only useful if the data source supports partition.
- Incremental date based Incremental profile uses a monotonically increasing date column to determine the bounds for selecting data from the data source.
In order to execute a policy on a database with incremental date based strategy, you need to provide values for the following properties:
Field name Description Date Column Select the column name that is used to save dates and time-stamps. Date Format Provide a date format to save the date time-stamp. Example YYYY-MM-DD Advance Fields Timezone: If you are from a different timezone, select a timezone from the drop-down list. Minute Offset: If the selected timezone is offset by a few hours or minutes, then enter the number of minutes in the field provided. Round End Date On checking Round End Date, the last executed date value is rounded up by the frequency that is selected from the Frequency drop-down list for the next execution of the policy. For instance, at 12:20, the last data row was executed, and you checked Round End Date and selected Hourly frequency. Therefore, the next time the policy is executed, it will only be executed on the data created at 13:20 and there after. Click Next.
From the Data Quality Policy Definition, fill in the below properties:
- Define Scheduler: Based on the time selected, fill in the time properties. Enable the Start Schedule Runs toggle.
- Select an Alerting Channel: Select one or more of the following channels to receive alerts when the data quality policy has succeeded or when an error has occurred:
note
Click the Notify on drop-down button to select whether to receive notifications only on success, failure, or both success or failure of the rule execution.
Email: Email notifications is sent to your default email. Additional mail recipient can be added to also receive alerts.
Slack: Slack notifications is sent to your default Slack channel. Additional channels can be added to also receive alerts.
Webhook: Webhook notifications are sent every time a rule execution fails.- Click the Enable toggle button to start receiving alerts.
Click Save Data Quality Policy.