Executing a Reconciliation Policy

Workflow procedure

Getting the system ready for data cataloging

  1. Create an Oracle connection.
  2. Create a Hive connection.
  3. Create an Oracle data source.
  4. Create a Hive data source.
  5. Crawl the Oracle data source.
  6. Crawl the Hive data source.

Executing a reconciliation policy

  1. Create a new reconciliation policy
  2. Check the status of execution whether it passed or failed.

Creating a data catalog

To understand the workflow of how to check the quality of data in your data catalog, read the below example.

As a data steward you need to reconcile Oracle database with Hive database. In order to do this, you need a data catalog with Oracle and Hive metadata. To create this data catalog, do the following:

  1. Create a connection to the Oracle database.

    1. Click the Connections tab from the Data Source page. The Connections page is displayed.

    2. Click the Create Connection button. The Create Connections wizard is displayed.

    3. Click Oracle from the displayed connection types.

    4. Specify the following details required to create the Oracle connection:

      Connection PropertyDescriptionExample
      Connection NameSpecify a name for the connection.Oracle-Connection
      Connection DescriptionSpecify a description for the connection.
      JDBC URLSpecify the Java Database Connectivity (JDBC) URL is used to locate the database schema.
      • jdbc:oracle:<drivertype>:@<database>
      . Example of a driver type is 'Thin'.
      ("jdbc:oracle:thin:@myhost:1521:orcl", "jack", "tiger")
      JDBC UsernameSpecify the username to connect to the Oracle database.From the JDBC URl example, the username is "jack"".
      JDBC PasswordSpecify the username to connect to the Oracle database.From the JDBC URL example, the password in "tiger".
    5. Select an analytics service to crawl information and also check the quality of data in the source system.

  2. Create a connection to the Hive database.

    1. Click the Connections tab from the Data Source page. The Connections page is displayed.

    2. Click the Create Connection button. The Create Connections wizard is displayed.

    3. Click Hive from the displayed connection types.

    4. Specify the following details required to create the Oracle connection:

      Connection PropertyDescriptionExample
      Connection NameSpecify a name for the connection.
      Connection DescriptionSpecify a description for the connection.
      JDBC URLSpecify the Java Database Connectivity (JDBC) URL is used to locate the database schema.jdbc:hive2://<hostname>:<port>/<database name>
      JDBC UsernameSpecify the username to connect to the Hive database.
      JDBC PasswordSpecify the username to connect to the Hive database.
    5. Select an analytics service to crawl information and also check the quality of data in the source system.

  3. Create an Oracle data source.

    1. Click the Data Sources tab. The Data Source page is displayed.
    2. Click Create Data Source. The Create Data Source wizard is displayed.
    3. From the Select Connection drop-down list, select the Oracle connection you just created i.e. Oracle-Connection. Based on your selection, the data source type Oracle is automatically selected.
    4. Specify a name for your data source.
    5. Specify a description for the data source.
    6. Click the Define Schedule for Crawler checkbox to specify the time at which you want the crawlers to crawl information from the Oracle data source. Upon completion, the crawler creates or updates one or more tables in your Data Catalog.
  4. Create a Hive data source.

    1. Click the Data Sources tab. The Data Source page is displayed.
    2. Click Create Data Source. The Create Data Source wizard is displayed.
    3. From the Select Connection drop-down list, select the Hive connection you just created i.e. Hive-Connection. Based on your selection, the data source type Hive is automatically selected.
    4. Specify a name for your data source.
    5. Specify a description for the data source.
    6. Click the Define Schedule for Crawler checkbox to specify the time constraints for the crawlers to crawl information from the Hive data source. Upon completion, the crawler creates or updates one or more tables in your Data Catalog.
  5. Crawl Oracle metadata from the Oracle data store to acceldata Data Catalog.

  6. Crawl Hive metadata from the Hive data store to acceldata Data Catalog.

Configuring and executing a reconciliation policy

To create a reconciliation policy between Oracle and Hive, do the following:

  1. Click Data Quality from the left navigation and select policies. The Quality Rules page is displayed.
  2. Click the Create Reconciliation Policy button. The Create Reconciliation Policy page is displayed.
  3. Specify a table from the Oracle data source, in the Left Hand asset.
  4. Specify a table from the Hive data source, in the Right Hand asset.
  5. Select one of the following reconciliation matches: Data Equality, Profile Equality Match, and Hashed Data Equality
  6. In the Rule Definition panel, select the column from the Left Hand asset, the Operator, and a column from the Right Hand asset.
  7. Click Next. The Reconciliation Policy Definition panel is displayed.
  8. Specify a name and description for the policy.
  9. Click the Define Scheduler checkbox and schedule a time. Enable Start Schedule Runs. The reconciliation policy will run every time the scheduled time is met. For example, Every Day : 2nd hour : 30 minute.
  10. Select one or both of the following notification platforms to receive alerts: Jira and Email.
  11. Click save to save the policy. The policy can be viewed in the policy panel list.
  12. To manually execute the policy, click Execute from the Actions column.
  13. Click Executions, from the left navigation bar, to check if the execution has passed or failed. Click the name of the execution to view its details.