Asset Profiling

Workflow Procedure

Creating a Hive connection

Click the Connections tab from the Data Source page. The Connections page is displayed.
Click the Create Connection button. The Create Connection wizard is displayed.
Select Hive from the displayed connection types.

Specify the following details required to create the Hive connection:

Connection Property	Description	Example
Connection Name	Specify the name for the connection.	Host2_Hive
Description	Describe the purpose of the connection.	Hive connection to test workflow
JDBC URL	Specify the Java Database Connectivity (JDBC) URL which is used to locate the database schema. The URL uses the following format: `jdbc:hive2://<hostname>:<port>/<database name>`	`jdbc:mysql://host2:3600/hive?useUnicode=true&useUnicode=false`
JDBC Username	Specify the username to connect to the Hive database.	Hive
JDBC Password	Specify the password to connect to the Hive database.	Hive

Click Next to attach the analytics pipeline for executing the Spark jobs associated with the connection.
Select the Analytics Service.
Click Save.

The connection is now available in Torch Data Source page to be used.

Creating a data source

Click the Data Source tab from the Connections page. The Data Source page is displayed.
Click the Create Data Source button. The Create Data Source wizard is displayed.
Select the connection that you just created i.e. the Host2_Hive connection.
Specify a name for the data source.
Specify a description for the data source.
Check the "Define Schedule for Crawler" checkbox, to schedule the crawler to run at a regular interval and configure the time for it. Do not check the checkbox if you want to run the crawler manually.
Click the Create Data Source button.

The newly created data source will appear in the Data Source page. You will notice that the crawler state is inactive and the data source does not consist of any assets like databases, tables, or columns.

To start the crawler, do the following:

Click the overflow menu icon in the selected data source to start the crawlers.
Click Start Crawler.

Once the crawler is started, the status of the crawler in the data source changes to a green color and the number of databases, tables, and columns will appear in the data source tile.

Once the crawler is done crawling all the assets, the status of the crawler changes back to the inactive state.

Discovering and profiling an asset in a data source

Click the Data Source tile, to discover the assets in detail. The Discover page is displayed.

You can filter out the assets based on whether you wanted to view only the databases, tables, or columns.

To view the assets in detail, click on an asset. The asset details page for the selected asset is displayed.

The Details tab provides you with the metadata that was captured for the selected asset.
The Child Assets tab provides you with all the child assets within it. For example, if the selected asset is a database, then the child asset tab will consist of all the tables in it.

Click on a child asset to view more details on it. The Asset Details page for the child asset is displayed with the following tabs as described in the below table.

Tab	Description
Profile	If the asset has not been profiled yet, a prompt is displayed saying that the asset has not been profiled. Click the Profile Asset button.
Sample Data	The system displays a table with the first 100 rows of sample information of the selected asset in the data source.
Quality	The properties associated with the selected asset.
Details	Metadata captured by the system for the asset selected is displayed in the Details tab.
Child Assets	The child assets of the asset are displayed in this tab. Click the child asset to view its details

To profile the child asset, do the following

From the Profile tab, click the Profile Asset button.
Click Profiling from the left navigation menu bar. The Profiling page is displayed where you can view the status of the profiling job.

You can also run the profile job at a scheduled interval by doing the following:

Click the Schedular Configuration button. The Profile Schedular wizard is displayed.
Select the type of profile you want to run and provide a time interval.
Toggle the enable button to keep the schedular on.
Click the Save button.

To view the profiled information, do the following

Navigate to the Asset Details page of the asset you just profiled.
Click on the Profile tab. The retrieved information of the asset from the profile job that was executed is displayed in the form of different charts like bar or pie chart.

Types of profiling

You can profile the asset as many times as you want. The two types of profiles that can be run on an asset are:

Sample Profile: Runs the profile job on an asset for the first 1000 rows
Full Profile: Runs the profile job on an asset for the entire database

Workflow Procedure#

Creating a Hive connection#

Creating a data source#

To start the crawler, do the following:#

Discovering and profiling an asset in a data source#

To profile the child asset, do the following#

To view the profiled information, do the following#

Types of profiling#