SmartDQ
Welcome to the documentation for SmartDQ, our Data Quality and Governance app. This resource is designed to provide you with comprehensive information on how to effectively utilize the app to ensure the integrity and reliability of your data. Here, you will find guidelines, best practices, and troubleshooting tips to help you navigate the features and functionalities of the app. Our goal is to empower you to maintain high standards of data quality and within your organization.
​
A video is posted below outlining demo of the app.

Download SmartDQ app from Snowflake Marketplace. Then follow below steps to complete installation process.
​
1. Create a network rule which will allow SmartDQ to communicate to the domain of your Azure OpenAI.
CREATE OR REPLACE NETWORK RULE SMARTDQ_APP.CFG.SMARTDQ_NETWORK_RULE MODE = EGRESS TYPE = HOST_PORT VALUE_LIST = ('<DOMAIN OF AZURE OPENAI>');
​
2. Create secrets which will store the access details of your Azure OpenAI.
Please follow the exact names.
​
CREATE OR REPLACE SECRET SMARTDQ_APP.CFG.SMARTDQ_OPENAI_API_KEY TYPE = GENERIC_STRING SECRET_STRING='<AZURE OPENAI API KEY>';
CREATE OR REPLACE SECRET SMARTDQ_APP.CFG.SMARTDQ_OPENAI_API_ENDPOINT TYPE = GENERIC_STRING SECRET_STRING='<AZURE OPENAI URL OF YOUR RESOURCE>'; CREATE OR REPLACE SECRET SMARTDQ_APP.CFG.SMARTDQ_OPENAI_API_VERSION TYPE = GENERIC_STRING SECRET_STRING='<OPENAI API VERSION>'; CREATE OR REPLACE SECRET SMARTDQ_APP.CFG.SMARTDQ_OPENAI_API_DEPLOYMENT TYPE = GENERIC_STRING SECRET_STRING='<YOUR DEPLOYMENT NAME>';
​​
3. Create integration - Execute the below code to create an integration to allow DQ to access your azure openai integration. Do not change the name of the integration or name of the secrets.
​
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION SMARTDQ_OPENAI_INTEGRATION ALLOWED_NETWORK_RULES = (SMARTDQ_NETWORK_RULE) ALLOWED_AUTHENTICATION_SECRETS = (SMARTDQ_APP.CFG.SMARTDQ_OPENAI_API_KEY,
SMARTDQ_APP.CFG.SMARTDQ_OPENAI_API_ENDPOINT,
SMARTDQ_APP.CFG.SMARTDQ_OPENAI_API_VERSION,
SMARTDQ_APP.CFG.SMARTDQ_OPENAI_API_DEPLOYMENT
) ENABLED=TRUE;
​
4. Grant SmartDQ access to your integration created in previous step
GRANT USAGE ON INTEGRATION SMARTDQ_OPENAI_INTEGRATION TO APPLICATION SMARTDQ_APP;
​
5. Access to your database / schema / table -
GRANT USAGE ON DATABASE <Your database name> TO APPLICATION SMARTDQ_APP;
GRANT USAGE ON ALL SCHEMAS IN DATABASE <your database name> TO APPLICATION SMARTDQ_APP;
GRANT SELECT ON ALL TABLES IN SCHEMA <Your database name>.<Your schema name> TO APPLICATION SMARTDQ_APP;
​
​
​
Given below is an example to show how you can integrate with SmartDQ with your ETL.
--Step 1 - the staging data we have.
select top 100 * from demodb.stg.demo_data_retail;
​
--Step 2 - go to the app, create rules for this table
--Step 3 - how to incorporate the DQ into daily pipelines.
--Step 3a - SP which pushes the data from stg to target for demo_retail.
create or replace procedure DEMODB.FNL.DEMO_DQ_PYTHON_SP(entity_name VARCHAR) returns TABLE()
language python
runtime_version = 3.8
packages =('snowflake-snowpark-python')
handler = 'main'
comment = 'DQ Process'
EXECUTE AS OWNER
as '# The Snowpark package is required for Python Worksheets. # You can add more packages by selecting them using the Packages control and then importing them. # Library declaration - depends on the requirement.
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col,lit,when, concat,sql_expr,length
from datetime import date,datetime
#import json
from snowflake.snowpark import Session, DataFrame
from snowflake.snowpark.types import StringType, TimestampType, IntegerType
from snowflake.snowpark.types import IntegerType, StringType, StructField,StructType #main function or any function of your choice.
def main(session: snowpark.Session, entity_name): start_datetime=datetime.now()
# Ideally you would have used staging table directly
sql_query="select * from DEMODB.STG.DEMO_DATA_RETAIL"
# calling the DQ procedure to find out data quality issues on staging data.
sql_query="call smartelloutilities.cfg.SMARTELLO_DQ_PROCESSOR(\'DEMODB.STG.DEMO_DATA_RETAIL\')"
target_table = "FNL.TRG_DEMODB_RETAIL"
dq_results_df = session.sql(sql_query) dq_results_row=dq_results_df.collect()
dq_results_first_row=dq_results_df.collect()[0] dq_col_list=dq_results_first_row.as_dict().keys() dq_final_df=session.createDataFrame(dq_results_row,schema=dq_col_list)
​
# Write the data after DQ check to target table/ dq_final_df.write.mode("overwrite").save_as_table(target_table)
# Return value will appear in the Results tab.
​
return dq_results_df';
# Run this SP--
call DEMODB.FNL.DEMO_DQ_PYTHON_SP('DEMODB.STG.DEMO_DATA_RETAIL');
#Show the results.
select top 100 * from demodb.fnl.TRG_DEMODB_RETAIL; ------------------------------------------------
My DQ Dashboard -
This is where you can see the DQ rules processed, number of records with error, warnings both graphically as well as number of runs as table. Dashboard gives an overall perspective of the health of your data. You have filters which you can apply to check for any data you want to check in particular.
Define DQ Rules tab -
Here you can define the rules by using simple English.
Lets say you want to choose a database demodb, a schema stg, then choose a table demo_data_retail which has few records for retail, Lets choose a column name.
You have ability to see if there are any DQ rules already defined for this column as you can define multiple rules for a single column.
Then you have the list of all tests which can be performed.
You can write a sentence directly or you can make it easier by clicking on the test which will provide a template sentence for you to configure.
Let me choose nullability, as you can see it gave me a sentence.
You have on fail config which means if the rule fails you want to flag it as an error or warning. Now lets click on Process instruction and it will configure the rule for you. As you can see its straightforward, all you need as person to use this is to have a business knowledge with sense for data.
SmartDocs -
The other aspect of governance journey is documentation, SmartDQ provides you GenAI based documentation which will read your view definition and provide documentation on the same. You will soon have the ability to generate documentation of the stored procedures, tables as well. This provides business friendly definitions, technical definition for IT teams, any optimization suggestion as well column level definitions.
This is just the start, we will have more ability to do below aspects by using natural language
1. Generate data model based on functional specifications
2. Generate test cases
3. Code quality
4. Generate ETL to pull data from various source systems without typing a single line of code.
5. Lineage from snowflake to PowerBI so that your organisations has end to end visibility.
