Client
How to use the Dataquality Client with the dataquality python package in your Databricks Notebook?
Running a measurement with the Dataquality Client will result in two dataframes: one with row-level checks and one with aggregate checks. Additionaly, a measurement object is returned in which you can find the details of your measure run, including the criteria and the mapping that was used to execute the measure. Provide the criteria for your measurement in yaml format:
Import the Dataquality Client from the dataquality package
import adq.client
Run the Dataquality Client to start a measurement and retrieve the measurement results.
%%measure
table: samples.tpch.nation
columns:
- name: n_nationkey
datatype: integer
checks:
- type: in_range
min: 0
max: 10
- type: custom
expression: ${{ column }} > 0
Retrieve the results from the previous cell by running the following command.
measurement = _
Retrieve and query the measure dataframes
In the row-check dataframe, the “__cast_xxx” column refers to the datatype check in the yaml(in this case integer), and the “__check_xxx” columns refer to the checks configured in the yaml under checks.
%sql
select * from adq_row
select * from adq_agg
Retrieve information from the measurement object
In the measurement object, you can find which “__check_xxx” column is mapped to the executed check in the measurement.
For example:
measurement.status.latest
measurement.result.column_mappings