Client

How to use the Dataquality Client with the dataquality python package in your Databricks Notebook?

Running a measurement with the Dataquality Client will result in two dataframes: one with row-level checks and one with aggregate checks. Additionaly, a measurement object is returned in which you can find the details of your measure run, including the criteria and the mapping that was used to execute the measure. Provide the criteria for your measurement in yaml format:

  1. Import the Dataquality Client from the dataquality package

import adq.client
  1. Run the Dataquality Client to start a measurement and retrieve the measurement results.

%%measure
table: samples.tpch.nation 
columns:
- name: n_nationkey
  datatype: integer
  checks:
  - type: in_range
    min: 0
    max: 10
  - type: custom
    expression: ${{ column }} > 0
  1. Retrieve the results from the previous cell by running the following command.

measurement = _
  1. Retrieve and query the measure dataframes

In the row-check dataframe, the “__cast_xxx” column refers to the datatype check in the yaml(in this case integer), and the “__check_xxx” columns refer to the checks configured in the yaml under checks.

%sql
select * from adq_row
select * from adq_agg
  1. Retrieve information from the measurement object

In the measurement object, you can find which “__check_xxx” column is mapped to the executed check in the measurement.

For example:

measurement.status.latest
measurement.result.column_mappings