add 'dqx' as test engine for DQT tests within datacontract#1069
add 'dqx' as test engine for DQT tests within datacontract#1069gkoenig wants to merge 3 commits into
Conversation
|
Commit 0c53e35 adds the DQX test_engine also for the CLI mode of datacontract-cli.
the DataContract which I used for this test, you'll find here: |
|
We need to make a strategic decision, if we want to support datasource-specific engines. |
The performance of native data quality engines should primarily be supported and optimized by the data platform providers themselves. From my personal experience, I have encountered limitations and issues with Snowflake DMFs. Having the ability to choose alternative engines is therefore important—not only from a technical standpoint, but also by principle, as it aligns with anti‑trust and vendor‑neutrality considerations. |
Thanks for your feedback Jochen. |
|
@gkoenig Although I'm not involved with the team maintaining CLI, I find your PR quite useful. Of course depending on what @jochenchrist will decide. However I think that there is a small problem in the code or the approach you took. You probably need to exclude specific dataset level checks which user might configure in the contract. For example foreign key check as it depends on data from another DataFrame. And potentially any validation check that is not from the standard set of the checks; for example how would you deal with custom check that was registered with DQX? As an alternative the limitations should be described in readme.md. Overall, this might be super useful for the people who use DQX. |
|
We are considering a dqx sync command to convert and sync ODCS to a dqx project, so that dqx would run these tests directly. Of course this should include quality checks with custom dqx tests What do you think? |
|
Hi, can you elaborate little bit more on how the sync command would work? |
thanks for your feedback @pocelka . You are absolutely right, there will be some fixes/extensions/etc to be worked on....as the current state of the PR was more a "kickstart" to figure out if it will make it to main ;) |
thanks for your reply @jochenchrist What do you mean by "sync ODCS to a dqx project" ? ...creating an isolated small python project which gets the ODCS contract and uses the dqx library to run the tests? |
Yes, idea is to convert ODCS to dbx tests and let dbx execute the tests. Should also work with existing databricks projects. How do your Databricks data product code structure look like? |
ok, got it, thanks @jochenchrist |
|
@gkoenig perhaps have a look at the dbt integration, with the new dbt sync command (which will be improved along the way as well). But the idea could be to simply have "$ datacontract dqx sync", which would make sure that all tests defined within the data cotnract are available as dqx tests, but there can also be additional dqx tests as well (they will be merged together), run all the dqx tests, and convert the test results in the test result format of the data contract cli, and possibly report them back to a system that understands that format. |

Add an alternative test engine to Soda, by introducing Databricks' DQX framework ... which obviously requires Databricks as server_type. Since DQX is able to read datacontracts in ODCS format natively, no conversion is required.
Initial extension offers DQX as test_engine in programmatic mode.