As well as being a good buzzword to mention at your next data event, data contracts are a good way of testing data values at boundary points. Ideally,

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-11-16 08:00:03

As well as being a good buzzword to mention at your next data event, data contracts are a good way of testing data values at boundary points. Ideally, all data would be usable when you recieve it, but you probably already have figured that's not always the case.

A data contract is an expression of what should be true of some data - we might want to check that the only columns that exist are first_name, last_name and rating, or we might want to check that rating is a number less than 10.

Wimsey then can execute tests for you in a couple of ways, validate - which will throw an error if tests fail, and otherwise pass back your dataframe - and test, which will give you a detailed run down of individual test success and fails.

Wimsey is very new! There's a lot more to come soon in the form of additional available data tests, better test coverage, performance improvements and friendly error messages. Once the fundamentals are polished, next up is developing a handy API for "data profiling" (generate minimal tests from a sample of data).

Wimsey is ready to mingle! If you have ideas or feedback, including additional tests you'd want to see, please feel free to raise an issue or submit a pull request.

Leave a Comment