Data cleaning can be a tiresome, difficult and even boring business. Having to transform data from its raw form to something that can be analysed, is

Do Not Use a Chainsaw to Slice a Mango

submited by
Style Pass
2024-02-10 20:30:08

Data cleaning can be a tiresome, difficult and even boring business. Having to transform data from its raw form to something that can be analysed, is one of the most time-consuming tasks Data Scientists will ever undertake. Even well-known data cleaning tools like Alteryx, Data Ladder and WinPure, despite their best efforts, struggle to improve the aforementioned situation. What’s worse is that these tools, great as they are, tend to offer much more than what is actually needed by the everyday user.

As is the case with most Enterprise solutions, data cleaning tools tend to come with a lot of features bundled in but most of us will probably end up using less than 10% of them. It is almost as if these features are only included so that they can cover all bases… and that is a bad way to do business. Sure, a few people reading this post might have use for ALL those features but, in my experience, this is rarely the case.

This FOMO-like behaviour by software developers, of course, leads to high software procurement costs and budget-breaking after-sales training sessions that most people would happily do without. Besides, most users tend to have simple data cleaning needs e.g. removing duplicates from email lists, meaning that being forced to buy the aforementioned tools (with all their unnecessary extra features) is simply overkill.

Leave a Comment