I’ve been hearing the term “authentic data” more and more, these days. At the GeoBuiz summit last week, the term was uttered enough to hit a critical mass in my head, helping me realize its novelty.
Data that is collected from real-world events, interactions, or observations. Authentic data is *not* artificially generated or manipulated, by LLMs or other automated mechanisms.
The rise of “authentic data” illustrates a new concern: the emergence of “generated data”, data created not from observations but from machine learning and AI models. It’s a necessary distinction – one we didn’t have to make until recently – though its importance will vary by use case.
Data that is artificially created, often by AI models, and used to augment authentic datasets or simulate real-world scenarios.
There is a growing wariness of “generated data” among data analysts and enterprises. Just as there are fears that AI slop will poison the internet, rendering it difficult to use for both humans and machines, there is anxiety that generated data will undermine analyses and lead to poor decisions.