How to be uncertain with dates

submited by
Style Pass
2021-07-20 18:30:17

ISO 8601 is "an internationally accepted way to represent dates and times using numbers". The ISO 8601 date format YYYY-MM-DD is simple and has the blessing of (among other international bodies) the World Wide Web Consortium (W3C). ISO 8601 dates can be built and parsed in most programming languages and can be easily validated as regex or with AWK.

ISO 8601 dates have a limited amount of flexibility. If you only know the year and month, you can write YYYY-MM, and if you only know the year, YYYY. The three forms can all be used in interval dates, too, as start-date/end-date. For example, 2019-08-16/2019-10 means from 16 August 2019 to sometime in October 2019.

In real-world databases, ISO 8601 corruptions can be found when year, month or day aren't known for certain, as with YYYY-MM-00, YYYY-00-00 and 0000-00-00. These appear where the software user (or developer) wants the date to fit the YYYY-MM-DD pattern, even though there is no day "00", no month "00" and no year "0000" in the Gregorian calendar on which ISO 8601 is based.

I also see date strings like 1998-05? and ?1998-05 in my data auditing. These are usually examples of domain schizophrenia; the field is being used to contain more kinds of information than it is supposed to contain. It would be better to have 1998-05 in the date field and a separate field for date notes, with a comment like The year is uncertain or The month is uncertain or The month and year are uncertain.

Leave a Comment