Devs and Time Clock Rollovers

Let’s party like it’s 1999!

Photo by Markus Spiske on Unsplash

Devs and Time Clock Rollovers

Let’s party like it’s 1999!

You have probably written programs with a bug related to the calculation of time (I certainly have!). For example, you need to work out the number of days there is between two dates, such as between 29 November and 20 December, and calculate:

Days = Date (20,12) — Date (29,11)

and we get 21 days. And you remember that we will have a problem if we calculate over the New Year:

Days = Date (4,1) — Date (29,11)

So we add in the year, and we can determine the right number of days. But many developers just look at the last two digits in the year, and then perform their calculation, so that the days between 1/1/16 and 1/1/19 is three times the number of days in each year. But what happens when we roll over from 2019 to 2020?

Let’s party like it’s 1999!

So, at the start of the 21st Century, we had a problem in that developers had often using the last two digits of the year to perform their time calculations. This caused the Millennium Bug, and where many worried that control systems would crash when the 21st Century arrived. Luckily, the bugs had been mostly identified and it passed without many problems. Developers were told that they should always use system time calls which included the year in their calculations.

And so, this week, I received this:

And it basically sad that Splunk had a little bug around the turn-over of the date from 2019 to 2020. It looks like the code had just looked — as with the Millennium Bug — at the last two digits of the year, and that search functions could give the wrong result (and even where data could be deleted).

It relates to the processing of the datatime.xml file (see below), and which is used regular expression in order to parse incoming data. Unfortunately it only uses two digits of the data, and will not work correctly when the date rolls-over from 2019 to 2020. It is thought that it may either create an exception on the processing of data, or put the wrong timestamp. It is thought that the bug will start occurring on 13 September 2020 at 12:26 (UTC). There is an update to datatime.xml (and which is stored in the etc folder) here:

http://download.splunk.com/products/ingest2020/datetime.zip

The bug is contained in here:

<!--   Version 4.0 -->
<!-- datetime.xml -->
<!-- This file contains the general formulas for parsing date/time formats. -->
<datetime>
<datetime>
<define name="_year" extract="year">
<text><![CDATA[(20\d\d|19\d\d|[901]\d(?!\d))]]></text>
</define>
..
</datetime>

and the updated version is:

<define extract="year" name="_year">
<text>
<![CDATA[(20\d\d|19\d\d|[9012]\d(?!\d))]]>
</text>
</define>
..
</datetime>

With this “[9012]” identifies that the year can contain a “2”, and which is missing in the previous version. If you are interested, here’s the updates in the regular expressions:

Comparing files datetime.xml and D:\DATETIME.XML
***** datetime.xml
<define name="_year" extract="year">
<text><![CDATA[(20\d\d|19\d\d|[901]\d(?!\d))]]></text>
</define>
***** D:\DATETIME.XML
<define name="_year" extract="year">
<text><![CDATA[(20\d\d|19\d\d|[9012]\d(?!\d))]]></text>
</define>
*****
***** datetime.xml
<define name="_masheddate" extract="year, month, day">
<text><![CDATA[(?:^|source::).*?(?<!\d|\d\.|-)(?:20)?([901]\d)(0\d|1[012])([012]\d|3[01])(?!\d|-| {2,})]]></text>
</define>
***** D:\DATETIME.XML
<define name="_masheddate" extract="year, month, day">
<text><![CDATA[(?:^|source::).*?(?<!\d|\d\.|-)(?:20)?([9012]\d)(0\d|1[012])([012]\d|3[01])(?!\d|-| {2,})]]></text>
</define>
*****
***** datetime.xml
<define name="_masheddate2" extract="month, day, year">
<text><![CDATA[(?:^|source::).*?(?<!\d|\d\.)(0\d|1[012])([012]\d|3[01])(?:20)?([901]\d)(?!\d| {2,})]]></text>
</define>
***** D:\DATETIME.XML
<define name="_masheddate2" extract="month, day, year">
<text><![CDATA[(?:^|source::).*?(?<!\d|\d\.)(0\d|1[012])([012]\d|3[01])(?:20)?([9012]\d)(?!\d| {2,})]]></text>
</define>
*****
***** datetime.xml
<define name="_utcepoch" extract="utcepoch, subsecond">
<!-- update regex before '2017' -->
<text><![CDATA[((?<=^|[\s#,"=\(\[\|\{])(?:1[012345]|9)\d{8}|^@[\da-fA-F]{16,24})(?:\.?(\d{1,6}))?(?![\d\(])]]></text>
</define>
***** D:\DATETIME.XML
<define name="_utcepoch" extract="utcepoch, subsecond">
<!-- update regex before '2023' -->
<text><![CDATA[((?<=^|[\s#,"=\(\[\|\{])(?:1[0123456]|9)\d{8}|^@[\da-fA-F]{16,24})(?:\.?(\d{1,6}))?(?![\d\(])]]></text>
</define>
*****

Conclusions

Luckily, Splunk have a strong bug fix programme, and hopefully things will all be patched in good time, otherwise, planes could fall out of the sky, as Splunk is used in some many industries now. BTW, we love using Splunk for Big Data analysis, and here’s a tutorial:

If you want a login, just ask.