The companies that I work for are operating in the Europe/Amsterdam timezone. During summer time, the Dutch clocks use +0100 GMT during summer and +0200 during winter. This was not always the case.
The Europe/Amsterdam timezone was defined in the year 1937 and changed its definition 3 times (1940: added day time saving; 1946: removed day time saving, 1977: added day time saving). Therefore, the timestamp 1900-01-01 00:00:00 simply is not defined. However, database vendors like Oracle and Snowflake support working with these timestamps. Be aware that the timezone-offset might be different than you might expect; 1900-01-01 00:00:00 +0019 in case of the Europe/Amsterdam timezone.
Data experts often assume that timezones are constant. Because of this false assumption, mistakes are easily made.
My name is Paul, I’m cofounder of Acheron. For the few years, I’m working on Snowflake to store all kinds of data. In this blog I will take you on my adventure how to store timely data on Snowflake.
- When are timezones important to store?
- How do end-users like to visualize time?
- How like data engineers to deal with time?
- How are the timestamp-formats different from each other? (TS_TZ vs. TS_NTZ)
- When should TS_LTZ be favoured over TS_TZ and visa versa
- In what timezone is Snowflake running?
- Making up the balance between TS_LTZ vs. TS_TZ
When are timezones important to store?
Preferences of end-users
Preferences of data engineers
How are the timestamp-formats different from each other?
In order to illustrate the difference, I reuse the response from David Gardner this thread. David takes the timestamp ‘2010-11-18 08:21:54.260’ without timezone and shows what happens when timezones switch between the time they are written to disk (INSERT_TIMEZONE) and the time they are queried (QUERY_TIMEZONE). Notice the following:
- All values in the column TS_NTZ are the same. This illustrates that TS_NTZ is ignorant for any timezone set.
- The rows annotated with (==) show that TS_LTZ and TS_TZ are the same if QUERY_TIMEZONE and INSERT_TIMEZONE are the same.
- Notice that the order of colors match in columns QUERY_TIMEZONE and TS_LTZ. This illustrates that the UTC-timestamp internally stored for TS_LTZ is computed on query time to the current timezone.
- otice that the order of colors match in columns INSERT_TIMEZONE and TS_TZ. When a timestamp is written to disk, the offset at writing-time is written to disk as well. Hence, this offset is returned on query-time.
When should TS_LTZ be favoured over TS_TZ, and visa versa
- Because TM_TZ only takes the offset into account on insert-time. This makes that TM_TZ is unaware of the daylight saving time. Hence, when a day is added to a timestamp of TM_TZ-type, 24 hours are added.
- TM_LTZ takes a different approach. I assumed it would add 24 hours on top of the UTC time it internally stores. However, as we can see in the picture, Snowflake is apparently aware of the daylight saving time, and adds one hour less. Since TS_LTZ doesn’t store the timestamp at INSERT_TIME, I’m quite curious how this mechanism works.
Query example illustrating shifted offsets
Risk of corrupted LTZ Timestamps
Randy: “The issue with using TM_LTZ in your tables is that it can result in non deterministic code. So if my local timezone is different from the service account I use in prod or different from my teammate’s, using TM_LTZ can quietly result in SQL that works but is quietly shifting timestamps around.“
Paul: “If I translate it in my own words, you say that it is an easy mistake to use another timezone somewhere in your snowflake landscape (hence the non-deterministic nature). Your first example is of multiple accounts (yes, you need to make sure that the timezones are set the same). I can also imagine that a timezone is overwritten accidentally on session level. If this would happen on read, I think it wouldn’t matter (you would see the correct time, but in another timezone). However, on write, you would store internally an incorrect UTC time.“
Randy: “Yeah, exactly“
- Corrupted LTZ Timestamps only impact the production account
- The only way that LTZ Timestamps can get corrupted is when the session timezone accidentally overwrites the account timezone. There are three scenario’s that I can think of that can cause this to happen.
- A Snowflake-developer can change the sessions timezone. However, they should not be privileged to load data into snowflake tables. Therefore, there is no risk that corrupted LTZ timestamps are loaded.
- The Snowflake ACCOUNTADMIN has the rights to change the timezone on the production Snowflake account. In case the timezone changes due to new insights, the ACCOUNTADMIN needs to be aware of that changing the timezone can cause new TS_LTZ timestamps to get corrupted. The workaround I recommend is to set the timezone explicitly in the session where data is loaded.
- An ELT-tool should be used for production loads. For each load, a new session is used. There is a risk that a timezone is set on session level. In order to make sure that this setting doesn’t have destructive effects, I strongly recommend to store the timezone that was used in the session.
- A JDBC/ODBC bug could set a timezone. This risk can be mitigated with proper testing.
In what timezone is Snowflake running?
Summary - TM_TZ vs. TM_LTZ
Final thoughts
‘1800-01-01 00:00:00’ (UTC) -> 1799-12-31 23:40:28.000 +0019 (LTZ) 1799-12-31 23:39:56.000 +0019 (TZ) FALSE
‘1800-07-01 00:00:00’ (UTC) -> 1800-06-30 23:40:28.000 +0019 (LTZ) 1800-06-30 23:39:56.000 +0019 (TZ) FALSE
‘1900-01-01 00:00:00’ (UTC) -> 1899-12-31 23:40:28.000 +0019 (LTZ) 1899-12-31 23:39:56.000 +0019 (TZ) FALSE
‘1900-01-01 00:59:59’ (UTC) -> 1900-01-01 00:40:27.000 +0019 (LTZ) 1900-01-01 00:39:55.000 +0019 (TZ) FALSE
‘1900-01-01 01:00:00’ (UTC) -> 1900-01-01 00:40:28.000 +0019 (LTZ) 1900-01-01 00:39:56.000 +0019 (TZ) FALSE
‘1900-01-01 01:59:59’ (UTC) -> 1900-01-01 01:40:27.000 +0019 (LTZ) 1900-01-01 01:39:55.000 +0019 (TZ) FALSE
‘1900-01-01 02:00:00’ (UTC) -> 1900-01-01 01:40:28.000 +0019 (LTZ) 1900-01-01 01:39:56.000 +0019 (TZ) FALSE
‘1915-01-01 00:00:00’ (UTC) -> 1914-12-31 23:40:28.000 +0019 (LTZ) 1914-12-31 23:39:56.000 +0019 (TZ) FALSE
‘1920-01-01 00:00:00’ (UTC) -> 1919-12-31 23:40:28.000 +0019 (LTZ) 1919-12-31 23:39:56.000 +0019 (TZ) FALSE
‘1925-01-01 00:00:00’ (UTC) -> 1924-12-31 23:40:28.000 +0019 (LTZ) 1924-12-31 23:39:56.000 +0019 (TZ) FALSE
‘1930-01-01 00:00:00’ (UTC) -> 1929-12-31 23:40:28.000 +0019 (LTZ) 1929-12-31 23:39:56.000 +0019 (TZ) FALSE
‘1935-01-01 00:00:00’ (UTC) -> 1934-12-31 23:40:28.000 +0019 (LTZ) 1934-12-31 23:39:56.000 +0019 (TZ) FALSE
‘1937-07-01 01:20:28’ (UTC) -> 1937-07-01 00:00:28.000 +0120 (LTZ) 1937-07-01 00:00:28.000 +0120 (TZ) TRUE
‘1940-07-01 00:00:00’ (UTC) -> 1940-06-30 22:00:00.000 +0200 (LTZ) 1940-06-30 22:00:00.000 +0200 (TZ) TRUE
‘1946-07-01 00:00:00’ (UTC) -> 1946-06-30 23:00:00.000 +0100 (LTZ) 1946-06-30 23:00:00.000 +0100 (TZ) TRUE
‘1975-07-01 00:00:00’ (UTC) -> 1975-06-30 23:00:00.000 +0100 (LTZ) 1975-06-30 23:00:00.000 +0100 (TZ) TRUE
‘1977-07-01 00:00:00’ (UTC) -> 1977-06-30 22:00:00.000 +0200 (LTZ) 1977-06-30 22:00:00.000 +0200 (TZ) TRUE
‘1990-01-01 00:00:00’ (UTC) -> 1989-12-31 23:00:00.000 +0100 (LTZ) 1989-12-31 23:00:00.000 +0100 (TZ) TRUE