Timestamp to string conversion (Redshift vs Snowflake) issue
Everyone faces some strange behavior in software products from time to time. Any software has bugs, so it does not surprise us much. However, in some case you facing not a clear but some kind of strange implementation that surprise you. I have such one from our Redshift to Snowflake migration:
select
cast(TIMESTAMP '2021-03-19 17:07:10.123321' as varchar),
cast(TIMESTAMP '2021-03-19 17:07:10.123000' as varchar),
cast(TIMESTAMP '2021-03-19 17:07:10.100000' as varchar),
cast(TIMESTAMP '2021-03-19 17:07:10.000000' as varchar);
Output (Redshift):
2021-03-19 17:07:10.123321,
2021-03-19 17:07:10.123,
2021-03-19 17:07:10.10,
2021-03-19 17:07:10
Output (Snowflake):
2021-03-19 17:07:10.123321,
2021-03-19 17:07:10.123000,
2021-03-19 17:07:10.100000,
2021-03-19 17:07:10.000000
At first glance, Redshift just cut trailing 0
. Do not hurry! Pay attention to the 3rd date 2021-03-19 17:07:10.100000
. Redshift will convert it really strange form 2021-03-19 17:07:10.10
with trailing zero. I could not explain why it happens in such a way. But it is what it is - Redshift converts data to string in a quite unexpected way.
You can say: Well, actually it is not a big deal. You could read all of the values in both cases, so there is no harm.
Correct, in terms of just string output, it is totally ok. But we were not so lucky: our data model heavily uses hash keys over several fields in a table. So to combine all of them to build hash code (actually we are using DBT function that do it for us) we are converting every field to string... I bet you already have an idea of what's going wrong with our hashes. Due to different algorithms for string conversion, we were getting unequal hashes over the same columns and the same values in them. Old Redshift hashes should be somehow converted to Snowflake base. We have created such a function to run it over Snowflake and convert all broken references build on top of hashes:
alter account set TIMESTAMP_NTZ_OUTPUT_FORMAT = 'YYYY-MM-DD HH24:MI:SS.FF6';
alter table public.some_table add column unique_id_redshift varchar;
update public.some_table c set uniq_id_redshift = md5(cast(
coalesce(cast(c.some_regular_column1 as varchar), '') || '-'
|| coalesce(
case
when LEN(RTRIM(cast(c.date_started as varchar), '0')) - POSITION('.', RTRIM(cast(c.date_started as varchar), '0')) > 1 then RTRIM(cast(c.date_started as varchar), '0')
when LEN(RTRIM(cast(c.date_started as varchar), '0')) - POSITION('.', RTRIM(cast(c.date_started as varchar), '0')) = 0 then coalesce(substr(RTRIM(cast(c.date_started as varchar), '0'), 0, LEN(RTRIM(cast(c.date_started as varchar), '0'))-1), '')
else RTRIM(cast(c.date_started as varchar), '0') || '0'
end
, '') || '-'
|| coalesce(cast(c.some_regular_column2 as varchar), '') || '-'
|| coalesce(cast(c.some_regular_column2 as varchar), '') as varchar));
update table_with_invalid_key_reference f set f.some_table_unique_id = c.snowflake_calculated_hash
from some_table c
where f.some_table_unique_id = c.unique_id_redshift;
...
alter table public.some_table drop column unique_id_redshift;