How to shorten large union queries in Snowflake
Recently I was trying to reduce the size of a large union query in Snowflake where the only thing that changed between the queries was the schema name. If you work with SQL professionally, you’ve probably encountered something like this:
select
'COMPANY_NAME' as Company
, GL.ACTNUM as Account_Number
, ACT.DESCRIPTION as Account_Name
from COMPANY_NAME1.General_Ledger_Table GL
inner join COMPANY_NAME1.Account_Name_Table ACT
on ACT.ID = GL.ID
union all
...
select
'COMPANY_NAME' as Company
, GL.ACTNUM as Account_Number
, ACT.DESCRIPTION as Account_Name
from COMPANY_NAME17.General_Ledger_Table GL
inner join COMPANY_NAME17.Account_Name_Table ACT
on ACT.ID = GL.ID
The query that inspired this post was originally a SQL Server query, but with Fivetran, we’ve now pushed all of our data up to Snowflake. This felt like the perfect time for a redesign - turns out there is a much better way to do this in Snowflake!
Here is a basic example that accomplishes the same as above. Snowflake supports for loops, and by using the resultset
keyword in a for loop, you can return a table similar to a select
statement.
The code below loops over a list of company names (the results of the ‘organization’ cursor in the below example), unions the results together, and then returns the final results using a resultset
:
set ro = 'sysadmin';
set wh = 'my_wh';
set db = 'dev';
set sch = 'my_schema';
use role identifier($ro);
use warehouse identifier($wh);
use database identifier($db);
use schema identifier($sch);
declare
sql varchar;
final_sql varchar;
/* The results of the organization cursor
should return the database names you want to
loop over */
organization cursor for (select COMPANY_NAME from MY_TABLE);
my_results resultset;
begin
final_sql := '';
for company in organization do
sql := $$select 'COMPANY_NAME' as Company
, GL.ACTNUM as Account_Number
, ACT.DESCRIPTION as Account_Name
from COMPANY_NAME.General_Ledger_Table GL
inner join COMPANY_NAME.Account_Name_Table ACT
on ACT.ID = GL.ID
$$;
sql := replace(sql, 'COMPANY_NAME', company.COMPANY_NAME);
if(final_sql != '')then
final_sql := final_sql || ' union all ';
end if;
final_sql := final_sql || sql;
end for;
my_results := (execute immediate :final_sql);
return table(my_results);
end;
Note that this is a simplified example for the purposes of this blog. Using similar logic I was able to reduce a 300 line production query to ~40 lines of code!