Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong result qualify in Tasty Bytes - Zero to Snowflake - Geospatial #121

Open
arcuellar88 opened this issue Sep 24, 2024 · 0 comments
Open

Comments

@arcuellar88
Copy link

I am following the tutorial Tasty Bytes - Zero to Snowflake - Geospatial with my own data and I am not getting the right answer in Step 4. I am missing one comparison. I have 5 places and I am only getting 9 distances. I should get 10.

https://github.com/Snowflake-Labs/sf-samples/blame/1ca85b477fa4beb3131c41cc1cfb6286df8a1c20/samples/tasty_bytes/FY25_Zero_To_Snowflake/tb_geospatial.sql#L157

Qualify window function filter doesn't work for the first element as Lag is null (something <> null is not true)

WITH _top_10_locations AS
(
SELECT TOP 10
location_id,
ST_MAKEPOINT(longitude, latitude) AS geo_point,
SUM(price) AS total_sales_usd
FROM analytics.orders_v
WHERE primary_city = 'Paris'
GROUP BY location_id, latitude, longitude
ORDER BY total_sales_usd DESC
)
SELECT
a.location_id,
b.location_id,
ROUND(ST_DISTANCE(a.geo_point, b.geo_point)/1609,2) AS geography_distance_miles,
ROUND(ST_DISTANCE(a.geo_point, b.geo_point)/1000,2) AS geography_distance_kilometers
FROM _top_10_locations a
JOIN _top_10_locations b
ON a.location_id <> b.location_id -- avoid calculating the distance between the point itself
QUALIFY a.location_id <> LAG(b.location_id) OVER (ORDER BY geography_distance_miles) -- avoid duplicate: a to b, b to a distances
ORDER BY geography_distance_miles;

My suggestion:

WITH _top_10_locations AS
(
SELECT TOP 10
location_id,
ST_MAKEPOINT(longitude, latitude) AS geo_point,
SUM(price) AS total_sales_usd
FROM analytics.orders_v
WHERE primary_city = 'Paris'
GROUP BY location_id, latitude, longitude
ORDER BY total_sales_usd DESC
)
SELECT
a.location_id,
b.location_id,
ROUND(ST_DISTANCE(a.geo_point, b.geo_point)/1609,2) AS geography_distance_miles,
ROUND(ST_DISTANCE(a.geo_point, b.geo_point)/1000,2) AS geography_distance_kilometers
FROM _top_10_locations a
JOIN _top_10_locations b
ON a.location_id < b.location_id -- avoid calculating the distance between the point itself. Also avoid duplicate: a to b, b to a
ORDER BY geography_distance_miles;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant