From 9160f85cccee627b573b08dfe0e62dc255583627 Mon Sep 17 00:00:00 2001 From: Olivier Leroy Date: Wed, 8 May 2024 15:03:42 -0400 Subject: [PATCH] reference repo and clean few typo --- ms-eda.qmd | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/ms-eda.qmd b/ms-eda.qmd index 2753618..f54da34 100644 --- a/ms-eda.qmd +++ b/ms-eda.qmd @@ -11,13 +11,13 @@ Why are we doing it? > Premises and Premise Counts -Currently we have FCC total count locations at census block level. +Currently we have FCC total count of locations (BSL) at the census block level. # MS Building Foot print ## Overview -MS used satellite data (from multiple campains/vintage) to get the footprint of buildings. +Microsoft (MS) used satellite data (from multiple campains/vintage) to get the footprint of buildings. They are classifying pixels that are supposed to be part of a buildings (segmentation using a neural network) and then convert pixels to a shape. @@ -25,9 +25,9 @@ It exists [worldwide](https://github.com/microsoft/GlobalMLBuildingFootprints) a - We have processed the 51 states -- Additional works will be required for PR and US territories (parts of workflow from the US states can be reused). +- Additional works will be required for Puero Rico and the US territories (parts of workflow from the US states can be reused). -The precision of the model vary depending on the region: the Carribean is at 92,2% and the US at 98.5%. The rate of false positive is 1% for the US and 1.8% for the Carribean. (Oceania was not provided) +The precision of their model vary depending on the region: the Carribean is at 92,2% and the US at 98.5%. The rate of false positive is 1% for the US and 1.8% for the Carribean. (Oceania was not provided) The license is [ODbL](https://github.com/Microsoft/USBuildingFootprints?tab=License-1-ov-file). @@ -35,7 +35,11 @@ The license is [ODbL](https://github.com/Microsoft/USBuildingFootprints?tab=Lice Buildings are shapes, BSL are points. We converted the buildings to single point (arbitrary: the first vertex of the shape) to lower the amout of data. Hence, now we have "buildings" summarized to points (lat/long). -We do not have access to lat/long of BSL (fabrics). Our assumptions is if a count at block match they are describing the same reality (we can't do the "on the ground verification"). +:::{.column-margin} +Our pipeline can be find [here](https://github.com/ruralinnovation/data-MS-buildings) (private repo) +::: + +We do not have access to lat/long of BSL (this is only part of the fabric data). Our assumptions is if a count at block match they are describing the same reality (we can't do the "on the ground verification"). The number of buildings reported for 51 states is: 130 099 920 @@ -53,11 +57,11 @@ While is the number of BSL is: 114 074 438 **What could be the use cases?** -Integrating those informations will have cost: information "overload", documentation about is needed, and depending +Integrating those informations will have cost: information "overload", documentation about it is needed We can: -- Provide the count per census block of MS building footprint and the user +- Provide the count per census block of MS building footprint and let the user decide on the quality of the data around their specific geography - Add the dot to the map @@ -69,7 +73,7 @@ How this data helps BEAD applicant? ## MS building footprint in VT -We can count those points per block and compare to the number of location than FCC is descriving. +We can count those points per block and compare to the number of location than FCC is describing. After that we build a small model that will provide either an estimate of BSL given MS footprint and how confidant the model is. @@ -116,10 +120,8 @@ lines(pot_val, conf_interval[, "upr"], col =" blue", lty = 2) ::: -- my model (just a linear one) is probably bad (log should correct that) - -- still strong relation +- Strong relation -- the model is "overconfidant", and the reality is more "spread" +- the model is "overconfidant", and the reality is more "spread" (what could possibly be the cause of it?) - VT MS has also **more** locations (285333, versus 352618) \ No newline at end of file