Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Dec 15, 2023
1 parent 0cc2618 commit 689e9c5
Show file tree
Hide file tree
Showing 5 changed files with 128 additions and 39 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
215b599b
c98a0529
141 changes: 115 additions & 26 deletions isp_eda.html

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions search.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"href": "isp_eda.html",
"title": "EDA on ISP",
"section": "",
"text": "Code\ntable_with_options <- function(x){DT::datatable(x, rownames = FALSE,\n extensions = 'Buttons', \n options = list(\n dom = 'Blfrtip',\n buttons = list('copy', 'print', list(\n extend = 'collection',\n buttons = c('csv', 'excel'),\n text = 'Download')\n )\n )\n )}\nWe are starting a first exploratory data analysis around ISPs in the FCC BDC data set. It should be kept in mind that an ISP can be multiple time in the same location (offering multiple service).\nThe query that generated this first pass at it is here:\nThe name of the column match FCC description. We are just adding cnt_services which is a count of services. In one locations you can have multiple services with different providers, technology and speeds provides (sometimes one providers can have multiple technology and/or multiple speeds).\nIt can be explored here:\nCode\nisp <- read.csv(\"data/isp_v2.csv\")\ncolnames(isp) <- c(\"brand_name\", \"state_abbr\", \"technology\",\n\"provider_id\", \"cnt_services\", \"cnt_total_locations\", \"cnt_block_presence\")\ntable_with_options(isp)"
"text": "Code\ntable_with_options <- function(x){DT::datatable(x, rownames = FALSE,\n extensions = 'Buttons', \n options = list(\n dom = 'Blfrtip',\n buttons = list('copy', 'print', list(\n extend = 'collection',\n buttons = c('csv', 'excel'),\n text = 'Download')\n )\n )\n )}\nWe are starting a first exploratory data analysis around ISPs in the FCC BDC data set. It should be kept in mind that an ISP can be multiple time in the same location (offering multiple service).\nThe query that generated this first pass at it is here:\nThe name of the column match FCC description.\nWe are adding:\n- cnt_services: count of services, in one location you can have multiple services with different providers, technology and speeds provides (sometimes one providers can have multiple technology and/or multiple speeds) - cnt_total_locations: count of locations covered by this specific set of brand_name, provider_id, state_abbr and technology (here if a provider declare providing different speed in that location it will not be counted) - cnt_block_presence: count of block were we meet the same set (brand_name, provider_id, state_abbr, technology)\nIt can be explored here:\nCode\nisp <- read.csv(\"data/isp_v2.csv\")\ncolnames(isp) <- c(\"brand_name\", \"state_abbr\", \"technology\",\n\"provider_id\", \"cnt_services\", \"cnt_total_locations\", \"cnt_block_presence\")\ntable_with_options(isp)"
},
{
"objectID": "isp_eda.html#numbers-for-context",
Expand All @@ -46,7 +46,7 @@
"href": "isp_eda.html#organize-a-bit-brand_name-and-provider_id",
"title": "EDA on ISP",
"section": "Organize a bit brand_name and provider_id",
"text": "Organize a bit brand_name and provider_id\n::: {.cell}\n\nCode\nisp_slim$brand_name <- tolower(isp_slim$brand_name)\n\n:::\nIt seems that we have:\n- brand name with and without capital letter (VERIZON, Verizon): if we tolower brand name we get 2414 unique brand name.\n\n\nCode\nisp_agg <- aggregate(isp_slim[\"cnt_services\"], isp_slim[\"brand_name\"], sum)\ntable_with_options(isp_agg[order(isp_agg$cnt_services, decreasing = TRUE), ])\n\n\n\n\n\n\n\nI have done a smaller .csv just with brand_name provider_id and cnt_services just to inspect what is the relation between them (1 to 1 / 1 to many). Outside of typos we should not have many to many relation.\nSELECT\n brand_name,\n provider_id,\n count(*) cnt_services\nFROM staging.june23\nGROUP BY brand_name, provider_id\nORDER BY cnt_services desc;\n\n\nCode\nisp_list <- read.csv(\"data/isp_prov.csv\")\nisp_list$ct <- 1 \nisp_list$name_id <- ave(isp_list$ct, isp_list$provider_id, FUN = sum)\n#View(isp_list[!is.na(isp_list$new_name),])\n\n\n\nTCT\n\n\nCode\ntable_with_options(isp_list[grepl(\"^TCT \", isp_list$brand_name) ,])\n\n\n\n\n\n\n\nCode\nisp_list[isp_list$provider_id == 410172,]\n\n\n brand_name provider_id cnt_services ct\n1584 The Tri-County Telephone Association, Inc. 410172 4260 1\n1957 Council Grove Telephone Company 410172 1992 1\n2025 TCT 410172 1681 1\n name_id\n1584 3\n1957 3\n2025 3\n\n\nTCT has some non-conventional names but nearly all of them has the same provider_id. An other “TCT” exist but with a different provider_id (410172) shared with two other brand name. I will assume that all of this TCT XXX are the same and provide them with a temporary name TCT_131366.\n\n\nCode\nisp_list$new_name[grepl(\"^TCT \", isp_list$brand_name)] <- \"TCT_131366\"\n\n\n\n\nWindstream\nWindtream present a similar case but the position of Windstream is not always the first word (Georgia Windstream, LLC). I went with the solution than TCT: Windstream_131413\n\n\nCode\ntable_with_options(isp_list[grepl(\"Windstream\", isp_list$brand_name) ,]) \n\n\n\n\n\n\n\nCode\nisp_list$new_name[grepl(\"Windstream\", isp_list$brand_name)] <- \"Windstream_131413\" \n\n\n\n\nAcentek/Acentek\nIt exists in both forms (tolower() will correct it) but it is also sharing it’s provider_id with some non-conventional “name”:\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 130008,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 130008] <- \"acentek\"\n\n\nFor now I will go with attributing them to Acentek but an other option will be to just remove them.\n\n\nMediacom - Bolt\n\n\nCode\ntable_with_options(isp_list[grepl(\"Mediacom|Bolt\", isp_list$brand_name) ,]) \n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 130804] <- \"mediacom_bolt\"\n\n\nIt appears that Bolt and Mediacom share the same provider_id and are together in some brand_name. I think we should regroup them but this definietly more domain knowledge than the one I have!\n\n\nComporium\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 131125,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 131125] <- \"comporium\"\n\n\nFor this one I am for renaming them “comporium”\n\n\nArmstrong\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 130071,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 130071] <- \"armstrong\"\n\n\nIdem label to “armstrong”?\n\n\nTEC\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 131311,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 131311] <- \"TEC\"\n\n\n\n\nPUD\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 290075,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 290075] <- \"PUD\"\n\n\nLabel to “PUD” ?\n\n\nGoNetspeed?\nI am unsure about that one. We can regroup the two GoNetspeed but we are lacking information for the rest.\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 131378,])\n\n\n\n\n\n\n\n\n\nMultiple brand_name with same provider_id\n\nprovider_id: 131413, same provider id than windstream but probably a different entity?\nprovider_id 130485: look very similar (SHLB in the name 8/10)\nlist of provider_id associated with multiple brand name:\n\n130074\n190233\n160127\n150277\n150266\n130183\n300192\n131362\n130877\n130757 (regroup Long lines?)"
"text": "Organize a bit brand_name and provider_id\n::: {.cell}\n\nCode\nisp_slim$brand_name <- tolower(isp_slim$brand_name)\n\n:::\nIt seems that we have:\n- brand name with and without capital letter (VERIZON, Verizon): if we tolower brand name we get 2414 unique brand name.\n\n\nCode\nisp_agg <- aggregate(isp_slim[\"cnt_services\"], isp_slim[\"brand_name\"], sum)\ntable_with_options(isp_agg[order(isp_agg$cnt_services, decreasing = TRUE), ])\n\n\n\n\n\n\n\nI have done a smaller .csv just with brand_name provider_id and cnt_services just to inspect what is the relation between them (1 to 1 / 1 to many). Outside of typos we should not have many to many relation.\nSELECT\n brand_name,\n provider_id,\n count(*) cnt_services\nFROM staging.june23\nGROUP BY brand_name, provider_id\nORDER BY cnt_services desc;\n\n\nCode\nisp_list <- read.csv(\"data/isp_prov.csv\")\nisp_list$ct <- 1 \nisp_list$name_id <- ave(isp_list$ct, isp_list$provider_id, FUN = sum)\n#View(isp_list[!is.na(isp_list$new_name),])\n\n\n\nTCT\n\n\nCode\ntable_with_options(isp_list[grepl(\"^TCT \", isp_list$brand_name) ,])\n\n\n\n\n\n\n\nCode\nisp_list[isp_list$provider_id == 410172,]\n\n\n brand_name provider_id cnt_services ct\n1584 The Tri-County Telephone Association, Inc. 410172 4260 1\n1957 Council Grove Telephone Company 410172 1992 1\n2025 TCT 410172 1681 1\n name_id\n1584 3\n1957 3\n2025 3\n\n\nTCT has some non-conventional names but nearly all of them has the same provider_id. An other “TCT” exist but with a different provider_id (410172) shared with two other brand name. I will assume that all of this TCT XXX are the same and provide them with a temporary name TCT_131366.\n\n\nCode\nisp_list$new_name[grepl(\"^TCT \", isp_list$brand_name)] <- \"TCT_131366\"\n\n\n\n\nWindstream\nWindtream present a similar case but the position of Windstream is not always the first word (Georgia Windstream, LLC). I went with the solution than TCT: Windstream_131413\n\n\nCode\ntable_with_options(isp_list[grepl(\"Windstream\", isp_list$brand_name) ,]) \n\n\n\n\n\n\n\nCode\nisp_list$new_name[grepl(\"Windstream\", isp_list$brand_name)] <- \"Windstream_131413\" \n\n\n\n\nAcentek/Acentek\nIt exists in both forms (tolower() will correct it) but it is also sharing it’s provider_id with some non-conventional “name”:\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 130008,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 130008] <- \"acentek\"\n\n\nFor now I will go with attributing them to “acentek”` but an other option will be to just remove them.\n\n\nMediacom - Bolt\n\n\nCode\ntable_with_options(isp_list[grepl(\"Mediacom|Bolt\", isp_list$brand_name) ,]) \n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 130804] <- \"mediacom_bolt\"\n\n\nIt appears that Bolt and Mediacom share the same provider_id and are together in some brand_name. I think we should regroup them but this definietly more domain knowledge than the one I have!\n\n\nComporium\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 131125,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 131125] <- \"comporium\"\n\n\nFor this one I am for renaming them “comporium”\n\n\nArmstrong\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 130071,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 130071] <- \"armstrong\"\n\n\nIdem label to “armstrong”?\n\n\nTEC\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 131311,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 131311] <- \"TEC\"\n\n\n\n\nPUD\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 290075,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 290075] <- \"PUD\"\n\n\nLabel to “PUD” ?\n\n\nGoNetspeed?\nI am unsure about that one. We can regroup the two GoNetspeed but we are lacking information for the rest.\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 131378,])\n\n\n\n\n\n\n\n\n\nMHTC\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 130862,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 130862] <- \"MHTC\"\n\n\nLabel MHTC?\n\n\nHardy\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 130588,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 130588] <- \"Hardy\"\n\n\nlabel Hardy?\n\n\nOmniTel\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 130484,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 130484] <- \"OmniTel\"\n\n\n\n\nHamilton\n\n\nCode\ntable_with_options(isp_list[isp_list$provider_id == 130887,])\n\n\n\n\n\n\n\nCode\nisp_list$new_name[isp_list$provider_id == 130887] <- \"Hamilton\"\n\n\nlabel Hamilton?\n\n\nMultiple brand_name with same provider_id\n\nprovider_id: 131413, same provider id than windstream but probably a different entity?\nprovider_id 130485: look very similar (SHLB in the name 8/10)\nlist of provider_id associated with multiple brand name:\n\n130074\n131378\n190233\n160127\n150277\n150266\n130183\n300192\n131362\n130877\n130757 (regroup Long lines?)\n130778 (Manti 5/6)\n330025\n130254 (2/5 altafiber)\n150334\n130906\n130425 (2/5 Lavaca)\n130206\n130453 (3/5 EFIBER)\n140092 (3/5 Twin Valley)\n140030\n130142"
},
{
"objectID": "isp_eda.html#todo-list",
Expand Down
8 changes: 4 additions & 4 deletions sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://github.com/ruralinnovation/proj-fcc-report/zero_dl_up.html</loc>
<lastmod>2023-12-15T01:20:25.375Z</lastmod>
<lastmod>2023-12-15T02:34:47.399Z</lastmod>
</url>
<url>
<loc>https://github.com/ruralinnovation/proj-fcc-report/isp_eda.html</loc>
<lastmod>2023-12-15T01:20:23.247Z</lastmod>
<lastmod>2023-12-15T02:34:45.191Z</lastmod>
</url>
<url>
<loc>https://github.com/ruralinnovation/proj-fcc-report/about.html</loc>
<lastmod>2023-12-15T01:20:20.795Z</lastmod>
<lastmod>2023-12-15T02:34:42.767Z</lastmod>
</url>
<url>
<loc>https://github.com/ruralinnovation/proj-fcc-report/index.html</loc>
<lastmod>2023-12-15T01:20:23.483Z</lastmod>
<lastmod>2023-12-15T02:34:45.443Z</lastmod>
</url>
</urlset>
Loading

0 comments on commit 689e9c5

Please sign in to comment.