-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathFCC_provider_list.qmd
174 lines (115 loc) · 5.17 KB
/
FCC_provider_list.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
---
title: "FCC providers list"
date: last-modified
format:
html:
code-fold: true
engine: knitr
---
```{r}
#| label: utility functions
source("R/table_with_options.R")
```
We have multiple sources from FCC to define a provider.
- Two are coming from the Broadband Data Collection:
* The first one from [FCC](https://us-fcc.app.box.com/v/bdcprovideridtable), accessed the 26-04-2024
* The second is comming from our pipelines of NBM (June 2023 release)
- One is comming from the Emergency Broadband Benefit[^ebb] and can be downloaded [here](https://www.fcc.gov/sites/default/files/ebb.provider.list_.xlsx) (accessed the 26-04-2024)
[^ebb]: The Emergency Broadband Benefit is the precursor of the Affordable Connectivity Program, source: [https://www.fcc.gov/broadbandbenefit](https://www.fcc.gov/broadbandbenefit)
::: {.callout-tip}
- [cori.data.fcc](https://github.com/ruralinnovation/fcc.utils) is a small R package that contains providers data and is used here
- Data is downlable using the download button
:::
# What does this data look like?
## Broadband Data Collection versions:
```{r}
#|label: read csv
library("cori.data.fcc")
isp <- cori.data.fcc::fcc_provider
```
`FCC provider` has 4 341 rows and 5 columns.
Those columns are:
- `Provider.Name`: Same than Brand Name?
- `Affiliation`: Same number than `Provider.ID`
```{r}
#| column: margin
#| label: Operation type
op_type <-as.data.frame(table(isp$operation_type, dnn = "Type"), responseName = "Nb." )
knitr::kable(op_type)
```
- `Operation.Type`: Only two options "ILEC" or "Non-ILEC"
- `FRN`: **F**CC **R**egistration **N**umber; "number of the entity that submited the data". It is supposed to be a string of 10 characters (with padding 0). Slighly more number than `Provider.Name` and seems to be the primary key.
- `Provider.ID`: An ID for `Affiliations`
```{r}
#|label: display ISP from BDC
table_with_options(isp)
```
# EDA / Analysis
## How many unique values do we have per column:
```{r}
#| label: count me some unique
#| column: margin
#| tbl-cap: Count of unique values / columns
sum_table <- apply(isp, 2, function(x)length(unique(x))) |>
as.data.frame()
names(sum_table) <- c("Count of unique values")
knitr::kable(sum_table)
```
We can confirm that:
- `FRN` here is unique for every row in this data set (our primary key)
- We have a bit less `Provider.Name` (4321 / 4341) than `FRN`
- We have the same number of `Affiliations` and `Provider.ID`
- The number of `Provider.ID`/`Affiliations` is smaller than FRN.
A quick check indicate that all `Provider.ID` are 6 characters.
`FRN` is also always 10 characters.
:::{.aside}
Hence the one with 7 characters in FCC NBM is probably an error.
:::
### What are the Provider Name that are sharing multiple `FRN`:
We probably have cases where companies have same name ("Farmers Mutual Telephone company") but we have probably company that have multiple FRN ("Grand Mound Cooperative Telephone Association"). Granted the low numbers I think we are fair to assume it does not matter to much.
```{r}
#| label: case where same name have different FRN
provider.name_by_FRN <- sapply(split(isp$frn, isp$provider_name), function(x) unique(x))
multiple.frn <- provider.name_by_FRN[lengths(provider.name_by_FRN)> 1]
provider.name_by_FRN.dat <- data.frame(provider_name = names(multiple.frn),
FRN = sapply(multiple.frn, toString)
)
table_with_options(provider.name_by_FRN.dat)
```
### How FRN are split between `Affiliations`/`Provider.ID`?
As expected most of of the relations FRN / provider are one to one (3135) while 387 have more than one FRN.
```{r}
#| label: spliting number of affiliation by frn
#| column: margin
#| tbl-cap: Affiliation per FRN
get_me_FRN_affiliations <- function(isp) {
FRN_by_affiliations <- sapply(split(isp[["frn"]], isp[["affiliation"]]),
function(x) length(unique(x)))
FRN_by_affiliations.dat <- data.frame(Affiliations = names(FRN_by_affiliations),
count_frn = FRN_by_affiliations)
return(FRN_by_affiliations.dat)
}
FRN_affiliations.dat <- get_me_FRN_affiliations(isp)
cnt_affiliation <- as.data.frame(table(FRN_affiliations.dat$count_frn),
responseName = "")
cnt_affiliation[["Num. of Affiliations / FRN"]] <-
ifelse(as.numeric(cnt_affiliation[[1]]) < 10, cnt_affiliation[[1]], "10+")
cnt_affiliation <- aggregate(cnt_affiliation[[2]],
list(cnt_affiliation[["Num. of Affiliations / FRN"]]),
sum)
names(cnt_affiliation) <- c("Number of Affiliations / FRN", "Count")
knitr::kable(cnt_affiliation[c(1, 3:9, 2 ),], row.names = FALSE)
```
You can explore those 387 affiliations here:
```{r}
#| label: 387 cases wth one affiliation and multiple FR
dat <- FRN_affiliations.dat[FRN_affiliations.dat[["count_frn"]] > 1 ,]
table_with_options(dat[order(dat[["count_frn"]], decreasing = TRUE), ])
```
# Data set with mail address and phone numbers
This data set is one that is coming from the ACP program.
```{r}
#| label: address and phone
isp_contact <- read.csv("data/bb-provider_list.csv")
table_with_options(isp_contact)
```