-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make tribal overlap set score N #2004
Changes from 4 commits
69b7072
a4083f3
bfd7b08
7d47cdb
4a5d1ef
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,6 +48,7 @@ class TribalOverlapETL(ExtractTransformLoad): | |
ANNETTE_ISLAND_TRIBAL_NAME = "Annette Island LAR" | ||
|
||
CRS_INTEGER = 3857 | ||
TRIBAL_OVERLAP_CUTOFF = 0.995 # Percentage of overlap that rounds to 100% | ||
|
||
# Define these for easy code completion | ||
def __init__(self): | ||
|
@@ -58,6 +59,7 @@ def __init__(self): | |
field_names.PERCENT_OF_TRIBAL_AREA_IN_TRACT, | ||
field_names.NAMES_OF_TRIBAL_AREAS_IN_TRACT, | ||
field_names.PERCENT_OF_TRIBAL_AREA_IN_TRACT_DISPLAY, | ||
field_names.IS_TRIBAL_DAC, | ||
] | ||
|
||
self.OVERALL_TRIBAL_COUNT = "OVERALL_TRIBAL_COUNT" | ||
|
@@ -72,16 +74,17 @@ def _create_string_from_list(series: pd.Series) -> str: | |
str_list = sorted(str_list) | ||
return ", ".join(str_list) | ||
|
||
@staticmethod | ||
@classmethod | ||
def _adjust_percentage_for_frontend( | ||
cls, | ||
percentage_float: float, | ||
) -> Optional[float]: | ||
"""Round numbers very close to 0 to 0 and very close to 1 to 1 for display""" | ||
if percentage_float is None: | ||
return None | ||
if percentage_float < 0.01: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we have the same cutoff type here? like 0.005 in the vein of 0.995? There's a little bit of asymmetry? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The decisions were made at different times, so good catch. @BethMattern thoughts on changing the cutoff for "very close to 0%" smaller (.5% instead of 1%)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we change this, it'll have both a backend and frontend change, and it's not obvious to me we should, so I've pulled it into its own ticket to explicitly make the decision: #2006 |
||
return 0.0 | ||
if percentage_float > 0.9995: | ||
if percentage_float > cls.TRIBAL_OVERLAP_CUTOFF: | ||
return 1.0 | ||
|
||
return percentage_float | ||
|
@@ -246,6 +249,12 @@ def transform(self) -> None: | |
field_names.COUNT_OF_TRIBAL_AREAS_IN_TRACT_CONUS | ||
] = None | ||
|
||
merged_output_df[field_names.IS_TRIBAL_DAC] = np.where( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it might be a bit cleaner to do this with a fillna potentially?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh that is simpler, good call. |
||
merged_output_df[field_names.PERCENT_OF_TRIBAL_AREA_IN_TRACT] | ||
> self.TRIBAL_OVERLAP_CUTOFF, | ||
True, | ||
False, | ||
) | ||
# The very final thing we want to do is produce a string for the front end to show | ||
# We do this here so that all of the logic is included | ||
merged_output_df[ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -997,6 +997,33 @@ def _mark_donut_hole_tracts(self) -> pd.DataFrame: | |
] | ||
) | ||
|
||
def _mark_tribal_dacs(self) -> None: | ||
"""Per the October 7th compromise (#1988), | ||
tracts that are approx 100% tribal are Score N communities. | ||
""" | ||
self.df[field_names.SCORE_N_COMMUNITIES] = np.where( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. might also be easier / make a bit more sense to just take the MAX of those two columns? and why overwrite SCORE_N_COMMUNITIES? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's funny, doing I wanted to overwrite SCORE_N_COMMUNITIES because i wanted the tribes to be taken into account in calculating donut holes slightly later. BUT, if that's a logical mistake lemme know and I'll just be glad you caught it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no not a logical mistake at all! this makes sense to me! i am willy nilly with types |
||
self.df[field_names.IS_TRIBAL_DAC], | ||
True, | ||
self.df[field_names.SCORE_N_COMMUNITIES], | ||
) | ||
|
||
def _get_percent_of_tract_that_is_dac(self) -> float: | ||
"""Per the October 7th compromise (#1988), | ||
tracts can be partially DACs if some portion of the tract is tribal land. | ||
|
||
Rules are as follows: | ||
If a tract is a SCORE_N_COMMUNITY, it is 100% a DAC | ||
If a tract is not, but contains tribal land, the percent that is tribal land is a DAC. | ||
""" | ||
tribal_percent = self.df[ | ||
field_names.PERCENT_OF_TRIBAL_AREA_IN_TRACT | ||
].fillna(0.0) | ||
return np.where( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not just...
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above --- where is more obvious to me than max here, since in my mind a bool in not a number. But I do like that line. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. totally valid! |
||
self.df[field_names.FINAL_SCORE_N_BOOLEAN], | ||
1.0, | ||
tribal_percent, | ||
) | ||
|
||
def add_columns(self) -> pd.DataFrame: | ||
logger.info("Adding Score Narhwal") | ||
self.df[field_names.THRESHOLD_COUNT] = 0 | ||
|
@@ -1031,10 +1058,15 @@ def add_columns(self) -> pd.DataFrame: | |
] | ||
self.df[field_names.CATEGORY_COUNT] = self.df[factors].sum(axis=1) | ||
self.df[field_names.SCORE_N_COMMUNITIES] = self.df[factors].any(axis=1) | ||
self._mark_tribal_dacs() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is nice and clean |
||
self.df[ | ||
field_names.SCORE_N_COMMUNITIES | ||
+ field_names.PERCENTILE_FIELD_SUFFIX | ||
] = self.df[field_names.SCORE_N_COMMUNITIES].astype(int) | ||
|
||
self._mark_donut_hole_tracts() | ||
self.df[ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a quick style comment -- why not modify df in place here as per the mark functions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a question NOT a suggestion to be clear There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think because I noticed a pattern of marking tracts by mutating in place (e.g., with the donut hole calculation) but getting scores and things like that and assigning them (e.g., There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that makes sense to me! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not gonna lie, it's a pretty weak signal, so it could go either way reasonably. |
||
field_names.PERCENT_OF_TRACT_IS_DAC | ||
] = self._get_percent_of_tract_that_is_dac() | ||
|
||
return self.df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make clear this is area-based?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emma-nechamkin I asked Beth in the ticket; it's good by me, but I leave the actual words to Beth and others