You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As each run ignores random_sample_selected anyway, so each run should produce the same output_difference_col and total_difference. (because data_col_ptb and total_ptb_data are identical each run).
Finally, it would be great if you could explain more in the documentation the purpose of direct_input_pertubation_strategy. Is it necessary at all to "zero-out" a column? Why?
It appears to me that just by orthogonalising other columns you already take away the effect of the subject column. Not clear to me why zero'ing out is required on top. Is it to be certain the effect of the column is not present?
Many thanks for the code by the way!
The text was updated successfully, but these errors were encountered:
@afiodorov thanks for taking a look at the code and for your feedback.
you are right. with the constant-zero and median strategies, the loop is redundant. I plan on separating the direct perturbation code from overall method to make it so that the run is independent. I am testing a local branch atm that handles this. Will push a fix up later this wk.
The purpose of direction perturbation. This is also a good question, and you are right, it is not explained in the thesis. We are working on suitable documentation to fully explain the overall issue.
For now here is the justification for including direction perturbation: if you have a function f(x_1, x_2, x_2). What, fairml does on a high level is to give you the dependence of f on each of the x_i. Now the dependence is calculated as direct influence + indirect influence. For direct influence, we generate a data transformation using any of the different direct perturbation strategies and then look at the impact of the black-box function on that perturbation. For the indirect influence, we use orthogonal transformation to generate those transformations.
Certainly, we could just use orthogonal transformation on all variables including, but wanted to give people flexibility to pick whatever function that they are interested in using for this task. Hope this helps explain the use of the direct-perturbation strategy requirement.
Nice method.
I am examining the code and the thesis more closely as it appears to be very useful.
I don't fully understand the point of perturbation strategy and it's not fully expanded on in the thesis.
I started reading the code and I spotted some bugs.
Firstly
https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L120
takes the strategy but ignores it, see:
https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L217
Also, I think that with constant_zero and median perturbation strategies this loop is redundant:
https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L205
As each run ignores
random_sample_selected
anyway, so each run should produce the sameoutput_difference_col
andtotal_difference
. (becausedata_col_ptb
andtotal_ptb_data
are identical each run).Finally, it would be great if you could explain more in the documentation the purpose of
direct_input_pertubation_strategy
. Is it necessary at all to "zero-out" a column? Why?It appears to me that just by orthogonalising other columns you already take away the effect of the subject column. Not clear to me why zero'ing out is required on top. Is it to be certain the effect of the column is not present?
Many thanks for the code by the way!
The text was updated successfully, but these errors were encountered: