Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds test for collinearity to the econometrics menu #5018

Merged
merged 16 commits into from
May 25, 2023

Conversation

northern-64bit
Copy link
Contributor

Description

Adds vif command, which tests collinearity by calculating the variance inflation factor. It also rearranges the commands in the menu by adding a Assumption Testing section.

How has this been tested?

By using it in the SDK and in the terminal.

Testing it on data with known collinearity. Use for instance the following test case:

df = pd.DataFrame(
    {'a': [1, 1, 2, 3, 4],
     'b': [2, 2, 3, 2, 1],
     'c': [4, 6, 7, 8, 9],
     'd': [4, 3, 4, 5, 4]}
)

The result should be and are:

   Values
a   22.95
b    3.00
c   12.95
d    3.00
  • Make sure affected commands still run in terminal
  • Ensure the SDK still works
  • Check any related reports

Checklist:

Others

  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.

@reviewpad reviewpad bot added the feat S Small T-Shirt size Feature label May 13, 2023
@JerBouma
Copy link
Contributor

Just calling vif returns error.

(🦋) /econometrics/ $ vif

Error: 'NoneType' object is not iterable


(🦋) /econometrics/ $

Autocomplete is missing for columns, shouldn't be just when I type vif.

(🦋) /econometrics/ $ vif --columns

usage: vif [-c COLUMNS] [-h] [--export EXPORT] [--sheet-name SHEET_NAME [SHEET_NAME ...]]
vif: error: argument -c/--columns: expected one argument

(🦋) /econometrics/ $ vif --columns

More importantly, I think the default should be to show all columns. The purpose of VIF is to see if other variables seem to be able to estimate the value of the variable. If you only include a subset, the results will definitely vary where, in most cases, you would like to use your entire dataset. Then, with --columns you can specify which ones you don't and do want if you do wish to mix this up.

image

This result (from load --file wage_panel -a wp) indicates that most variables are not highly correlated (close to 1). However experience and experience squared do seem to be somewhat explained by other variables. As you look into the dataset, this actually makes perfect sense given that this is panel data, meaning it tracks multiple individuals over time, and we have a year column. So you'd see that if year goes up by 1, experience goes up by 1 too meaning from the year column you could have inferred experience too.

@jmaslek
Copy link
Collaborator

jmaslek commented May 23, 2023

Issue with error handling when only one dataset input:

2023 May 23, 18:42 (🐛) /econometrics/ $ vif k0.high

['k0.high']

@jmaslek
Copy link
Collaborator

jmaslek commented May 23, 2023

Looks like a rogue print:

2023 May 23, 18:44 (🐛) /econometrics/ $ vif -d k0.high,k0.low

['k0.high', 'k0.low']

Copy link
Collaborator

@jmaslek jmaslek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for making all the changes!

@jmaslek jmaslek added this pull request to the merge queue May 25, 2023
Merged via the queue into OpenBB-finance:develop with commit 8e21b34 May 25, 2023
@northern-64bit northern-64bit deleted the feature/collinearity branch May 29, 2023 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat S Small T-Shirt size Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants