Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtypes for the scoring function #174

Open
ryanma9629 opened this issue Jun 26, 2023 · 2 comments
Open

dtypes for the scoring function #174

ryanma9629 opened this issue Jun 26, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@ryanma9629
Copy link

When generating the Python scoring function in MM, the default dtypes are set to 'object', as below:
input_array = pd.DataFrame([[LOAN, MORTDUE, VALUE, REASON, JOB, YOJ, DEROG, DELINQ, CLAGE, NINQ, CLNO, DEBTINC]], columns=["LOAN", "MORTDUE", "VALUE", "REASON", "JOB", "YOJ", "DEROG", "DELINQ", "CLAGE", "NINQ", "CLNO", "DEBTINC"], dtype=object)
However, classifiers such as lightgbm don't accept object dtypes. So we may get an error when scoring with lightgbm models in MM:
ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in the following fields: LOAN, MORTDUE, VALUE, REASON, JOB, YOJ, DEROG, DELINQ, CLAGE, NINQ, CLNO, DEBTINC
I don't know whether it is safe to set all dtypes to float or None when generating the scoring func.

@ryanma9629 ryanma9629 added the bug Something isn't working label Jun 26, 2023
@smlindauer smlindauer self-assigned this Jul 3, 2023
@smlindauer
Copy link
Collaborator

@ryanma9629:
I was running in to a depreciation error with pandas in regards to setting all the values to float, but it may be better to just let pandas dictate the type. The worry I had had was that MM can't accept numpy values, but we can check for that from the output of the prediction function.

I will run through some other model types to see how they handle not setting the dtype in the input_array.

@ryanma9629
Copy link
Author

due to the depreciation of np.int and np.float since numpy version 1.20?
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

@ryanma9629: I was running in to a depreciation error with pandas in regards to setting all the values to float, but it may be better to just let pandas dictate the type. The worry I had had was that MM can't accept numpy values, but we can check for that from the output of the prediction function.

I will run through some other model types to see how they handle not setting the dtype in the input_array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants