-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SWIG methods that return char** #2850
Fix SWIG methods that return char** #2850
Conversation
@AlbertoEAF Sorry for the inconvenience, but can you please rebase to the latest |
Hello @StrikerRUS, I'd very much like to, but I'm waiting for approval from my company to sign it and |
158919b
to
a373964
Compare
@StrikerRUS done :) Regarding the code, Also please don't merge directly, and give me feedback first on the comments that say "@Reviewer". Also, |
8954897
to
cd510d2
Compare
fd94b96
to
02f2d28
Compare
Hello, I changed the API to be simpler to use and removed the choice of string allocation size from the user, to be impossible to result in an irrecoverable segmentation fault in case the user decides to allocate too small strings when asking for the feature names. I changed the 2 wrappers to be closer to what @imatiach-msft had initially for The new usage is much simpler now than it was on the first submission (nullptr is returned in case of any error). Example:
Could someone tell me how to include the Also, can someone review? All inputs are appreciated :) Thanks! |
a958fd6
to
cd5e48a
Compare
"Could someone tell me how to include the swig/StringArray.hpp file and remove its copy-pasted code from swig/StringArray_API_extensions.i?" I would think that in swig/lightgbmlib.i you would be able to include StringArray.hpp using just:
similar to how the export.h file is included in swig/lightgbmlib.i:
please let me know if I misunderstood something! |
Hello @guolinke, @imatiach-msft and all to whom this may concern, Changelog
C API changes (safety)
New API usage Since this functions are not in hot-code regions, in the Java SWIG wrappers I chose simply to pass a null pointer and 0-sizes for both number of strings and their size, just to allocate memory accordingly and then repeat the call. I could have instead allocated memory with an assumed size, check and if needed repeat the call with enough storage space.
R and Python wrappers changes
If you want, a warning instead of an error could be triggered instead, or proper sizing of the content like in the SWIG wrappers without assumptions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The R-specific changes look ok to me! I looked through the rest of the PR and didn't have any other comments, but I don't know enough about c_api
or the SWIG stuff for my approval to count towards a merge on this.
Thanks for the contribution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for Python part!
Hello @guolinke and @StrikerRUS :), Do you know if anyone else needs to approve? I'm not sure how the procedure goes now. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlbertoEAF
Please consider fixing some style and linting issues below:
ddbb4fc
to
935947b
Compare
Co-Authored-By: Nikita Titov <[email protected]>
Ok @StrikerRUS :) Anything left? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlbertoEAF Thank you very much! We really appreciate all your efforts!
Only one new cpplint issue has raised after my previous review.
Approving this PR right now to merge it ASAP after resolving and green CI.
Co-Authored-By: Nikita Titov <[email protected]>
Co-Authored-By: Nikita Titov <[email protected]>
@AlbertoEAF I tried to update to latest code today but I am getting a segfault when calling the new LGBM_BoosterGetEvalNamesSWIG API, any idea what might be going wrong? Will take a deeper look over this week.
This is the new scala code I am using, but it doesn't seem to get past just the LGBM_BoosterGetEvalNamesSWIG call:
|
Hello @imatiach-msft , just to be sure, what you did in #2958 fixes it right? It makes sense to me that it would. |
@AlbertoEAF yes, thanks! |
std::memcpy(out_strs[idx], name.c_str(), name.size() + 1); | ||
if (idx < len) { | ||
std::memcpy(out_strs[idx], name.c_str(), std::min(name.size() + 1, buffer_len)); | ||
out_strs[idx][buffer_len - 1] = '\0'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here sometimes will have memory problems, maybe it is better for
auto cur_len = std::min(name.size(), buffer_len - 1);
std::memcpy(out_strs[idx], name.c_str(), cur_len);
out_strs[idx][cur_len] = '\0';
btw, is idx < len
needed?
ping @AlbertoEAF @StrikerRUS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe it is also a cause of #3398?..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the change, it does the same but it's easier to read @guolinke and sets the null byte only once when the string has the limit size.
Yes, the idx < len
check is to ensure if you have allocated a char** that only has space for len
string pointers, you don't write outside it, even if internally we have a bigger array that would require more space. We stop the copy before writing outside allocated memory to avoid segmentation faults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S.: In other words, segfaults/memory write violations can only occur if the function receives wrong len
and buffer_len
arguments from the caller. I.e., the user/caller actually pre-allocated smaller/less "strings" in char **out_strs
than that the values he passed in buffer_len/len respectively.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
The main reason to add this pull request is outlined in #2814.
After discussing with @imatiach-msft and the devs at SWIG (swig/swig#1740),
the devised solution aiming at simplicity and correctness was to add a StringArray to wrap any char** parameters so memory can be managed.
This relies on the already existing
various.i
extensions to return a String[] through a single call.@imatiach-msft also requested that LGBM_BoosterGetEvalNamesSWIG be refactored to use the same type of API. This meant changing that method signature, and I did so by following the LGBM_* signature of returning the error code as an int 0/-1.
I'm very interested in getting feedback from the team, specially @imatiach-msft with whom I've been discussing this topic. If possible I'd like to know your thoughts on these questions first: #2814.
I also believe this should be improved upon before merging. I left some "@Reviewer" notes throughout the code. If you can read those and discuss with me I'd appreciate it, as I believe those points should be addressed to get the code style more in line with the rest of the code in the repo, and to streamline the code as well.
Also, can we find a way to add the docstrings in the interface files to the documentation?
You can now get hold of String[] by just calling
swigStringArrayHandle.data()
.Working example:Thank you!
By the way, here is the changelog:
Add StringArray class to manage and manipulate arrays of fixed-length strings.
This class is now used to wrap any char** parameters, manage memory and
manipulate the strings.
Fix SWIG LGBM_BoosterGetFeatureNames call that would result in a segfault.
Added wrapper LGBM_BoosterGetFeatureNamesSWIG (the method it wraps didn't work)
For consistency, LGBM_BoosterGetEvalNames was wrapped as well.
[breaking api change]
Re-factored wrapper LGBM_BoosterGetEvalNames to follow LGBM_*
convention of 0/-1 in case of success/failure.