Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] remove HDFS support? #6436

Closed
Tracked by #6439
jameslamb opened this issue May 1, 2024 · 3 comments · Fixed by #6534
Closed
Tracked by #6439

[RFC] remove HDFS support? #6436

jameslamb opened this issue May 1, 2024 · 3 comments · Fixed by #6534
Labels

Comments

@jameslamb
Copy link
Collaborator

jameslamb commented May 1, 2024

Proposal

I'm requesting comment on the following proposal:

  • remove CMake option USE_HDFS
  • remove all docs and code related to HDFS

And doing all of these only after issuing build-time and run-time deprecation warnings for 1 release.

Summary

6+ years ago, #1243 was merged, adding "experimental" support for Hadoop Filesystem ("HDFS") to LightGBM.

Specifically, that meant adding the ability to provide an HDFS URI to Dataset creation methods that accept files, and have LightGBM read the file from HDFS.

Since then:

I don't believe SynapseML depends on the HDFS build of LightGBM (GitHub search). Hopefully @imatiach-msft or @mhamilton723 can confirm.

Searching the rest of GitHub (GitHub search), I just see other forks of LightGBM and two recipes from repackages:

Motivation

Simplifies the project's public interface (build system options, installation docs, etc.).

Simplifies file-reading C++ code, making it easier to understand and refactor in the future.

Here in 2024, users have other and better options for using LightGBM distributed training on data in HDFS, like:

@barracuda156
Copy link
Contributor

@jameslamb Thank you for referring. For Macports: Since we do not enable this option presently, removing it won’t break anything. So we are good.

@borchero
Copy link
Collaborator

borchero commented May 1, 2024

I'm all for this change, simplification is always good and if we neither have tests nor indication that nobody has used this (extensively), that's even more of an argument 👍

@nviets
Copy link

nviets commented May 3, 2024

Thanks for the heads up. will remove the arg from nixpkgs when updating for 4.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants