SOLR-17540: Remove Hadoop Auth Module #2835

epugh · 2024-11-02T12:01:57Z

https://issues.apache.org/jira/browse/SOLR-17540

Description

Remove Hadoop Auth

Solution

no more Hadoop Auth

Tests

Just removing things

Tasks

Look at solr-tests.policy
Do we still need useShortName feature, maybe only supported by hadoop-auth?
remove licenses
update versions.lock
Look at javax.security.auth.kerberos in package-list file in docs render dir
Is Kerb stuff in Solr clients part of hadoop-auth, or to work with other setups?

Remove links to old pages that no longer exist, but leave the major changes references alone.

epugh · 2024-11-02T17:26:58Z

Kerb stuff appears to still work! All tests ran.

janhoy

Fantastic to see all those files go, and all those external deps removed! Just a few comments..

dev-tools/scripts/refguide/htaccess.txt

solr/webapp/web/partials/login.html

risdenk · 2024-11-05T14:23:19Z

There are some more hadoop auth cleanup in the security.policy

epugh · 2024-11-05T22:20:45Z

There are some more hadoop auth cleanup in the security.policy

* https://github.com/apache/solr/blob/main/solr/server/etc/security.policy#L134

* https://github.com/apache/solr/blob/main/gradle/testing/randomization/policies/solr-tests.policy#L103

Thanks for that! I hope I got them all out...

epugh · 2024-11-06T11:43:20Z

@janhoy I tried running that script, but couldn't quite grok it. COuld you give an example of what that scripts should be? And let's add an example to the readme or the to script itself!

janhoy · 2024-11-06T11:55:55Z

@janhoy I tried running that script, but couldn't quite grok it. COuld you give an example of what that scripts should be? And let's add an example to the readme or the to script itself!

It's some time ago, and the script was made to make sure we had redirects for the old regfuide structure, and we also added in page removals. The gist is to mainain the csv file with metadata of all changes, and edit the py script to output the correct htacces.

We likely need to make another CSV section for pages removed in 10.0 guide, and then generate the correct redeirects..

I don't remember where / how the generated htaccess file is checked in though. And I know Antora has some built-in support for generating htaccess as well, should look into it..

To be pragmatic and unblock this, I'd make a Blocker JIRA for 10.0 and maintain a list of the removed pages there, so someone can update htaccess in proper way in due time.

gus-asf · 2024-11-22T16:45:33Z

Would someone be willing to look at what is going on in SolrDispatchFilter? Around

solr/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java

Line 336 in 620175a

requestContinues =

it appears that MAYBE there is an opportunity for some refactoring to simplify the flow... Especially since we specifically mention the Hadoop Auth as the reason for the extra complexity.. I do not understand this bit and would love another set of eyes.... I could also see a path to updating the very lengthy comments to say "This has complexity due to hadoop auth partially, and now that it is gone there may be an opportunity for improvement"...

I wrote that comment to memorialize several hours of digging I did back when I moved startup to a context listener. One of the things I found perplexing about SolrDispatchFilter when I first tried to understand it for that task was the lack of a call to doFilter(req,resp,filterchain) ... note that our custom version with the boolean retry doesn't count, because it doesn't make the normal call to the method specified by javax.servlet.Filter. Normally filter implementations look like:

  public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
      throws IOException, ServletException {

      // Do stuff, that seems important here

      if (importantStuffSeemsHappy) {
        chain.doFilter(request, response, chain);
      } else {
        // do unhappy error type stuff. (maybe/maybe not doFilter anyway)
      }
     // add try/finally if there is mandatory cleanup.
    }
  }

So it was very weird not to find a call to doFilter in the doFilter method, nor in our custom version of it. EVENTUALLY I figured out that that call is made either in the dispatch method, OR in our auth filter (I haven't tried to prove it can't get called twice, but with just SolrDispatchFilter in play that is not currently going to cause a problem since chain.doFilter is a no-op for the final filter). One of the long term goals I have is to start pulling stuff that we are doing in this monster filter out int a series of filters, which will make the individual missions easier to understand and put the cleanup code near the instantiation code where, again it would be much easier to understand (and nesting can be easily seen to be correct).

My impulse (not yet informed by actual attempts) is to rework our auth plugin to be auth filters. The other thing I'm pointing out in that comment is that the HadoopAuthFilter is what seems to stand in the way of writing an if block such as:

      if (authPlugin.authenticate(req, resp)) {            // <<< note the lack of filterchain arg

        // do searchy stuff here

        chain.doFilter(request, response, chain);
      } else {
        // do unhappy 401 error type stuff.
      }
     // add try/finally for mandatory cleanups.
    }

That is of course the first step to breaking auth out to it's own filter where it becomes

  public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
      throws IOException, ServletException {
        if (authPlugin.authenticate(req, resp)) {            
          chain.doFilter(request, response, chain);   // <<< dispatch filter is later in the chain.
        } else {
          // do unhappy 401 error type stuff.
        }
       // add try/finally for mandatory cleanups.
    }

The particular issue with the hadoop auth plugin that complicates the transition is that chain.doFilter() comes before a switch statement and other code...

solr/solr/modules/hadoop-auth/src/java/org/apache/solr/security/hadoop/HadoopAuthPlugin.java

Line 247 in 6f94c50

authFilter.doFilter(request, response, filterChain);

At least at the time of that comment it seemed that all the other plugins called chain.doFilter() at the end (or possibly in a shortcut followed by an immediate return statement). Only Hadoop auth seemed to have mandatory actions AFTER doFilter(). If it disappears, we can possibly remove the filterchain argument and make a simpler use of the return value from authenticate().

epugh · 2024-11-23T16:14:06Z

I am going to not touch SolrDispatchFilter as I don't have a good game plan to move forward with it! Everything else is green.

For CHANGES.txt "Remove Kerberos authentication support from Solr. This in turn removes the Hadoop Auth module". <-- @dsmiley ???

dsmiley · 2024-11-23T17:34:43Z

Cause and effect is inverted. I suggest:

Removed the Hadoop Auth module, and thus Kerberos authentication and other exotic options.

solr/core/src/java/org/apache/solr/security/RuleBasedAuthorizationPlugin.java

solr/modules/hdfs/src/java/org/apache/solr/hdfs/HdfsDirectoryFactory.java

gradle/testing/randomization/policies/solr-tests.policy

solr/solr-ref-guide/modules/deployment-guide/pages/solr-on-hdfs.adoc

solr/server/etc/security.policy

This reverts commit 3ed7ddf.

solr/modules/hdfs/src/java/org/apache/solr/hdfs/HdfsDirectoryFactory.java

risdenk · 2024-12-01T02:00:03Z

I think there might be a few more places to cleanup based on running the following on your branch

git grep -nFi kerber | grep -Fv -e 'solr/modules/hdfs' -e 'solr-on-hdfs.adoc' -e 'solr/CHANGES.txt' -e 'solr/benchmark/src/resources/words.txt'

Specifically these findings:

solr/bin/solr.in.sh:296:# Solr internally doesn't use cookies other than for modules such as Kerberos/Hadoop Auth. If you don't need any of those
solr/core/src/java/org/apache/solr/cli/AuthTool.java:124:        + "  bin/solr auth enable --type kerberos --config \\\"<kerberos configs>\\\" [--update-include-file-only <true|false>] [-v]\n"
solr/core/src/java/org/apache/solr/core/CoreContainer.java:602:      // this caused plugins like KerberosPlugin to register its intercepts, but this intercept
solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java:328:        // obviously don't care. Kerberos plugins seem to mostly use it to satisfy the api of a
solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java:351:    // if it should short circuit, e.g. the Kerberos Authentication Filter will send an error and
solr/webapp/web/js/angular/controllers/index.js:27:        // Needed for Kerberos, since this is the only place from where
solr/webapp/web/js/angular/controllers/index.js:28:        // Kerberos username can be obtained.

It would be awesome to be able to cleanup the security policy files but I know there is some overlap with the Hadoop hdfs tests too.

risdenk · 2024-12-01T02:04:40Z

Some added context about delegation tokens - these were a Hadoop construct at one point and expanded elsewhere to avoid hitting the KDC (kerberos server) too much so the delegation token was used in place after the initial authentication happened. Basically it was a secure token passed around instead of doing the whole roundtrip to the KDC for each call. There are some other things the delegation token can do as well (impersonation if needed).

As David said the Hadoop authentication framework is not just Kerberos, but has a whole framework for authentication. Its similar to how Hadoop filesystem support isn't just HDFS but also S3 and some other backends.

Jetty does have Kerberos/SPNEGO support if we want to go down that route later. The Hadoop implementation for Kerberos support was better than most other Java support out there since not many Kerberos and Java implementations historically and lots of bugs across implementations (Active Directory vs Kerby vs others).

I do think its time to remove this module and make it fully opt in (via a plugin or separately supported module). I haven't had time to keep up with the Hadoop side development of this and don't use it anymore.

As Gus pointed out, there are some interesting hooks to make the Hadoop auth client stuff work. so cleaning all of that up is worth it and removing a module that isn't used that widely.

epugh · 2024-12-01T17:43:44Z

@risdenk I see in solr.in.sh the reference to -Dsolr.http.disableCookies=true and that it highlights the use of it in either hadoop auth or maybe in a load balancer. Do you think i should remove that capablity? Or just remove the text referencing hadooop auth.. I worry that i'm going to pull on a thread and break the http clients... I could just edit it. If you have a sense of what the cookie change shouild be, please do push to the PR or add a patch and I'll add it... I'm thinking of just changing the text and leaving the rest of the cookie stuff... Maybe add another follow on JIRA...

epugh · 2024-12-02T14:59:43Z

Okay, I've responded (I think!) to @risdenk comments. I think this is ready for merging????

epugh added 3 commits November 2, 2024 07:36

Remove the hadoop-auth module from source

d1c5e55

Remove build tooling integrations

af5a296

Remove from login screen.

9b533ed

epugh requested a review from risdenk November 2, 2024 12:02

github-actions bot added module:hadoop-auth tool:build admin-ui tests cat:security labels Nov 2, 2024

First path of removing related docs

480f8f7

github-actions bot added documentation Improvements or additions to documentation scripts labels Nov 2, 2024

epugh added 4 commits November 2, 2024 08:04

Missed this core file

b5e6cff

Light edits to get ref guide build to pass

72f4457

Remove links to old pages that no longer exist, but leave the major changes references alone.

Remove hadoop version

a86505e

Precommit told me to regenerate versions.lock after removing hadoop-auth

b46987f

github-actions bot added the dependencies Dependency upgrades label Nov 2, 2024

epugh added 2 commits November 2, 2024 08:30

Post 'checkUnusedConstraints' step

b7e33ca

Post updateLicenses step

e6ca759

Remove no longer required libraries.

4bf733b

janhoy requested changes Nov 5, 2024

View reviewed changes

dev-tools/scripts/refguide/htaccess.txt Outdated Show resolved Hide resolved

solr/webapp/web/partials/login.html Show resolved Hide resolved

Respond to feedback

9569b95

github-actions bot added the jetty-server label Nov 5, 2024

epugh added 2 commits November 5, 2024 17:22

One last eradication

fb3a0e0

Proper way to change htaccess.

83a5d05

epugh added 3 commits November 22, 2024 11:35

Strip out Kerberos

930d17f

useShortName was only implemented in Kerberos, so remove it.

a3a8acc

No longer support Kerberos for Solr on HDFS

3ed7ddf

Track that Kerberos has been removed

b42274e

dsmiley reviewed Nov 23, 2024

View reviewed changes

epugh added 8 commits November 24, 2024 06:00

Align changes to both files to reflect each other.

7de517b

Revert "No longer support Kerberos for Solr on HDFS"

66d5d1d

This reverts commit 3ed7ddf.

restore removed docs that are still relevant

716ee09

Restore permissions needed by hdfs and crossdc

5425d73

Merge remote-tracking branch 'upstream/main' into SOLR-17540

dfe667f

Simplify method signature

a20d775

Update major-changes-in-solr-10.adoc

69d5831

Track changes

6ca7d16

risdenk reviewed Nov 30, 2024

View reviewed changes

solr/modules/hdfs/src/java/org/apache/solr/hdfs/HdfsDirectoryFactory.java Outdated Show resolved Hide resolved

Backout changes to HDFS and kerberos security.

8ba85f5

github-actions bot removed the module:hdfs label Nov 30, 2024

epugh added 4 commits December 1, 2024 12:45

Update usage to reflect Kerberos being gone.

52adc6f

Attempt to clarify why we mention Kerberos and Hadoop Auth in comments

d4a905d

Remove dead logic. Tested locally using basic auth.

ea09711

Highlight why we mention Kerberos....

bc20356

iamsanjay mentioned this pull request Dec 9, 2024

[WIP] Jetty12 + EE10 #2876

Draft

Merge remote-tracking branch 'upstream/main' into SOLR-17540

c25e78b

epugh merged commit cf68a7f into apache:main Dec 9, 2024
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-17540: Remove Hadoop Auth Module #2835

SOLR-17540: Remove Hadoop Auth Module #2835

epugh commented Nov 2, 2024 •

edited

Loading

epugh commented Nov 2, 2024

janhoy left a comment

risdenk commented Nov 5, 2024

epugh commented Nov 5, 2024

epugh commented Nov 6, 2024

janhoy commented Nov 6, 2024

gus-asf commented Nov 22, 2024 •

edited

Loading

epugh commented Nov 23, 2024

dsmiley commented Nov 23, 2024

risdenk commented Dec 1, 2024

risdenk commented Dec 1, 2024 •

edited

Loading

epugh commented Dec 1, 2024

epugh commented Dec 2, 2024

SOLR-17540: Remove Hadoop Auth Module #2835

SOLR-17540: Remove Hadoop Auth Module #2835

Conversation

epugh commented Nov 2, 2024 • edited Loading

Description

Solution

Tests

Tasks

epugh commented Nov 2, 2024

janhoy left a comment

Choose a reason for hiding this comment

risdenk commented Nov 5, 2024

epugh commented Nov 5, 2024

epugh commented Nov 6, 2024

janhoy commented Nov 6, 2024

gus-asf commented Nov 22, 2024 • edited Loading

epugh commented Nov 23, 2024

dsmiley commented Nov 23, 2024

risdenk commented Dec 1, 2024

risdenk commented Dec 1, 2024 • edited Loading

epugh commented Dec 1, 2024

epugh commented Dec 2, 2024

epugh commented Nov 2, 2024 •

edited

Loading

gus-asf commented Nov 22, 2024 •

edited

Loading

risdenk commented Dec 1, 2024 •

edited

Loading