Skip to content

Commit

Permalink
Added support for per-request authentication to Jsoup.connect (#2046)
Browse files Browse the repository at this point in the history
Added support for per-request authentication

Uses the multi-version support so that in Java versions that support it (9+), an authenticator is set via `java.net.HttpURLConnection.setAuthenticator()`.

On Java 8, we set the system-wide default authenticator, and use ThreadLocals to enable per-request authenticators.

Also adds tests for HTTP and HTTPS server and proxy basic authentication.
  • Loading branch information
jhy authored Nov 9, 2023
1 parent 7d46675 commit 1123dd2
Show file tree
Hide file tree
Showing 14 changed files with 708 additions and 13 deletions.
3 changes: 3 additions & 0 deletions CHANGES
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
jsoup changelog

Release 1.17.1 [PENDING]
* Improvement: in Jsoup.connect(), added support for request-level authentication, supporting authentication to
proxies and to servers.

* Improvement: in the Elements list, added direct support for `#set(index, element)`, `#remove(index)`,
`#remove(object)`, `#clear()`, `#removeAll(collection)`, `#retainAll(collection)`, `#removeIf(filter)`,
`#replaceAll(operator)`. These methods update the original DOM, as well as the Elements list.
Expand Down
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@
<ignore>java.io.UncheckedIOException</ignore>
<ignore>java.util.function.Predicate</ignore>
<ignore>java.util.function.UnaryOperator</ignore>
<ignore>java.net.HttpURLConnection</ignore><!-- .setAuthenticator(java.net.Authenticator) in Java 9; only used in multirelease 9+ version -->
</ignores>
<!-- ^ Provided by https://developer.android.com/studio/write/java8-support#library-desugaring
Possibly OK to remove androidscents; keep for now to validate other additions are supported. -->
Expand Down
103 changes: 103 additions & 0 deletions src/main/java/org/jsoup/Connection.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
package org.jsoup;

import org.jsoup.helper.RequestAuthenticator;
import org.jsoup.nodes.Document;
import org.jsoup.parser.Parser;

Expand All @@ -9,6 +10,7 @@
import java.io.IOException;
import java.io.InputStream;
import java.io.UncheckedIOException;
import java.net.Authenticator;
import java.net.CookieStore;
import java.net.Proxy;
import java.net.URL;
Expand Down Expand Up @@ -69,6 +71,28 @@ public final boolean hasBody() {
*/
Connection newRequest();

/**
Creates a new request, using this Connection as the session-state and to initialize the connection settings (which
may then be independently changed on the returned {@link Connection.Request} object).
@return a new Connection object, with a shared Cookie Store and initialized settings from this Connection and Request
@param url URL for the new request
@since 1.17.1
*/
default Connection newRequest(String url) {
return newRequest().url(url);
}

/**
Creates a new request, using this Connection as the session-state and to initialize the connection settings (which
may then be independently changed on the returned {@link Connection.Request} object).
@return a new Connection object, with a shared Cookie Store and initialized settings from this Connection and Request
@param url URL for the new request
@since 1.17.1
*/
default Connection newRequest(URL url) {
return newRequest().url(url);
}

/**
* Set the request URL to fetch. The protocol must be HTTP or HTTPS.
* @param url URL to connect to
Expand Down Expand Up @@ -322,6 +346,64 @@ <p>For GET requests, data parameters will be sent on the request query string. F
*/
Connection postDataCharset(String charset);

/**
Set the authenticator to use for this connection, enabling requests to URLs, and via proxies, that require
authentication credentials.
<p>The authentication scheme used is automatically detected during the request execution.
Supported schemes (subject to the platform) are {@code basic}, {@code digest}, {@code NTLM},
and {@code Kerberos}.</p>
<p>To use, supply a {@link RequestAuthenticator} function that:
<ol>
<li>validates the URL that is requesting authentication, and</li>
<li>returns the appropriate credentials (username and password)</li>
</ol>
</p>
<p>For example, to authenticate both to a proxy and a downstream web server:
<code><pre>
Connection session = Jsoup.newSession()
.proxy("proxy.example.com", 8080)
.auth(auth -> {
if (auth.isServer()) { // provide credentials for the request url
Validate.isTrue(auth.url().getHost().equals("example.com"));
// check that we're sending credentials were we expect, and not redirected out
return auth.credentials("username", "password");
} else { // auth.isProxy()
return auth.credentials("proxy-user", "proxy-password");
}
});
Connection.Response response = session.newRequest("https://example.com/adminzone/").execute();
</pre></code>
</p>
<p>The system may cache the authentication and use it for subsequent requests to the same resource.</p>
<p><b>Implementation notes</b></p>
<p>For compatibility, on a Java 8 platform, authentication is set up via the system-wide default
{@link java.net.Authenticator#setDefault(Authenticator)} method via a ThreadLocal delegator. Whilst the
authenticator used is request specific and thread-safe, if you have other calls to {@code setDefault}, they will be
incompatible with this implementation.</p>
<p>On Java 9 and above, the preceding note does not apply; authenticators are directly set on the request. </p>
<p>If you are attempting to authenticate to a proxy that uses the {@code basic} scheme and will be fetching HTTPS
URLs, you need to configure your Java platform to enable that, by setting the
{@code jdk.http.auth.tunneling.disabledSchemes} system property to {@code ""}.
This must be executed prior to any authorization attempts. E.g.:
<code><pre>
static {
System.setProperty("jdk.http.auth.tunneling.disabledSchemes", "");
// removes Basic, which is otherwise excluded from auth for CONNECT tunnels
}</pre></code>
</p>
* @param authenticator the authenticator to use in this connection
* @return this Connection, for chaining
* @since 1.17.1
*/
default Connection auth(@Nullable RequestAuthenticator authenticator) {
throw new UnsupportedOperationException();
}

/**
* Execute the request as a GET, and parse the result.
* @return parsed Document
Expand Down Expand Up @@ -699,6 +781,27 @@ interface Request extends Base<Request> {
*/
String postDataCharset();

/**
Set the authenticator to use for this request.
See {@link Connection#auth(RequestAuthenticator) Connection.auth(authenticator)} for examples and
implementation notes.
* @param authenticator the authenticator
* @return this Request, for chaining.
* @since 1.17.1
*/
default Request auth(@Nullable RequestAuthenticator authenticator) {
throw new UnsupportedOperationException();
}

/**
Get the RequestAuthenticator, if any, that will be used on this request.
* @return the RequestAuthenticator, or {@code null} if not set
* @since 1.17.1
*/
@Nullable
default RequestAuthenticator auth() {
throw new UnsupportedOperationException();
}
}

/**
Expand Down
90 changes: 90 additions & 0 deletions src/main/java/org/jsoup/helper/AuthenticationHandler.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
package org.jsoup.helper;

import javax.annotation.Nullable;
import java.lang.reflect.Constructor;
import java.net.Authenticator;
import java.net.HttpURLConnection;
import java.net.PasswordAuthentication;

/**
Handles per request Authenticator-based authentication. Loads the class `org.jsoup.helper.RequestAuthHandler` if
per-request Authenticators are supported (Java 9+), or installs a system-wide Authenticator that delegates to a request
ThreadLocal.
*/
class AuthenticationHandler extends Authenticator {
static final int MaxAttempts = 5; // max authentication attempts per request. allows for multiple auths (e.g. proxy and server) in one request, but saves otherwise 20 requests if credentials are incorrect.
static AuthShim handler;

static {
try {
//noinspection unchecked
Class<AuthShim> perRequestClass = (Class<AuthShim>) Class.forName("org.jsoup.helper.RequestAuthHandler");
Constructor<AuthShim> constructor = perRequestClass.getConstructor();
handler = constructor.newInstance();
} catch (ClassNotFoundException e) {
handler = new GlobalHandler();
} catch (Exception e) {
throw new IllegalStateException(e);
}
}

@Nullable RequestAuthenticator auth;
int attemptCount = 0;

AuthenticationHandler() {}

AuthenticationHandler(RequestAuthenticator auth) {
this.auth = auth;
}

/**
Authentication callback, called by HttpURLConnection - either as system-wide default (Java 8) or per HttpURLConnection (Java 9+)
* @return credentials, or null if not attempting to auth.
*/
@Nullable @Override public final PasswordAuthentication getPasswordAuthentication() {
AuthenticationHandler delegate = handler.get(this);
if (delegate == null) return null; // this request has no auth handler
delegate.attemptCount++;
// if the password returned fails, Java will repeatedly retry the request with a new password auth hit (because
// it may be an interactive prompt, and the user could eventually get it right). But in Jsoup's context, the
// auth will either be correct or not, so just abandon
if (delegate.attemptCount > MaxAttempts)
return null;
if (delegate.auth == null)
return null; // detached - would have been the Global Authenticator (not a delegate)

RequestAuthenticator.Context ctx = new RequestAuthenticator.Context(
this.getRequestingURL(), this.getRequestorType(), this.getRequestingPrompt());
return delegate.auth.authenticate(ctx);
}

interface AuthShim {
void enable(RequestAuthenticator auth, HttpURLConnection con);

void remove();

@Nullable AuthenticationHandler get(AuthenticationHandler helper);
}

/**
On Java 8 we install a system-wide Authenticator, which pulls the delegating Auth from a ThreadLocal pool.
*/
static class GlobalHandler implements AuthShim {
static ThreadLocal<AuthenticationHandler> authenticators = new ThreadLocal<>();
static {
Authenticator.setDefault(new AuthenticationHandler());
}

@Override public void enable(RequestAuthenticator auth, HttpURLConnection con) {
authenticators.set(new AuthenticationHandler(auth));
}

@Override public void remove() {
authenticators.remove();
}

@Override public AuthenticationHandler get(AuthenticationHandler helper) {
return authenticators.get();
}
}
}
21 changes: 21 additions & 0 deletions src/main/java/org/jsoup/helper/HttpConnection.java
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,10 @@ public Connection postDataCharset(String charset) {
return this;
}

@Override public Connection auth(RequestAuthenticator authenticator) {
req.auth(authenticator);
return this;
}

@SuppressWarnings("unchecked")
private static abstract class Base<T extends Connection.Base<T>> implements Connection.Base<T> {
Expand Down Expand Up @@ -596,6 +600,7 @@ public static class Request extends HttpConnection.Base<Connection.Request> impl
private String postDataCharset = DataUtil.defaultCharsetName;
private @Nullable SSLSocketFactory sslSocketFactory;
private CookieManager cookieManager;
private @Nullable RequestAuthenticator authenticator;
private volatile boolean executing = false;

Request() {
Expand Down Expand Up @@ -626,6 +631,7 @@ public static class Request extends HttpConnection.Base<Connection.Request> impl
parserDefined = copy.parserDefined;
sslSocketFactory = copy.sslSocketFactory; // these are all synchronized so safe to share
cookieManager = copy.cookieManager;
authenticator = copy.authenticator;
executing = false;
}

Expand Down Expand Up @@ -764,6 +770,15 @@ public String postDataCharset() {
CookieManager cookieManager() {
return cookieManager;
}

@Override public Connection.Request auth(@Nullable RequestAuthenticator authenticator) {
this.authenticator = authenticator;
return this;
}

@Override @Nullable public RequestAuthenticator auth() {
return authenticator;
}
}

public static class Response extends HttpConnection.Base<Connection.Response> implements Connection.Response {
Expand Down Expand Up @@ -898,6 +913,10 @@ else if (methodHasBody)
throw e;
} finally {
req.executing = false;

// detach any thread local auth delegate
if (req.authenticator != null)
AuthenticationHandler.handler.remove();
}

res.executed = true;
Expand Down Expand Up @@ -1008,6 +1027,8 @@ private static HttpURLConnection createConnection(HttpConnection.Request req) th

if (req.sslSocketFactory() != null && conn instanceof HttpsURLConnection)
((HttpsURLConnection) conn).setSSLSocketFactory(req.sslSocketFactory());
if (req.authenticator != null)
AuthenticationHandler.handler.enable(req.authenticator, conn); // removed in finally
if (req.method().hasBody())
conn.setDoOutput(true);
CookieUtil.applyCookiesToRequest(req, conn); // from the Request key/val cookies and the Cookie Store
Expand Down
Loading

0 comments on commit 1123dd2

Please sign in to comment.