-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP response caching, with TTL and LRU logic #11342
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall approach looks great to me.
A few changes though please.
I think we should also put the LRU into its own class and make thread safe sooner rather than later (as next step will parallel download!)
std-bits/base/src/main/java/org/enso/base/enso_cloud/TransientHTTPResponseCache.java
Outdated
Show resolved
Hide resolved
std-bits/base/src/main/java/org/enso/base/enso_cloud/TransientHTTPResponseCache.java
Outdated
Show resolved
Hide resolved
std-bits/base/src/main/java/org/enso/base/enso_cloud/TransientHTTPResponseCache.java
Show resolved
Hide resolved
std-bits/base/src/main/java/org/enso/base/enso_cloud/TransientHTTPResponseCache.java
Outdated
Show resolved
Hide resolved
MessageDigest messageDigest = MessageDigest.getInstance("SHA-256"); | ||
messageDigest.update(resolvedURI.toString().getBytes()); | ||
|
||
var sortedHeaders = resolvedHeaders.stream().sorted(headerNameComparator).toList(); | ||
for (Pair<String, String> resolvedHeader : sortedHeaders) { | ||
messageDigest.update(resolvedHeader.getLeft().getBytes()); | ||
messageDigest.update(resolvedHeader.getRight().getBytes()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use just use Java's string hash?
Any reason to use SHA-256 over faster MD5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean concatenate the uri and headers and use String.hashCode()? I used this for better security, but can definitely switch to MD5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHA256 is heavier to compute than a simple MD5 - not sure makes too much difference on security as reversing an MD5 is pretty hard anyway, but as this is on headers and uri its not a real issue.
The Java string hash was just couldn't we just make a big string and hash it - but I prefer yours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used this for better security
Security of what? I mean:
- there is no check whether the first download of the HTTP resource is trustworthy, right?
- e.g. the system downloads everything and just puts a stamp on it
- if someone can get access to a computer (user account) then there is no security anyway
- so: what kind of attack this hash is shielding the user against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synthesizing a URL that would hash to an existing resource in the user's cache that they hadn't intended to load. It's an extremely unlikely case, but I thought of it just as the routine use of non-trivial hashes for resource naming, and the cost is once per HTTP request. I'm fine using hashCode
for this.
// 1 year. | ||
private static final int DEFAULT_TTL_SECONDS = 31536000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this some standardized value? If not, isn't it too large for the purposes of this cache? It feels like something around 24h would be more fitting, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was suggested by @jdunkerley .
Comments resolved except for nio change. |
distribution/lib/Standard/Base/0.0.0-dev/src/Errors/Common.enso
Outdated
Show resolved
Hide resolved
std-bits/base/src/main/java/org/enso/base/enso_cloud/TransientHTTPResponseCache.java
Outdated
Show resolved
Hide resolved
std-bits/base/src/main/java/org/enso/base/enso_cloud/TransientHTTPResponseCache.java
Outdated
Show resolved
Hide resolved
std-bits/base/src/main/java/org/enso/base/enso_cloud/TransientHTTPResponseCache.java
Outdated
Show resolved
Hide resolved
std-bits/base/src/main/java/org/enso/base/enso_cloud/TransientHTTPResponseCache.java
Outdated
Show resolved
Hide resolved
std-bits/base/src/main/java/org/enso/base/enso_cloud/TransientHTTPResponseCache.java
Outdated
Show resolved
Hide resolved
std-bits/base/src/main/java/org/enso/base/enso_cloud/TransientHTTPResponseCache.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks very good and I'm very happy about how readable the code is, even though quite a lot is happening, it is really easy to follow it and understand it. It's really appreciated! ❤️
I have some reservations about 'security'. While our current design of the secrets system cannot be 100% secure, because the secrets are in the JVM and can always be recovered e.g. using reflection, we still want to make the 'attack surface' as small as possible.
Adding the, relatively complex, logic of TransientHTTPResponseCache
into the enso_cloud
package that is dealing with secrets is making the 'sensitive surface' quite large - one has to look at all the methods of the TransientHTTPResponseCache
class and make sure that secrets aren't leaked anywhere. And in fact I did find a leak there.
I would suggest to decrease the 'sensitive' surface smaller by adding a bit more encapsulation. We could move TransientHTTPResponseCache
to a separate package and make the 'secure' EnsoSecretHelper
communicate with it through a much smaller API that can be more easily analyzed for security.
I think the RequestMaker
interface is a good starting point. What if we extend it a bit?
I'd suggest
interface RequestMaker {
/* Executes the HTTP request and returns the response. All secret handling should be encapsulated inside of the `run` function, so that secret values are not exposed to the outside world. */
EnsoHttpResponse run() throws IOException, InterruptedException;
/* Returns a hash key that can be used to uniquely identify this request (by hashing its URI and headers). This will be used to decide if the `run` method should be executed, or if a cached response will be returned.
The hash key should not be reversible - any secrets that were present inside of the URI or headers should not be recoverable from the hash key. */
String hashKey();
/* When a cached response is returned, instead of executing `run`, this method is used to construct the response. */
EnsoHttpResponse reconstructResponseFromCachedStream(InputStream cachedResponseData, ...);
}
By encapsulating all the logic for running the actual query and constructing the response from cache in the RequestMaker
, the TransientHTTPResponseCache
can be 'clueless' about the details and security of secret handling - the TransientHTTPResponseCache
itself is not given any secret information at all (in fact it does not need to know even the non-secret rendered URI now), so we don't have to worry about its security anymore. We only need to be careful about the RequestMaker
implementation to avoid leaking secrets there - but this is much smaller piece of code than the whole TransientHTTPResponseCache
. So it becomes much easier.
The
...
in the code snippet above indicate that we still need to know some more info to be able to reconstruct the request. I think we could parametrize theRequestMaker
byResponseMetadata
type - and add aResponseMetadata dumpMetadataForCache(EnsoHttpResponse)
method to theRequestMaker
interface. Then the cache would call this when storing the cache entry and then provide an instance ofResponseMetadata
toreconstructResponseFromCachedStream
. This way even the stored metadata will stay opaque for the cache.
Alternatively, and perhaps much simpler, it could be to keep storing the metadata (response headers and status code) like now, and add them as parameters to reconstruct the response - these metadata are not secret anyway, so we can store them in the cache directly. I'd just avoid storing the URI and instead let theRequestMaker
reconstruct it.
This implements
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
or the Snowflake database integration, a run of the Extra Tests has been scheduled.