-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retain resource URL in java-parser results, improve caching #139
Retain resource URL in java-parser results, improve caching #139
Conversation
The java parser/legacy-parser returns a resource's `:file`, actually a relative path as a string, and `:path`, the relative path converted to absolute. It does this conversion through an `io/resource` lookup to find the file system location of the artifact. There are two issues with this: 1. The return value of `io/resource` is discarded, even though it is needed later on in `orchard.java`, causing unnecessary resource lookups. 2. The path is obtained through `(.getPath (io/resource path))`, which yields nonsensical results when used on resources that are inside a JAR. This change keeps the `:file` returned from java-parser (relative path string), but removes the `:path` value in favor of a `:resource-url`, i.e. the return value of `(io/resource path)`. It then provides utility functions to work with this resource URL directly, namely mapping it to a filesystem artifact, or retrieving its modification time. java.parser/parser already does a `io/resource` lookup for the resource it is parsing, meaning it has a full URL (jar: or file:). By including this URL in the map it returns callers can do further checks or operations on the resource without having to re-scan the classpath. This in turn allows us to simplify and optimize `orchard.java/class-info`. The old version would call `io/resource` on each invocation, even when the result was already cached, in order to find the artifact's modification time. This can noticably speed up cider-nrepl's `"stacktrace"` op, which calls `orchard.java/class-info` on each stack frame, in extreme cases taking multiple seconds to analyze all stack frames and return a result. This leads to rather jarring UX, with the stacktrace buffering popping up in Emacs seconds after the evaluation and exception happened. This is exarcerbated when using the nREPL sideloader, which adds overhead to each resource lookup. This also makes the return value of java-parser more correct. As mentioned the previous version would always call `(.getPath (io/resource path))`, but this only makes sense for `file:` resources, not for `jar:` resources. For `file:` URLs `.getPath` returns the path of the file. For `jar:` URLs it returns the nested url+the jar entry, so a `file:` URL but with a dangling `!/jar/entry`. Illustration of why calling `getPath` on `jar:` URLs is not useful: ```clojure (io/resource "lambdaisland/witchcraft.clj") ;;=> #java.net.URL "file:/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj" (.getPath (io/resource "lambdaisland/witchcraft.clj")) ;;=> "/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj" (io/resource "clojure/lang/RT.class") ;;=> #java.net.URL "jar:file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class" (.getPath (io/resource "clojure/lang/RT.class")) ;;=> "file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class" ```
:file path | ||
:path (. (io/resource path) getPath)))) | ||
;; Legacy key. Please do not remove - we don't do breaking changes! | ||
:path (-> path io/resource .getPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm now that I added :path again in the second commit, there are two io/resource calls, which go against this PR's intent.
Can extract to a common let
later today (if you don't beat me to it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a longer comment, but I really think this should simply go away or become an alias for :resource-url
. Again I'll say that we shouldn't go overboard for backwards compatibility concerns that likely don't have any real-world implications. There are like 5 projects using cider-nrepl
and all of them model their API usage on CIDER, as I never documented the API particularly well. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's overboard
here? The code is pretty simple, I'm not contorting it to keep compat (see Don't compute io/resource twice for :path
commit).
In terms of computations, no extra cost is incurred. It's Just Data 🙂
Honestly I don't understand why the default should be dropping instead of keeping. In this case, dropping is being actually costlier as several people have to be queried/coordinated to make sure it's safe.
And we can't never be really sure. Clojure/Emacs are hackers' tools they can be consumed in unsuspected ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this PR had two goals:
- eliminate a performance problem
- make the API simpler
If we still compute a redundant key, that's one redundant computation, even after the optimization you did. Not a big deal perhaps. More importantly, however, it's one key with unclear purpose in the result that was added somewhat accidentally. I mean - even I couldn't tell you what this key was about, before looking into the code. Now we're back to square one - the key is here, and people will wonder what exactly is the difference between all the "path" keys. Anyways, I don't want to argue too much on this topic and I'll merge the PR in its current form for now. I might end up simplifying this API myself down the road.
Why add |
Thanks for checking with me, @bbatsov! Not sure how to figure out if Calva relies on We will of course make any changes necessary to not rely on this. So please go ahead removing it. To me it is more a matter of knowing which changes I need to make. |
I think this will impact directly only the |
Cool. We're using |
The cider-nrepl ;; ======= clojure var
(-> {:ns 'clojure.core :sym 'str}
info-reply
(select-keys ["path" "file" "resource"]))
;;=>
{"file" "jar:file:/home/arne/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/core.clj"
"resource" "clojure/core.clj"}
;; ========== java method
(-> {:ns 'clojure.core :sym 'clojure.lang.Compiler/eval}
info-reply
(select-keys ["path" "file" "resource"]))
;;=>
{}
;; ========== java class
(-> {:ns 'clojure.core :sym 'java.util.List}
info-reply
(select-keys ["path" "file" "resource"]))
;;=>
{"path" "file:/home/arne/.sdkman/candidates/java/16.0.1.hs-adpt/lib/src.zip!/java.base/java/util/List.java"
"file" "jar:file:/home/arne/.sdkman/candidates/java/16.0.1.hs-adpt/lib/src.zip!/java.base/java/util/List.java"
"resource" "java.base/java/util/List.java"} Note that only in the case of a java class symbol there is currently a The This PR is just a first step, the thing is that cider-nrepl is doing some Note that in the {:file resource-url
:resource file
:path (.getPath (URL. resource-url))} |
Sounds prudent. In general I'm way more concerned about the API at the nREPL level, as I'm not even sure if someone is using Orchard directly these days. It's a pity Clojars doesn't have a reverse deps functionality.
Yeah, that's a bit confusing indeed. I think my reasoning for the op was file paths were supposed to be absolute and resource paths were supposed to be relative, as resources are a Java concept and files are universal. Perhaps this was misguided. We can always introduce keys with more consistent names and gradually adopt them over time, if we deem this necessary. |
It seems we keep ending up in similar discussions about how much breakage we will accept. I have a pretty strong aversion to (potentially) breaking existing consumers. "Don't break userspace" etc. But that does mean you need to own all your mistakes forever, and there are a lot of mistakes in nrepl/cider/orchard. Stuff has been built in a rather ad hoc way, with lots of inconsistencies and things that don't quite work as well as they should. And this is turn has led to accidental breakage in the past. So you have two options here, you try to evolve while minimizing breakage, or you freeze what is there and tell people that if they want something better it'll have to be in a new project or fork. Something that stands out here in particular is that we have an nrepl op (info) which can returns vastly different things. Are these things documented? Can we have a protocol spec for these things? There are multiple nrepl server implementations out there, it would be nice if clients or servers could be considered compliant with the protocol in general, instead of tying themselves to implementation details. |
I agree. Admittedly, I've done plenty of mistakes mostly because I had to fit too much work into too little time (which meant that often I didn't consider carefully the design of some things). And I hadn't envisioned cider-nrepl as a reusable API early on. In the long run dealing with those mistakes became another time sink, that's why I'm in favor of tackling them more aggressively when feasible.
I'm obviously in favor of the first option. :-)
They are not documented yet, but also the op is marked as experimental so it's API is definitely not set in stone. Simply speaking - I meant for it to return data in the same format as
Indeed. |
@bbatsov Sorry for late reply |
This PR replaces #124, see there for previous discussion.
The java parser/legacy-parser returns a resource's
:file
, actually a relativepath as a string, and
:path
, the relative path converted to absolute. It doesthis conversion through an
io/resource
lookup to find the file system locationof the artifact. There are two issues with this:
The return value of
io/resource
is discarded, even though it is neededlater on in
orchard.java
, causing unnecessary resource lookups.The path is obtained through
(.getPath (io/resource path))
, which yieldsnonsensical results when used on resources that are inside a JAR.
This change keeps the
:file
returned from java-parser (relative path string),but removes the
:path
value in favor of a:resource-url
, i.e. the returnvalue of
(io/resource path)
. It then provides utility functions to work withthis resource URL directly, namely mapping it to a filesystem artifact, or
retrieving its modification time.
java.parser/parser already does a
io/resource
lookup for the resource it isparsing, meaning it has a full URL (jar: or file:). By including this URL in the
map it returns callers can do further checks or operations on the resource
without having to re-scan the classpath.
This in turn allows us to simplify and optimize
orchard.java/class-info
. Theold version would call
io/resource
on each invocation, even when the resultwas already cached, in order to find the artifact's modification time.
This can noticably speed up cider-nrepl's
"stacktrace"
op, which callsorchard.java/class-info
on each stack frame, in extreme cases taking multipleseconds to analyze all stack frames and return a result. This leads to rather
jarring UX, with the stacktrace buffering popping up in Emacs seconds after the
evaluation and exception happened. This is exarcerbated when using the nREPL
sideloader, which adds overhead to each resource lookup.
This also makes the return value of java-parser more correct. As mentioned the
previous version would always call
(.getPath (io/resource path))
, but thisonly makes sense for
file:
resources, not forjar:
resources. Forfile:
URLs
.getPath
returns the path of the file. Forjar:
URLs it returns thenested url+the jar entry, so a
file:
URL but with a dangling!/jar/entry
.Illustration of why calling
getPath
onjar:
URLs is not useful:Before submitting a PR make sure the following things have been done:
Thanks!