Retain resource URL in java-parser results, improve caching #139

plexus · 2021-11-11T06:31:29Z

This PR replaces #124, see there for previous discussion.

The java parser/legacy-parser returns a resource's :file, actually a relative
path as a string, and :path, the relative path converted to absolute. It does
this conversion through an io/resource lookup to find the file system location
of the artifact. There are two issues with this:

The return value of io/resource is discarded, even though it is needed
later on in orchard.java, causing unnecessary resource lookups.
The path is obtained through (.getPath (io/resource path)), which yields
nonsensical results when used on resources that are inside a JAR.

This change keeps the :file returned from java-parser (relative path string),
but removes the :path value in favor of a :resource-url, i.e. the return
value of (io/resource path). It then provides utility functions to work with
this resource URL directly, namely mapping it to a filesystem artifact, or
retrieving its modification time.

java.parser/parser already does a io/resource lookup for the resource it is
parsing, meaning it has a full URL (jar: or file:). By including this URL in the
map it returns callers can do further checks or operations on the resource
without having to re-scan the classpath.

This in turn allows us to simplify and optimize orchard.java/class-info. The
old version would call io/resource on each invocation, even when the result
was already cached, in order to find the artifact's modification time.

This can noticably speed up cider-nrepl's "stacktrace" op, which calls
orchard.java/class-info on each stack frame, in extreme cases taking multiple
seconds to analyze all stack frames and return a result. This leads to rather
jarring UX, with the stacktrace buffering popping up in Emacs seconds after the
evaluation and exception happened. This is exarcerbated when using the nREPL
sideloader, which adds overhead to each resource lookup.

This also makes the return value of java-parser more correct. As mentioned the
previous version would always call (.getPath (io/resource path)), but this
only makes sense for file: resources, not for jar: resources. For file:
URLs .getPath returns the path of the file. For jar: URLs it returns the
nested url+the jar entry, so a file: URL but with a dangling !/jar/entry.

Illustration of why calling getPath on jar: URLs is not useful:

(io/resource "lambdaisland/witchcraft.clj")
;;=> #java.net.URL "file:/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj"

(.getPath (io/resource "lambdaisland/witchcraft.clj"))
;;=> "/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj"

(io/resource "clojure/lang/RT.class")
;;=> #java.net.URL "jar:file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class"

(.getPath (io/resource "clojure/lang/RT.class"))
;;=> "file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class"

Before submitting a PR make sure the following things have been done:

The commits are consistent with our contribution guidelines
You've added tests to cover your change(s)
All tests are passing
The new code is not generating reflection warnings
You've updated the changelog (if adding/changing user-visible functionality)

Thanks!

The java parser/legacy-parser returns a resource's `:file`, actually a relative path as a string, and `:path`, the relative path converted to absolute. It does this conversion through an `io/resource` lookup to find the file system location of the artifact. There are two issues with this: 1. The return value of `io/resource` is discarded, even though it is needed later on in `orchard.java`, causing unnecessary resource lookups. 2. The path is obtained through `(.getPath (io/resource path))`, which yields nonsensical results when used on resources that are inside a JAR. This change keeps the `:file` returned from java-parser (relative path string), but removes the `:path` value in favor of a `:resource-url`, i.e. the return value of `(io/resource path)`. It then provides utility functions to work with this resource URL directly, namely mapping it to a filesystem artifact, or retrieving its modification time. java.parser/parser already does a `io/resource` lookup for the resource it is parsing, meaning it has a full URL (jar: or file:). By including this URL in the map it returns callers can do further checks or operations on the resource without having to re-scan the classpath. This in turn allows us to simplify and optimize `orchard.java/class-info`. The old version would call `io/resource` on each invocation, even when the result was already cached, in order to find the artifact's modification time. This can noticably speed up cider-nrepl's `"stacktrace"` op, which calls `orchard.java/class-info` on each stack frame, in extreme cases taking multiple seconds to analyze all stack frames and return a result. This leads to rather jarring UX, with the stacktrace buffering popping up in Emacs seconds after the evaluation and exception happened. This is exarcerbated when using the nREPL sideloader, which adds overhead to each resource lookup. This also makes the return value of java-parser more correct. As mentioned the previous version would always call `(.getPath (io/resource path))`, but this only makes sense for `file:` resources, not for `jar:` resources. For `file:` URLs `.getPath` returns the path of the file. For `jar:` URLs it returns the nested url+the jar entry, so a `file:` URL but with a dangling `!/jar/entry`. Illustration of why calling `getPath` on `jar:` URLs is not useful: ```clojure (io/resource "lambdaisland/witchcraft.clj") ;;=> #java.net.URL "file:/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj" (.getPath (io/resource "lambdaisland/witchcraft.clj")) ;;=> "/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj" (io/resource "clojure/lang/RT.class") ;;=> #java.net.URL "jar:file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class" (.getPath (io/resource "clojure/lang/RT.class")) ;;=> "file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class" ```

vemv · 2021-11-11T07:53:03Z

src-jdk8/orchard/java/legacy_parser.clj

               :file path
-               :path (. (io/resource path) getPath))))
+               ;; Legacy key. Please do not remove - we don't do breaking changes!
+               :path (-> path io/resource .getPath)


mmm now that I added :path again in the second commit, there are two io/resource calls, which go against this PR's intent.

Can extract to a common let later today (if you don't beat me to it)

I've added a longer comment, but I really think this should simply go away or become an alias for :resource-url. Again I'll say that we shouldn't go overboard for backwards compatibility concerns that likely don't have any real-world implications. There are like 5 projects using cider-nrepl and all of them model their API usage on CIDER, as I never documented the API particularly well. :-)

What's overboard here? The code is pretty simple, I'm not contorting it to keep compat (see Don't compute io/resource twice for :path commit).

In terms of computations, no extra cost is incurred. It's Just Data 🙂

Honestly I don't understand why the default should be dropping instead of keeping. In this case, dropping is being actually costlier as several people have to be queried/coordinated to make sure it's safe.

And we can't never be really sure. Clojure/Emacs are hackers' tools they can be consumed in unsuspected ways.

For me this PR had two goals:

eliminate a performance problem

make the API simpler

If we still compute a redundant key, that's one redundant computation, even after the optimization you did. Not a big deal perhaps. More importantly, however, it's one key with unclear purpose in the result that was added somewhat accidentally. I mean - even I couldn't tell you what this key was about, before looking into the code. Now we're back to square one - the key is here, and people will wonder what exactly is the difference between all the "path" keys. Anyways, I don't want to argue too much on this topic and I'll merge the PR in its current form for now. I might end up simplifying this API myself down the road.

bbatsov · 2021-11-11T08:16:53Z

Why add :path back if our aim was to reduce the confusion with it and likely it's not used anywhere? As there's no way to mark something in the responses as deprecated (in a way where the clients will understand they are using a deprecated API) I think that keeping it around will likely increase the chance of someone using it, rather than solve a potential breakage that the PR might introduce. If we keep I'd say to just make it an alias for :resource-url, even if this changes what it was.

bbatsov · 2021-11-11T08:20:54Z

@PEZ @liquidz Any concerns about dropping :path?

PEZ · 2021-11-11T09:44:39Z

Thanks for checking with me, @bbatsov!

Not sure how to figure out if Calva relies on :path... Which ops would be affected? I've looked at Calva's usage of the info op and we are never accessing any :path entry in the result. We do use the ns-path op and pick up :path from its results.

We will of course make any changes necessary to not rely on this. So please go ahead removing it. To me it is more a matter of knowing which changes I need to make.

bbatsov · 2021-11-11T10:01:58Z

I think this will impact directly only the info op. Perhaps the stacktrace op as well. @plexus Please, correct me if I'm wrong.

PEZ · 2021-11-11T10:46:50Z

Cool. We're using file-url in that op, so we are probably good. 😄

plexus · 2021-11-11T11:05:18Z

The cider-nrepl info is affected in a limited way, this is the current behavior:

;; ======= clojure var
(-> {:ns 'clojure.core :sym 'str}
    info-reply
    (select-keys ["path" "file" "resource"]))
;;=>
{"file" "jar:file:/home/arne/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/core.clj"
 "resource" "clojure/core.clj"}

;; ========== java method
(-> {:ns 'clojure.core :sym 'clojure.lang.Compiler/eval}
    info-reply
    (select-keys ["path" "file" "resource"]))
;;=>
{}

;; ========== java class
(-> {:ns 'clojure.core :sym 'java.util.List}
    info-reply
    (select-keys ["path" "file" "resource"]))
;;=>
{"path" "file:/home/arne/.sdkman/candidates/java/16.0.1.hs-adpt/lib/src.zip!/java.base/java/util/List.java"
 "file" "jar:file:/home/arne/.sdkman/candidates/java/16.0.1.hs-adpt/lib/src.zip!/java.base/java/util/List.java"
 "resource" "java.base/java/util/List.java"}

Note that only in the case of a java class symbol there is currently a "path", and that will go way. If you are relying on that then you should use "file" or "resource" instead.

The stacktrace op is not affected, for java stackframes it calls java/resolve-symbol, which does call java/class-info, but the value of :path is ignored.

This PR is just a first step, the thing is that cider-nrepl is doing some io/resource lookups of its own, which could be done (and cached) in Orchard, so the Orchard API becomes consistent (returns file and resource-url if it has them), and cider-nrepl's API can become consistent, while dropping extra io/resource lookups. I think we should also hide :resource-url from the result, since it will be the same as "file", and we can still bring back "path" at this level if we want to for backwards compatibility.

Note that in the info op the meaning is basically flipped, file is a full URL, resource is a relative path.

{:file resource-url
  :resource file
  :path (.getPath (URL. resource-url))}

bbatsov · 2021-11-11T17:51:22Z

I think we should also hide :resource-url from the result, since it will be the same as "file", and we can still bring back "path" at this level if we want to for backwards compatibility.

Sounds prudent. In general I'm way more concerned about the API at the nREPL level, as I'm not even sure if someone is using Orchard directly these days. It's a pity Clojars doesn't have a reverse deps functionality.

Note that in the info op the meaning is basically flipped, file is a full URL, resource is a relative path.

Yeah, that's a bit confusing indeed. I think my reasoning for the op was file paths were supposed to be absolute and resource paths were supposed to be relative, as resources are a Java concept and files are universal. Perhaps this was misguided. We can always introduce keys with more consistent names and gradually adopt them over time, if we deem this necessary.

plexus · 2021-11-11T17:58:43Z

It seems we keep ending up in similar discussions about how much breakage we will accept. I have a pretty strong aversion to (potentially) breaking existing consumers. "Don't break userspace" etc. But that does mean you need to own all your mistakes forever, and there are a lot of mistakes in nrepl/cider/orchard. Stuff has been built in a rather ad hoc way, with lots of inconsistencies and things that don't quite work as well as they should. And this is turn has led to accidental breakage in the past.

So you have two options here, you try to evolve while minimizing breakage, or you freeze what is there and tell people that if they want something better it'll have to be in a new project or fork.

Something that stands out here in particular is that we have an nrepl op (info) which can returns vastly different things. Are these things documented? Can we have a protocol spec for these things? There are multiple nrepl server implementations out there, it would be nice if clients or servers could be considered compliant with the protocol in general, instead of tying themselves to implementation details.

bbatsov · 2021-11-11T18:33:01Z

But that does mean you need to own all your mistakes forever, and there are a lot of mistakes in nrepl/cider/orchard.

I agree. Admittedly, I've done plenty of mistakes mostly because I had to fit too much work into too little time (which meant that often I didn't consider carefully the design of some things). And I hadn't envisioned cider-nrepl as a reusable API early on. In the long run dealing with those mistakes became another time sink, that's why I'm in favor of tackling them more aggressively when feasible.

So you have two options here, you try to evolve while minimizing breakage, or you freeze what is there and tell people that if they want something better it'll have to be in a new project or fork.

I'm obviously in favor of the first option. :-)

Something that stands out here in particular is that we have an nrepl op (info) which can returns vastly different things. Are these things documented?Something that stands out here in particular is that we have an nrepl op (info) which can returns vastly different things. Are these things documented?

They are not documented yet, but also the op is marked as experimental so it's API is definitely not set in stone. Simply speaking - I meant for it to return data in the same format as cider-nrepl's info, so it'd be easier for existing clients to leverage it, without the need for conditional logic. Same with completions.

There are multiple nrepl server implementations out there, it would be nice if clients or servers could be considered compliant with the protocol in general, instead of tying themselves to implementation details.

Indeed.

liquidz · 2021-11-11T20:49:44Z

@bbatsov Sorry for late reply
No problem since we are using :file .

plexus and others added 4 commits November 5, 2021 11:22

Undo breaking changes

0e3ce38

Make jar/file processing more homegeneous

f9161a6

clj-kondo refinements

ce13519

plexus requested a review from vemv November 11, 2021 06:31

Adapt a test

9e53caf

vemv reviewed Nov 11, 2021

View reviewed changes

Don't compute io/resource twice for :path

ec4622a

bbatsov merged commit 646af3e into clojure-emacs:master Nov 11, 2021

bbatsov mentioned this pull request Nov 14, 2021

Document better the lookup middleware return values nrepl/nrepl#257

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retain resource URL in java-parser results, improve caching #139

Retain resource URL in java-parser results, improve caching #139

plexus commented Nov 11, 2021

vemv Nov 11, 2021

bbatsov Nov 11, 2021

vemv Nov 11, 2021

bbatsov Nov 11, 2021

bbatsov commented Nov 11, 2021

bbatsov commented Nov 11, 2021

PEZ commented Nov 11, 2021

bbatsov commented Nov 11, 2021

PEZ commented Nov 11, 2021

plexus commented Nov 11, 2021

bbatsov commented Nov 11, 2021

plexus commented Nov 11, 2021

bbatsov commented Nov 11, 2021

liquidz commented Nov 11, 2021

Retain resource URL in java-parser results, improve caching #139

Retain resource URL in java-parser results, improve caching #139

Conversation

plexus commented Nov 11, 2021

vemv Nov 11, 2021

Choose a reason for hiding this comment

bbatsov Nov 11, 2021

Choose a reason for hiding this comment

vemv Nov 11, 2021

Choose a reason for hiding this comment

bbatsov Nov 11, 2021

Choose a reason for hiding this comment

bbatsov commented Nov 11, 2021

bbatsov commented Nov 11, 2021

PEZ commented Nov 11, 2021

bbatsov commented Nov 11, 2021

PEZ commented Nov 11, 2021

plexus commented Nov 11, 2021

bbatsov commented Nov 11, 2021

plexus commented Nov 11, 2021

bbatsov commented Nov 11, 2021

liquidz commented Nov 11, 2021