Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain resource URL in java-parser results, improve caching #139

Merged

Conversation

plexus
Copy link
Contributor

@plexus plexus commented Nov 11, 2021

This PR replaces #124, see there for previous discussion.

The java parser/legacy-parser returns a resource's :file, actually a relative
path as a string, and :path, the relative path converted to absolute. It does
this conversion through an io/resource lookup to find the file system location
of the artifact. There are two issues with this:

  1. The return value of io/resource is discarded, even though it is needed
    later on in orchard.java, causing unnecessary resource lookups.

  2. The path is obtained through (.getPath (io/resource path)), which yields
    nonsensical results when used on resources that are inside a JAR.

This change keeps the :file returned from java-parser (relative path string),
but removes the :path value in favor of a :resource-url, i.e. the return
value of (io/resource path). It then provides utility functions to work with
this resource URL directly, namely mapping it to a filesystem artifact, or
retrieving its modification time.

java.parser/parser already does a io/resource lookup for the resource it is
parsing, meaning it has a full URL (jar: or file:). By including this URL in the
map it returns callers can do further checks or operations on the resource
without having to re-scan the classpath.

This in turn allows us to simplify and optimize orchard.java/class-info. The
old version would call io/resource on each invocation, even when the result
was already cached, in order to find the artifact's modification time.

This can noticably speed up cider-nrepl's "stacktrace" op, which calls
orchard.java/class-info on each stack frame, in extreme cases taking multiple
seconds to analyze all stack frames and return a result. This leads to rather
jarring UX, with the stacktrace buffering popping up in Emacs seconds after the
evaluation and exception happened. This is exarcerbated when using the nREPL
sideloader, which adds overhead to each resource lookup.

This also makes the return value of java-parser more correct. As mentioned the
previous version would always call (.getPath (io/resource path)), but this
only makes sense for file: resources, not for jar: resources. For file:
URLs .getPath returns the path of the file. For jar: URLs it returns the
nested url+the jar entry, so a file: URL but with a dangling !/jar/entry.

Illustration of why calling getPath on jar: URLs is not useful:

(io/resource "lambdaisland/witchcraft.clj")
;;=> #java.net.URL "file:/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj"

(.getPath (io/resource "lambdaisland/witchcraft.clj"))
;;=> "/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj"

(io/resource "clojure/lang/RT.class")
;;=> #java.net.URL "jar:file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class"

(.getPath (io/resource "clojure/lang/RT.class"))
;;=> "file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class"

Before submitting a PR make sure the following things have been done:

  • The commits are consistent with our contribution guidelines
  • You've added tests to cover your change(s)
  • All tests are passing
  • The new code is not generating reflection warnings
  • You've updated the changelog (if adding/changing user-visible functionality)

Thanks!

plexus and others added 4 commits November 5, 2021 11:22
The java parser/legacy-parser returns a resource's `:file`, actually a relative
path as a string, and `:path`, the relative path converted to absolute. It does
this conversion through an `io/resource` lookup to find the file system location
of the artifact. There are two issues with this:

1. The return value of `io/resource` is discarded, even though it is needed
later on in `orchard.java`, causing unnecessary resource lookups.

2. The path is obtained through `(.getPath (io/resource path))`, which yields
nonsensical results when used on resources that are inside a JAR.

This change keeps the `:file` returned from java-parser (relative path string),
but removes the `:path` value in favor of a `:resource-url`, i.e. the return
value of `(io/resource path)`. It then provides utility functions to work with
this resource URL directly, namely mapping it to a filesystem artifact, or
retrieving its modification time.

java.parser/parser already does a `io/resource` lookup for the resource it is
parsing, meaning it has a full URL (jar: or file:). By including this URL in the
map it returns callers can do further checks or operations on the resource
without having to re-scan the classpath.

This in turn allows us to simplify and optimize `orchard.java/class-info`. The
old version would call `io/resource` on each invocation, even when the result
was already cached, in order to find the artifact's modification time.

This can noticably speed up cider-nrepl's `"stacktrace"` op, which calls
`orchard.java/class-info` on each stack frame, in extreme cases taking multiple
seconds to analyze all stack frames and return a result. This leads to rather
jarring UX, with the stacktrace buffering popping up in Emacs seconds after the
evaluation and exception happened. This is exarcerbated when using the nREPL
sideloader, which adds overhead to each resource lookup.

This also makes the return value of java-parser more correct. As mentioned the
previous version would always call `(.getPath (io/resource path))`, but this
only makes sense for `file:` resources, not for `jar:` resources. For `file:`
URLs `.getPath` returns the path of the file. For `jar:` URLs it returns the
nested url+the jar entry, so a `file:` URL but with a dangling `!/jar/entry`.

Illustration of why calling `getPath` on `jar:` URLs is not useful:

```clojure
(io/resource "lambdaisland/witchcraft.clj")
;;=> #java.net.URL "file:/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj"

(.getPath (io/resource "lambdaisland/witchcraft.clj"))
;;=> "/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj"

(io/resource "clojure/lang/RT.class")
;;=> #java.net.URL "jar:file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class"

(.getPath (io/resource "clojure/lang/RT.class"))
;;=> "file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class"
```
@plexus plexus requested a review from vemv November 11, 2021 06:31
:file path
:path (. (io/resource path) getPath))))
;; Legacy key. Please do not remove - we don't do breaking changes!
:path (-> path io/resource .getPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm now that I added :path again in the second commit, there are two io/resource calls, which go against this PR's intent.

Can extract to a common let later today (if you don't beat me to it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a longer comment, but I really think this should simply go away or become an alias for :resource-url. Again I'll say that we shouldn't go overboard for backwards compatibility concerns that likely don't have any real-world implications. There are like 5 projects using cider-nrepl and all of them model their API usage on CIDER, as I never documented the API particularly well. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's overboard here? The code is pretty simple, I'm not contorting it to keep compat (see Don't compute io/resource twice for :path commit).

In terms of computations, no extra cost is incurred. It's Just Data 🙂

Honestly I don't understand why the default should be dropping instead of keeping. In this case, dropping is being actually costlier as several people have to be queried/coordinated to make sure it's safe.

And we can't never be really sure. Clojure/Emacs are hackers' tools they can be consumed in unsuspected ways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this PR had two goals:

  • eliminate a performance problem
  • make the API simpler

If we still compute a redundant key, that's one redundant computation, even after the optimization you did. Not a big deal perhaps. More importantly, however, it's one key with unclear purpose in the result that was added somewhat accidentally. I mean - even I couldn't tell you what this key was about, before looking into the code. Now we're back to square one - the key is here, and people will wonder what exactly is the difference between all the "path" keys. Anyways, I don't want to argue too much on this topic and I'll merge the PR in its current form for now. I might end up simplifying this API myself down the road.

@bbatsov
Copy link
Member

bbatsov commented Nov 11, 2021

Why add :path back if our aim was to reduce the confusion with it and likely it's not used anywhere? As there's no way to mark something in the responses as deprecated (in a way where the clients will understand they are using a deprecated API) I think that keeping it around will likely increase the chance of someone using it, rather than solve a potential breakage that the PR might introduce. If we keep I'd say to just make it an alias for :resource-url, even if this changes what it was.

@bbatsov
Copy link
Member

bbatsov commented Nov 11, 2021

@PEZ @liquidz Any concerns about dropping :path?

@PEZ
Copy link
Contributor

PEZ commented Nov 11, 2021

Thanks for checking with me, @bbatsov!

Not sure how to figure out if Calva relies on :path... Which ops would be affected? I've looked at Calva's usage of the info op and we are never accessing any :path entry in the result. We do use the ns-path op and pick up :path from its results.

We will of course make any changes necessary to not rely on this. So please go ahead removing it. To me it is more a matter of knowing which changes I need to make.

@bbatsov
Copy link
Member

bbatsov commented Nov 11, 2021

I think this will impact directly only the info op. Perhaps the stacktrace op as well. @plexus Please, correct me if I'm wrong.

@PEZ
Copy link
Contributor

PEZ commented Nov 11, 2021

Cool. We're using file-url in that op, so we are probably good. 😄

@plexus
Copy link
Contributor Author

plexus commented Nov 11, 2021

The cider-nrepl info is affected in a limited way, this is the current behavior:

;; ======= clojure var
(-> {:ns 'clojure.core :sym 'str}
    info-reply
    (select-keys ["path" "file" "resource"]))
;;=>
{"file" "jar:file:/home/arne/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/core.clj"
 "resource" "clojure/core.clj"}

;; ========== java method
(-> {:ns 'clojure.core :sym 'clojure.lang.Compiler/eval}
    info-reply
    (select-keys ["path" "file" "resource"]))
;;=>
{}

;; ========== java class
(-> {:ns 'clojure.core :sym 'java.util.List}
    info-reply
    (select-keys ["path" "file" "resource"]))
;;=>
{"path" "file:/home/arne/.sdkman/candidates/java/16.0.1.hs-adpt/lib/src.zip!/java.base/java/util/List.java"
 "file" "jar:file:/home/arne/.sdkman/candidates/java/16.0.1.hs-adpt/lib/src.zip!/java.base/java/util/List.java"
 "resource" "java.base/java/util/List.java"}

Note that only in the case of a java class symbol there is currently a "path", and that will go way. If you are relying on that then you should use "file" or "resource" instead.

The stacktrace op is not affected, for java stackframes it calls java/resolve-symbol, which does call java/class-info, but the value of :path is ignored.

This PR is just a first step, the thing is that cider-nrepl is doing some io/resource lookups of its own, which could be done (and cached) in Orchard, so the Orchard API becomes consistent (returns file and resource-url if it has them), and cider-nrepl's API can become consistent, while dropping extra io/resource lookups. I think we should also hide :resource-url from the result, since it will be the same as "file", and we can still bring back "path" at this level if we want to for backwards compatibility.

Note that in the info op the meaning is basically flipped, file is a full URL, resource is a relative path.

{:file resource-url
  :resource file
  :path (.getPath (URL. resource-url))}

@bbatsov
Copy link
Member

bbatsov commented Nov 11, 2021

I think we should also hide :resource-url from the result, since it will be the same as "file", and we can still bring back "path" at this level if we want to for backwards compatibility.

Sounds prudent. In general I'm way more concerned about the API at the nREPL level, as I'm not even sure if someone is using Orchard directly these days. It's a pity Clojars doesn't have a reverse deps functionality.

Note that in the info op the meaning is basically flipped, file is a full URL, resource is a relative path.

Yeah, that's a bit confusing indeed. I think my reasoning for the op was file paths were supposed to be absolute and resource paths were supposed to be relative, as resources are a Java concept and files are universal. Perhaps this was misguided. We can always introduce keys with more consistent names and gradually adopt them over time, if we deem this necessary.

@bbatsov bbatsov merged commit 646af3e into clojure-emacs:master Nov 11, 2021
@plexus
Copy link
Contributor Author

plexus commented Nov 11, 2021

It seems we keep ending up in similar discussions about how much breakage we will accept. I have a pretty strong aversion to (potentially) breaking existing consumers. "Don't break userspace" etc. But that does mean you need to own all your mistakes forever, and there are a lot of mistakes in nrepl/cider/orchard. Stuff has been built in a rather ad hoc way, with lots of inconsistencies and things that don't quite work as well as they should. And this is turn has led to accidental breakage in the past.

So you have two options here, you try to evolve while minimizing breakage, or you freeze what is there and tell people that if they want something better it'll have to be in a new project or fork.

Something that stands out here in particular is that we have an nrepl op (info) which can returns vastly different things. Are these things documented? Can we have a protocol spec for these things? There are multiple nrepl server implementations out there, it would be nice if clients or servers could be considered compliant with the protocol in general, instead of tying themselves to implementation details.

@bbatsov
Copy link
Member

bbatsov commented Nov 11, 2021

But that does mean you need to own all your mistakes forever, and there are a lot of mistakes in nrepl/cider/orchard.

I agree. Admittedly, I've done plenty of mistakes mostly because I had to fit too much work into too little time (which meant that often I didn't consider carefully the design of some things). And I hadn't envisioned cider-nrepl as a reusable API early on. In the long run dealing with those mistakes became another time sink, that's why I'm in favor of tackling them more aggressively when feasible.

So you have two options here, you try to evolve while minimizing breakage, or you freeze what is there and tell people that if they want something better it'll have to be in a new project or fork.

I'm obviously in favor of the first option. :-)

Something that stands out here in particular is that we have an nrepl op (info) which can returns vastly different things. Are these things documented?Something that stands out here in particular is that we have an nrepl op (info) which can returns vastly different things. Are these things documented?

They are not documented yet, but also the op is marked as experimental so it's API is definitely not set in stone. Simply speaking - I meant for it to return data in the same format as cider-nrepl's info, so it'd be easier for existing clients to leverage it, without the need for conditional logic. Same with completions.

There are multiple nrepl server implementations out there, it would be nice if clients or servers could be considered compliant with the protocol in general, instead of tying themselves to implementation details.

Indeed.

@liquidz
Copy link
Member

liquidz commented Nov 11, 2021

@bbatsov Sorry for late reply
No problem since we are using :file .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants