Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Add support for handling remote file URIs. #629

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Carpe-Wang
Copy link

Summary

This PR adds support for handling remote file URIs within the project. It addresses the issue where the system could not correctly process URIs that point to remote files.

Changes

  • Implemented functionality to download remote files specified by file: URIs.
  • Added unit tests to verify the correctness of this new feature.

Issue Link

Fixes #616

@Carpe-Wang Carpe-Wang changed the title Add support for handling remote file URIs. [fix] Add support for handling remote file URIs. Aug 18, 2024
@bioball
Copy link
Contributor

bioball commented Aug 19, 2024

Hey Carpe, thanks for the PR!

I don't know if this is the right solution to the problem. As far as I can tell, there is no agreed upon protocol when opening files with a remote hostname.

  • Windows machines will use SMB to open the file
  • Safari on macOS will strip the hostname entirely and try to open the file from the local machine
  • Java's URL#openConnection() will use FTP
  • I can't find any examples of using HTTP to fetch the file

Without a protocol here, I think some valid choices are:

  1. Explicitly don't support, and instead throw an error
  2. Support on Windows only
  3. Follow Java's lead and use FTP

I'm kind of leaning towards option 1 because it's the least amount of commitment and allows us to address this if the need arises later.

Curious what @holzensp and @stackoverflow think

@Carpe-Wang
Copy link
Author

@bioball Hi bioball, I got your idea, If both of them also suggest modifying it to directly throw an exception, I will submit a new commit to implement this.

@stackoverflow
Copy link
Contributor

Agree that throwing an error seems like the best solution, as there's no consensus on how to handle host names in files.

@holzensp
Copy link
Contributor

I would have expected NFS on Linux/Solaris/FreeBSD/etc. Agreed; randomly guessing protocols to start hammering seems risky, at best. Exception would be the way to go.

@Carpe-Wang
Copy link
Author

@bioball hi bioball, I change to throw the exception directly. Can you help me to ensure satisfying the requirement?

Copy link
Contributor

@bioball bioball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Carpe-Wang: can you add this check to org.pkl.core.module.ModuleKeys#file?

That method should throw a URISyntaxException if there is a hostname set. And we should remove the check from IoUtils#toUrl; that's not quite the right place for it.

Thanks for the help here!

@Carpe-Wang
Copy link
Author

Hi @bioball , I want move check URL logic to ModuleKeys inner ProjectPackage.class's ResolvedModuleKeymethod like below:

    @Override
    public ResolvedModuleKey resolve(SecurityManager securityManager)
        throws IOException, SecurityManagerException {
      var uri = packageAssetUri.getUri();
      try {
        if (uri.getHost() != null) {
          throw new URISyntaxException(uri.toString(), "Host not allowed");
        }
      }catch (URISyntaxException e) {
        throw new IOException(e.getMessage());
      }
      securityManager.checkResolveModule(uri);
      var dependency =
          getProjectDependenciesManager().getResolvedDependency(packageAssetUri.getPackageUri());
      var local = getLocalUri(dependency);
      if (local != null) {
        var resolved =
            VmContext.get(null).getModuleResolver().resolve(local).resolve(securityManager);
        return ResolvedModuleKeys.delegated(resolved, this);
      }
      var dep = (Dependency.RemoteDependency) dependency;
      assert dep.getChecksums() != null;
      var bytes = getPackageResolver().getBytes(packageAssetUri, false, dep.getChecksums());
      return ResolvedModuleKeys.virtual(this, uri, new String(bytes, StandardCharsets.UTF_8), true);
    }

Do you think is okay? If I directly throw a URISyntaxException without catching it, I would need to modify the method signature to public ResolvedModuleKey resolve(SecurityManager securityManager) throws IOException, SecurityManagerException, URISyntaxException. Additionally, the interface and corresponding Override methods would also need to include URISyntaxException. Does this match your expectations?

@bioball
Copy link
Contributor

bioball commented Oct 25, 2024

I think we should add the check to the org.pkl.core.module.ModuleKeys#file factory method; this is the entrypoint for all file-based modules.

This is a pattern that we already have! Take a look at org.pkl.core.module.ModuleKeys#pkg for another example of a module key that throws a URISyntaxException upon initialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PklBugException when given a file: URI with a host
4 participants