Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

Phase 2-1: Module Maps integrating with in Memory Resources #231

Closed
bmeck opened this issue Nov 26, 2018 · 24 comments
Closed

Phase 2-1: Module Maps integrating with in Memory Resources #231

bmeck opened this issue Nov 26, 2018 · 24 comments

Comments

@bmeck
Copy link
Member

bmeck commented Nov 26, 2018

@jkrems and I spent an hour gathering what we could on why we opposed each other on Module Maps per a previous meeting on the topic of 2-1 . The result was largely trying to talk about what a "Module Map" is and moved off of the topic of 2-1 towards other constraints and APIs that could be related to but not part of 2-1. We have been busy with holidays and work for a couple weeks and so we were a bit slow in getting our write up reviewed by each other to set a base of what we are talking about when talking about Module Maps and Synthetic Resources.

This document is available at https://docs.google.com/document/d/12-dba4YVUGQwdbscceVXUqeaqi-P7tgKrbniQxpUeOw/edit .

The document largely was gathering information from other parts of the ecosystem and reviewing what a "Module Map" is in other situations similar to what we were discussing and trying to sort out in our 1 hour sync.

Questions and concerns about what relates to module maps that caused our confusion and difference in viewpoints are listed in the document and can be summarized:

  • Are Locations only relevant for Modules?
  • Are Locations able to collide at a global level?
  • Locations can be determined before format or body of the Module Record is known.
  • Imports can be resolved prior to body of a Module Record being fully parsed.
  • What can be used to represent different Module Records in a non-polymorphic way?
  • Location does not guarantee access to the body it points to.

I think in order to talk about module maps we should figure out these points as a group so that we can move forward without getting into repeated questions about what module maps are/should become to support modules that come from in-memory locations.

@bmeck bmeck changed the title 2-1 Module Maps integrating with in Memory Resources Phase 2-1: Module Maps integrating with in Memory Resources Nov 26, 2018
@devsnek
Copy link
Member

devsnek commented Nov 26, 2018

my quick take on these

  • Are Locations only relevant for Modules?

maybe i'm missing the definition of "location" in this context, but anything we resolve via fs or module or esm has a location in digital space of where it resides, whether that be file:///a.mjs or vm:module(0) or /Users/snek/transactions.csv

  • Are Locations able to collide at a global level?

given the above, i should hope not.

  • Locations can be determined before format or body of the Module Record is known.

Like the issue before about using a cjs: scheme, i will argue that the location of something does not determine what it is, and therefore must always be known before running checks to tell what it is.

  • Imports can be resolved prior to body of a Module Record being fully parsed.

you have to at least perform a lexer pass of the entire body, and at that point building a node tree isn't much more expensive (at least for native engines, i dunno about stuff like acorn or babylon)

  • Location does not guarantee access to the body it points to.

given how every existing filesystem and resolution system works, where you can know the location of something without having permission to read it, i wouldn't expect this to be any different.

@bmeck
Copy link
Member Author

bmeck commented Nov 26, 2018

maybe i'm missing the definition of "location" in this context, but anything we resolve via fs or module or esm has a location in digital space of where it resides, whether that be file:///a.mjs or vm:module(0) or /Users/snek/transactions.csv

Yes, but things like /Users/snek/transactions.csv might never be loaded using import even if they occupy a location in that digital space. Additionally, that digital space not be forgeable (ie. it cannot be reproduced without having a path to the original address).

given the above, i should hope not.

this gets a bit odd since in the document when talking about things, browser inline script tags do in fact share location for example.

you have to at least perform a lexer pass of the entire body, and at that point building a node tree isn't much more expensive (at least for native engines, i dunno about stuff like acorn or babylon)

you can perform these passes as you stream text, this is how the preload parser in browsers do things for HTML and can be applied to ESM whenever the streaming module compiler is finished for v8.

@jkrems
Copy link
Contributor

jkrems commented Nov 26, 2018

Are Locations able to collide at a global level?

Depends on what "collide" means. E.g. if we allow per-context hooks into loading, then the same location string might refer to different logical locations. So at a "per machine" level, I think there will be collisions. At a per process level, there may be collisions. On a per context/realm level, there shouldn't be collisions. That'd be my answer.

[...] i will argue that the location of something does not determine what it is, and therefore must always be known before running checks to tell what it is.

I think this misinterprets the question. This is not about "should the location string imply additional meta data". It's about "should it be possible to first determine the location string and then use it to look up meta data". This allows separation of concerns when implementing things like "load from archive" or "resolve from in-memory resolution map".

Imports can be resolved prior to body of a Module Record being fully parsed.

Looks like this is confusing and can be misunderstood (afaict). Maybe "body of all of the imported Module Records being fully parsed"?

@bmeck
Copy link
Member Author

bmeck commented Nov 26, 2018

Looks like this is confusing and can be misunderstood (afaict). Maybe "body of all of the imported Module Records being fully parsed"?

Well it is about the importing module not being finished. eg.

import 'foo';
// streaming of this file waits 10s due to network or w/e
console.log('hi');

You can parse out import 'foo'; and begin to fetch it eagerly before waiting for the rest of the text.

so, I think it is more about body of the importing Module Record being fully parsed rather than imported Module Record.

@devsnek
Copy link
Member

devsnek commented Nov 26, 2018

@bmeck

You can parse out import 'foo'; and begin to fetch it eagerly before waiting for the rest of the text.

that seems totally reasonable

browser inline script tags

i wouldn't consider that multiple things colliding, since there are well defined semantics on how a document parses/evaluates and there is no way to access a single script/module in that context.

@jkrems

should it be possible to first determine the location string and then use it to look up meta data

i don't really understand this sentence. you need to know the location of something before you interact with it because otherwise you don't have the thing to interact with. what is "meta data" in this case?

@jkrems
Copy link
Contributor

jkrems commented Nov 26, 2018

@devsnek "meta data" is things like content type, a more general "it can be read", and any other things that aren't the actual bytes of content. In this case it is what motivates the split between locations, resources, and modules. A system may determine a location without necessarily worrying about how to retrieve a resource from that location. If locations, resources, and modules are tied into one, that kind of isolation is no longer possible.

@devsnek
Copy link
Member

devsnek commented Nov 26, 2018

A system may determine a location without necessarily worrying about how to retrieve a resource from that location.

yes. this is the sane way to set this up. (see: *nix and windows nt methods for dealing with files, where first resolution occurs, and then you can access stuff like file type and permissions and whatnot, and if allowed, read/write/exec/etc)

@bmeck
Copy link
Member Author

bmeck commented Dec 3, 2018

We should setup a meeting to discuss these things since we have had a bit of time to think about and review existing work on it.

How would people feel about setting up some time later this week to do so? To guage availability and interest, we can try and schedule this via a doodle: https://doodle.com/poll/zs7mbgz3hp7tax75

I think the main thing we need to discuss is what exactly we are trying to specify as needed to accomplish 2-1. For example:

  • reach some overall agreement on what a Module Map is per discussing 2-1 (this might be doable purely in this thread)
  • identify how resources that are not loaded via URL be placed into a Module Map that previously has been keyed off URLs
  • identify parts of 2-1 are not solely tied to modules and can/should be moved to a different discussion group
  • figure out any restrictions we want to place in order to guarantee some behavior and/or reserve something for a future proposal

@jkrems
Copy link
Contributor

jkrems commented Dec 4, 2018

Added myself to the doodle. Thanks for setting this up!

@bmeck
Copy link
Member Author

bmeck commented Dec 4, 2018

Lets aim for Wednesday 4-5PM CT via hangouts like other phases have used.

In your local time:

Hangout Link

@SMotaal
Copy link

SMotaal commented Dec 5, 2018

@bmeck I'm not able to add myself to the doodle but would like to participate if possible.

@bmeck
Copy link
Member Author

bmeck commented Dec 5, 2018

@SMotaal feel free to join via the hangout link above

@SMotaal
Copy link

SMotaal commented Dec 6, 2018

Very interesting discussion yesterday. But there was one possible outcome I was trying to work out based on the clear advantages for each approach:

Suppose there is a top loader, ie it controls the key-to-record operations, and using such a loader effectively determines and controls which chainable loaders can be chained, enforcing at the very least the key type of their choice.

  1. How much overhead (if it is possible) would it take to make such a design work for both types of top loaders?

  2. Can (likely not) this be merely a loader API consumed in ESM-based loaders? Any thoughts on how we can mitigate the downsides of bindings from ever affecting end-users (where historically packages with such build steps were cause for unexpected issues by the end-users)?

  3. If the above are hinting at something reasonable, can we say that in some cases, where handles are introduced, that such handles can be remapped internally as unique strings if necessary, where such strings can effectively be partitioned out at the top loader level if they use handles?

@SMotaal
Copy link

SMotaal commented Dec 19, 2018

@weswigham @jkrems @bmeck

Would really appreciate your views on the direction suggested.

@bmeck
Copy link
Member Author

bmeck commented Dec 19, 2018

@SMotaal I don't understand the proposal. Which keys and which records are being talked about? Also, note that loaders should not be able to mutate behaviors of other loaders by changing how import works within other loaders.

If we are talking about requested import specifier tuples then controlling them to point to records is the idea that we talked about at the last meeting. I don't understand however how that is different from chainable loaders? What is the distinguishing features of these 2 workflows? I only see a single workflow of loaders calling out to other ones: first loader -> second loader -> ... -> node's default loader.

@weswigham
Copy link
Contributor

The only thing that I found core was that the "string associated with a module" was not necessarily unique under many schemes and that module identity should to be distinct from that to handle a bunch of usecases (asset references, dynamically generated modules) while keeping the API hard to use insecurely (so original security context information could travel alongside the identity handle).

@SMotaal
Copy link

SMotaal commented Dec 23, 2018

In more detail, the approach I was considering is separates the key from the actual url of the module (ie import.meta.url) in the loader layer where we can potentially still use URLs as keys and rely on non-standard schemes to securely marshal module requests between loaders. The internal loader uses those schemes to determine which loader will be used to resolve links... etc.

In special cases where extra security is needed, a custom loader is paired with a randomized scheme and obfuscates the requested urls with hashes kept in a private hash-to-request map passed to the custom loader during initialization. This map can potentially be prepopulated (ie running threads) not to dwell on the details, but working with multiple instances of the same custom loader across threads where each has its own randomized scheme can still be coordinated. With this approach, the private map is where we offset the more complex aspects of supporting things like handles.

So even if the strings were mishandled somewhere, they cannot be used to spin up a hacked spawn that will request the private mapped string because the unique and randomly generated scheme which the custom loader receives (and can potentially validate before any bootstrapping) provides the additional compartmentalization to reasonably lock down such access.

I am not certain enough about how handles are handled across threads and welcome any insights that could lend to alternatives in our discussions.


Side Note: I created a project board per last meeting and would appreciate if we can collectively populate the Module Maps column.

@bmeck
Copy link
Member Author

bmeck commented Dec 30, 2018

@SMotaal I'm not following this explanation; there are multiple loaders, schemes, and maps being mentioned but it isn't exactly clear how they interact.

@bmeck
Copy link
Member Author

bmeck commented Jan 7, 2019

Since people are likely back from vacations this week, can we schedule another meeting on how we want to have APIs consume locations/contents of resources? I have a setup a doodle : https://doodle.com/poll/au28mhxyeg43eenn

In particular we have a few APIs of note that I can think of:

Loader Realm:

  • Loaders referencing non-synthetic locations.
  • Loaders referencing synthetic locations.

These operations are likely to use the same data structures that unwrap to specific kinds either by inheritance or reflective operations. We should figure out the operations we need here, in particular we have the general hooks from the spec that are:

HostResolveImportedModule: (specifier: string, referrer: Identity) => To<ModuleRecord>

Differentiating synthetic from non-synthetic locations needs to avoid accidental coercions to the opposite kind ideally. Service workers using cache APIs are a nice way to avoid this, by placing all locations into a Response based identity that can be consumed only once and stops holding onto referenced memory afterwards.

I think the actual resolution of locations can be left out of this discussion since it applies to general resource operations and we should focus on what To<ModuleRecord> needs as fields (internal or reflective) for the host to consume.

Main+Loader Realm:

  • Ability to create synthetic location.
  • Ability to put content into a synthetic location.

These are likely to be 2 distinct operations even if they have 1 API. Per discussions in w3c/FileAPI#97 there is a need to generate locations prior to having content for circular dependencies. This can be achieved in multiple APIs form such as but not limited to something like:

getNewLocation((assignContent) => {assignContent(stream, {...metadata})});

There needs to be consensus on what assignContent does when called multiple times and what metadata is required (such as status and/or format) and what is dropped.

These are generic operations but they must cooperate with To<ModuleRecord> in the topics above. In particular a list of fields that form a primary key to uniquely identify a location should be decided upon, and with the expectation that this list of fields may grow due to things like cross Realm imports and/or Compartments isolating modules within a single Realm.

  • Referencing contents from locations.

There is a general need to be able to consume location contents. In general this likely just means that we need a generic way retrieve the data passed to assignContent. Node.js' JS streams are quite slow and lead to a lot of GC. In addition, we should try to allow for duplicated uses of resources, but eagerly freeing resources when they are unlikely to be used again. Looking at how Response has .clone might give us insight here.

@bmeck
Copy link
Member Author

bmeck commented Jan 9, 2019

Friday 10AM PT seems to work, I have made a calendar event which will use hangouts.

@bmeck
Copy link
Member Author

bmeck commented Jan 11, 2019

We had our meeting and had a few conclusions:

Using a stripped down Response for To<ModuleRecord> seems fine. In particular we want the following:

  1. Avoid unnecessary API surface such as status codes, leaving us with the following:

    1. response.ok - for if the ModuleRecord is supposed to be an error
    2. response.redirected - for if the ModuleRecord is actually a redirect like package.json#main
    3. response.headers.get('content-type') - for the type of ModuleRecord. This must not require string parsing. A helper must be exposed such as a MIME datatype or other.
  2. We need another API besides new Response in order to form the Identity of a ModuleRecord prior to its body.

We did not agree on how to represent Identity, having separate thoughts on using Strings vs AssetReferences. We will schedule another talk on that specific issue.

@bmeck
Copy link
Member Author

bmeck commented Feb 8, 2019

@jkrems are we still wanting to have a meeting on strings vs reference types?

@MylesBorins
Copy link
Contributor

Can this be closed?

@MylesBorins
Copy link
Contributor

Closing. Please reopen if needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants