-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Toggle contenthash
for all output filenames
#518
Comments
Thanks for logging this. Hashing of entry points is indeed on the agenda. One of the hold-ups is that I haven't yet investigated what other bundlers do as a point of comparison. I actually like Rollup's solution to this since it solves multiple things in an elegant way (e.g. sub-directories too). Thanks for linking their docs. The full solution would be something like #553 but it'd be nice to have something built in. The other reason is that my current approach of generating the file name by hashing the file doesn't work in this case. Entry points caused by dynamic |
Sounds good! Worth mentioning that Rollup's hashing isn't deterministic* and is/was up for reconsideration. Have personally run into this issue a few times. *while I don't understand it fully, I think their hashing is based on import order and reference counting. In that sense it's deterministic, but the order in which Rollup follows imports may differ. The output contents may be identical, but live under different output names. IMO the hash should 100% be content-based. |
I think the reason Rollup's hashes aren't 100% content-based is that Rollup doesn't know the content at the hashing stage. E.g. if file A imports file B then Rollup writes something like |
It may or may not be useful, but I had this need while switching over to snowpack, and I wrote a library to help do that - https://github.com/TylorS/typed-content-hash |
Hashing based on content is awesome for caching, I was able to do that in my old Webpack build unless I misunderstood what they meant by The level of control that Webpack offers in regard to output file names and location (both JS and assets) is really good. I was able to replicate that for assets using plugins in esbuild, but that's not easily feasible for JS files because any edit to the output files will likely break the sourcemaps... |
@Ventajou The above library I mention will remap all your sourcemaps with the hash changes FWIW |
Hey, @evanw I don't know much go, but I would love to help out here if I can. I would be willing to contribute code if I was pointed in the right direction, but in the meantime, I can give some unsolicited advice on the algorithm in the library I linked above. I used rollup as the basis for my algorithm since this thread mentioned it, so I figure I could try to break it down. Once you have the otherwise final output (including banner/footers/etc)
Since you're tackling code-splitting and care a lot about perf. you might have already encountered the need to calculate strongly connected components using something like "Tarjan's Algorithm". This will allow you to convert to an acyclic graph from a cyclic graph, usually represented as a list of lists. If the nested list contains a singular item there is no cycle and if there are multiple they are strongly connected or have a cycle. const ouput = [ ['a'], ['b', 'c'], ['d'] ] // b + c would be a cycle Conveniently Tarjan's algo. will already have produced a topological sort and you shouldn't need to do any additional sorting. 2.) Sequentially rewrite imports and compute hashes As you traverse through the sorted list if you encounter a list with no cycles you can 1) rewrite import/export specifiers with previously calculated hashes for its dependencies, 2) calculate the hash for this file and cache it for the next items in the list. As you encounter lists that represent cycles, you'll quickly notice it is impossible to follow the same pattern. Potentially in parallel, you'll create a hash for each of the items in the cycle upfront before rewriting the imports. To do so you'll concatenate the document's contents with the contents of all of its dependencies recursively, excluding anything that has already been concatenated, to get the same benefits of it being deterministically content-based but also ensuring dependents get new hashes as their dependencies change. Again we'll want to cache the hashes we computed for later iterations. Repeat until you've made it through all the lists. I hope it made some sense written out, but I'd be happy to clarify any points and help figure out any specifics for esbuild. I have some TypeScript code samples if those would be helpful at all as well. I could potentially see going about it differently such that all content hashes are calculated in parallel by using the same algo. for cycles for everything as well. I'm not sure if go is better equipped to handle this over JS but it could be worth a try. I was worried it would scale poorly with large/complex dependency graphs. |
Yes, I'm already thinking along similar lines. Not sure if this is exactly what you said or just very similar, but I can describe the algorithm I am currently in the middle of implementing. It's done in three phases:
I think this should naturally handle cycles without having to split them up into connected components first. It also lets the whole graph go in parallel instead of getting bottlenecked waiting for dependencies to finish first. |
Can you explain this part in a little more detail? The "random id" part worries me on its own – will this mean that hashing isn't deterministic? Between builds, output filenames should only have a new hash if their contents changed & not be subjected to a randomized-base for hashing on each pass. |
It's just a placeholder that gets completely removed at the end and that is excluded from the content hash. So all builds would still be completely deterministic. The point of the long random id just just so that 100% unlikely to end up being confused with the actual source code. I was thinking that if you used something like The only way it wouldn't be deterministic is if you have a plugin that does something non-deterministic like sorting imports based on their name I guess. I could also use a deterministic seed for the random number generator, but then you start getting into the same problem where the id might end up in the source code. Output from one build might somehow make it into another build? Maybe that's not something to worry about though. Hope that makes sense. Any thoughts after reading that? |
Ah, that sounds great. Thanks! No immediate thoughts, except for a useless suggestion that you may want to hash the source files' absolute file paths instead of entrusting a rand() for the random id (which is going to be fine basically all the time). |
Hey @evanw, Thanks for getting back. This does indeed sound very similar with the addition of the temporary random ids and using paths as part of the hash. The random ids make sense to me for replacements. I just kept track of start/end offsets when parsing for dependencies, but I didn't have the bundling or plugins to worry about. Adding the output path to the template doesn't quite make as much sense to me, could you elaborate on the intent? My understanding would be that if module |
To avoid bottlenecks, all substitutions of the final paths into the final output files (i.e. phase 3) happen in parallel. This means that when determining the final path of Specifically:
If you don't include the output path template then you have this:
In which case |
That all appears sound to me now that you broke it down for me, thanks! I'm really excited for this feature! Also many thanks for this project ❤️, I really appreciate the goal of keeping the scope limited and focused. |
Good to hear. Thanks for sanity-checking it! |
Code-splitting already produces chunks that are hashed (
chunk.[contenthash].js
), which is awesome, but the entry files themselves are still written with filenames that match the inputs, unlessoutfile
is given something specific.Ideally, one would have the ability to include a content-hash as part of the entry (and/or all) file outputs too.
Rollup does this via
output.entryFileNames
, but all the template patterns aren't necessary imo.. nor is a string template.Perhaps there could be a global
build()
option called:contenthash: boolean
added?(Not suggesting a
out
-prefix since it'd apply generally, much likeminify
andsourcemap
)When disabled, filenames are untouched.
When enabled, the content hash is calculated and injected into all file names (
bundle.js
->bundle.[hash].js
)This may already be on the agenda via #268 (comment) comments, but after searching a bit, I couldn't find anything that addressed non-chunk hashing specifically.
The text was updated successfully, but these errors were encountered: