-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't replace "@" when sanitizing paths #515
Comments
Was there a particular reason we were replacing '@OpenZeppelin' with '_openzeppelin' when saving in our repo, like the fs does not allow names with '@'? Also discussed in sourcifyeth/org#63 |
No, I guess we just did not consider |
Hello, guys! So, if you do need this sanitation for any reason, can you also make a translation map file as it was proposed here? |
@tom2drum Hi, thanks for bringing this up. Yes we will add a translation map if we do any changes to the metadata, as this would break the metadata hash. We will handle this and the related issues on the next Milestone https://github.com/ethereum/sourcify/milestone/8 |
Hi, @kuzdogan! I would like to notice that special characters replacement may cause problems even if translation map is provided. You can see it in the following example: https://repo.sourcify.dev/contracts/full_match/5/0x027f1fe8BbC2a7E9fE97868E82c6Ec6939086c52/. Here, the two files have the same name ( As an example where both contracts are available you can see into: |
Hi Rim. Thanks for bringing this up. This also breaks the IPFS hash of the metadata, even though the contract is marked as a We were waiting for couple more things to do a full review of the repository but we should prioritize this @marcocastignoli |
I thought we were modifying the metadata.json file which would change the file's hash but apparently, we are not doing that but only changing the folder names in the filesystem. For some reason the examples provided can't be fetched from IPFS, for which I created another issue. However, as others said, the sanitization breaks the mapping "path in metadata.json" --> "actual path" and without a translation file there's no way how the files are renamed/moved.
An alternative @marcocastignoli mentioned is to urlencode'ing the path but I believe that is a little too restrictive as the URLencoding does not allow many characters such as But let's not wait for fixing |
@rimrakhimov Curious, was this an example you came up with or something you observed in practice? |
That was an artificial example I come up with when researching the ways to handle such mapping ourselves via looking for the valid file path inside the metadata, as described in blockscout/blockscout#7648 Just got an idea that it may not always be one-to-one correspondence and came up with the example |
by url encoding the path I meant:
|
URLEncoding is fine as an example but not a solution. We should not change the file names as much as possible. Another implication of this, I realized, is we currently don't allow non-ASCII filenames, they are also "sanitized". For Unix it should be fine with any characters and there should be no need to "sanitize" special characters. Windows is another issue.. I just wonder if the compiler allows relative paths in the metadata, something like |
this is how solidity handles |
Here are some weird "source unit name"s (as Solidity docs refers to) I found with a simple repo scan:
These are full matches so these are the exact outputs of the compiler. Apparently, the compiler does resolve the paths when files are supplied over the CLI but the source unit names stay as they are with the standard JSON. So we need to assume it can be arbitrary strings. This again shows another downside of us having the contract repo directly as a filesystem. If we had a DB we could easily give source unit names as identifiers to the files. But yeah, we need to look into how to provide files over IPFS with a DB. I guess we can:
The problem with the normalization is it does not ultimately remove the leading
outputs
so I guess we need to do both |
It's called like this because what user sees as a "path" is not really a path in a general case. It's a unique identifier assigned to a particular source unit by the compiler. It comes from
Yes. Source unit names can be arbitrary byte sequences. Normally it's going to be something more or less resembling a path, but you can't rely on that. It can have special characters, Unicode characters, invalid encodings of Unicode characters, invisible control characters. It can even be an empty string. It can be a path that is fine on some system but not another. When Unicode is used, it can be in any encoding, not necessarily UTF-8. It can be an URL (if the tool supports it, like Remix) or have a protocol-like prefix (e.g. Truffle prefixes files stored under
If you use Standard JSON, you can directly specify source unit names. They can be anything you want and are not normalized or modified in any way. These names do not have to have any relation to the filesystem whatsoever. The compiler does not force you to make them look like paths, but if you do that, you can then rely on the compiler resolving relative imports containing The complications start when you need to feed the compiler contracts already stored in a filesystem, and that happens in several situations. One of them is giving paths on the CLI - then the compiler tries to assign source unit names for you based on actual paths, in a way that will give you the same source unit name for the same physical file on disk, and will stay the same across different platforms. That source unit name also has to match the way a user would refer to that file in an import. To do that the compiler does transform the paths given on the CLI. It tries to make them relative to your working dir (or base path) and not include anything system specific, like drive letters on Windows, names of UNC shares or the name of your home directory. This is annoyingly complicated and never foolproof - there are situations where you simply cannot do that - e.g. if you give it files stored on two different Windows drives. It's also only done since 0.8.8, where we made that normalization more regular, actually compatible with Now, contracts compiled by frameworks and tools are a different matter. They pretty much always store the contract on an actual filesystem (Truffle, Hardhat, Foundry) or in a database in a way that still uses paths to identify files (Remix). The problem is that they have their own rules of now to put that into Standard JSON. In that case translating paths to source unit names is up to the tool and rules may be completely different from compiler's. For example some frameworks may use absolute paths (which is discouraged), others relative paths, yet others may use a prefix like Truffle. Some may sanitize them, some may not. Some may resolve symlinks, some may not. Some may only allow things from the filesystem, others may allow importing from URLs, IPFS hashes, or even source files generated on the fly (e.g. interfaces automatically generated from ABI JSON). It does not create any issues as long as it's kept inside Standard JSON, but if you want to translate it back to how files where stored originally, that's tough, differs between tools and in some cases may be even impossible/ambiguous.
I think that if you want to handle this in full generality, without risking that you won't be able to compile something due to weird paths, the only sane solution is to have some abstraction layer between names used in Standard JSON and names under which you store things in the filesystem. URL-encoding would be one way to do this, the main downside being that in your URLs the source unit names will stop looking like paths. You might want to come up with some slight variation to get more sensible URLs in simple cases. E.g. it would be more readable if slashes were not encoded. |
This was a difficult tradeoff but the final PR #1132 does the following:
The tradeoff with not URI encoding is the files with source unit names containing non-URI compatible characters (whitespace, non-ascii etc.) or that have reserved URI characters ( I've opted for this because:
The other small downside is that file paths are not Windows compatible. Ultimately, we've learned that we can't assume source unit names to. be paths and a full solution would be having an abstraction layer. This also feels like we need to go in having a DB direction and maybe separately serve the files on IPFS. |
sourcify/services/verification/src/services/Injector.ts
Line 492 in 462e238
View in Huly HI-438
The text was updated successfully, but these errors were encountered: