-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the concept of "file extension" from data-files/extra-source-files wildcards #5883
Comments
Also from the docs:
This rationale makes sense to motivat not adding super sophisticated globbing features, there's no point artificially restricting trivial use cases. |
Agreed that this restriction isn't pulling its weight. The implementation would actually have been simpler if this restriction had never existed, but we don't live in that world. The only snare is that we'll want to have the less-restricted globs only if the |
This is a big pain for Idris in getting it working with |
Iirc the extension is there to prevent sdisting tmp files, e.g. .dataFile.txt.swp or dataFile.txt~ or whatever hidden there.
And we don’t want maintain the list of all possible tmp file variants either.
… On 13 Apr 2019, at 17.51, Niklas Larsson ***@***.***> wrote:
This is a big pain for Idris in getting it working with new-install. Every test for the compiler has files named run and expected, so we can't use wildcards as they are.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
How about at least letting '.' be considered a filename character? Cabal is currently telling me that |
@phadej Then let's try to find another solution to exclude those, such as allowing the user to specify patterns to exclude. Enabling a convenience feature (avoiding sdisting tempfiles) by making a normal use case completely impossible (globbing normal extensioneless files) is a severe restriction and makes cabal inconvenient to use. The rest of the unix world won't add extensions to their files so that cabal can glob them. |
@nh2, please tell when you find one. "Allow user to specify patterns to exclude" is waiting for mistakes to happen: Including unnecessary files in sdist tarballs. And a mistake which is difficult to catch on CI (missing file is obvious). So I'm simply 👎 on this, before some good and concrete plan forward is written down. @chreekat use
|
@phadej one solution would be to do what hpack does: ignore hidden files (starting with '.') and otherwise just use Unix globs. I believe this would cover both cases you mentioned: .dataFile.txt.swp would be ignored, and dataFile.txt~ would not match It's true that there is no perfect solution to this, but from a Linux user's perspective, some things are more surprising than others. Cabal's behavior is pretty far out there. |
Oh I just saw you pinged me as well. Thanks for the tip! |
This sounds like a good approach to me: Ignoring dotfiles is a common behaviour, and for the case that one wants them, one could give |
@phadej Why can't I use the form |
+1 for removing the restrictions from globs when specifying data-files. I have a directory So far I was using Stack and I had following config:
That was great! This is what hpack generated, in .cabal file:
However, I am migrating to Cabal now, and the restrictions above pose following problems:
For example, I now converted the above list of data files to: data-files:
Generator/templates/Dockerfile
Generator/templates/dockerignore
Generator/templates/react-app/gitignore
Generator/templates/server/gitignore
Generator/templates/server/npmrc
Generator/templates/server/nvmrc
Generator/templates/**/*.prisma
Generator/templates/**/*.toml
Generator/templates/**/*.json
Generator/templates/**/*.ico
Generator/templates/**/*.html
Generator/templates/**/*.md
Generator/templates/**/*.js
Generator/templates/**/*.css
Generator/templates/**/*.png
Cli/bash-completion
Cli/templates/**/*.css
Cli/templates/**/*.js
Cli/templates/**/*.png But I just know, that at some point, some team member will add a file to I would advocate for more relaxed rules, where |
Please tell how to avoid |
Or we could bless git and have an option for including a .gitignore file. |
That won't work. Package author/maintainer is not the only person who sdists. (e.g. OTOH, if people want to be able to shoot themselves into the foot, i guess there is no way to convince them it's a bad idea. IMHO, globs were a bad idea, but then there weren't |
Can you explain how it's a bad idea? I guess I have no concept of the impacted workflow you are using as a counterpoint. |
I'm very much in agreement on this. Tools like |
So should we update the docs? ".cabal format is read-only. Use a different tool [which?] to generate it." |
My view is I believe we should allow specific extensions to glob expressivity which do not make it easy for users to accidentally add temp files. We should not add extensions which have a high chance of picking up temp and "junk" files. Even in the example just given, I could very well imagine a user editing the templates in some editor they don't usually use, leaving around a tilde file, and that tilde file getting picked up in the next sdist. The case of having your directories of files you actually edit littered with all manner of temp files is not uncommon. So while this request may look appealing at first, I expect it will cause problems for even the people who think "No, I'm too smart and careful, I wouldn't do that!" Sure, but can you say that for everyone else working on your project? Can you say that for yourself doing a last-minute sdist at 2am? Etc. That said, I'm also very much open to ergonomic features that can improve the UX of working with this. Perhaps there could be some way to indicate to sdist that it should report all files in a directory that are not picked up by the regexps, and it can warn on check to force you to ensure that your patterns are comprehensive. |
Tests in CI. That's why e.g. |
I was thinking e.g. |
I have to admit, in the use case I described above, which is an everyday use case for me and my team, I have never thought of the possibility of edit/tmp/swp files finding their way in there and littering the list of data files. It just never occurred to me. So you are right that some people are not aware of it, and that it could happen if one is not careful. In practice, this could be solved by people configuring their editors to not produce such files all around the disk but in a central place, however there will be situations where that is not so. On the other hand, not being able to specify relaxed enough globs, which results in likeliness of new files not being included when added by a team member that is not aware of this, is also not good. It can be prevented by having thorough testing that will catch the fact that file was not included in the runtime, but in practice many projects will not test that well enough, especially with IO testing not being a simple thing in Haskell, so there will be situations where this will result with runtime errors possibly. So this is obviously a trade-off with no clear win -> one solution may result with errors in runtime, another might result with redundant files being packed with the project, both avoidable if people stick with best practices, but in practice they will happen. I would personally prefer redundant files over runtime errors, but that is subjective. If decision here was easy, we wouldn't have so many people in this thread holding different sides to this argument. What I personally like the best as a possible approach, based on what others wrote above: Btw keep in mind that I am mostly coming from maintainer of a Haskell app, which is not getting published to Hackage and will never use sdist. I am also an author of Haskell package that is published on Hackage, so I do have some of that perspective, but there I never had to use data-files, on the other hand for this Haskell app we are using it intensively. It seems to me like a lot of arguments here are coming from the perspective of Hackage package maintainer, but that is not the only use case. p.s. What is the impact of accidentally having swp/tmp/redundant files included in package? EDIT: The more I think about it, the more I realize that runtime errors are not so directly correlated with this. They could also happen if somebody deletes the file while globs are too relaxed since cabal will not complain. And if somebody just added file, even if they don't have tests, they will very likely at least manually test it, so it is unlikely they will not realize that it is not being included. So it mostly comes down to ergonomics, and this kind of negates my argument from before to extent. With this in mind, I am actually even leaning more on the side of restricted globs (as they are now)! I didn't expect this is how my piece here will end, but there it is. |
The worst case I can imagine is leaking sensitive data to the public, but runtime errors are possible too, for example if the program reads all data files assuming they are a certain format. |
Idris2 doesn't use Haskell so we don't have this problem anymore, but I still would like the ability to use a glob that specifies a full file name at the end like |
I can't see a good reason those were blocked before. Added a PR that changes this (though of course extensions to glob syntax will require people use a sufficiently new cabal file version.) |
Currently on https://www.haskell.org/cabal/users-guide/developing-packages.html#pkg-field-data-files
On Unix, there is no concept of "file extension". That is a Windows-only concept.
I have a directory of files I'd like to include in my cabal package:
These files have no extension.
They are part of a submodule of another project, so I can't easily change them.
I can't even use the well-fitting
syscalls-table/tables/syscalls-*
, also not allowed.I think we should remove above restriction from Cabal, allowing
data/*
.Users can still do
data/*.txt
and so on where applicable, but this restriction right now doens't help anybody and increases that things are packaged incorrectly due to extra code duplication to include all relevant files manually.The text was updated successfully, but these errors were encountered: