Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Licensed under "Custom: https://github.com..." #63

Open
fredrikaverpil opened this issue Jan 31, 2023 · 14 comments
Open

Licensed under "Custom: https://github.com..." #63

fredrikaverpil opened this issue Jan 31, 2023 · 14 comments

Comments

@fredrikaverpil
Copy link

fredrikaverpil commented Jan 31, 2023

Hi,

Have you ever seen something like this after running the checker in GitHub Actions?

Package "[email protected]" is licensed under "Custom: https://github.com/MY-ORG/MY-REPO/actions/workflows/test.yml/badge.svg" which is not permitted by the --onlyAllow flag. Exiting.
Error: Process completed with exit code 1.

I can provide more info but figured I'd start like this. My project is using npm workspaces which might be related to this...

@fredrikaverpil
Copy link
Author

fredrikaverpil commented Jan 31, 2023

Okay, so I managed to solve it. My root package.json file looked like this:

{
  "name": "mypkg",
  "version": "1.0.0",
  "description": "Root npm project, only meant to control workspaces.",
  "workspaces": [
    "packages/mypkg"
  ],
  "scripts": {
    "blah blah": "random command here"
  },
  "devDependencies": {
    "openapi-typescript-codegen": "^0.23.0"
  }
}

And all I had to do was insert the license in the json to get rid of that weird error:

{
  "name": "mypkg",
  "version": "1.0.0",
  "description": "Root npm project, only meant to control workspaces.",
+  "license": "MYLICENSE-HERE",
  "workspaces": [
    "packages/mypkg"
  ],
  "scripts": {
    "blah blah": "random command here"
  },
  "devDependencies": {
    "openapi-typescript-codegen": "^0.23.0"
  }
}

@fredrikaverpil
Copy link
Author

However, I'm now noticing that the checker is unable to check for dependencies underneath this npm workspace.

@RSeidelsohn
Copy link
Owner

Hi @fredrikaverpil , thanks for informing me about this. I've never heard of npm workspaces before and I will have a more detailed look into the documentation - thank you for linking it in your comment.

Indeed, the license-checker-rseidelsohn (oh my, I really dislike this clumsy name, but I guess it's best to not change it again. Some of my colleagues suggested using a name space: rseidelsohn@license-checker, but this wouldn't make it more handy, I think) can not handle workspaces - yet it looks like a nice and useful feature to add.

I do not promise anything here, but I plan on dealing with it the next days. Also, of course, feel free to create a pull request on your own if you feel like it. I always appreciate any kind of input - be it a pull request or just a question or note.

Cheers, Roman.

@fredrikaverpil
Copy link
Author

@RSeidelsohn there's an open issue about this here: #36

@Flydiverny
Copy link
Contributor

Hi,

We are also seeing this occur an increasing amount of times in various of our repos as we've rolled out license checking in more of our JS repos.

Another example

Reproduction:

  1. Create a directory somewhere
  2. Add a package.json
    package.json
{
  "dependencies": {
    "cycle": "^1.0.3"
  }
}
  1. npm install
  2. npx license-checker-rseidelsohn

Output

└─ cycle@@1.0.3
   ├─ licenses: Custom: https://github.com/douglascrockford/JSON-js
   ├─ repository: https://github.com/dscape/cycle
   ├─ path: /Code/tmp/license/node_modules/cycle
   └─ licenseFile: /Code/tmp/license/node_modules/cycle/README.md

In the published version of cycle there's actually no license specified in package.json or README.md.
Its a bit unclear how it finds the first random URL to match instead of saying its unknown or something.

We've seen similar errors where some other package which has a proprietary license suddenly get parsed as having a https://badge.fury.io/js/ because the first URL like thing in the readme is markdown for a badge icon. In that packages case there is actually a license field set in the package.json, to an URL.

@Flydiverny
Copy link
Contributor

@RSeidelsohn After some debugging and digging in the code when there is a custom license specified, for example a URL in package.json
It will later fall into this code block:

if (
!moduleInfo.licenses ||
moduleInfo.licenses.indexOf(UNKNOWN) > -1 ||
moduleInfo.licenses.indexOf('Custom:') === 0
) {
//Only re-check the license if we didn't get it from elsewhere
content = fs.readFileSync(licenseFile, { encoding: 'utf8' });
moduleInfo.licenses = getLicenseTitle(content);
}

Which will then override the already correctly detected license from the license field.

@Flydiverny
Copy link
Contributor

#71 shows a failing test for this

@Flydiverny
Copy link
Contributor

I think there are two parts to this.
Not extracting URLs from the license files and README.md which is checked in different places
Like this code block and the previously linked one in comment above

} else if (getLicenseTitle(json.readme)) {
moduleInfo.licenses = getLicenseTitle(json.readme);
}

Then somehow deciding when to check these additional files or not if there already is a custom URL detected?
Is there any case where the custom license actually does need to be overriden? Like currently Custom: URL and Custom: FILE_REF are the same output. Maybe custom URL shouldn't look in additional files but FILE_REF should?

@RSeidelsohn
Copy link
Owner

Sorry guys, this is a flaw I have to fix. I'll dig into this code next - unfortunately, this code was neither created by me, nor did I get a hand-over. So I have to try to understand what's going on here and why so and only then I'll be able to fix it. Should anyone of you or someone else feel the urge to provide a fix: PR's are much appreciated!

@RSeidelsohn
Copy link
Owner

Hey @Flydiverny and @fredrikaverpil ,

I am finally on it! I had to refactor quite some of the still old logic, which helped me understand more and even find and fix another bug. I'm now quite near, but still, please bear with me. Holidays are approaching and I guess I won't do much during that time.
Still, I've already fixed the failing test you, @Flydiverny, thankfully brought in and now I have to fix one old test that is now failing. But enough for today. Just wanted to give you an update.
Your code is merged, but no new release available yet.

Cheers, Roman.

@Flydiverny
Copy link
Contributor

Thanks a lot for having a look at it @RSeidelsohn! Enjoy the holidays :) 🐣

@RSeidelsohn
Copy link
Owner

Hi @Flydiverny and @fredrikaverpil ,

a short update: I have worked a lot on the code during my vacation and just released a bugfix version 4.2.1, which at least pleases the new test you added, @Flydiverny , but still does not respect npm workspaces (I guess - I did not test this yet).
I refactored a huge part of the code which is still quite monolithic in terms of way too big functions. This is a heritage from the original license-checker, which I once took over. Refactoring this code by introducing speaking and descriptive function and variable names and splitting up the way too large functions into smaller chunks will still need a lot of time, but now it's again way better than before and hopefully enables me to do bug fixes and create new functionality faster and easier in the future.

@RSeidelsohn
Copy link
Owner

RSeidelsohn commented Apr 16, 2023

Hi,

We are also seeing this occur an increasing amount of times in various of our repos as we've rolled out license checking in more of our JS repos.

Another example

Reproduction:

  1. Create a directory somewhere
  2. Add a package.json
    package.json
{
  "dependencies": {
    "cycle": "^1.0.3"
  }
}
  1. npm install
  2. npx license-checker-rseidelsohn

Output

└─ cycle@@1.0.3
   ├─ licenses: Custom: https://github.com/douglascrockford/JSON-js
   ├─ repository: https://github.com/dscape/cycle
   ├─ path: /Code/tmp/license/node_modules/cycle
   └─ licenseFile: /Code/tmp/license/node_modules/cycle/README.md

In the published version of cycle there's actually no license specified in package.json or README.md. Its a bit unclear how it finds the first random URL to match instead of saying its unknown or something.

We've seen similar errors where some other package which has a proprietary license suddenly get parsed as having a https://badge.fury.io/js/ because the first URL like thing in the readme is markdown for a badge icon. In that packages case there is actually a license field set in the package.json, to an URL.

Ok, the whole functionality for automatically finding a license in a README file is pretty doomed. Currently, the license-checker just looks for the first URL in the README, which leads to the situation you mention. This doesn't make any sense and I have to think of a way to improve on this.

Update

I released a kind of a fix for it in version 4.2.4 - URLs in READMEs will now only be taken as a last resort if the READMEs at least include the term "license". This fixes already a lot of false hits.

@zola-25
Copy link

zola-25 commented Aug 13, 2023

I don't know if this helps in the long run, but I didn't like how the --files output [email protected] documents were sometimes including the full README markdown from dependency package that just happened to include a license header. I was also unable to get any urls included in my output.json document, even though the property was specified in my customFormatExample.json, but I may have been doing something wrong there.

So I wrote two regexes to extract everything under a license markdown header - one for

  1. ## header syntax
  2. ---- or ==== underline syntax

e.g.

## license
MIT / whatever

or

license
-------
etc

and

license
======
etc

They only extract the text until the next header in the README, or the end of file.

They are in .NET flavour but probably work for other regex flavours, just strip out the ?<license> and ?<header> tags, which is .NET specific syntax.

  1. (?smi)(?<header>^#+\s+licen[sc]es?\s*)(?<license>.+?(?=#+|\z))
  2. (?smi)(?<header>^\slicen[sc]es?.\n[=-]+\n)(?<license>.+?(?==+|-{2,}|\z))

An example of 1 at regex101 and 2 at regex101

  • In my project, the these [email protected] license files only had 'MIT' underneath the license header anyway, so not much was gained. And there were only a few packages that were outputting the full README markdown. But they didn't match against any 'proper' license texts, which was important. But they have not been tested against enough examples to be fully trusted, they just worked for my project with 100+ dependencies of varying kinds of license output.

  • I don't know much about the license resolution implementation in license-checker-rseidelsohn, but I don't think examining README files should be a fallback option. If the license type is known but the license text is not, I think it would be better to just grab the raw examples from the spdx repository and output those as the license text instead.

  • Disclaimer, this open-source licensing stuff is all very new to me, so please excuse any newbie knowledge I haven't understood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants