Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter EnvironmentParser output by requirements file #435

Closed
a1lu opened this issue Nov 5, 2022 · 15 comments
Closed

Filter EnvironmentParser output by requirements file #435

a1lu opened this issue Nov 5, 2022 · 15 comments
Labels
enhancement New feature or request help wanted Extra attention is needed source: environment

Comments

@a1lu
Copy link
Contributor

a1lu commented Nov 5, 2022

If the EnvironmentParser is used to generate a SBOM for a project, also the dependencies of cyclonedx-python and the default packages of virtual environments (pip, pkg_ressources, setuptools) are included in the output.
It would be nice to have a mechanism to filter the output by passing e.g. a requirements.txt file as input to the client. I guess the already parsers parsers for requirements, conda or poetry could be used for this.

@thbeu
Copy link
Contributor

thbeu commented Nov 9, 2022

See also #370 (reply in thread) where I raised the same topic again.

I have an patch almost ready and can provide it here soon.

@thbeu
Copy link
Contributor

thbeu commented Nov 9, 2022

Here are my changes. You can now call cyclonedx-bom with both CLI args --environment and --requirements. I tested in my setup with Py 3.9 and 3.10.

 cyclonedx_py/client.py              | 28 ++++++++++++++++++----------
 cyclonedx_py/parser/environment.py  |  6 +++++-
 cyclonedx_py/parser/requirements.py |  4 +++-
 3 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/cyclonedx_py/client.py b/cyclonedx_py/client.py
index c0ac089..df69f09 100644
--- a/cyclonedx_py/client.py
+++ b/cyclonedx_py/client.py
@@ -23,7 +23,7 @@ import enum
 import os
 import sys
 from datetime import datetime
-from typing import Optional
+from typing import List, Optional
 
 from cyclonedx.model import Tool
 from cyclonedx.model.bom import Bom
@@ -116,7 +116,10 @@ class CycloneDxCmd:
         else:
             from importlib.metadata import version as _md_version
         _this_tool_name = 'cyclonedx-bom'
-        _this_tool_version: Optional[str] = _md_version(_this_tool_name)
+        try:
+            _this_tool_version: Optional[str] = _md_version(_this_tool_name)
+        except:
+            _this_tool_version: Optional[str] = None
         bom.metadata.tools.add(Tool(
             vendor='CycloneDX',
             name=_this_tool_name,
@@ -161,7 +164,7 @@ class CycloneDxCmd:
     def get_arg_parser(*, prog: Optional[str] = None) -> argparse.ArgumentParser:
         arg_parser = argparse.ArgumentParser(prog=prog, description='CycloneDX SBOM Generator')
 
-        input_group = arg_parser.add_mutually_exclusive_group(required=True)
+        input_group = arg_parser.add_mutually_exclusive_group(required=False)
         input_group.add_argument(
             '-c', '--conda', action='store_true',
             help='Build a SBOM based on the output from `conda list --explicit` or `conda list --explicit --md5`',
@@ -172,7 +175,7 @@ class CycloneDxCmd:
             help='Build a SBOM based on the output from `conda list --json`',
             dest='input_from_conda_json'
         )
-        input_group.add_argument(
+        arg_parser.add_argument(
             '-e', '--e', '--environment', action='store_true',
             help='Build a SBOM based on the packages installed in your current Python environment (default)',
             dest='input_from_environment'
@@ -250,8 +253,18 @@ class CycloneDxCmd:
         exit(exit_code)
 
     def _get_input_parser(self) -> BaseParser:
+        if self._arguments.input_source:
+            input_data_fh = self._arguments.input_source
+            with input_data_fh:
+                input_data = input_data_fh.read()
+                input_data_fh.close()
+
         if self._arguments.input_from_environment:
-            return EnvironmentParser()
+            req_names: Optional[List[str]] = None
+            if self._arguments.input_source and self._arguments.input_from_requirements:
+                req_names = RequirementsParser(requirements_content=input_data).req_names
+
+            return EnvironmentParser(pkg_filter=req_names)
 
         # All other Parsers will require some input - grab it now!
         if not self._arguments.input_source:
@@ -277,11 +290,6 @@ class CycloneDxCmd:
                     f'No input file was supplied and no input was provided on STDIN:\n{str(e)}'
                 )
 
-        input_data_fh = self._arguments.input_source
-        with input_data_fh:
-            input_data = input_data_fh.read()
-            input_data_fh.close()
-
         if self._arguments.input_from_conda_explicit:
             return CondaListExplicitParser(conda_data=input_data,
                                            use_purl_bom_ref=self._arguments.use_purl_bom_ref)
diff --git a/cyclonedx_py/parser/environment.py b/cyclonedx_py/parser/environment.py
index c44a0ad..bd53618 100644
--- a/cyclonedx_py/parser/environment.py
+++ b/cyclonedx_py/parser/environment.py
@@ -29,6 +29,7 @@ The Environment Parsers support population of the following data about Component
 """
 
 import sys
+from typing import List, Optional
 
 # See https://github.com/package-url/packageurl-python/issues/65
 from packageurl import PackageURL  # type: ignore
@@ -51,17 +52,20 @@ from cyclonedx.parser import BaseParser
 class EnvironmentParser(BaseParser):
     """
     This will look at the current Python environment and list out all installed packages.
+    An optional filter list of package names can be considered.
 
     Best used when you have virtual Python environments per project.
     """
 
-    def __init__(self, use_purl_bom_ref: bool = False) -> None:
+    def __init__(self, use_purl_bom_ref: bool = False, pkg_filter: Optional[List[str]] = None) -> None:
         super().__init__()
 
         import pkg_resources
 
         i: DistInfoDistribution
         for i in iter(pkg_resources.working_set):
+            if bool(pkg_filter) and i.project_name not in pkg_filter:
+                continue
             purl = PackageURL(type='pypi', name=i.project_name, version=i.version)
             bom_ref = purl.to_string() if use_purl_bom_ref else None
             c = Component(name=i.project_name, bom_ref=bom_ref, version=i.version, purl=purl)
diff --git a/cyclonedx_py/parser/requirements.py b/cyclonedx_py/parser/requirements.py
index 0d334af..c71e8bd 100644
--- a/cyclonedx_py/parser/requirements.py
+++ b/cyclonedx_py/parser/requirements.py
@@ -20,7 +20,7 @@
 import os
 import os.path
 from tempfile import NamedTemporaryFile, _TemporaryFileWrapper  # Weak error
-from typing import Any, Optional
+from typing import Any, List, Optional
 
 from cyclonedx.model import HashType
 from cyclonedx.model.component import Component
@@ -76,6 +76,8 @@ class RequirementsParser(BaseParser):
         if requirements_file:
             os.unlink(requirements_file.name)
 
+        self.req_names: List[str] = [req.name for req in parsed_rf.requirements] if parsed_rf is not None else []
+
 
 class RequirementsFileParser(RequirementsParser):
 

@a1lu
Copy link
Contributor Author

a1lu commented Nov 19, 2022

@thbeu Could you please create a pull request with these changes?

@thbeu
Copy link
Contributor

thbeu commented Nov 20, 2022

@thbeu Could you please create a pull request with these changes?

Will do.

@jkowalleck jkowalleck added enhancement New feature or request source: environment help wanted Extra attention is needed labels Dec 11, 2022
@jkowalleck
Copy link
Member

jkowalleck commented Dec 22, 2022

I gave the issue some deeper thought.

From an architectural and product point of view, I am against a filter feature (see the reasons below).
Instead, I understand the need for a new feature additionally to auto-detecting the environment, it should be able to set it manually.

Regarding the CLI: there should be a new option --env which would be an optional path to a python env. It would default to None meaning auto-detect.

Regarding the implementation: maybe use pkg_resources.Environment() with a non-standard search_path; or pkg_resources.WorkingSet() with a non-standard entries.

Besides an implementation, a test would be required, that shipped a venv or something, and checks for correct detection of expected findings and no additional findings.

⏭️ this caused/centinues in #475


Reasons against a file-based filter

If you could create a blacklist file, you could also create a whitelist (aka requirements.txt) - and use the existing parser.

Reasons against a location-based filter

The packages in your venv exist for a good reason, regardless of symlinked from system, or independent installations. They are there to be used. If you need to filter those, you should not have them installed in the first place.
Filtering them here is no part of this very tool.

General reasons against a filter

If you need to strip parts of the SBOM result, then you could use existing other tools for it.
This very tool's purpose is to generate complete and true results based on available information.

@jkowalleck
Copy link
Member

jkowalleck commented Dec 22, 2022

Regarding the original feature request: you could install cyclonedx-py as a global tool, via pipx or something.
Then there would be no need to have it installed on a project-level(conda, poetry, requirements, pipenv), so that you would not see the tool and its dependencies in your SBOM results.

If you are looking for a general feature that omits dev-dependencies, then #474 is for you.

@jkowalleck jkowalleck closed this as not planned Won't fix, can't repro, duplicate, stale Dec 22, 2022
@jkowalleck
Copy link
Member

@a1lu
Copy link
Contributor Author

a1lu commented Dec 22, 2022

Regarding the original feature request: you could install cyclonedx-py as a global tool, via pipx or something. Then there would be no need to have it installed on a project-level(conda, poetry, requirements, pipenv), so that you would not see the tool and its dependencies in your SBOM results.

But then the SBOM of the global environment would be generated, at least this is my experience.

@jkowalleck
Copy link
Member

jkowalleck commented Dec 22, 2022

re: #435 (comment)

But then the SBOM of the global environment would be generated, at least this is my experience.

TRUE for cyclonedx-py -e -- until #475 was done. :D

@a1lu
Copy link
Contributor Author

a1lu commented Dec 22, 2022

Exactly. Unfortunately the requirements parser does not / is not able to analyze the dependency versions.

@KramNamez
Copy link

The python environment contains a bunch of packages like setuptools and its dependencies that aren't actually dependencies of the software I'm trying to create an SBOM for, though.

It may not be ideal to not have pinned versions in requirements.txt but it's still pretty common, and right now, that leads to an empty SBOM, which is clearly the wrong outcome. This feature would be one way to work around that, by taking the more accurate and useful parts of the environment output, but restricting it to what's actually relevant instead of just including everything and the kitchen sink.

@jkowalleck
Copy link
Member

jkowalleck commented Dec 28, 2022

re: #435 (comment)
@KramNamez

Instead of pip freeze, you can use other ways to create the requirements file.
It is not the domain of the CycloneDX tool to help you create this file. The docs suggest a way you COULD do it.

I suggest to use pip-compile from pip-tools.

@KramNamez
Copy link

No, it's not the job of CycloneDX to create a pinned requirements file. Its job is to describe the dependencies as they are, and right now, for projects that do not have pinned versions in requirements.txt it fails to do so. I cannot in all cases just demand that a project add tooling and creates a pinned dependency file - nor will everyone just agree, even if I could.

CycloneDX can get versions from the environment, but that includes irrelevant data that isn't actually a dependency.

It can get accurate information from other sources, but those aren't available in this case.

Hence this proposed feature that merges the useful parts of the environment info with the listed actual dependencies of requirements.txt.

In my personal projects, I can of course (and do in fact) pin dependencies, and pip-tools is useful for that, thanks. But that's not what I'm really concerned about.

@moshekaplan
Copy link

It is not the domain of the CycloneDX tool to help you create this file. The docs suggest a way you COULD do it

Is there a better means that could be suggested in the docs? Currently, I am trying to generate a SBOM for a Python project and following the direct instructions in the documentation (installing CycloneDX via pip, running pip freeze, and then running cyclonedx_py on my requirements.txt leads to my SBOM being filled with unwanted CycloneDX dependencies.

@jkowalleck
Copy link
Member

please continue this discussion in the Q&A section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed source: environment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants