From 0e4975ef02c14f00427ad64f0e1d48580b135184 Mon Sep 17 00:00:00 2001 From: swastik Date: Fri, 23 Aug 2024 16:33:30 +0530 Subject: [PATCH 1/6] Add 2024 GSoC report Compute Summary for all detected packages Signed-off-by: swastik --- docs/source/archive/gsoc-toc.rst | 11 +++- .../reports/2024/scancode_toolkit_swastkk.rst | 64 +++++++++++++++++++ 2 files changed, 74 insertions(+), 1 deletion(-) create mode 100644 docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst diff --git a/docs/source/archive/gsoc-toc.rst b/docs/source/archive/gsoc-toc.rst index dd49183..fac7e3f 100644 --- a/docs/source/archive/gsoc-toc.rst +++ b/docs/source/archive/gsoc-toc.rst @@ -6,7 +6,16 @@ GSoC -- Google Summer of Code open source software development. GSoC is completely online designed to encourage university student participation in open source software development. It was started by Google in 2005. -More about GSoc - _ +More about GSoC - ``_ + +GSoC 2024 +--------- + +.. toctree:: + :maxdepth: 2 + + gsoc/reports/2024/scancode_toolkit_swastkk + GSoC 2022 --------- diff --git a/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst new file mode 100644 index 0000000..b818056 --- /dev/null +++ b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst @@ -0,0 +1,64 @@ +======================================================================== +Compute summary for all detected packages. +======================================================================== + + +| **Organization:** `AboutCode `_ +| **Project:** `Scancode Toolkit `_ +| **Mentee:** `Swastik Sharma (swastkk) `_ +| **Mentors:** Philippe Ombredanne, AyanSinhaMahapatra, AvishrantSh, Jonathan Yang, Jay Kumar + +Overview +-------- + +Previously we were computing the summary at the codebase level which involves `license_clarity_score`, +`declared_holder`, `other_license_expressions` and many more. This project aims to improve scanning accuracy +by computing summary and license clarity scores for each package and its files, rather than for the entire scan. +This involves enhancing package models, and ensuring proper attribute collection for all package ecosystems. + +Implementation +-------------- + +All the work I did is contained in `this single PR `_. +I added a new command line option called ``--package-summary`` that someone can use +to get the package level summary within a single codebase. The package level summary involves the +``license_clarity_score`` calculation and population of package attributes like ``copyright``, +``holder``, ``other_license_expression``, ``notice_text``. This option must be called with ``--classify`` +option that helps ScanCode further classify scanned files/directories, to determine whether +they fall in these categories `legal`, `readme`, `top-level`, `manifest` & ``--package`` or ``-p`` option +detects various package manifests, lockfiles and package-like data and then assembles codebase level packages +and dependencies from these package data detected at files. Also tags files if they are part of the packages. + +This change allows users to get the more refined summary for each individual package that is present in a codebase. +Also this feature improves the package assembly for various package ecosystems like npm, python-whl, rust, rubygems etc. + + +Finally, all these changes are tested through multiple unit tests validating both correct +behavior and error handling as needed. + +Post GSoC +--------- + +I would like to merge this PR into Scancode Toolkit, hopefully allowing users to leverage +this feature to expand their package/codebase scanning capabilities. + +Links +----- + +`Project idea `_ + +`Official GSoC project page `_ + +`GSoC Proposal `_ + +Acknowledgements +---------------- + +I would like to thank my mentors +- `@pombredanne `_ +- `@AyanSinhaMahapatra `_ +- `@AvishrantSh `_ +- `@35C4n0r `_ + +Weekly calls were greatly helpful and those special 1:1 call with `@AyanSinhaMahapatra` and `@pombredanne` +were so amazing. Thank you for your time and your patience! From 5893d74d3c05f3fedd8f8c08a270ef6c2a19fea0 Mon Sep 17 00:00:00 2001 From: swastik Date: Fri, 23 Aug 2024 16:47:59 +0530 Subject: [PATCH 2/6] Minor nits like bullet points were fixed Signed-off-by: swastik --- .../reports/2024/scancode_toolkit_swastkk.rst | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst index b818056..fc96a70 100644 --- a/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst +++ b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst @@ -11,17 +11,18 @@ Compute summary for all detected packages. Overview -------- -Previously we were computing the summary at the codebase level which involves `license_clarity_score`, -`declared_holder`, `other_license_expressions` and many more. This project aims to improve scanning accuracy -by computing summary and license clarity scores for each package and its files, rather than for the entire scan. -This involves enhancing package models, and ensuring proper attribute collection for all package ecosystems. +Previously, we computed the summary at the codebase level, which included elements like the +`license_clarity_score`, `declared_holder`, `other_license_expressions`, and more. +This project aims to improve scanning accuracy by computing summaries and license clarity scores for +each package and its files, rather than for the entire scan. This involves enhancing package models +and ensuring accurate attribute collection across all package ecosystems. Implementation -------------- All the work I did is contained in `this single PR `_. -I added a new command line option called ``--package-summary`` that someone can use -to get the package level summary within a single codebase. The package level summary involves the +I added a new command-line option called ``--package-summary`` that users can employ to obtain +a package-level summary within a single codebase. The package level summary involves the ``license_clarity_score`` calculation and population of package attributes like ``copyright``, ``holder``, ``other_license_expression``, ``notice_text``. This option must be called with ``--classify`` option that helps ScanCode further classify scanned files/directories, to determine whether @@ -55,10 +56,12 @@ Acknowledgements ---------------- I would like to thank my mentors + - `@pombredanne `_ - `@AyanSinhaMahapatra `_ - `@AvishrantSh `_ - `@35C4n0r `_ +- `@jono-yang `_ Weekly calls were greatly helpful and those special 1:1 call with `@AyanSinhaMahapatra` and `@pombredanne` were so amazing. Thank you for your time and your patience! From cfe10915737dcdc73a87c032accceb88d9012b24 Mon Sep 17 00:00:00 2001 From: swastik Date: Fri, 23 Aug 2024 16:57:44 +0530 Subject: [PATCH 3/6] Fix failing test with scripts Signed-off-by: swastik --- docs/source/archive/gsoc-toc.rst | 1 - .../reports/2024/scancode_toolkit_swastkk.rst | 24 ++++++++++--------- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/docs/source/archive/gsoc-toc.rst b/docs/source/archive/gsoc-toc.rst index fac7e3f..db06e51 100644 --- a/docs/source/archive/gsoc-toc.rst +++ b/docs/source/archive/gsoc-toc.rst @@ -15,7 +15,6 @@ GSoC 2024 :maxdepth: 2 gsoc/reports/2024/scancode_toolkit_swastkk - GSoC 2022 --------- diff --git a/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst index fc96a70..3fd2c77 100644 --- a/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst +++ b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst @@ -11,7 +11,7 @@ Compute summary for all detected packages. Overview -------- -Previously, we computed the summary at the codebase level, which included elements like the +Previously, we computed the summary at the codebase level, which included elements like the `license_clarity_score`, `declared_holder`, `other_license_expressions`, and more. This project aims to improve scanning accuracy by computing summaries and license clarity scores for each package and its files, rather than for the entire scan. This involves enhancing package models @@ -22,16 +22,18 @@ Implementation All the work I did is contained in `this single PR `_. I added a new command-line option called ``--package-summary`` that users can employ to obtain -a package-level summary within a single codebase. The package level summary involves the +a package-level summary within a single codebase. The package level summary involves the ``license_clarity_score`` calculation and population of package attributes like ``copyright``, -``holder``, ``other_license_expression``, ``notice_text``. This option must be called with ``--classify`` -option that helps ScanCode further classify scanned files/directories, to determine whether -they fall in these categories `legal`, `readme`, `top-level`, `manifest` & ``--package`` or ``-p`` option -detects various package manifests, lockfiles and package-like data and then assembles codebase level packages -and dependencies from these package data detected at files. Also tags files if they are part of the packages. +``holder``, ``other_license_expression``, ``notice_text``. This option must be called +with ``--classify`` option that helps ScanCode further classify scanned files/directories, +to determine whether they fall in these categories `legal`, `readme`, `top-level`, `manifest` +& ``--package`` or ``-p`` option detects various package manifests, lockfiles and +package-like data and then assembles codebase level packages and dependencies from +these package data detected at files. Also tags files if they are part of the packages. -This change allows users to get the more refined summary for each individual package that is present in a codebase. -Also this feature improves the package assembly for various package ecosystems like npm, python-whl, rust, rubygems etc. +This change allows users to get the more refined summary for each individual package +that is present in a codebase. Also this feature improves the package assembly for +various package ecosystems like npm, python-whl, rust, rubygems etc. Finally, all these changes are tested through multiple unit tests validating both correct @@ -63,5 +65,5 @@ I would like to thank my mentors - `@35C4n0r `_ - `@jono-yang `_ -Weekly calls were greatly helpful and those special 1:1 call with `@AyanSinhaMahapatra` and `@pombredanne` -were so amazing. Thank you for your time and your patience! +Weekly calls were greatly helpful and those special 1:1 call with +`@AyanSinhaMahapatra` and `@pombredanne` were so amazing. Thank you for your time and your patience! From df13e316ba178193c8a2b36a0fb87ca1e68d2725 Mon Sep 17 00:00:00 2001 From: swastik Date: Sat, 24 Aug 2024 00:34:21 +0530 Subject: [PATCH 4/6] Made changes according to the reviews Signed-off-by: swastik --- .../reports/2024/scancode_toolkit_swastkk.rst | 132 ++++++++++++++---- 1 file changed, 104 insertions(+), 28 deletions(-) diff --git a/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst index 3fd2c77..12d90bd 100644 --- a/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst +++ b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst @@ -6,7 +6,7 @@ Compute summary for all detected packages. | **Organization:** `AboutCode `_ | **Project:** `Scancode Toolkit `_ | **Mentee:** `Swastik Sharma (swastkk) `_ -| **Mentors:** Philippe Ombredanne, AyanSinhaMahapatra, AvishrantSh, Jonathan Yang, Jay Kumar +| **Mentors:** `Philippe Ombredanne`_, `Ayan Sinha Mahapatra`_, `Avishrant Sharma`_, `Jonathan Yang`_, `Jay Kumar`_ Overview -------- @@ -20,24 +20,89 @@ and ensuring accurate attribute collection across all package ecosystems. Implementation -------------- -All the work I did is contained in `this single PR `_. -I added a new command-line option called ``--package-summary`` that users can employ to obtain -a package-level summary within a single codebase. The package level summary involves the -``license_clarity_score`` calculation and population of package attributes like ``copyright``, -``holder``, ``other_license_expression``, ``notice_text``. This option must be called -with ``--classify`` option that helps ScanCode further classify scanned files/directories, -to determine whether they fall in these categories `legal`, `readme`, `top-level`, `manifest` -& ``--package`` or ``-p`` option detects various package manifests, lockfiles and -package-like data and then assembles codebase level packages and dependencies from -these package data detected at files. Also tags files if they are part of the packages. - -This change allows users to get the more refined summary for each individual package -that is present in a codebase. Also this feature improves the package assembly for -various package ecosystems like npm, python-whl, rust, rubygems etc. +- **Added a new command-line option called** ``--package-summary``: + + - Provides a package-level summary within a single codebase. + - Involves the ``license_clarity_score`` calculation. + - Populates package attributes like ``copyright``, ``holder``, + ``other_license_expression``, ``notice_text``. + +- **The** ``--package-summary`` **option must be used with:** + + - ``--classify``: Helps ScanCode further classify scanned files/directories into + categories like ``legal``, ``readme``, ``top-level``, ``manifest``. + - ``--package`` or ``-p``: Detects various package manifests, lockfiles, and + package-like data, assembles codebase-level packages and dependencies, and tags + files as part of the packages. + +- **Benefits of the change:** + + - Allows users to obtain a more refined summary for each individual package in + a codebase. + - Improves package assembly for various package ecosystems like npm, python-whl, + rust, rubygems, etc. Since the package-level summary heavily depends on the package assembly, + there were several scenarios where key files for top-level packages were not properly tagged. + To address this, a method called ``get_top_level_resources`` was implemented. This method retrieves + the resources for top-level packages, which helps in correctly tagging the key files. + +- **Testing:** + + - All changes are tested through multiple full scan tests. + - Validated both correct behavior and error handling. + + +Linked Pull Requests +-------------------- + +.. list-table:: + :widths: 10 60 30 + :header-rows: 1 + + * - Sr. no + - Link + - Status + * - 1 + - https://github.com/aboutcode-org/scancode-toolkit/pull/3792 + - Open + +Related Issues +-------------- +.. list-table:: + :widths: 10 60 30 + :header-rows: 1 + + * - Sr. no + - Name + - Link + * - 1 + - Improve Package models to have license_clarity_score + - `#3817 `_ + * - 2 + - Post Scan option --package-summary + - `#3802 `_ + * - 3 + - Look in package-ecosystem specific key-files for referenced licenses + - `#3707 `_ + * - 4 + - Compute summary and clarity for EACH package in a codebase + - `#3287 `_ + * - 5 + - Provide data values in scan results to correspond with license_clarity_score elements + - `#1395 `_ + * - 6 + - Populate package instance attributes from it's files + - `#3862 `_ + * - 7 + - Improve Ruby Package Ecosystem/Datafile Handler to tag key_files properly + - `#3881 `_ + * - 8 + - Rust Members files are not detected properly + - `#3895 `_ + * - 9 + - Add Tests for Package Level Summary computation + - `#3889 `_ -Finally, all these changes are tested through multiple unit tests validating both correct -behavior and error handling as needed. Post GSoC --------- @@ -48,22 +113,33 @@ this feature to expand their package/codebase scanning capabilities. Links ----- -`Project idea `_ +* `Project Idea `_ + +* `Official GSoC project page `_ -`Official GSoC project page `_ +* `GSoC Proposal `_ -`GSoC Proposal `_ +* `Project Board `_ + +* `Reference Issue `_ Acknowledgements ---------------- -I would like to thank my mentors +I would like to thank my mentors: + +- `Philippe Ombredanne`_ +- `Ayan Sinha Mahapatra`_ +- `Avishrant Sharma`_ +- `Jay Kumar`_ +- `Jonathan Yang`_ + +Weekly Status calls were greatly helpful and those special 1:1 calls with +`Ayan Sinha Mahapatra`_ and `Philippe Ombredanne`_ were so amazing. Thank you for your time and your patience! -- `@pombredanne `_ -- `@AyanSinhaMahapatra `_ -- `@AvishrantSh `_ -- `@35C4n0r `_ -- `@jono-yang `_ -Weekly calls were greatly helpful and those special 1:1 call with -`@AyanSinhaMahapatra` and `@pombredanne` were so amazing. Thank you for your time and your patience! +.. _Philippe Ombredanne: https://github.com/pombredanne +.. _Ayan Sinha Mahapatra: https://github.com/AyanSinhaMahapatra +.. _Avishrant Sharma: https://github.com/AvishrantSsh +.. _Jay Kumar: https://github.com/35C4n0r +.. _Jonathan Yang: https://github.com/JonoYang From 62ac18f3260005427dabe8bab35eb9e6ff402f96 Mon Sep 17 00:00:00 2001 From: swastik Date: Sat, 24 Aug 2024 00:45:00 +0530 Subject: [PATCH 5/6] fix up test with doc8_style Signed-off-by: swastik --- .../reports/2024/scancode_toolkit_swastkk.rst | 34 +++++++++---------- docs/source/contributing.rst | 10 ++++-- 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst index 12d90bd..9e49fb2 100644 --- a/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst +++ b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst @@ -6,7 +6,8 @@ Compute summary for all detected packages. | **Organization:** `AboutCode `_ | **Project:** `Scancode Toolkit `_ | **Mentee:** `Swastik Sharma (swastkk) `_ -| **Mentors:** `Philippe Ombredanne`_, `Ayan Sinha Mahapatra`_, `Avishrant Sharma`_, `Jonathan Yang`_, `Jay Kumar`_ +| **Mentors:** `Philippe Ombredanne`_, `Ayan Sinha Mahapatra`_, `Avishrant Sharma`_, + `Jonathan Yang`_, `Jay Kumar`_ Overview -------- @@ -21,36 +22,32 @@ Implementation -------------- - **Added a new command-line option called** ``--package-summary``: - - Provides a package-level summary within a single codebase. - Involves the ``license_clarity_score`` calculation. - - Populates package attributes like ``copyright``, ``holder``, + - Populates package attributes like ``copyright``, ``holder``, ``other_license_expression``, ``notice_text``. - **The** ``--package-summary`` **option must be used with:** - - - ``--classify``: Helps ScanCode further classify scanned files/directories into + - ``--classify``: Helps ScanCode further classify scanned files/directories into categories like ``legal``, ``readme``, ``top-level``, ``manifest``. - - ``--package`` or ``-p``: Detects various package manifests, lockfiles, and - package-like data, assembles codebase-level packages and dependencies, and tags + - ``--package`` or ``-p``: Detects various package manifests, lockfiles, and + package-like data, assembles codebase-level packages and dependencies, and tags files as part of the packages. - **Benefits of the change:** - - - Allows users to obtain a more refined summary for each individual package in + - Allows users to obtain a more refined summary for each individual package in a codebase. - - Improves package assembly for various package ecosystems like npm, python-whl, - rust, rubygems, etc. Since the package-level summary heavily depends on the package assembly, - there were several scenarios where key files for top-level packages were not properly tagged. - To address this, a method called ``get_top_level_resources`` was implemented. This method retrieves - the resources for top-level packages, which helps in correctly tagging the key files. + - Improves package assembly for various package ecosystems like npm, python-whl, + rust, rubygems, etc. Since the package-level summary heavily depends on the + package assembly, there were several scenarios where key files for top-level + packages were not properly tagged. To address this, a method called + ``get_top_level_resources`` was implemented. This method retrieves the resources + for top-level packages, which helps in correctly tagging the key files. - **Testing:** - - All changes are tested through multiple full scan tests. - Validated both correct behavior and error handling. - Linked Pull Requests -------------------- @@ -68,7 +65,7 @@ Linked Pull Requests Related Issues -------------- -.. list-table:: +.. list-table:: :widths: 10 60 30 :header-rows: 1 @@ -135,7 +132,8 @@ I would like to thank my mentors: - `Jonathan Yang`_ Weekly Status calls were greatly helpful and those special 1:1 calls with -`Ayan Sinha Mahapatra`_ and `Philippe Ombredanne`_ were so amazing. Thank you for your time and your patience! +`Ayan Sinha Mahapatra`_ and `Philippe Ombredanne`_ were so amazing. +Thank you for your time and your patience! .. _Philippe Ombredanne: https://github.com/pombredanne diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst index 598bbce..faf8ced 100644 --- a/docs/source/contributing.rst +++ b/docs/source/contributing.rst @@ -2,13 +2,17 @@ Contributing to AboutCode ######################### -We welcome you and your interest in contributing to open source software! AboutCode is always looking for enthusiatic contributors and we are happy to help with any questions or comments. Here a few resources to get started: +We welcome you and your interest in contributing to open source software! AboutCode +is always looking for enthusiatic contributors and we are happy to help with any questions +or comments. Here a few resources to get started: 1) Take a look through our public repos here: https://github.com/aboutcode-org/ * Find one you are interested in and check out its open **Issues** -2) If you have specific questions browse through our documentation here: https://aboutcode.readthedocs.io/en/latest/ +2) If you have specific questions browse through our documentation here: + https://aboutcode.readthedocs.io/en/latest/ * Depending on the project, there may be a separate ReadTheDocs website - * Not finding what you were looking for or still have questions? Open an issue on the relevant repository or ask directly via Gitter or Slack + * Not finding what you were looking for or still have questions? + Open an issue on the relevant repository or ask directly via Gitter or Slack You can always interact with the AboutCode community on Gitter_ and Slack_. From ee253d63313fa2c33e3ae61cf5429bd414296450 Mon Sep 17 00:00:00 2001 From: swastik Date: Sat, 24 Aug 2024 00:52:16 +0530 Subject: [PATCH 6/6] Fix indentation error Signed-off-by: swastik --- .../archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst index 9e49fb2..4f89384 100644 --- a/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst +++ b/docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst @@ -22,12 +22,14 @@ Implementation -------------- - **Added a new command-line option called** ``--package-summary``: + - Provides a package-level summary within a single codebase. - Involves the ``license_clarity_score`` calculation. - Populates package attributes like ``copyright``, ``holder``, ``other_license_expression``, ``notice_text``. - **The** ``--package-summary`` **option must be used with:** + - ``--classify``: Helps ScanCode further classify scanned files/directories into categories like ``legal``, ``readme``, ``top-level``, ``manifest``. - ``--package`` or ``-p``: Detects various package manifests, lockfiles, and @@ -35,6 +37,7 @@ Implementation files as part of the packages. - **Benefits of the change:** + - Allows users to obtain a more refined summary for each individual package in a codebase. - Improves package assembly for various package ecosystems like npm, python-whl, @@ -45,6 +48,7 @@ Implementation for top-level packages, which helps in correctly tagging the key files. - **Testing:** + - All changes are tested through multiple full scan tests. - Validated both correct behavior and error handling.