Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added depth limit #318

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Added depth limit #318

wants to merge 1 commit into from

Conversation

keijokapp
Copy link

This PR adds depth option (and --depth CLI argument). It can be used to remove all modules from the output that are more than depth steps away from the closest input module ("deep dependencies"). depth = 0 makes it to only include input modules.

Note that the whole tree still gets traversed because deep dependencies could reference back to non-deep modules (i.e. modules that are <=depth steps away from any input module). In that case, the intermediate deep dependency is removed and the non-deep dependencies are connected directly. This can cause self-dependent modules.

Example:

depth: 1
input: [A, D]

full graph:
A => B => C => D
output graph:
A => B => D

I haven't written tests yet.

@keijokapp keijokapp force-pushed the depth-limit branch 2 times, most recently from 104e0d1 to 77ddee5 Compare June 24, 2022 17:55
bin/cli.js Outdated Show resolved Hide resolved
@PabloLION
Copy link
Collaborator

PabloLION commented Jan 30, 2023

Another thing is about the naming: --depth is slightly more confusing than --max-depth, if consider we add more functionality about depth. For now it's fine, but I still recommend to use --max-depth.

@keijokapp
Copy link
Author

Thanks for the review!

  1. I've rebased the feature branch on top of master
  2. I've changed the if ('depth' in config) checks to if (config.depth) which is more consistent with the other checks and eliminates the confusion with Number('') === 0.
  3. I would advise against using parseInt or parseFloat. Their semantics is even more bizarre than JS's already messed up type conversion, especially that they succeed if only the prefix is parseable and ignore non-numeric suffixes. Number (or unary +) is IMO nearly always a better option.
  4. Regarding changing the name --depth. I don't think max-depth is a good alternative name but it's your decision in the end.
    1. While technically max-depth avoids potential name conflicts in the future by making the name more verbose, it still reseves the word depth to mean the same thing. So using the word depth for other things would be confusing. An alternative would be making the option name [semantically] more specific, like --output-depth (because the option only affects the output and not, for example, traversing depth).
    2. My experience with CLI tools is that --depth are --level are conventional option names for this kind of behaviour. I think that having a prefix there would be slightly awkward. I can come up with one realistic feature that could clash with name depth - traversing depth (for very large code bases where filtering with --exclude wouldn't be easy) - but this would be much more specific use case and would warrant more specifically named option name (--traverse-depth).

@PabloLION
Copy link
Collaborator

Need testcase. I try to add them later.

@miluoshi
Copy link

Any update on this? 👀

@PabloLION
Copy link
Collaborator

I added some test case with copilot on keijokapp#1 I'll try to add the same commit here next week in case of no response.

@keijokapp keijokapp force-pushed the depth-limit branch 3 times, most recently from 17b1ac3 to dc4f0b0 Compare May 29, 2024 16:20

Verified

This commit was signed with the committer’s verified signature.
sreuland shawn
@keijokapp
Copy link
Author

I added tests to the PR.

'e.js': ['f.js'],
'f.js': ['b.js', 'c.js']
});
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for my recent lack of activity. I'm currently sorting out a few details, as I've noticed that the implementation of C, E, F, C with depth=3 in the test file seems to induce a self-dependency, which isn't accounted for in the final test case (the highlighted one). I need a bit more time to fully understand this, as I initially thought this PR primarily involved output formatting.

Could you @keijokapp please provide a specific use case or example that illustrates the necessity of this functionality? It would greatly help in evaluating its impact and relevance.

Thank you for your understanding and support.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might not fully understand the question.

which isn't accounted for in the final test case (the highlighted one)

The output of depth=3 does show c -> e -> f -> c dependency cycle (?) So do tests with depth=1 and depth=2. In the context of this feature, there isn't anything special about circular dependencies. They are handled just like any other "back-references" (ie references from deep dependencies to non-deep dependencies). If there are circular dependencies in the output, it means there were circular dependency in the original graph. These tests make sure that they are handled (short-circuited) correctly.

please provide a specific use case or example that illustrates the necessity of this functionality

You mean "circular dependencies"? I don't think I'm the right person to explain the use cases for circular dependencies because I personally have a strong distaste for them. But they are nevertheless common and in rare cases not easily avoidable. This tool specifically handles them eg. by coloring them red on the dot graph. The feature doesn't have anything specifically to do with circular dependencies.


When it comes to "limiting the depth", there are two ways to go about it. One is to stop traversing at depth and ignore all references originating from that point onward. This is not usually what the user wants. Not showing those "back-references" gives a wrong impression about the dependency structure. For example, at depth=1, the user might see b.js and c.js as fully independent branches while, in fact, b.js is an indirect dependency of c.js.

Instead, as noted in the original submission, this feature traverses through the whole tree and does not hide those indirect relationships. It only removes the intermediate deep nodes.

I initially thought this PR primarily involved output formatting.

It really is just about the output. It is to remove the "noise" while keeping the useful relationship information. It allows the user to focus around the close proximity of specific modules. It doesn't change anything else about the tool (eg. module resolution, parsing, traversing).

Copy link
Collaborator

@PabloLION PabloLION May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your detailed response, and I appreciate your patience as I navigate through this PR. After multiple reviews, I'm still grappling with some aspects of the implementation and would like to discuss these further.

  1. File Structure and Depth Concerns
    My apologies for any confusion caused by the C-D-E-C example I initially suggested—it may not have been the best illustration of the concept. Here’s a refresher on the file structure from our tests:

    dependencies = {
        'a.js': ['b.js', 'c.js'],
        'b.js': [],
        'c.js': ['d.js', 'e.js'],
        'd.js': [],
        'e.js': ['f.js'],
        'f.js': ['g.js', 'c.js'],
        'g.js': ['b.js']
    }
    
    a.js
    ├─ b.js
    └─ c.js
       ├─ d.js
       └─ e.js
          └─ f.js
             ├─ g.js
             │  └─ b.js
             └─ c.js (circular dependency back to c.js)
    

    This tree structure helps illustrate potential dependency chains, such as A-C-E-F is like the A => B => C => D mentioned in the PR description. When discussing depth=1, I find that the outputs might not align with what one would expect, like A-C-D in the example in the description, which would be A-E-F in our test case accordingly. This which should help us better understand the effects of limiting depth. Could we consider revisiting this part to ensure it aligns more clearly with our objectives?

  2. Circular Dependencies
    I generally strive to avoid circular dependencies, too. I found this repository trying to include checks for these circular dependencies in the CI/CD process. Kudos to the contributor for implementing such a valuable feature!

  3. Clarification on the 'depth' Functionality
    Could you possibly provide more concrete examples or use cases for the depth functionality? While your explanation was insightful, additional examples would help solidify my understanding of its practical impact.

  4. Depth Traversal Options
    Concerning your point on depth traversal:

    When it comes to "limiting the depth", there are two ways to go about it. One is to stop traversing at depth and ignore all references originating from that point onward. This is not usually what the user wants.

    I slightly disagree here. In some scenarios, particularly during refactoring, it could be beneficial to understand the breadth of a module's direct influence, which would support limiting traversal as a feature.

  5. Confusion Over Terminology
    These two terms in the description seem to mean the same thing, leading to some confusion. Could you clarify these definitions?

  • the closest input module ("deep dependencies")

  • non-deep modules (i.e. modules that are <=depth steps away from any input module)

  1. Implementation Details
    I've noticed significant changes in the convertTree(depTree, tree, pathCache) method, particularly the removal of pathCache. This alteration could potentially lead to increase execution times significantly. I believe reintroducing pathCache might enhance performance. I'll need some more time to evaluate this change thoroughly.

Thank you once again for your support and understanding as we refine this feature.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few nodes for the tree so that we can be on the same page with "deep dependencies"

dependencies = {
    'a.js': ['b.js', 'c.js'],
    'b.js': ['h.js'],
    'c.js': ['d.js', 'e.js'],
    'd.js': [],
    'e.js': ['f.js'],
    'f.js': ['g.js', 'c.js'],
    'g.js': ['b.js', 'h.js'],
    'h.js': ['i.js'],
    'i.js': ['j.js'],
    'j.js': ['k.js'],
    'k.js': ['l.js'],
    'l.js': []
}

a.js
├─ b.js
│  └─ h.js
│     └─ i.js
│        └─ j.js
│           └─ k.js
│              └─ l.js
└─ c.js
   ├─ d.js
   └─ e.js
      └─ f.js
         ├─ g.js
         │  ├─ b.js
         │  └─ h.js
         │     └─ i.js
         │        └─ j.js
         │           └─ k.js
         │              └─ l.js
         └─ c.js (circular dependency back to c.js)

For my understanding, with depth=1, we are excluding file b, c. And with depth=2 we are also excluding h, d and e in addition. But here h would appear again in the chain ACEFGH which made the output unclear.
(This also shows that removing the pathCache might be a necessity to implement this feature, depending on what we are trying to achieve.)

@keijokapp
Copy link
Author

keijokapp commented May 30, 2024

@PabloLION I answer in a separate comment because the discussion is not specific to the piece of code.

4. Clarification on the 'depth' Functionality
Could you possibly provide more concrete examples or use cases for the depth functionality? While your explanation was insightful, additional examples would help solidify my understanding of its practical impact.

Absolutely. For programmatic usage, the feature might not be all that useful because it can be achieved on the user land. But for CLI usage (eg generating an image), it's crucial. For any code base of non-trivial size (in my case, it was React Reconciler and React DOM packages but even for much smaller code bases), it's highly impractical to see _all _ modules on the graph. In case of React, there are hundreds, if not thousands of them. The tool can't understand the semantical relationships between these modules so relevant modules that might be more related to each other are often all over the place. This feature is for those use cases. It enables the user to focus on the proximity of specific modules.

4. Depth Traversal Options
In some scenarios, particularly during refactoring, it could be beneficial to understand the breadth of a module's direct influence, which would support limiting traversal as a feature.

Ok. I haven't encountered a use case where I only want to see a direct influence. The IDE tooling has been usually enough if I'm only interested in direct dependencies.

If there ever is such use case, I'd expect it to be more specific than this one. So it warrants a new (more specifc) option. --traverse-depth would be good, I think.

I guess the important question is, does seeing those indirect dependencies is a deal breaker for those use cases or not? For my uses cases, not seeing those dependencies is absolutely a non-solution.

A feedback from a larger audience would be welcome.

5. Confusion Over Terminology
These two terms in the description seem to mean the same thing, leading to some confusion. Could you clarify these definitions?

Deep dependencies:

all modules from the output that are more than depth steps away from the closest input module

or in other words: modules that are >depth steps away from any input module

Non-deep dependencies:

modules that are <=depth steps away from any input module

6. Implementation Details

It's been a long time since I implemented this so I might not remember all considerations that went into that change. The performance implications were definitely one of them. The change is that path cache is not shared between convertTree and npmPaths loop but it's still used within either one of these. Looking at the diff now, I think the performance gain from sharing the cache would not be worth the complexity of additional parameter and state sharing between generateTree and convertTree.

For purity, I could add the cache sharing back. To be honest, the whole state management is quite messed up due to the pseudo-OOP approach. There's no need for classes here and it's highly confusing. The constructor is called as a regular function (without the benefits of a regular function like async/await) and returns a Promise (not the class instance). The class instance is never exposed outside this single run and is only minimally used within the run.


For my understanding, with depth=1, we are excluding file b, c. And with depth=2 we are also excluding h, d and e in addition.

You probably mean "including" those files, not "excluding" them. The lower the depth, the more files are excluded and less files included.

@PabloLION
Copy link
Collaborator

Thank you @keijokapp for your continued dedication and the insightful new information.

On Functionality

Absolutely. While this feature may not be essential for programmatic applications—where similar outcomes can be achieved at the user level—it becomes critical for CLI usage, such as image generation. In larger codebases, like the React Reconciler and React DOM packages, it becomes highly impractical to display all modules on the graph. In environments like React, where there are potentially hundreds or even thousands of modules, the tool struggles to recognize semantic relationships, often scattering closely related modules. This feature addresses such scenarios by allowing users to _focus on the proximity of specific modules.

The explanation is very comprehensive. I now understand that for large repositories, the functionality you're introducing is incredibly useful. Here’s my takeaway: instead of displaying the entire dependency tree from a selected entry point (or input file path), this feature modifies the output to omit the first n levels and only shows the final m layers. Here, m is given by the value of --depth. This insight has led me to consider possibly introducing a control for n in the future, though that would pertain to a different feature.
Also, could you possibly provide a few more examples using the graph below? From your previous examples involving inputs A and D, it seems these files are your primary focus. Perhaps additional examples with one entry file and another with three entry files could help clarify the concept further.

dependencies = {
    'a.js': ['b.js', 'c.js'],
    'b.js': ['h.js'],
    'c.js': ['d.js', 'e.js'],
    'd.js': [],
    'e.js': ['f.js'],
    'f.js': ['g.js', 'c.js'],
    'g.js': ['b.js', 'h.js'],
    'h.js': ['i.js'],
    'i.js': ['j.js'],
    'j.js': ['k.js'],
    'k.js': ['l.js'],
    'l.js': []
}

a.js
├─ b.js
│  └─ h.js
│     └─ i.js
│        └─ j.js
│           └─ k.js
│              └─ l.js
└─ c.js
   ├─ d.js
   └─ e.js
      └─ f.js
         ├─ g.js
         │  ├─ b.js
         │  └─ h.js
         │     └─ i.js
         │        └─ j.js
         │           └─ k.js
         │              └─ l.js
         └─ c.js (circular dependency back to c.js)

On Terminology

I now realize my initial misunderstanding about your use of "deep dependencies," which are:

outputs that are more than depth steps away from the closest input module.

On Implementation

I now realize that the implementation is not that simple as I initially anticipated because it calls several other methods from the same class, which functions I need to remember. I will examine these interactions more thoroughly in the coming days.

Requesting another opinion

@kamiazya @pahen, could you assist in selecting a more suitable name for this flag (option)? Given your extensive experience with this project and expertise in naming conventions, your input would be invaluable in refining this PR.

  • If there is ever such a use case, I'd expect it to require a more specific option than currently available. --traverse-depth seems appropriate.

  • The key question is whether viewing those indirect dependencies is crucial for certain use cases. For my purposes, not being able to see those dependencies is certainly not viable.
    Feedback from a broader audience would be appreciated.

Additionally, if you have any other insights or suggestions to contribute regarding this feature, I would greatly appreciate hearing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants