Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scanners: Add data flow based scanning for multi language support #1698

Closed
wants to merge 2 commits into from

Conversation

johnandersen777
Copy link

@johnandersen777 johnandersen777 commented Jun 10, 2022

  • Add in dffml for data flow based scanning as an extra within extras_required
  • Existing Scanner functionality via data flows
  • Support Python via inclusion of shoudli flows
  • Add in operations/binsec for binary analysis. Will work with Ygor to incorporate his scanning techniques.

For multi lanaguge support. Current supports Python via inclusion of shoudli
flows.

Below is the command used for testing:

.. code-block:: console

    $ nodemon -e py --exec 'clear; rm /tmp/tmp.ahAs38AoDJ/THREATS.md; cve-bin-tool --update never --format md --output-file /tmp/tmp.ahAs38AoDJ/THREATS.md /tmp/tmp.ahAs38AoDJ; test 1'

Signed-off-by: John Andersen <[email protected]>
@johnandersen777
Copy link
Author

johnandersen777 commented Jun 10, 2022

Example THREATS.md:


Threat Model

graph TD
f577c71443f6b04596b3fe0511326c40[check_if_valid_git_repository_URL]
155b8fdb5524f6bfd5adbae4940ad8d5[clone_git_repo]
70c47962ba601f0df1890f4c72ae1b54[count_authors]
90b953c5527ed3a579912eea8b02b1be[git_commits]
7155c0a875a889898d6d6e0c7959649b[dffml_feature_git.feature.operations:git_grep]
0afa2b3dbc72afa67170525d1d7532d7[git_repo_author_lines_for_dates]
02de40331374616f64ba4a92fbb33edd[git_repo_checkout]
7bbb97768b34f207c34c1f4721708675[git_repo_commit_from_date]
546062a96122df465d2631f31df4e9e3[git_repo_default_branch]
f01273bde2638114cff25a747963223e[git_repo_release]
ef6d613ca7855a13865933156c79ddea[lines_of_code_by_language]
b6e1f853d077365deddea22b2fdb890d[lines_of_code_to_comments]
7f20bd2c94ecbd47ab6bd88673c7174f[make_quarters]
9dc9f9feff38d8f5dd9388d3a60e74c0[quarters_back_to_date]
67e92c8765a9bc7fb2d335c459de9eb5[work]
8f334b03992d62741e1aa4fa13630ebc[cve_bin_tool.scanners.dataflow:repo_to_directory]
c6ab827d854e73b6bbf6f862c85356e0[cve_bin_tool.scanners.dataflow:scan_directory]
7ec43cbbf66e6d893180645d5e929bb4(seed<br>URL)
style 7ec43cbbf66e6d893180645d5e929bb4 fill:#f6dbf9,stroke:#a178ca
7ec43cbbf66e6d893180645d5e929bb4 --> f577c71443f6b04596b3fe0511326c40
7ec43cbbf66e6d893180645d5e929bb4(seed<br>URL)
style 7ec43cbbf66e6d893180645d5e929bb4 fill:#f6dbf9,stroke:#a178ca
7ec43cbbf66e6d893180645d5e929bb4 --> 155b8fdb5524f6bfd5adbae4940ad8d5
a6ed501edbf561fda49a0a0a3ca310f0(seed<br>git_repo_ssh_key)
style a6ed501edbf561fda49a0a0a3ca310f0 fill:#f6dbf9,stroke:#a178ca
a6ed501edbf561fda49a0a0a3ca310f0 --> 155b8fdb5524f6bfd5adbae4940ad8d5
f577c71443f6b04596b3fe0511326c40 --> 155b8fdb5524f6bfd5adbae4940ad8d5
0afa2b3dbc72afa67170525d1d7532d7 --> 70c47962ba601f0df1890f4c72ae1b54
155b8fdb5524f6bfd5adbae4940ad8d5 --> 90b953c5527ed3a579912eea8b02b1be
546062a96122df465d2631f31df4e9e3 --> 90b953c5527ed3a579912eea8b02b1be
9dc9f9feff38d8f5dd9388d3a60e74c0 --> 90b953c5527ed3a579912eea8b02b1be
155b8fdb5524f6bfd5adbae4940ad8d5 --> 7155c0a875a889898d6d6e0c7959649b
0690fdb25283b1e0a09016a28aa08c08(seed<br>git_grep_search)
style 0690fdb25283b1e0a09016a28aa08c08 fill:#f6dbf9,stroke:#a178ca
0690fdb25283b1e0a09016a28aa08c08 --> 7155c0a875a889898d6d6e0c7959649b
155b8fdb5524f6bfd5adbae4940ad8d5 --> 0afa2b3dbc72afa67170525d1d7532d7
546062a96122df465d2631f31df4e9e3 --> 0afa2b3dbc72afa67170525d1d7532d7
9dc9f9feff38d8f5dd9388d3a60e74c0 --> 0afa2b3dbc72afa67170525d1d7532d7
155b8fdb5524f6bfd5adbae4940ad8d5 --> 02de40331374616f64ba4a92fbb33edd
7bbb97768b34f207c34c1f4721708675 --> 02de40331374616f64ba4a92fbb33edd
155b8fdb5524f6bfd5adbae4940ad8d5 --> 7bbb97768b34f207c34c1f4721708675
546062a96122df465d2631f31df4e9e3 --> 7bbb97768b34f207c34c1f4721708675
9dc9f9feff38d8f5dd9388d3a60e74c0 --> 7bbb97768b34f207c34c1f4721708675
155b8fdb5524f6bfd5adbae4940ad8d5 --> 546062a96122df465d2631f31df4e9e3
155b8fdb5524f6bfd5adbae4940ad8d5 --> f01273bde2638114cff25a747963223e
546062a96122df465d2631f31df4e9e3 --> f01273bde2638114cff25a747963223e
9dc9f9feff38d8f5dd9388d3a60e74c0 --> f01273bde2638114cff25a747963223e
02de40331374616f64ba4a92fbb33edd --> ef6d613ca7855a13865933156c79ddea
ef6d613ca7855a13865933156c79ddea --> b6e1f853d077365deddea22b2fdb890d
a8b3d979c7c66aeb3b753408c3da0976(seed<br>quarters)
style a8b3d979c7c66aeb3b753408c3da0976 fill:#f6dbf9,stroke:#a178ca
a8b3d979c7c66aeb3b753408c3da0976 --> 7f20bd2c94ecbd47ab6bd88673c7174f
3261e0991aae6690cf0359a79dee8aaf(seed<br>quarter_start_date)
style 3261e0991aae6690cf0359a79dee8aaf fill:#f6dbf9,stroke:#a178ca
3261e0991aae6690cf0359a79dee8aaf --> 9dc9f9feff38d8f5dd9388d3a60e74c0
7f20bd2c94ecbd47ab6bd88673c7174f --> 9dc9f9feff38d8f5dd9388d3a60e74c0
0afa2b3dbc72afa67170525d1d7532d7 --> 67e92c8765a9bc7fb2d335c459de9eb5
155b8fdb5524f6bfd5adbae4940ad8d5 --> 8f334b03992d62741e1aa4fa13630ebc
45b3d4ae2acc4198368e36fb7c4e1499(seed<br>InputOfUnknownType)
style 45b3d4ae2acc4198368e36fb7c4e1499 fill:#f6dbf9,stroke:#a178ca
45b3d4ae2acc4198368e36fb7c4e1499 --> c6ab827d854e73b6bbf6f862c85356e0

Loading
{
    "definitions": {
        "DirectoryToScan": {
            "links": [
                [
                    [
                        "name",
                        "pathlib.Path.object"
                    ],
                    [
                        "primitive",
                        "object"
                    ],
                    [
                        "links",
                        [
                            [
                                [
                                    "name",
                                    "Path"
                                ],
                                [
                                    "primitive",
                                    "object"
                                ],
                                [
                                    "links",
                                    [
                                        [
                                            [
                                                "name",
                                                "pathlib"
                                            ],
                                            [
                                                "primitive",
                                                "object"
                                            ],
                                            [
                                                "links",
                                                [
                                                    [
                                                        [
                                                            "name",
                                                            "module"
                                                        ],
                                                        [
                                                            "primitive",
                                                            "object"
                                                        ]
                                                    ]
                                                ]
                                            ]
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ]
            ],
            "name": "DirectoryToScan",
            "primitive": "object"
        },
        "InputOfUnknownType": {
            "links": [
                [
                    [
                        "name",
                        "str"
                    ],
                    [
                        "primitive",
                        "str"
                    ]
                ]
            ],
            "name": "InputOfUnknownType",
            "primitive": "str"
        },
        "ScanResults": {
            "links": [
                [
                    [
                        "name",
                        "dict"
                    ],
                    [
                        "primitive",
                        "map"
                    ]
                ]
            ],
            "name": "ScanResults",
            "primitive": "dict"
        },
        "URL": {
            "name": "URL",
            "primitive": "string"
        },
        "author_count": {
            "name": "author_count",
            "primitive": "int"
        },
        "author_line_count": {
            "name": "author_line_count",
            "primitive": "Dict[str, int]"
        },
        "commit_count": {
            "name": "commit_count",
            "primitive": "int"
        },
        "date": {
            "name": "date",
            "primitive": "string"
        },
        "date_pair": {
            "name": "date_pair",
            "primitive": "List[date]"
        },
        "git_branch": {
            "name": "git_branch",
            "primitive": "str"
        },
        "git_commit": {
            "name": "git_commit",
            "primitive": "string"
        },
        "git_grep_found": {
            "name": "git_grep_found",
            "primitive": "string"
        },
        "git_grep_search": {
            "name": "git_grep_search",
            "primitive": "string"
        },
        "git_repo_ssh_key": {
            "default": null,
            "name": "git_repo_ssh_key",
            "primitive": "string"
        },
        "git_repository": {
            "lock": true,
            "name": "git_repository",
            "primitive": "Dict[str, str]",
            "spec": {
                "defaults": {
                    "URL": null
                },
                "name": "GitRepoSpec",
                "types": {
                    "URL": "str",
                    "directory": "str"
                }
            },
            "subspec": false
        },
        "git_repository_checked_out": {
            "lock": true,
            "name": "git_repository_checked_out",
            "primitive": "Dict[str, str]",
            "spec": {
                "defaults": {
                    "URL": null,
                    "commit": null
                },
                "name": "GitRepoCheckedOutSpec",
                "types": {
                    "URL": "str",
                    "commit": "str",
                    "directory": "str"
                }
            },
            "subspec": false
        },
        "language_to_comment_ratio": {
            "name": "language_to_comment_ratio",
            "primitive": "int"
        },
        "lines_by_language_count": {
            "name": "lines_by_language_count",
            "primitive": "Dict[str, Dict[str, int]]"
        },
        "no_git_branch_given": {
            "name": "no_git_branch_given",
            "primitive": "boolean"
        },
        "quarter": {
            "name": "quarter",
            "primitive": "int"
        },
        "quarter_start_date": {
            "name": "quarter_start_date",
            "primitive": "int"
        },
        "quarters": {
            "name": "quarters",
            "primitive": "int"
        },
        "release_within_period": {
            "name": "release_within_period",
            "primitive": "bool"
        },
        "valid_git_repository_URL": {
            "name": "valid_git_repository_URL",
            "primitive": "boolean"
        },
        "work_spread": {
            "name": "work_spread",
            "primitive": "int"
        }
    },
    "flow": {
        "check_if_valid_git_repository_URL": {
            "inputs": {
                "URL": [
                    "seed"
                ]
            }
        },
        "cleanup_git_repo": {
            "inputs": {
                "repo": [
                    {
                        "clone_git_repo": "repo"
                    }
                ]
            }
        },
        "clone_git_repo": {
            "conditions": [
                {
                    "check_if_valid_git_repository_URL": "valid"
                }
            ],
            "inputs": {
                "URL": [
                    "seed"
                ],
                "ssh_key": [
                    "seed"
                ]
            }
        },
        "count_authors": {
            "inputs": {
                "author_lines": [
                    {
                        "git_repo_author_lines_for_dates": "author_lines"
                    }
                ]
            }
        },
        "cve_bin_tool.scanners.dataflow:repo_to_directory": {
            "inputs": {
                "repo": [
                    {
                        "clone_git_repo": "repo"
                    }
                ]
            }
        },
        "cve_bin_tool.scanners.dataflow:scan_directory": {
            "inputs": {
                "arg": [
                    "seed"
                ]
            }
        },
        "dffml_feature_git.feature.operations:git_grep": {
            "inputs": {
                "repo": [
                    {
                        "clone_git_repo": "repo"
                    }
                ],
                "search": [
                    "seed"
                ]
            }
        },
        "git_commits": {
            "inputs": {
                "branch": [
                    {
                        "git_repo_default_branch": "branch"
                    }
                ],
                "repo": [
                    {
                        "clone_git_repo": "repo"
                    }
                ],
                "start_end": [
                    {
                        "quarters_back_to_date": "start_end"
                    }
                ]
            }
        },
        "git_repo_author_lines_for_dates": {
            "inputs": {
                "branch": [
                    {
                        "git_repo_default_branch": "branch"
                    }
                ],
                "repo": [
                    {
                        "clone_git_repo": "repo"
                    }
                ],
                "start_end": [
                    {
                        "quarters_back_to_date": "start_end"
                    }
                ]
            }
        },
        "git_repo_checkout": {
            "inputs": {
                "commit": [
                    {
                        "git_repo_commit_from_date": "commit"
                    }
                ],
                "repo": [
                    {
                        "clone_git_repo": "repo"
                    }
                ]
            }
        },
        "git_repo_commit_from_date": {
            "inputs": {
                "branch": [
                    {
                        "git_repo_default_branch": "branch"
                    }
                ],
                "date": [
                    {
                        "quarters_back_to_date": "date"
                    }
                ],
                "repo": [
                    {
                        "clone_git_repo": "repo"
                    }
                ]
            }
        },
        "git_repo_default_branch": {
            "conditions": [
                "seed"
            ],
            "inputs": {
                "repo": [
                    {
                        "clone_git_repo": "repo"
                    }
                ]
            }
        },
        "git_repo_release": {
            "inputs": {
                "branch": [
                    {
                        "git_repo_default_branch": "branch"
                    }
                ],
                "repo": [
                    {
                        "clone_git_repo": "repo"
                    }
                ],
                "start_end": [
                    {
                        "quarters_back_to_date": "start_end"
                    }
                ]
            }
        },
        "lines_of_code_by_language": {
            "inputs": {
                "repo": [
                    {
                        "git_repo_checkout": "repo"
                    }
                ]
            }
        },
        "lines_of_code_to_comments": {
            "inputs": {
                "langs": [
                    {
                        "lines_of_code_by_language": "lines_by_language"
                    }
                ]
            }
        },
        "make_quarters": {
            "inputs": {
                "number": [
                    "seed"
                ]
            }
        },
        "quarters_back_to_date": {
            "inputs": {
                "date": [
                    "seed"
                ],
                "number": [
                    {
                        "make_quarters": "quarters"
                    }
                ]
            }
        },
        "work": {
            "inputs": {
                "author_lines": [
                    {
                        "git_repo_author_lines_for_dates": "author_lines"
                    }
                ]
            }
        }
    },
    "linked": true,
    "operations": {
        "check_if_valid_git_repository_URL": {
            "inputs": {
                "URL": "URL"
            },
            "name": "check_if_valid_git_repository_URL",
            "outputs": {
                "valid": "valid_git_repository_URL"
            },
            "retry": 0,
            "stage": "processing"
        },
        "cleanup_git_repo": {
            "inputs": {
                "repo": "git_repository"
            },
            "name": "cleanup_git_repo",
            "outputs": {},
            "retry": 0,
            "stage": "cleanup"
        },
        "clone_git_repo": {
            "conditions": [
                "valid_git_repository_URL"
            ],
            "inputs": {
                "URL": "URL",
                "ssh_key": "git_repo_ssh_key"
            },
            "name": "clone_git_repo",
            "outputs": {
                "repo": "git_repository"
            },
            "retry": 0,
            "stage": "processing"
        },
        "count_authors": {
            "inputs": {
                "author_lines": "author_line_count"
            },
            "name": "count_authors",
            "outputs": {
                "authors": "author_count"
            },
            "retry": 0,
            "stage": "processing"
        },
        "cve_bin_tool.scanners.dataflow:repo_to_directory": {
            "inputs": {
                "repo": "git_repository"
            },
            "name": "cve_bin_tool.scanners.dataflow:repo_to_directory",
            "outputs": {
                "result": "DirectoryToScan"
            },
            "retry": 0,
            "stage": "processing"
        },
        "cve_bin_tool.scanners.dataflow:scan_directory": {
            "inputs": {
                "arg": "InputOfUnknownType"
            },
            "name": "cve_bin_tool.scanners.dataflow:scan_directory",
            "outputs": {
                "result": "ScanResults"
            },
            "retry": 0,
            "stage": "processing"
        },
        "dffml_feature_git.feature.operations:git_grep": {
            "inputs": {
                "repo": "git_repository",
                "search": "git_grep_search"
            },
            "name": "dffml_feature_git.feature.operations:git_grep",
            "outputs": {
                "found": "git_grep_found"
            },
            "retry": 0,
            "stage": "processing"
        },
        "git_commits": {
            "inputs": {
                "branch": "git_branch",
                "repo": "git_repository",
                "start_end": "date_pair"
            },
            "name": "git_commits",
            "outputs": {
                "commits": "commit_count"
            },
            "retry": 0,
            "stage": "processing"
        },
        "git_repo_author_lines_for_dates": {
            "inputs": {
                "branch": "git_branch",
                "repo": "git_repository",
                "start_end": "date_pair"
            },
            "name": "git_repo_author_lines_for_dates",
            "outputs": {
                "author_lines": "author_line_count"
            },
            "retry": 0,
            "stage": "processing"
        },
        "git_repo_checkout": {
            "inputs": {
                "commit": "git_commit",
                "repo": "git_repository"
            },
            "name": "git_repo_checkout",
            "outputs": {
                "repo": "git_repository_checked_out"
            },
            "retry": 0,
            "stage": "processing"
        },
        "git_repo_commit_from_date": {
            "inputs": {
                "branch": "git_branch",
                "date": "date",
                "repo": "git_repository"
            },
            "name": "git_repo_commit_from_date",
            "outputs": {
                "commit": "git_commit"
            },
            "retry": 0,
            "stage": "processing"
        },
        "git_repo_default_branch": {
            "conditions": [
                "no_git_branch_given"
            ],
            "inputs": {
                "repo": "git_repository"
            },
            "name": "git_repo_default_branch",
            "outputs": {
                "branch": "git_branch"
            },
            "retry": 0,
            "stage": "processing"
        },
        "git_repo_release": {
            "inputs": {
                "branch": "git_branch",
                "repo": "git_repository",
                "start_end": "date_pair"
            },
            "name": "git_repo_release",
            "outputs": {
                "present": "release_within_period"
            },
            "retry": 0,
            "stage": "processing"
        },
        "lines_of_code_by_language": {
            "inputs": {
                "repo": "git_repository_checked_out"
            },
            "name": "lines_of_code_by_language",
            "outputs": {
                "lines_by_language": "lines_by_language_count"
            },
            "retry": 0,
            "stage": "processing"
        },
        "lines_of_code_to_comments": {
            "inputs": {
                "langs": "lines_by_language_count"
            },
            "name": "lines_of_code_to_comments",
            "outputs": {
                "code_to_comment_ratio": "language_to_comment_ratio"
            },
            "retry": 0,
            "stage": "processing"
        },
        "make_quarters": {
            "expand": [
                "quarters"
            ],
            "inputs": {
                "number": "quarters"
            },
            "name": "make_quarters",
            "outputs": {
                "quarters": "quarter"
            },
            "retry": 0,
            "stage": "processing"
        },
        "quarters_back_to_date": {
            "expand": [
                "date",
                "start_end"
            ],
            "inputs": {
                "date": "quarter_start_date",
                "number": "quarter"
            },
            "name": "quarters_back_to_date",
            "outputs": {
                "date": "date",
                "start_end": "date_pair"
            },
            "retry": 0,
            "stage": "processing"
        },
        "work": {
            "inputs": {
                "author_lines": "author_line_count"
            },
            "name": "work",
            "outputs": {
                "work": "work_spread"
            },
            "retry": 0,
            "stage": "processing"
        }
    }
}

@johnandersen777 johnandersen777 force-pushed the data_flow_scanner branch 2 times, most recently from d9661b4 to 6b285c7 Compare June 11, 2022 16:12
@terriko
Copy link
Contributor

terriko commented Sep 27, 2022

Hey @pdxjohnny -- I'm trying to clean up some old PR requests before hacktoberfest. What did you want to do with this one? I'm guessing you're not likely to finish it anytime soon but do you need me to leave it open or can it be closed until you're ready to start working again?

@johnandersen777
Copy link
Author

Hi @terriko! Sorry my github notifications are a mess, I'm getting back to this now.

@terriko
Copy link
Contributor

terriko commented Apr 17, 2023

I'm cleaning up old pull requests in preparation for the hackathon this month. This one's been stagnant a while and has merge errors so I'm guessing it's not going to get merged in the next couple of weeks, but feel free to reopen when you're ready to work on it again.

@terriko terriko closed this Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants