Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvestigativeActions should be required to produce at least one ProvenanceRecord #146

Open
3 of 17 tasks
ajnelson-nist opened this issue Jan 23, 2024 · 5 comments · May be fixed by #147
Open
3 of 17 tasks

InvestigativeActions should be required to produce at least one ProvenanceRecord #146

ajnelson-nist opened this issue Jan 23, 2024 · 5 comments · May be fixed by #147
Milestone

Comments

@ajnelson-nist
Copy link
Member

ajnelson-nist commented Jan 23, 2024

Background

Discussion on CASE Issue 136 suggests that an InvestigativeAction should always result in the creation of at least one ProvenanceRecord.

Requirements

Requirement 1

CASE should enforce that an InvestigativeAction results in at least one ProvenanceRecord.

As an implementation note, this would be done with a qualified SHACL constraint.

Edited 2024-02-15: "Must" relaxed to "should".

Requirement 2

CASE should describe in a mechanically discoverable way that an InvestigativeAction is expected to always result in at least one ProvenanceRecord.

As an implementation note, this would be done with a qualified minimum cardinality in an OWL Restriction.

Risk / Benefit analysis

Benefits

  1. Requiring a ProvenanceRecord always be generated induces a chain of custody tie in forensic processing for resultant objects of InvestigativeActions.
  2. Reintroduction of OWL constructs will assist with OWL-specific review mechanisms that do not appear to be possible in SHACL, such as set-satisfiability (e.g. determining through set-theoretic analysis whether a class or restriction has accidentally ended up equating to the empty set, rendering usage conformant with the specification impossible).
    1. This is acknowledged to be a broader issue than this one proposal. However, a minimum cardinality restriction appears to the submitter to be a "safe" reintroduction in terms of complexity.

Risks

  1. Existing SHACL shapes require a ProvenanceRecord always have one member UcoObject. Thus, this proposal would induce a significant requirement on InvestigativeActions: They must always result in something aside from the ProvenanceRecord.
    1. Note that an object being a result of an action does not necessarily imply that the object was created by the action. This stemmed from discussion on UCO Issue 558.
    2. It is possible the definition of ProvenanceRecord is too stringent. It is somewhat a separate concern that there might exist a class of InvestigativeActions that truly have no results. Perhaps: "This action found all files within this directory. There were none."
    3. NOTE: Risk 1 mitigated with resolution of UCO Issue 599. ProvenanceRecords may now be empty.
  2. Some Actions might be desired to be defined in a manner that attempt to restrict the results to a specific class, e.g., IP addresses. If such an action-class were introduced, it could never be an InvestigativeAction, because an InvestigativeAction would be required to include a ProvenanceRecord among its results. Hence, this proposal would end up inducing an upstream design constraint on UCO: action:result can never be constrained with owl:allValuesFrom, because UCO doesn't "know" about case-investigation:ProvenanceRecord.
  3. This proposal does not specify whether there must only be one ProvenanceRecord among the results. This is an inconclusive point from the discussion on CASE Issue 136, and could be affected depending on whether the committee decides a subaction's ProvenanceRecord should also be recorded in the parent action's results.
  4. This proposal suggests restoring OWL practices, starting with a description of at least one of the outputs for any InvestigativeAction. CASE and UCO previously abandoned OWL in UCO 0.7.0 / CASE 0.5.0. This proposal starts a disciplined reintroduction of OWL constructs, testing with the UCO-OWL syntax review mechanisms.
    1. UCO Change Proposal 23 housed discussion, though it appears that document was not exported from the access-controlled UCO Confluence space. (I don't think there is a reason it wasn't, aside from document exports only becoming a mandated part of the proposal process in later releases.)
    2. A test focused on the syntax used will be added in a separate proposal to UCO.
  5. Due to needing SHACL qualified shapes, the CASE testing infrastructure also needs to require pySHACL >= 0.24.0, which incorporates a resolution to pySHACL Issue 213.
  6. (Added 2024-02-15.) In information sharing situations, some data might be restricted from being shared or alluded to, e.g., from legally imposed redactions. If Org1 shares part of a graph with Org2, and includes some InvestigativeAction for, say, its timing and tool-use relevance, but doesn't share the identifier for the generated ProvenanceRecord, the shared data should by itself still be conformant to UCO, and should not impose UCO validation errors when folded into the receiving organization's knowledge base.

Competencies demonstrated

Competencies are omitted from this proposal, as the effects are new restrictions on data, and hence do not enable new expressive abilities.

Solution suggestion

For CASE 1.x.0, add the following to investigation.ttl:

investigation:InvestigativeAction
	rdfs:subClassOf [
		a owl:Restriction ;
		owl:onProperty uco-action:result ;
		owl:onClass investigation:ProvenanceRecord ;
		owl:minQualifiedCardinality "1"^^xsd:nonNegativeInteger ;
	] ;
	sh:property [
		sh:message "An InvestigativeAction should have a ProvenanceRecord among its results.  This will be a requirement in CASE 2.0.0."@en ;
		sh:path uco-action:result ;
		sh:qualifiedMinCount "1"^^xsd:integer ;
		sh:qualifiedValueShape [
			a sh:NodeShape ;
			sh:class investigation:ProvenanceRecord ;
		] ;
		sh:severity sh:Warning ;
	] ;
	.

For CASE 2.0.0, remove the sh:message and sh:severity triples from the added sh:PropertyShape.

Coordination

  • Administrative review completed, proposal announced to Ontology Committees (OCs) on Jan. 26, 2024
  • Requirements to be discussed in OC meeting, date Feb.15, 2024
  • Risk 1 addressed - InvestigativeActions that have no non-ProvenanceRecord results confirmed supportable.
  • Requirements to be discussed in OC meeting, date TBD.
  • Requirements Review vote has not occurred
  • Requirements development phase completed.
  • Solution announced to OCs on TODO-date
  • Solutions Approval to be discussed in OC meeting, date TBD
  • Solutions Approval vote has not occurred
  • Solutions development phase completed.
  • Backwards-compatible implementation merged into develop for the next release
  • develop state with backwards-compatible implementation merged into develop-2.0.0
  • Backwards-incompatible implementation merged into develop-2.0.0 (or N/A)
  • Milestone linked
  • Documentation logged in pending release page
  • Prerelease publication: CASE develop branch updated to track UCO's updated develop branch
  • Prerelease publication: CASE develop-2.0.0 branch updated to track UCO's updated develop-2.0.0 branch
ajnelson-nist added a commit that referenced this issue Jan 23, 2024
This new shape stemmed from discussion on CASE Issue 136.

As a matter of preserving backwards compatibility, this patch introduces
the shape requiring `ProvenanceRecord`s with a `sh:Warning`-level
severity.  In CASE 2.0.0, this requirement will be strengthened into a
`sh:Violation`.

A separate proposal will be filed with UCO to test the minimum qualified
cardinality OWL structure.  A draft of that syntax review system was
used to test this patch.

This patch adds a version floor for pySHACL to ensure an update in
qualified value shape handling is included, which is necessary for the
new property shape to function when using pySHACL.

Disclaimer:

References:
* RDFLib/pySHACL#213
* #136
* #146

Signed-off-by: Alex Nelson <[email protected]>
@ajnelson-nist ajnelson-nist added this to the CASE 1.x.0 milestone Jan 23, 2024
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Jan 26, 2024
A follow-on patch will regenerate Make-managed files.

References:
* casework/CASE#146

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/casework.github.io that referenced this issue Jan 26, 2024
References:
* casework/CASE#146

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Jan 26, 2024
A follow-on patch will regenerate Make-managed files.

References:
* casework/CASE#146

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Jan 26, 2024
References:
* casework/CASE#146

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Jan 26, 2024
A follow-on patch will regenerate Make-managed files.

References:
* casework/CASE#146

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this issue Jan 26, 2024
References:
* casework/CASE#146

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Jan 26, 2024
A follow-on patch will regenerate Make-managed files.

References:
* casework/CASE#146

Signed-off-by: Alex Nelson <[email protected]>
@sbarnum
Copy link
Contributor

sbarnum commented Feb 15, 2024

While I agree with this proposal in intended spirit I do not feel it is viable due to Risk 1 and Risk 2 above.

I do not believe either of these risks can be ignored in favor of the intent of this proposal.

I believe that Risk 2 is real and could have a significant impact if ignored.
I believe that Risk 1 is very real and WILL have a critical impact if ignored. There are certainly investigative actions that could have no result.

We can say that an InvestigativeAction SHOULD have at least one ProvenanceRecord but we cannot say MUST.

@ajnelson-nist
Copy link
Member Author

While I agree with this proposal in intended spirit I do not feel it is viable due to Risk 1 and Risk 2 above.

I do not believe either of these risks can be ignored in favor of the intent of this proposal.

I believe that Risk 2 is real and could have a significant impact if ignored. I believe that Risk 1 is very real and WILL have a critical impact if ignored. There are certainly investigative actions that could have no result.

We can say that an InvestigativeAction SHOULD have at least one ProvenanceRecord but we cannot say MUST.

More on Risk 1:

I'm more inclined to review and revise that minimum-count 1 SHACL rule on ContextualCompilation. This is not the first place that has caused an issue: the experimental extension ontology in CASE-Corpora is trying an alignment between DCAT-US (in short, a model for datasets) and CASE+UCO. Some things under DCAT-US looked like philosophic kindreds to ContextualCompilation, but would at times be appropriately empty (e.g., datasets with distribution files, but not publicly available distribution files). The sh:minCount 1 rule inherited from ContextualCompilation calls that a data error. So there is some subclassing in that repository that feels ...contortive.

I'm glad you and think it is appropriate to represent investigative actions that have no non-provenance-record results. I think it's a little strange-feeling to have a provenance record with no members as the sole result of an investigative action, but it isn't necessarily wrong. For instance, it could be a further sanity check down stream in CASE analysis if that "empty" provenance record were used by a later investigative action and nothing in the (empty) provenance record was also an input to that same investigative action. (This is inching out of scope of this proposal, but my gut's saying that's a sanity check I would be grateful to have; it sounds like it would catch copy-paste errors stemming from copying the wrong thing.)

I think Risk 1 is solely from ContextualCompilation having used SHACL for its minimum member count description instead of OWL. A SHACL minimum-1 count, anywhere, induces validation failures for incomplete information, so it is a construct that must be used sparingly. Should a UCO graph fail validation because it named a set (ContextualCompilation) but said nothing of its members? This is a bigger question for data sharing, which I'm noting here because this might be another risk specific to this proposal. Here's an example:

If Org1 shares part of a graph with Org2, and includes some InvestigativeAction for, say, its timing and tool-use relevance, but doesn't share the identifier for the generated ProvenanceRecord, should that shared data fail validation?

After discussion on this morning's call, it is likely that that spelling change for ContextualCompilation will be proposed.

@ajnelson-nist
Copy link
Member Author

From discussion on this morning's call, we felt the risks (including the one realized just prior to the call on information sharing) left us uncertain the requirements are sufficiently captured. We will return to this after proposing at least one upstream matter on UCO to address Risk 1.

@ajnelson-nist
Copy link
Member Author

The proposal has received some revisions (accompanied by string "2024-02-15"), and an extra step in its coordination checklist.

@ajnelson-nist
Copy link
Member Author

Risk 1 has been addressed with the resolution of UCO Issue 599.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants