Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISCUSSION: SPDX 3 data model #240

Closed
wants to merge 3 commits into from

Conversation

kzantow
Copy link
Collaborator

@kzantow kzantow commented May 14, 2024

DO NOT MERGE!

This PR is intended to provide a spot for an open discussion about what people would like to see for working with SPDX 3 documents. There is a particularly challenging bit, that SPDX 3 is implemented using a single inheritance model, which Go does not directly support. Due to this, I'd like to offer up a few possibilities to determine which users would most like to see.

In the spdx/v3/v3_0 directory, there are several subdirectories, each implementing a different variant of a model that I believe would support SPDX 3. I've implemented a very simple example with each: creating a couple packages, a file, and some relationships. 👀 look at the *_test.go to see the example usage. There, of course, could be certain parts of these different variants used together such as setter errors and first-class embedded structs.

Also check the README with some more information and what I see as pros/cons for each of the approaches.

And, of course, if someone has different ideas, please do share them! Perhaps I've missed some obvious solutions that could simplify things here!

There's a high likelihood that I've made some mistakes with the modeling and usage, but this is more to determine the patterns we would like to use, so we can use the SPDX 3 SHACL model to generate the corresponding Go code here.

Element

RootElements []IElement
Elements []IElement
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem with this implementation is that RootElement is not class Element but any class that inherits from Element. In Go, I guess, that translates to "any struct that implements the Element interface".

Copy link
Collaborator Author

@kzantow kzantow May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this variant, the I* types are all interfaces -- so these are "any types implementing the IElement interface". These could be named differently... naming is hard! there doesn't seem to be a way to avoid interfaces unless we made any reference field an any /interface{} type, but then there's no compiler enforcement of elements going in there. (Also I probably modeled this wrong, I was going by the software profile svg which appears to have * for both of these fields).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think personally I'm leaning towards the embedded_with_interfaces_and_constructors variant, with setters that return error, as the example currently shows. This allows generated code to enforce valid values for all fields and for a user to most quickly understand when things failed rather than needing to run a full validation once all values are set. It also eliminates the need to export the data structs (all the embedded variants need exported structs)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my bad: the property is of type IElement

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work. It would be easier to generate:

package main

import "fmt"

type ElementP struct {
	Name string
}

type Element interface {
	AsElement() ElementP
}

type Artifact struct {
	ElementP
	StandardName []string
}

func (a Artifact) AsElement() ElementP {
	return a.ElementP
}

type ElementCollection struct {
	ElementP
	RootElement Element
}

func main() {
	a := Artifact{
		ElementP: ElementP{
			Name: "AnArtifact",
		},
		StandardName: []string{"TheArtifact"},
	}

	ec := ElementCollection{
		ElementP: ElementP{
			Name: "AnElementCollection",
		},
		RootElement: a,
	}

	fmt.Println(ec)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more issue: The jinja renderer I am using from the shacl2code project can only generate one file. So if we were to generate a model and a constructor file, we would need to roll our own jinja renderer...which I would rather do once I can handle the code better. You are welcome to try building one that will generate multiple files.

Copy link
Collaborator Author

@kzantow kzantow May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note: there is no requirement to have separate files -- I just did this to separate things logically, but all the code for each of these approaches could be generated in a single file. In other words: if we were to export all 4 types: interface, data struct, constructor, constructor props; all of these could be one after the other in the same file.

From the "generic shacl2code" perspective, I think in the example there's a bit of an inconsistency with the example you posted: only a single struct type gets renamed (ElementP) for an Element interface, but there are no other interfaces created. I think the easiest thing to do in shacl2code is probably: generate an exported interface and a struct for each type. It also probably makes the most sense to name the interface after the shacl type rather than the struct, as one of my examples has. Abstract types are a question: golang would let you indiscriminately construct any abstract type if these structs are exported, and they would need to be generated and exported to be used like superclasses through embedding.

The interfaces_only approach has the potential to model inheritance defined in shacl in the most correct way, I believe -- because abstract classes can omit constructor functions and since the types themselves are not exported, there's no way for a user to create these types incorrectly. It has some complexity that at the time you're generating code for a type, it needs to have access to supertypes to be able to inline all the properties -- a quick look makes me think this wouldn't be too hard to accomplish, since it looks like we can get a reference to a parent class and jinja supports recursion.

An aside: naming is hard! Is the P from ElementP short for Props or Properties? I am sorta partial to Data for these structs despite using Impl in one of the variants, but also like Props, at least for the constructor props types, or other things; happy for any other suggestions but would definitely like to keep it short but maybe more descriptive than a single letter. Interfaces only allows the lowercase name to be used, since it's not exported and we don't have to come up with names, so I guess there's another plus for that one... except for package, which is a reserved word 🤦

Copy link
Collaborator Author

@kzantow kzantow May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an "embedded_second" variant, which I think is similar to what you were suggesting (replace Data with P or some other suffix TBD and the interfaces are the first-class names of the types); I just wanted to point out specifically what the usage would look like, which could be just fine: https://github.com/spdx/tools-golang/pull/240/files#diff-55bf8e9ab2432ee525685cd1e7ef75be715a7e18c4eef3011711b89af64bc389R14

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with you on going with the interface only approach i.e. Abstract class -> Go Interface with getter and setter for each property. Concrete class -> Go struct which implements the getter and setter methods for the abstract class.

I personally would prefer having corresponding internal structs to hold common properties in the concrete classes, but it's not a deal breaker. We have to embed the interfaces anyway (Artifact is actually an abstract class that inherits from Element).

@kzantow kzantow mentioned this pull request May 16, 2024
Signed-off-by: Keith Zantow <[email protected]>
ElementCollection
}

type spdxDocument struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "interfaces only" example, but this is a struct. Is a concrete class a struct and an abstract class an interface?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this struct is hidden though, so users are not able to directly index the struct and will need to rely on interfaces to get/set values.

Comment on lines +13 to +14
spdxID ElementID
name string
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generating each abstract parents' properties using jinja would be ok to do. You think this is a better option than using a struct called elementProperties (purposefully not exporting the struct, but making the concrete class implement the interface Element).

name string
}

var _ Element = (*ElementImpl)(nil)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this does?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a way to write implements <interface> in go:

var _ Element = (*ElementImpl)(nil)

is a compile-time check that an *ElementImpl type can be assigned to an Element interface, it gets optimized out by the compiler so no runtime overhead is incurred. It's not really necessary for any of the code generation, I mostly added it for my own benefit.

A similar compile-time check would happen with constructor functions, like:

func NewPackage() Package {
   return &packageImpl{}
}

will similarly fail to compile if packageImpl doesn't adhere to the Package interface

Copy link
Collaborator

@lumjjb lumjjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you sooo much @kzantow for this super detailed rundown - I like embedded with interfaces and constructors the best!

@@ -0,0 +1,53 @@
# SPDX 3 Model Patterns

This is a non-exhaustive list of some patterns we are able to use in Golang to work with an SPDX 3 data model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you sooo much for putting this level of detail and even mocking out all the examples! You've made it super understandable and easy to review!

- pro: modifying structs is simple and idiomatic
- pro: minimal surface area: very few additional functions necessary
- con: construction is confusing -- which nesting has field X?
- con: naming is hard, need `Package` and `IPackage`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an interesting dimension to consider also is addition on new fields and compatibility between versions, i think each of them may have pros/cons in this regard as well. (like in interfaces we can just have the v3.1 interface embed the v3.0 interface and make backward compatibility simple.

- pro: interfaces are simple
- pro: modifying structs is simple and idiomatic
- pro: minimal surface area: very few additional functions necessary
- con: construction is confusing -- which nesting has field X?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea this is something i really dislike about embedding as well, the constructors are very non-intuitive.. especially to a user who does not really have to the inheritance relationships to actually use SPDX.

ElementCollection
}

type spdxDocument struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this struct is hidden though, so users are not able to directly index the struct and will need to rely on interfaces to get/set values.


- [embedded with interfaces and constructors](embedded_with_interfaces_and_constructors) - takes the approach of both
embedding and interfaces, along with constructor functions. This makes creation reasonably simple and working with
existing documents reasonably simple.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this one the best! i think there is one aspect which i'm not sure what the solution to is (which may be generic to anything that is implemented with interfaces and not just to this one)... when we have a field like a slice of elements, like To []Element. Once we get hold of an Element, how do we get back to the child class of the Element? Like a Package is an Element but when we get a hold of the Package as an Element in a relationship, how do we know its a Package?

- pro: eliminates some code duplication seen with `interfaces_only`
- con: getter/setter still not especially idiomatic
- con: larger surface area: at least 4 exported parts for each type
- con: absolutely no JSON/etc. output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure what you mean by this, is it just because the fields are not public thus no serialized JSON output?

- pro: reasonably simple to work with existing documents
- pro: eliminates some code duplication seen with `interfaces_only`
- con: getter/setter still not especially idiomatic
- con: larger surface area: at least 4 exported parts for each type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... yea i wonder if there's anyhting we can do with getters/setters here with reflection...

why do we need to export the internal struct again? can you remind me?

@lumjjb
Copy link
Collaborator

lumjjb commented May 30, 2024

@goneall is there a JSON schema spec for v3? Also would love your thoughts on the proposal since you've been doing this with Java as well (since i imagine having inheritance is nice for this data model :))

@goneall
Copy link
Member

goneall commented May 30, 2024

@goneall is there a JSON schema spec for v3? Also would love your thoughts on the proposal since you've been doing this with Java as well (since i imagine having inheritance is nice for this data model :))

Yes - https://spdx.github.io/spdx-spec/v3.0/model/schema.json - This is generated from the SHACL file using the code generator Joshua wrote. I haven't reviewed the above discussion yet, but I thought I'd do a quick reply with a pointer to the schema

@nishakm
Copy link

nishakm commented Jun 5, 2024

Following up: any consensus on what direction we want to take?

@kzantow
Copy link
Collaborator Author

kzantow commented Aug 1, 2024

Thanks everyone for the feedback here! After some experimentation and back and forth with @nishakm, I have a proposed direction and would love feedback there. I opened a new PR for this, feel free to try it out: #247

@kzantow kzantow closed this Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants