Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize reference types and logic across schemas #3772

Open
dafeder opened this issue Mar 24, 2022 · 2 comments · May be fixed by #3774
Open

Standardize reference types and logic across schemas #3772

dafeder opened this issue Mar 24, 2022 · 2 comments · May be fixed by #3774
Assignees
Milestone

Comments

@dafeder
Copy link
Member

dafeder commented Mar 24, 2022

The referencing system contains a lot of implicit rules and types of references, and has business logic spread out between the Referencer and Lifecycle classes. This could be organized much better by defining reference types/classes that contain their own reference and dereferencing logic, and simply mapping these reference types to properties in the schemas. This change would be complementary to what we're starting to sketch out in #3761 for better abstracting/defining the schema system.

An initial diagram of how this would work:

classDiagram

class ReferenceDefinitionInterface {
  <<interface>>
  +create()*
  +property()
  +schemaId()
  +reference(mixed metadata) string
  +dereference(string identifier) mixed
}

class AbstractReferenceDefinition {
  <<abstract>>
  -property
  -schemaId
  +property()
  +schemaId()
}
ReferenceDefinitionInterface --|> AbstractReferenceDefinition

class JsonReference {
  <<service>>
  +reference()
  +dereference()
}
AbstractReferenceDefinition --|> JsonReference


class ResourceReference {
  <<service>>
  +reference()
  +dereference()
}
AbstractReferenceDefinition --|> ResourceReference

class IdReference {
  <<service>>
  +reference()
  +dereference()
}
AbstractReferenceDefinition --|> IdReference

class ReferencerInterface {
  <<interface>>
  +reference(object metadata) object
}

class Referencer {
  <<service>>
  +reference()
  #referenceProperty()
  #referenceMultiple()
}
ReferencerInterface --|> Referencer

class DereferencerInterface {
  <<interface>>
  +dereference(object metadata) object
}

class Dereferencer {
  <<service>>
  +dereference()
  #dereferenceProperty()
  #dereferenceMultiple()
}
DereferencerInterface --|> Dereferencer

class ReferenceMap {
  <<service>>
  +getAllReferences(schemaId) array
  +getReference(schemaId, propertyName) ReferenceDefinitionInterface
}

Referencer ..> ReferenceMap : Dependency
Dereferencer ..> ReferenceMap : Dependency

Loading

The Referencer will use the ReferenceMap service to find all the reference definitions for a schema. It will then iterate through them and call that reference definition class's reference method (e.g. JsonReference::reference()). The dereferencer will work essentially the same way, calling the dereference() method. So we should be able to send any incoming or outgoing metastore item through the Referencer or Dereferencer without having schema-specific logic in these classes.

Currently, we have essentially two types of references - schema/JSON referencers, which replaces a JSON subtree with a Drupal UUID and stores the JSON in a second metastore item in Drupal; and resource references, which replace a downloadURL with a resource ID and creates a resource/file record in the DKAN datastore.

This will open the door to new types of references in the future. Two that are already in the works include:

  • string/literal references: Similar to the JSON references but would do a better job of handling string values, which need some extra processing to convert into and out of JSON in order not to break logic elsewhere in DKAN. (Not 100% clear this would be handled in referencing logic as opposed to schema logic.)
  • url references: Currently part of the plan for implementing data dictionaries. This would swap a Drupal UUID for a value, but rather than a full JSON object it will be swapped with a URL pointing to the JSON object. For example, while some objects, such as distributions, should be included in the full dataset JSON output, data dictionaries under DCAT-AP should be URLs. If using DKAN's native data dictionary functionality we would want to store these as intra-metastore references, but present them as absolute URLs after dereferencing.
  • (further out) file references: it has been suggested we might replace DKAN's bespoke resource system with something more integrated with Drupal's native file entity system. If we do ever do this, it will be easier to swap out if the logic is encapsulated in interchangeable reference classes.
@clayliddell
Copy link
Contributor

@dafeder Nice job! This looks good to me. I'm excited to see how this turns out.

@dafeder
Copy link
Member Author

dafeder commented Mar 24, 2022

Thanks! And, update after our conversation.. these reference classes may make more sense to implement as plugins than services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants