Skip to content

Commit

Permalink
Merge pull request DSpace#252 from the-library-code/DSpace_duplicate_…
Browse files Browse the repository at this point in the history
…contract

Duplicate Detection: REST Contract submission section & item link
  • Loading branch information
tdonohue authored Mar 4, 2024
2 parents 84e09b2 + e4cc4eb commit 7b39dc0
Show file tree
Hide file tree
Showing 7 changed files with 222 additions and 3 deletions.
105 changes: 105 additions & 0 deletions duplicates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Duplicate detection endpoint
[Back to the list of all defined endpoints](endpoints.md)

## Main Endpoint
**/api/submission/duplicates**

Provide access to basic duplicate detection services. These services use Solr and the levenshtein distance operator
to detect potential duplicates of a given item, useful during submission and workflow review.

See `dspace/config/modules/duplicate-detection.cfg` for configuration properties and examples.

## Single duplicate

Not implemented. (a duplicate only makes sense in the context of a search by item)

## All duplicates

Not implemented. (a duplicate only makes sense in the context of a search by item)

## Search

**GET /api/submission/duplicates/search/findByItem?uuid=<:uuid>**

Provides a list of items that may be duplicates, if this feature is enabled, given the uuid as a parameter.

Note that although this appears in the submission category, the item UUID can also be an archived item.
Currently, the only frontend use of this feature is in workspace and workflow, so it is categorised as such.

Each potential duplicate has the following attributes:

* title: The item title
* uuid: The item UUID
* owningCollectionName: Name of the owning collection, if present
* workspaceItemId: Integer ID of the workspace item, if present
* workflowItemId: Integer ID of the workflow item, if present
* metadata: A list of metadata values copied from the item, as per configuration
* type: The value is always DUPLICATE. This is the 'type' category used for serialization/deserialization.

Example

```json
{
"potentialDuplicates": [
{
"title": "Example Item",
"uuid": "5ca83276-f003-460d-98b6-dd3c30708749",
"owningCollectionName": "Publishers",
"workspaceItemId": null,
"workflowItemId": null,
"metadata": {
"dc.title": [
{
"value": "Example Item",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
],
"dspace.entity.type": [
{
"value": "Publication",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
]
},
"type": "DUPLICATE"
}, {
"title": "Example Itom",
"uuid": "32f8f6e4-c79e-4322-aae7-07ee535f70a6",
"owningCollectionName": null,
"workspaceItemId": 51,
"workflowItemId": null,
"metadata": {
"dc.title": [{
"value": "Example Itom",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}]
},
"type": "DUPLICATE"
}, {
"title": "Exaple Item",
"uuid": "0647ff45-48f5-4c1b-b6d7-f5dbbc160856",
"owningCollectionName": null,
"workspaceItemId": 52,
"workflowItemId": null,
"metadata": {
"dc.title": [{
"value": "Exaple Item",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}]
},
"type": "DUPLICATE"
}]
}
```
1 change: 1 addition & 0 deletions endpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
* [/api/integration/suggestions](suggestions.md)
* [/api/integration/suggestionsources](suggestionsources.md)
* [/api/integration/suggestiontargets](suggestiontargets.md)
* [/api/submission/duplicates](duplicates.md)

## Endpoints Under Development/Discussion
* [/api/authz/resourcepolicies](resourcepolicies.md)
Expand Down
2 changes: 1 addition & 1 deletion items.md
Original file line number Diff line number Diff line change
Expand Up @@ -585,4 +585,4 @@ Return codes:
* 204 No content - if the operation succeed
* 401 Unauthorized - if you are not authenticated
* 403 Forbidden - if you are not logged in with sufficient permissions
* 404 Not found - if the item doesn't exist (or was already deleted)
* 404 Not found - if the item doesn't exist (or was already deleted)
2 changes: 1 addition & 1 deletion submission.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ This is the WorkspaceItem object you created.
It is **important** to keep the `id` of the WorkspaceItem, as this is necessary to update it or access it again.
For example, using the `id`, you can load up the current state of your WorkspaceItem
```
GET /api/sumission/workspaceitems/<:id>
GET /api/submission/workspaceitems/<:id>
```

In the response, you'll see a list of `sections` which are available to complete for this WorkspaceItem.
Expand Down
1 change: 1 addition & 0 deletions submissionsection-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,6 @@ cclicense | [/config/submissioncclicenses](submissioncclicenses.md) | [example](
access | [/config/submissionaccessoptions](submissionaccessoptions.md) | [example](workspaceitem-data-access.md)
sherpaPolicies | n/a | [example](workspaceitem-data-sherpa-policy.md)
identifiers | n/a | [example](workspaceitem-data-identifiers.md)
duplicates | n/a | [example](workspaceitem-data-duplicates.md)

n/a --> not applicable. The sectionType doesn't require/support any extra configuration
89 changes: 89 additions & 0 deletions workspaceitem-data-duplicates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# WorkspaceItem data of identifiers sectionType
[Back to the definition of the workspaceitems endpoint](workspaceitems.md)

This section data represent a list of potential duplicates associated for this workspace item.

It is a JSON object with the following structure (matches the response from the [duplicate search endpoint](duplicates.md)) :

```json
{
"potentialDuplicates": [
{
"title": "Example Item",
"uuid": "5ca83276-f003-460d-98b6-dd3c30708749",
"owningCollectionName": "Publishers",
"workspaceItemId": null,
"workflowItemId": null,
"metadata": {
"dc.title": [
{
"value": "Example Item",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
],
"dspace.entity.type": [
{
"value": "Publication",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
]
},
"type": "DUPLICATE"
}, {
"title": "Example Itom",
"uuid": "32f8f6e4-c79e-4322-aae7-07ee535f70a6",
"owningCollectionName": null,
"workspaceItemId": 51,
"workflowItemId": null,
"metadata": {
"dc.title": [{
"value": "Example Itom",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}]
},
"type": "DUPLICATE"
}, {
"title": "Exaple Item",
"uuid": "0647ff45-48f5-4c1b-b6d7-f5dbbc160856",
"owningCollectionName": null,
"workspaceItemId": 52,
"workflowItemId": null,
"metadata": {
"dc.title": [{
"value": "Exaple Item",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}]
},
"type": "DUPLICATE"
}]
}
```
The potential duplicates listed in the section have all been detected by a special Solr search that compares the
levenshtein edit distance between the in-progress item title and other item titles (normalised).

Each potential duplicate has the following attributes:

* title: The item title
* uuid: The item UUID
* owningCollectionName: Name of the owning collection, if present
* workspaceItemId: Integer ID of the workspace item, if present
* workflowItemId: Integer ID of the workflow item, if present
* metadata: A list of metadata values copied from the item, as per configuration
* type: The value is always DUPLICATE. This is the 'type' category used for serialization/deserialization.

See `dspace/config/modules/duplicate-detection.cfg` for configuration properties.

## Patch operations
There are no PATCH methods implemented for this section.
25 changes: 24 additions & 1 deletion workspaceitems.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,31 @@ Provide detailed information about a specific workspaceitem. The JSON response d
"doi" : "https://doi.org/10.5072/dspace/2",
"otherIdentifiers" : [ ]
},
"duplicates": {
"potentialDuplicates": [
{
"title": "Sample Submission Item",
"uuid": "5ca83276-f003-460d-98b6-dd3c30708749",
"owningCollectionName": "Another Collection",
"workspaceItemId": null,
"workflowItemId": null,
"metadata": {
"dc.title": [
{
"value": "Example Item",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
]
},
"type": "DUPLICATE"
}
]
},
"traditional-page1": {
"dc.title" : [{value: "Sample Submission Item"}],
"dc.title" : [{value: "Sample Submission Item"}],
"dc.contributor.author" : [
{value: "Bollini, Andrea", authority: "rp00001", confidence: 600}
]
Expand Down

0 comments on commit 7b39dc0

Please sign in to comment.