You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When updating to DKAN 2.19 from a lower version, if two different harvest_ID_runs tables happen to have the same timestamp a sql error is thrown because the timestamp is treated as the unique identifier. This is unlikely since it is only a span of 1 second, but it is possible to encounter in the wild.
> > [notice] Converting runs for home_health__data
> > [error] Drupal\Core\Database\IntegrityConstraintViolationException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1599570670' for key 'PRIMARY': INSERT INTO "harvest_runs" ("id", "harvest_plan_id", "data", "extract_status") VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3); Array
> > (
> > [:db_insert_placeholder_0] => 1599570670
> > [:db_insert_placeholder_1] => home_health__data
> > [:db_insert_placeholder_2] => {"plan":"{\"identifier\":\"home_health__data\",\"extract\":{\"type\":\"\\\\Drupal\\\\pqdc\\\\Harvest\\\\ETL\\\\Extract\\\\DataJson\",\"uri\":\"file:\\\/\\\/\\\/mnt\\\/tmp\\\/data.json\"},\"transforms\":[],\"load\":{\"type\":\"\\\\Drupal\\\\harvest\\\\Load\\\\Dataset\"}}","status":[],"errors":{"extract":"Error decoding JSON."}}
> > [:db_insert_placeholder_3] => FAILURE
> > )
> > in Drupal\mysql\Driver\Database\mysql\ExceptionHandler->handleExecutionException() (line 45 of /var/www/html/docroot/core/modules/mysql/src/Driver/Database/mysql/ExceptionHandler.php).
> > [error] SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '1599570670' for key 'PRIMARY': INSERT INTO "harvest_runs" ("id", "harvest_plan_id", "data", "extract_status") VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3); Array
> > (
> > [:db_insert_placeholder_0] => 1599570670
> > [:db_insert_placeholder_1] => home_health__data
> > [:db_insert_placeholder_2] => {"plan":"{\"identifier\":\"home_health__data\",\"extract\":{\"type\":\"\\\\Drupal\\\\pqdc\\\\Harvest\\\\ETL\\\\Extract\\\\DataJson\",\"uri\":\"file:\\\/\\\/\\\/mnt\\\/tmp\\\/data.json\"},\"transforms\":[],\"load\":{\"type\":\"\\\\Drupal\\\\harvest\\\\Load\\\\Dataset\"}}","status":[],"errors":{"extract":"Error decoding JSON."}}
> > [:db_insert_placeholder_3] => FAILURE
> > )
> >
> > [error] Update failed: harvest_update_8008
> > [notice] Update started: metastore_update_8009
> > [notice] Updated 0 dictionaries. If you have overridden DKAN's core schemas,
> > you must update your site's data dictionary schema after this update. Copy
> > modules/contrib/dkan/schema/collections/data-dictionary.json over you local
> > site version before attempting to read or write any data dictionaries.
> > [notice] Update completed: metastore_update_8009
> > [notice] Update started: metastore_admin_update_8012
> > [notice] Update completed: metastore_admin_update_8012
> [error] Update aborted by: harvest_update_8008
> [error] Finished performing updates.
Expected Behavior
The migration of data from one table to another should happen without error.
Steps To Reproduce
Have a data setup where you have harvest_ID_runs that contain the same timestamps.
run update.php, or drush upddb or drush dkan:harvest:update
See errors
Relevant log output (optional)
No response
Anything else?
This may be too unlikely a scenerio to add a try catch block HarvestUtility::convertRunTable() but I will at least provide a drush sqlc command to undo any duplicated IDs.
Is there any issue with just incrementing the timestamp by 1 second (hoping it would make it then be unique)? Or is that timestamp used as a unique ID other places in the system?
It turns out that bumping the timestamp will likely disconnect the harvest run from all other things. This also means it could not be addressed with a try catch.
The new hope is that some variation of this might work:
Change the harvest_runs schema to NOT have the id be a key.
The HarvestRunRepository()::loadEntity() function treats the id AND the harvest_plan_id as the combined key to look up the harvest run entity.
So if the id were not the key for the table, then there would be no issue with the id needing to be unique. The only risk would be if two harvest runs from the same plan took place at the same time.
I think the easiest solution to implement would be to add another column which is the actual ID (maybe a uuid), and use that as the unique key. Leave everything else in place, and provide an update path to the new entity schema for both old style harvest_id_run tables and the newer entity tables.
Current Behavior
When updating to DKAN 2.19 from a lower version, if two different harvest_ID_runs tables happen to have the same timestamp a sql error is thrown because the timestamp is treated as the unique identifier. This is unlikely since it is only a span of 1 second, but it is possible to encounter in the wild.
Expected Behavior
The migration of data from one table to another should happen without error.
Steps To Reproduce
drush upddb
ordrush dkan:harvest:update
Relevant log output (optional)
No response
Anything else?
This may be too unlikely a scenerio to add a try catch block
HarvestUtility::convertRunTable()
but I will at least provide a drush sqlc command to undo any duplicated IDs.Discussion in CA Slack
The text was updated successfully, but these errors were encountered: