Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dammigration:migratemediatagsinrte _migrateddamuid does not match original dam uid #62

Open
ghost opened this issue Mar 24, 2015 · 2 comments

Comments

@ghost
Copy link

ghost commented Mar 24, 2015

After executing the commands dammigration:migratedamrecords and dammigration:migratedammetadata the command dammigration:migratemediatagsinrte ends up with several "No FAL record found for dam uid: ..." error messages.

After investigating the problem I can say that there are two possibilities.

  • The extracted dam uid is referencing a deleted dam record. Then the deleted field of the record in table tx_dam is set to 1.
  • The extracted dam uid doesn't match the field sys_file._migrateddamuid of any sys_file record but it is still possible to identify the sys_file record by comparing e.g. the dam record's file_name with sys_file.name.

I think the migration of media tags should therefore compare the sys_file.identifier with the values of the according dam record fields file_path and file_name if the comparison of the _migrateddamuid field fails. Is this a bug?

Cheers,
Thomas

@ghost
Copy link
Author

ghost commented Apr 7, 2015

It seems this is only a problem for me? I think it's a bit misleading that the migration process states a file does not exist though it was migrated successfully.

@dneuge
Copy link
Contributor

dneuge commented Sep 23, 2015

There's a third possibility I am hitting on one of our websites: For some reason we have two DAM metadata records for the same file, tx_dam.uid is 503 and 609, all data except indexing status, record creation time and filesystem information (inode, size, mtime, hash) are identical. The unmigrated website uses both DAM uids in content elements and displays them as the file exists by path and neither record is set deleted. It appears that DAM has simply forgotten to remove the older record on re-indexing (503 is marked to have been performed manually, 609 was done automatically).

When performing migration it appears that both records are being processed (no missing files reported). Since correlation between DAM and FAL is done by file path but there's only one _migrateddamuid field per sys_file record, all DAM records of the same file name share the same sys_file record and only the last processed DAM record appears in _migrateddamuid and thus can be used for later lookups. So for us, _migrateddamuid is 609 as that's the later DAM record having been processed. As a result, all references to DAM uid 503 are being lost on all other steps of migration (there is no _migrateddamuid = 503), meaning all content elements using DAM record 503 end up without a FAL record.

I'm not sure if that can be properly fixed:

The loss of file references could be solved by "reversing" _migrateddamuid, so we save all migrated sys_file uids to tx_dam and perform all lookups on that field instead. I will look into providing a patch for that.

The bigger question is: How do we know which DAM metadata record should be used for migration if there's more than one for the same file which hasn't been deleted? Currently, due to joining on _migrateddamuid in MigrateMetadataService::execSelectMigratedSysFilesQuery the last processed DAM metadata record will be used. As no specific order is requested on MigrateToStorageService::execSelectNotMigratedDamRecordsQuery, the decision which record will be migrated last and thus which metadata record will be used may be random. How to fix that?

  • Should we use the last edited metadata?
  • Or maybe the metadata of the file matching size/mtime/hash?
  • Simply use the higher uid / later creation timestamp?
  • Maybe it's best to duplicate the file on file system so we get a chance to keep all variations of metadata on migration?

All but the last option sacrifice all variations of metadata except one.

How to handle metadata should remain open to discussion, choosing of metadata records will remain random until then. I'm not sure if this is worth putting any more effort into it, but we have 2 huge websites which could run into trouble if that DAM inconsistency occured there as well. I'll have to check if it's reasonable for us to spend any time on the metadata issue.

dneuge pushed a commit to glutrot/t3ext-dam_falmigration that referenced this issue Sep 23, 2015
Reason is issue b13#62 - DAM may have multiple records for the same file.
Since we correlate DAM and FAL by file path, we need to associate at
least one DAM record to each migrated FAL record, thus the old way of
using one field sys_file._migrateddamuid was incomplete as it only
allowed associating at most one DAM record to each FAL record.

Note that we may have different metadata for the same file as a
result. Before these changes, only the metadata of the last file
having been migrated was used. Now, all metadata will be processed
which may lead to errors. This commit is only intended to record the
change in columns and does not handle that issue yet.
dneuge pushed a commit to glutrot/t3ext-dam_falmigration that referenced this issue Sep 23, 2015
If multiple metadata for same files is encountered, we now print a
warning prompting the user to check migrated metadata of all listed
files (message includes file path and UIDs of sys_file and tx_dam
records).

This is for issue b13#62 again but still doesn't attempt to solve the
conflict, we leave it up to the user to check for any possible
errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant