Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let file metadata (i.e. description) be specified during zip upload #724

Closed
raprasad opened this issue Jul 11, 2014 · 4 comments
Closed
Labels
Type: Feature a feature request

Comments

@raprasad
Copy link
Contributor


Author Name: Philip Durbin (@pdurbin)
Original Redmine Issue: 3232, https://redmine.hmdc.harvard.edu/issues/3232
Original Date: 2013-08-19


Currently, our zip and tar upload feature does not allow the description field to be populated on a per file basis. After upload the user much change the description field for each uploaded file, if desired.

In order to set file metadata fields such as "description" we could support some sort of "manifest" file within the zip or tar itself that contains a list of all the files in the archive and the metadata (description, category, possibly md5sum) for each file.

We could invent our own format or support an existing format such as BagIt ( http://en.wikipedia.org/wiki/BagIt ) or the DSpace Simple Archive Format: https://wiki.duraspace.org/display/DSDOC3x/Importing+and+Exporting+Items+via+Simple+Archive+Format#ImportingandExportingItemsviaSimpleArchiveFormat-ItemImporterandExporter

In addition to zip or tar upload via DVN's web interface, this functionality could also be used in the Data Deposit API (SWORDv2), which supports file upload. Some discussion of file metadata took place with Open Journal Systems (OJS) at http://irclog.iq.harvard.edu/dvn/2013-07-29#i_2752

@raprasad
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2013-08-27T19:42:23Z


Philip Durbin wrote:

Some discussion of file metadata took place with Open Journal Systems (OJS) at http://irclog.iq.harvard.edu/dvn/2013-07-29#i_2752

Jen from OJS and I discussed this again today at http://irclog.iq.harvard.edu/dvn/2013-08-27#i_3226 and she seem ok with limiting the visible fields on the OJS side to those that we can accept via zip upload:

12:49 pdurbin jwhitney: in the past we've talked about our files on the DVN side have metadata such as filename, category, and description. I don't have a way to set descriptions for files. Is this a problem for the OJS use case? See also this ticket about this: https://redmine.hmdc.harvard.edu/issues/3232
13:15 pdurbin jwhitney: does that makes sense? I'm trying to ask if it's ok if we don't populate "description" for each file on the DVN side
13:17 jwhitney pdurbin: for now, yes, I think so. OJS gives authors the option to add metadata to supplementary files (title, creators, keywords, etc.) that may differ from article-level metadata for these fields.
13:19 jwhitney pdurbin: so potentially, ojs is collecting more metadata than can currently be sent over to dataverse, unless some of the file-level metadata is propagated upward to fill in absent study-level fields
13:19 jwhitney pdurbin: although that's probably not a great idea
13:20 pdurbin jwhitney: right, OJS is collecting more metadata about files and it wouldn't all appear on the DVN side
13:21 jwhitney pdurbin: yep
13:21 pdurbin jwhitney: I'm looking at your screenshot at author_describe_datafile.png - Google Drive - https://docs.google.com/file/d/0B8Zfl4GMgyejMVV2VUV6QkptN3M/edit
13:22 pdurbin looks like for a file in OJS, you can have Title, Author(s), Keywords, Brief description, Category, and Date
13:22 jwhitney pdurbin: yes, this is based on the supplementary file upload
13:23 pdurbin jwhitney: to support all this, you and I would need to agree on some sort of manifest file, I guess... or some other way to store all this information within the zip file that is sent across via SWORD
13:24 jwhitney pdurbin: I'm wondering if it's better to capture file-level metadata that's only available OJS side, or use a simpler interface to only capture what Dataverse will currently store
13:25 pdurbin jwhitney: oh, are you saying you could expose only a few fields on the OJS side? Only the fields we can receive on the DVN side? (filename and category)
13:27 jwhitney pdurbin: that's what I'm wondering: if that approach would be too limiting for submitters
13:27 pdurbin it might feel limiting, yes
13:27 posixeleni hi pdurbin and jwhitney!
13:28 pdurbin but it would probably be frustrating for submitters if they filled in a bunch of fields that don't get propogated to the DVN side
13:28 jwhitney hello!
13:28 jwhitney pdurbin: agreed
13:28 pdurbin posixeleni: are you following this?
13:28 pdurbin posixeleni: and hello! :)
13:29 posixeleni i saw you chatting about the invidiviual file level metadata and just wanted to ask a quick question about how OJS would capture send over to us about the overall metadata for the Dataverse study
13:29 pdurbin posixeleni: oh, well, that's different... study-level metatadat
13:29 posixeleni we got that covered right?
13:30 pdurbin well, let's finish the file-level metadata (i.e. description) discussion
13:30 pdurbin for now anyway :)
13:30 posixeleni cool sorry to interrupt!
13:30 pdurbin I'm in favor of limiting the visible fields on the OJS side to what we can receive on the DVN side (filename and category)
13:31 pdurbin I realize this is limiting, but I'm more worried about the frustration submitters would feel when they realize the description, etc. doesn't get propogated to the DVN side
13:31 jwhitney I agree -- otherwise, I think it's misleading to collect metadata that doesn't get deposited

I also left a note about this on the OJS mockups doc: https://docs.google.com/document/d/1T-i2a4synXIhe3DClYyALI8VYgh2hLdJJMmd6KVVXhc/edit?pli=1

@raprasad
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2013-11-06T14:29:23Z


At https://help.hmdc.harvard.edu/Ticket/Display.html?id=169905#txn-3486070 Eleni pointed out that this blog post mentions Bagit: Introducing next year’s model, the data-crate; applied standards for data-set packaging | ptsefton - http://ptsefton.com/2013/11/01/1944.htm

"Crate = Bagit + Zip + x"

@raprasad
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-05-23T15:15:53Z


Now that we're developing a "native" API, perhaps we could re-visit this ticket. I just added a Trello card for this: https://trello.com/c/LiD9Xx5u/12-let-file-metadata-i-e-description-be-specified-during-upload

@raprasad
Copy link
Contributor Author

Duplicate issue. Same as #723

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature a feature request
Projects
None yet
Development

No branches or pull requests

1 participant