Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File API download bypasses terms of use and guestbook #2911

Closed
scolapasta opened this issue Feb 3, 2016 · 12 comments
Closed

File API download bypasses terms of use and guestbook #2911

scolapasta opened this issue Feb 3, 2016 · 12 comments

Comments

@scolapasta
Copy link
Contributor

Currently, when you download a file through the UI, all logic for creating a GuestbookResponse row is down before hitting the API to download the file.

If you download the file directly from the API, you don't create a row here, so the count does not go up. Also this bypasses the terms of use and guestbook completely. We need to make sure a ) a row gets created, so counts are accurate, b) that we determine how we want to handle the bypassing of the terms of use (via a token?) rather than just acting like they don't exist.

@scolapasta scolapasta added Feature: API Component: Code Infrastructure formerly "Feature: Code Infrastructure" labels Feb 3, 2016
@djbrooke djbrooke changed the title File API does not increment download count / bypasses terms of use File API download bypasses terms of use Oct 11, 2016
@djbrooke
Copy link
Contributor

Updating this to cover terms of use and not increasing download count, which is covered in #3331.

@pdurbin pdurbin added the Type: Bug a defect label Jun 29, 2017
@pdurbin pdurbin added the User Role: Curator Curates and reviews datasets, manages permissions label Jul 13, 2017
@solhm
Copy link
Contributor

solhm commented Jan 26, 2018

We are looking forward to this functionality cause we are facing some issue related to copyright regarding organizations those harvesting our dataverse using the API.Since they are getting a direct download link to the file and puts it on their sits users are downloading them without any knowledge or agreement to the terms of use.

@pdurbin
Copy link
Member

pdurbin commented Oct 28, 2019

@solhm thanks for your comment. I just brought up "File API download bypasses terms of use" with @djbrooke @scolapasta and @sekmiller while discussing #3758.

@pdurbin
Copy link
Member

pdurbin commented Mar 20, 2024

@alejandratenorio
Copy link
Contributor

alejandratenorio commented Apr 2, 2024

Hi all,

Possibly CIMMYT could collaborate on this. As @pdurbin suggested, we would like to have a proposal validated by you before any development.
We think that the file download could work as follows, (it's our proposal v. 0.2):

These are our assumptions:

  • You can create an API token only if you have a user on a Dataverse instance. At least we have each user's last name, first name and email address. Desirably the Affiliation.
  • Using the API, anyone can download files with no access restrictions.
  • If someone uses an API Token, we could know the user associated to that token, do not?
  • As a user, when you request access to a restricted datafile you must accept the Terms of Access for Restricted Files.

File download:
Terms of use:
As a user, when you request access to a restricted datafile you must accept the Terms of Access for Restricted Files and its Terms of use.

  • If its dataset has no Terms of use & the datafile has no access restrictions:
    o No changes.

  • If its dataset has no Terms of use & the datafile has access restrictions:
    o No changes, an API token is required.

  • If its dataset has Terms of use & the datafile has no access restrictions:
    o An API token is required because we must be sure that a user accepts the terms of use.
    o The API Would download the file with its terms of use as a txt file.

  • If its dataset has Terms of use & the datafile has access restrictions:
    o An API token is required.
    o Would the API download the file with its terms of use as a txt file? If the user has already accepted the terms of use, is it necessary?

Guestbook:

  • If a Dataset has no guestbook & the datafile has no access restrictions:
    o No changes.

  • If a Dataset has no guestbook & the datafile has access restrictions:
    o No changes, an API token is required.

  • If a Dataset has guestbook & the datafile has no access restrictions:
    o A token will always be required, and the API would create a GuestbookResponse row with the user's first name, last name and email.

  • If a Dataset has guestbook & the datafile has access restrictions:
    o A token will always be required, no changes.
    o and the API would create a GuestbookResponse row with the user's first name, last name and email.

We underline the proposed changes.
Please let me know your comments and whether this proposal is feasible.

@alejandratenorio
Copy link
Contributor

Hi all,

Due to some observations and comments, we have adjusted our proposal:
These are our assumptions:

  • You can create an API token only if you have a user on a Dataverse instance. At least we have each user's last name, first name and email address. Desirably the Affiliation.
  • Using the API, anyone can download files with no access restrictions.
  • If someone uses an API Token, we could know the user associated to that token, do not?
  • As a user, when you request access to a restricted datafile you must accept the Terms of Access for Restricted Files.

CIMMYT Proposal - File download:

  • Proposed changes are highlighted in italics.

Guestbook:
Since not all institutions may require these restrictions, we propose adding a global setting to enable this new functionality.

  • If a Dataset has no guestbook & the datafile has no access restrictions:
    o No changes.

  • If a Dataset has no guestbook & the datafile has access restrictions:
    o No changes, an API token is required.

  • If a Dataset has guestbook & the datafile has no access restrictions:
    o A token will always be required, and the API would create a GuestbookResponse row with the user's first name, last name and email.

  • If a Dataset has guestbook & the datafile has access restrictions:
    o A token will always be required, and the API would create a GuestbookResponse row with the user's first name, last name and email.

Terms of use:
a. As a user, when you request access to a restricted datafile you must accept the Terms of Access for Restricted Files and its Terms of use.
b. Since not all institutions may require these restrictions, we propose adding a global setting to enable this new functionality.

  • If its dataset has no Terms of use & the datafile has no access restrictions:
    o No changes.

  • If its dataset has no Terms of use & the datafile has access restrictions:
    o No changes, an API token is required.

  • If its dataset has Terms of use & the datafile has no access restrictions:
    o When a bot or user attempts to download a datafile directly from the API, they will not download the datafile itself; instead, they will download a PDF or TXT containing all the metadata of the datafile and the data from user or bot attempting the download: User agent, IP, Date and Time.
    o Additionally, a message will be added to the file similar to: "If you wish to download the datafile XXXX, please go to [insert Datafile URL]."
    o At the end of the file, a text will also be added mentioning that the datafile is subject to usage restrictions and explicit approval is required.

  • If its dataset has Terms of use & the datafile has access restrictions:
    o An API token is required.
    o No changes, see section A of this block.

Private link to accept Terms of Use:
a. Since not all institutions may require these restrictions, we propose adding a global setting to enable this new functionality.

  • If a dataset has Terms of use, Dataverse opens a pop-up windows with the terms to ensure that the user cannot download the files without accepting the terms.

We would like to hear your comments, if you think it could work.

@pdurbin
Copy link
Member

pdurbin commented Apr 9, 2024

@alejandratenorio thank you for the detailed writeup! Overall, I think this makes a lot of sense. A few questions:

  • For this part... If you wish to download the datafile XXXX, please go to [insert Datafile URL]... would the second URL always be the same or would it vary and expire over time? If it's the latter, perhaps we could re-use SignedUrls from GDCC/7715 Signed Urls for external tools #9001.
  • What do you think about making the new behavior the default, since it's more secure... and if installations don't like it, the configuration option could revert to the old behavior?
  • For guestbook, what about required fields that aren't in the user account? Custom questions can be created and set as required, which complicates things.
  • Have you considered getting additional feedback from the Dataverse community by posting at https://groups.google.com/g/dataverse-community ? I think others might have opinions on this! I'll also mention this our internal Slack (DONE).

@pdurbin pdurbin changed the title File API download bypasses terms of use File API download bypasses terms of use and guestbook Apr 9, 2024
@alejandratenorio
Copy link
Contributor

Hi @pdurbin,

Thanks you very much for your comments.

For this part... If you wish to download the datafile XXXX, please go to [insert Datafile URL]... would the second URL always be the same or would it vary and expire over time? If it's the latter, perhaps we could re-use SignedUrls from GDCC/7715 Signed Urls for external tools #9001.

We propose to use its Persistent Datafile URL, something like "If you wish to download the datafile [datafile name], please go to [Persistent Datafile URL]."

What do you think about making the new behavior the default, since it's more secure... and if installations don't like it, the configuration option could revert to the old behavior?

Yeah, great idea.

For guestbook, what about required fields that aren't in the user account? Custom questions can be created and set as required, which complicates things.

Since we do not have all this information, a solution could also be to download the PDF / TXT file. What do you think?

Have you considered getting additional feedback from the Dataverse community by posting at https://groups.google.com/g/dataverse-community ? I think others might have opinions on this! I'll also mention this our internal Slack (DONE).

We could have a final proposal together and share it, what do you say?

@pdurbin
Copy link
Member

pdurbin commented Apr 10, 2024

Sure. I think I'm still a bit confused about the proposed multistep solution for downloading files. Is it something like this?

  • API user tries to download a file with terms. They get a text file instead.
  • The text file has the URL to download the file.

I guess my question is, do they have to parse the text to find the URL? Will this be easy to do?

What happens to the existing download URL? It stops working? Now the user get a text file instead?

We can go back to Zulip if that's easier! 😄

Or maybe a Google doc where I can leave comments here or there?

@qqmyers
Copy link
Member

qqmyers commented May 10, 2024

@DS-INRAE DS-INRAE moved this to 🔍 Interest in Recherche Data Gouv Jul 10, 2024
@cmbz
Copy link

cmbz commented Aug 20, 2024

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

@cmbz cmbz closed this as completed Aug 20, 2024
@github-project-automation github-project-automation bot moved this from 🔍 Interest to Done in Recherche Data Gouv Aug 20, 2024
@pdurbin
Copy link
Member

pdurbin commented Aug 21, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

8 participants