-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[🚀 Feature]: Add File Download Support to Bindings #11443
Comments
This issue is looking for contributors. Please comment below or reach out to us through our IRC/Slack/Matrix channels if you are interested. |
@titusfortner Sure, with a great pleasure. :) But I have few suggestions:
|
For the list I think we would need a separate implementation in the grid, right @krmahadevan? |
So it looks like we have the below asks:
The GET URL that currently responds to (2) is {
"filename":"input.txt",
"contents": "base64EncodedStringThatRepresentsAZippedFileContentsGoesHere"
} To support (1), we could alter the behaviour such that if the user does a GET on [
{
"filename":"input.txt",
"contents": ""
},
{
"filename":"anotherFile.txt",
"contents": ""
}
] To support (3) we can have the user specify a comma separated list of file names which would return back a payload of [
{
"filename":"file1.txt",
"contents": "base64EncodedStringThatRepresentsAZippedFileContentsGoesHere"
},
{
"filename":"file2.txt",
"contents": "base64EncodedStringThatRepresentsAZippedFileContentsGoesHere"
}
] So we wouldn't need to add additional URLs but can work within whatever we have already. Its just that for (1) wherein user asks for the list of files that can be downloaded it would be kind of awkward to send back a response but without the contents. But if we have a client API then we could perhaps abstract out this. The client binding could either reside within the
Please let me know what do you think |
Just a quick comment about waits - I haven't checked the current implementation, but in order to avoid querying for a file that hasn't yet downloaded, should we create a wait or build the wait into fetch of the file (retry with polling intervals and timeout)? |
This makes sense. However, I would put this behind a flag because some users might be using a shared Grid and they just do not want to allow listing the directory contents. Also, it gets tricky because what if more than one session has downloaded a file? Should we list all files no matter the session that downloaded it or should we list only the files that were downloaded in that particular session? This all implies that we could potentially need to manage a state of downloaded files in the Node? I'd really like to avoid such overhead in the Node code (that already handles quite a bit).
I do not see the need for this. What is the use case that is not covered by a single file download? I would not expect users now using the Grid to download large amounts of file per session.
This is the trickiest one because the download happens in a remote machine. AFAIK, there are only workarounds to know if a file has been completely downloaded or not, nothing really standard. That is why our suggestion is to alter the system under test to offer a way of knowing that the file has been downloaded. Another thing I forgot to mention in the initial implementation is that we forgot to add some mechanism to clean downloaded files, otherwise the Node machine can run out of space at some point. |
Lets say that a test basically downloads a file multiple times. Since a test cannot specify the name of the file to be saved as, depending on the OS, the file name gets chosen. For e.g., in OSX I noticed that, if there's a file name already available in the Maybe we can restrict the files on a session id basis (wherein inside the downloads path that a user specifies the respective files would be downloaded under a folder that represents a session id). That way, for a given session a user would be able to retrieve his/her downloaded files, but that list would perhaps never span across sessions and so the ask of
Should we drive it via a flag wherein we say if set to |
In the normal mode without the grid in the picture, today how do you determine if a file has been downloaded?
|
This is covered by allowing the user to list the downloaded files, and then choosing one of them.
AFAIK, this is not possible because the browser already has a downloads directory and the file download process is between the browser and the host OS, WebDriver is not involved in the process of determining the directory or the file name.
This should be handled in a different issue, I think. |
I agree with @diemol wrt not needing the "get all" feature, user can iterate over the list and download whatever they want in subsequent commands. If we don't want scope to go crazy here, we should also refer to this as a beta feature.
Simplest option is you opt into the whole downloads feature with
Current implementation won't differentiate between something downloaded during the session vs already there... I think if we start the grid with
I think driver quit / session end should clear everything in directory with that session id.
I'm not an expert, don't browsers download files to a temp name and then rename it when complete for just this reason? Or is that just a chrome thing? Either way, waits are out of scope, imo. Up to the user to figure out what and how to wait. |
Also I'm not sure we're following the spec to the extent we need to. I wasn't following the original PR, just saw when it was merged. Can we do this without the query string? Is that valid w3c syntax, even for a custom endpoint? Right now we have this for uploading a file to remote:
We can do this like we do cookies:
I'm not sure we can even put file name like that, so probably want to do it like we do elements:
Obviously return syntax needs to put payload inside |
Actually... Why does the grid need to start with the parameter. Couldn't grid access whatever was passed into the capabilities if we make the end user pass it in the capabilities? |
How can you tell the browser to download the file to a directory with the session as a prefix?
It could be a security hole, which would allow a client to access any file on the remote machine. |
There are several options, none of which is perfect. There is lots of hidden tricks and underwater rocks. All browsers do it differently (even chromium-based). @titusfortner Let's keep it very simple:
To implement #1, there are options in most browsers:
|
Oh, shoot, yeah, you don't know the session id, yet
Can the grid map an id to the session to delete the directory at the end? Or is that especially complicated?
Oh, yeah, can we restrict it to HOME? Theoretically sensitive things don't get stored in HOME? But probably smart to require grid to "enable" the feature somehow. Just don't like both the grid config & end user to have to have the same String for it to work... Can we just hard code a specific directory? Follow Selenium Manager template?
@asolntsev I think we're going the same direction, except the grid shouldn't be changing the options passed in. I think the bindings need to have a separate method that enables it, and can set the full directory with unique identifier; then the grid can parse it to create the directory.
|
Keep in mind that the Node is just passing commands along, it does not know the file name either. When a file is downloaded, it will be placed in the downloads directory and all that concerns the Grid is that a command was executed, it does not even know how to wait for the file to be downloaded. Any attempt to figure that out will be a workaround that needs to be maintained. Yeah, it'd be nice to map the session or any given id to the downloaded file and delete all that when the session is over. But again, the Node does not know the file name to keep that mapping. Plus, that would make the Node more complex that it is already. I think the current approach of having a flag with the directory is good enough because it is an opt in, and it gives the user the possibility to define which directory is the one that will be used (of course, given that the browser uses that as well). I would add the endpoints we want, and let people tell us if this is good enough or not. |
Should this be a separate bug that tracks this ask or is this refactoring also being captured in this same feature request? Also what should the endpoint be ? GET or POST ? |
This currently does not exist. So does it mean that one of the feature requirements would be, to be able to list all files in a directory. Is it still the downloads path that was specified at the time of starting the hub or Based on what @asolntsev is suggesting, is this a unique sub-directory under the downloads-path parent directory that the client bindings will pass via the Options is a separate ask. Also is this something that the Grid is going to inject on its own or is the client bindings expected to add this ?
@titusfortner - Wouldn't this now require a test to be able to know the directory structure of an actual node? It needs to know the OS flavour as well to be able to specify a full directory path because the folder needs to be adherant to the OS specifics [ like its @diemol @titusfortner - Just thinking out loud here. What if we did something like this ?
Some questions that crop up:
|
With the current implementation, the directory needs to exist. Plus, that idea of manipulating options from the bindings is complicated because that needs to be set at the beginning of the session, and you do not know if the user wants to download a file during the session. Seems there will be a lot of magic involved, which I do not want to have in the code. This issue is now growing with comments about different features. I will open separate issues for each one:
|
@krmahadevan |
ProblemWe could implement everything as-is in the bindings. User hard codes a location in both node setup and in options, and everything goes to the same directory.
Easiest Partial SolutionThe 3rd issue is obviously the biggest, and we could potentially address it by adding a UUID subdirectory in the bindings.
Any reason why this wouldn't work? Full Solution IdeasIdeally we could handle everything for the user, the trickiest part is knowing the architecture of node if user isn't providing the path.
Questions:
If the answer to both of those is "no" I think we can probably actually rely on Selenium Manager to solve this for us. |
Not clear for me.
Once I get clarity on this part, I will proceed with working on adding changes and raise a PR. |
Thinking through this again after a month away... I think the grid node is powerful enough to do everything we need without the user having to do much. User shouldn't set location of downloads for security reasons. We create a unique directory associated with a session and that's what the user gets. I think this should work:
|
@titusfortner - I know we are going back and forth on this, but how do we deal with the situation of actually controlling where the downloads happen? I remember @diemol mentioned this earlier as well, WebDriver does not have any control in the actual file download, and so we cannot even determine in which directory was the file downloaded. For some browsers, I believe that can be controlled by the user as called out earlier by @asolntsev in this comment This is what confuses me: We can control the entire download folder management part of it, but we really don't have any say in where exactly does the file get downloaded. So how do we move forward. If we had a Webdriver way of controlling the downloads (similar to how we do the upload of a file) then we would have been in a better place I feel. If we go by the approach that you have cited, then we should at-least send back the download directory to the user via the capabilities object and then maybe have an api at the |
Now that I think of it, even this doesn't make sense, because I think for chromium and firefox browsers we would be setting this at the time of webdriver instantiation via the Options class. So this kind of becomes like a chicken and egg problem :( |
The node should be able to do everything without the user needing to send or receive anything specific. The node creates the directory and modifies the vendor capabilities before it passes it in to the driver. The capabilities that need to be modified will be different for each browser (and won't be possible for some browsers). |
(Based on the conversation that I have had with @titusfortner on this) expectations from this feature are as below:
I will work on getting this done and come back in case i have any specific doubts. |
Yes! Except a couple more suggestions. The node should create a UUID to associate with the session to keep items from different sessions in different subdirectories.
I don't think we need to let the user change this directory. I think we should follow what Selenium Manager is doing and put it in |
Maybe we can break this into smaller PRs? Like, one that changes the current endpoint to avoid using the query string (#11466). Then another one to process the capability and create the directory associated to the session, and from there we continue. |
@diemol Breaking this up into the following and closing this one:
|
Feature and motivation
In #11277 there is now support for moving files from a Grid standalone/node to the client machine.
A flag has to be set when the grid is started, but we should provide a Selenium specific endpoint and method for getting that from inside the bindings themselves.
@asolntsev I see you're tracking it for Selenide already, want to help get it in Selenium? selenide/selenide#1687
Usage example
Or:
Or Maybe:
The text was updated successfully, but these errors were encountered: