Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script or CLI to bulk import resources #1436

Closed
SteadyCadence opened this issue Apr 27, 2017 · 10 comments
Closed

Script or CLI to bulk import resources #1436

SteadyCadence opened this issue Apr 27, 2017 · 10 comments
Assignees

Comments

@SteadyCadence
Copy link

Currently, there is no way to bulk upload resources that are stored locally on a user's computer.

We need a script or command line interface that allows a user to loop through photos/pdfs stored locally and upload them to the associate party/location/relationship.

The focus is on the ability to (1) upload resources to our AWS server (2) painlessly associate that resource to the property endpoint

Eventually, we can bulk upload resources via the QGIS plugin and/or the web interface.

@wonderchook
Copy link
Contributor

Who would you see as running this? Would this be specific to the programs team?

I'm thinking that for our users being able to drag and drop bulk upload through the web UI would be more useful for them.

@alukach can you outline what you were thinking here?

@clash99
Copy link
Contributor

clash99 commented Apr 27, 2017

Are we thinking a dropzone type of area? (The user would still need to be on the correct party/location/relationship. After loading to the project library, a user can associate them with other entities already.)

Something like this where you can load multiple resources at once? (this is old but an example of what I'm trying to describe)
screenshot 2017-04-27 16 52 31

@dpalomino
Copy link

Who would you see as running this? Would this be specific to the programs team?

I'm thinking that for our users being able to drag and drop bulk upload through the web UI would be more useful for them.

@alukach can you outline what you were thinking here?

Yes, there is a need for a specific partner that has potentially hundreds of thousands of resources to be uploaded, so we need a automatic way to do this. It would be one of us running the script, not the partner.

Thanks in advance @alukach, happy to catch up if needed!

Something like this where you can load multiple resources at once? (this is old but an example of what I'm trying to describe)

Thanks @clash99! I think this will be very useful for a less "extreme" case where the number of resources to upload/import would be less...

@wonderchook
Copy link
Contributor

@clash99 I was thinking something like that. I think we are talking about two different features potentialy and I want to make sure we build something that will be useful.

@wonderchook
Copy link
Contributor

@dpalomino we need to decide if this is a one-off vs. a reusable component. This is also something that could be outsourced since it is separate from the platform.

@alukach
Copy link
Contributor

alukach commented Apr 28, 2017

Agreed that there are two features being discussed. For the sake of discussion, let's operate with the mentality that we're talking about 1000+ resource files.

Drag-n-Drop Solution

As @clash99 mentioned, the user would still need to be on the correct party/location/relationship view. I'm still learning the UI and workflow, can people comment on how arduous this would be for 1k resources? If there are few parties/locations/relationships, then this is no problem, the user should be able to select all 1k files and drop them and the view should handle them in reasonable batches (I'm guessing the user would have to keep the view open and walk away for a bit). However, if the resources are either a) organized in separate dirs on their machine; or b) affiliated with different parties/locations/relationships, then this will be pretty laborious and would better be done with a custom script.

Overall, I think this seems like a sensible feature and looks like it's already being thought-out be @clash99.

Scripting Solution

Working with @SteadyCadence, I've come to the understanding that there isn't a standardized structure for partners to follow when organizing/storing their data or a requirement for structure when they submit their data to a member of the Programs team. Additionally, I don't know if it would be realistic to assume people would comply with a requirement as everyone organizes their files differently and people use different formats/filetypes. For that reason, I don't think there will ever be a one-size-fits-all script/CLI to handle all formats and filetypes.

What I'd recommend is throwing together a Cadasta SDK. It would contain a collection of helper-components that a member of Programs (or a tech-savvy partner) should be able to easily string together to make a workflow that fits their needs and data. This could create utilities for:

  • logging-in, producing a session object that could make other authenticated API requests
  • uploading resources, with Django-buckets this is a multi-step process (request signed URL from API, send file to S3, possibly inform API of new file on S3), this flow could be abstracted into a single function
  • converting data to API-friendly formats (e.g. GeoJSON/Shapefiles/GPX files to WKT)
  • optimization helpers such as abstractions for threading or multiprocessing

Additionally, the SDK could also store some examples of common workflows (e.g. login, lookup project ID by name, crawl nested directories looking for resources, send resources to API), providing a few common boilerplates that a user could use as a starting point for their needs.

If there were some really common operations, the SDK could also include a few CLI tools that would be built on the SDK and would be installed when the SDK is installed.

Finally, the SDK could also be used by members of development when building micro-services that interact with our system (e.g. tooling for importing or exporting data).

I've done this for other APIs in the past and did a bit of experimentation with building some tooling for the Cadasta API a couple of weeks ago, it wouldn't be too difficult to move it to a structured/versioned Python package and to expand it for file-upload-support.

@wonderchook
Copy link
Contributor

created a new ticket #1440 to track the GUI version

@dpalomino
Copy link

Thanks a lot @alukach and @wonderchook for the input! I really like the SDK approach.

From the utilities listed the most urgent one would be the one for uploading resources (feel free @SteadyCadence to comment if I'm wrong). And then the conversion to WKT probably.

Let's discuss how to best approach the short term partner's need that @SteadyCadence was mentioning (either with an ad-hoc script or with the SDK).

@dpalomino
Copy link

hey @alukach. Just moving this to next sprint to continue the support of partners' activities related to this and help @SteadyCadence while she is on the field...

@dpalomino
Copy link

I think we are good to close this issue for now. We can open new issues for specific tasks related to more partner support when importing records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants