-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python3 support #1
base: master
Are you sure you want to change the base?
Conversation
## Usage | ||
|
||
Create a new harvest source of type "Socrata" and enter the URL of the Socrata catalog you want to harvest from. The default base url to retrieve catalogues is "https://api.us.socrata.com/api/catalog/v1". You can provide a config object to the harvester to change this base url. For example: | ||
|
||
```json | ||
{ | ||
"base_url": "https://api.eu.socrata.com/api/catalog/v1" | ||
} | ||
``` | ||
|
||
For local development, run | ||
|
||
```bash | ||
ckan harvester gather-consumer | ||
ckan harvester fetch-consumer | ||
``` | ||
|
||
to see the harvest jobs being processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds information about the config object
|
||
try: | ||
from urllib.parse import urlparse | ||
except ImportError: | ||
from urlparse import urlparse | ||
|
||
try: | ||
from json import JSONDecodeError | ||
except ImportError: | ||
from simplejson.scanner import JSONDecodeError | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try the python 3 import, fall back to python 2
|
||
from ckan import model | ||
from ckan.lib.munge import munge_title_to_name, munge_tag | ||
from ckan.lib.munge import munge_tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed an unused import
@@ -205,11 +213,35 @@ def _build_package_dict(self, context, harvest_object): | |||
'url': DOWNLOAD_ENDPOINT_TEMPLATE.format( | |||
domain=urlparse(harvest_object.source.url).hostname, | |||
resource_id=res['resource']['id']), | |||
'format': 'CSV' | |||
'format': 'CSV', | |||
'name': res['resource']['name'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoids all the data files being called "Unnamed Resource"
|
||
def _set_config(self, config_str): | ||
self.config = {'base_api_endpoint':BASE_API_ENDPOINT} | ||
if config_str: | ||
self.config = json.loads(config_str) | ||
if 'base_api_endpoint' in self.config: | ||
self.base_api_endpoint = self.config['base_api_endpoint'] | ||
|
||
log.debug('Using config: %r', self.config) | ||
|
||
def validate_config(self, config): | ||
if not config: | ||
return config | ||
|
||
config_obj = json.loads(config) | ||
if 'base_api_endpoint' in config_obj: | ||
try: | ||
parsed = urlparse(config_obj['base_api_endpoint']) | ||
except AttributeError: | ||
raise ValueError('base_api_endpoint must be a valid URL') | ||
if not parsed.scheme or not parsed.netloc: | ||
raise ValueError('base_api_endpoint must be a valid URL') | ||
return config | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add configuration for base_api_endpoint (we need to use the .eu version). _set_config is called by the gather stage, validate_config is called when the source is created
@@ -266,8 +298,9 @@ def _page_datasets(domain, batch_number): | |||
_request_datasets_from_socrata(domain, batch_number, | |||
current_offset) | |||
if datasets is None or len(datasets) == 0: | |||
raise StopIteration | |||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found raising an error was just stopping the process
@@ -396,7 +429,7 @@ def import_stage(self, harvest_object): | |||
|
|||
else: | |||
# We need to explicitly provide a package ID | |||
package_dict['id'] = unicode(uuid.uuid4()) | |||
package_dict['id'] = str(uuid.uuid4()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added for python 3 compatibility
Updates the plugin to: