Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python3 support #1

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Python3 support #1

wants to merge 3 commits into from

Conversation

kipparker
Copy link
Collaborator

Updates the plugin to:

  • Run ok with python 3
  • Allow BASE_API_ENDPOINT to be changed using the harvester config field
  • Fixes an issue with harvest ingestion stopping due to StopIteration error being raised

Comment on lines +21 to +38
## Usage

Create a new harvest source of type "Socrata" and enter the URL of the Socrata catalog you want to harvest from. The default base url to retrieve catalogues is "https://api.us.socrata.com/api/catalog/v1". You can provide a config object to the harvester to change this base url. For example:

```json
{
"base_url": "https://api.eu.socrata.com/api/catalog/v1"
}
```

For local development, run

```bash
ckan harvester gather-consumer
ckan harvester fetch-consumer
```

to see the harvest jobs being processed.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds information about the config object

Comment on lines +7 to +18

try:
from urllib.parse import urlparse
except ImportError:
from urlparse import urlparse

try:
from json import JSONDecodeError
except ImportError:
from simplejson.scanner import JSONDecodeError


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try the python 3 import, fall back to python 2


from ckan import model
from ckan.lib.munge import munge_title_to_name, munge_tag
from ckan.lib.munge import munge_tag
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed an unused import

@@ -205,11 +213,35 @@ def _build_package_dict(self, context, harvest_object):
'url': DOWNLOAD_ENDPOINT_TEMPLATE.format(
domain=urlparse(harvest_object.source.url).hostname,
resource_id=res['resource']['id']),
'format': 'CSV'
'format': 'CSV',
'name': res['resource']['name']
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoids all the data files being called "Unnamed Resource"

Comment on lines +221 to +244

def _set_config(self, config_str):
self.config = {'base_api_endpoint':BASE_API_ENDPOINT}
if config_str:
self.config = json.loads(config_str)
if 'base_api_endpoint' in self.config:
self.base_api_endpoint = self.config['base_api_endpoint']

log.debug('Using config: %r', self.config)

def validate_config(self, config):
if not config:
return config

config_obj = json.loads(config)
if 'base_api_endpoint' in config_obj:
try:
parsed = urlparse(config_obj['base_api_endpoint'])
except AttributeError:
raise ValueError('base_api_endpoint must be a valid URL')
if not parsed.scheme or not parsed.netloc:
raise ValueError('base_api_endpoint must be a valid URL')
return config

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add configuration for base_api_endpoint (we need to use the .eu version). _set_config is called by the gather stage, validate_config is called when the source is created

@@ -266,8 +298,9 @@ def _page_datasets(domain, batch_number):
_request_datasets_from_socrata(domain, batch_number,
current_offset)
if datasets is None or len(datasets) == 0:
raise StopIteration
break
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found raising an error was just stopping the process

@@ -396,7 +429,7 @@ def import_stage(self, harvest_object):

else:
# We need to explicitly provide a package ID
package_dict['id'] = unicode(uuid.uuid4())
package_dict['id'] = str(uuid.uuid4())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added for python 3 compatibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant