Python3 support #1

kipparker · 2023-03-22T17:18:22Z

Updates the plugin to:

Run ok with python 3
Allow BASE_API_ENDPOINT to be changed using the harvester config field
Fixes an issue with harvest ingestion stopping due to StopIteration error being raised

kipparker · 2023-03-22T17:21:15Z

README.md

+## Usage
+
+Create a new harvest source of type "Socrata" and enter the URL of the Socrata catalog you want to harvest from. The default base url to retrieve catalogues is "https://api.us.socrata.com/api/catalog/v1". You can provide a config object to the harvester to change this base url. For example:
+
+```json
+{
+  "base_url": "https://api.eu.socrata.com/api/catalog/v1"
+}
+```
+
+For local development, run
+
+```bash
+ckan harvester gather-consumer
+ckan harvester fetch-consumer
+```
+
+to see the harvest jobs being processed.


Adds information about the config object

kipparker · 2023-03-22T17:21:32Z

ckanext/socrata/plugin.py

+
+try:
+    from urllib.parse import urlparse
+except ImportError:
+    from urlparse import urlparse
+
+try:
+    from json import JSONDecodeError
+except ImportError:
+    from simplejson.scanner import JSONDecodeError
+
+


Try the python 3 import, fall back to python 2

kipparker · 2023-03-22T17:21:43Z

ckanext/socrata/plugin.py


 from ckan import model
-from ckan.lib.munge import munge_title_to_name, munge_tag
+from ckan.lib.munge import munge_tag


Removed an unused import

kipparker · 2023-03-22T17:22:09Z

ckanext/socrata/plugin.py

@@ -205,11 +213,35 @@ def _build_package_dict(self, context, harvest_object):
            'url': DOWNLOAD_ENDPOINT_TEMPLATE.format(
                domain=urlparse(harvest_object.source.url).hostname,
                resource_id=res['resource']['id']),
-            'format': 'CSV'
+            'format': 'CSV',
+            'name': res['resource']['name']


Avoids all the data files being called "Unnamed Resource"

kipparker · 2023-03-22T17:23:17Z

ckanext/socrata/plugin.py

+
+    def _set_config(self, config_str):
+        self.config = {'base_api_endpoint':BASE_API_ENDPOINT}
+        if config_str:
+            self.config = json.loads(config_str)
+            if 'base_api_endpoint' in self.config:
+                self.base_api_endpoint = self.config['base_api_endpoint']
+
+            log.debug('Using config: %r', self.config)
+
+    def validate_config(self, config):
+        if not config:
+            return config
+
+        config_obj = json.loads(config)
+        if 'base_api_endpoint' in config_obj:
+            try:
+                parsed = urlparse(config_obj['base_api_endpoint'])
+            except AttributeError:
+                raise ValueError('base_api_endpoint must be a valid URL')
+            if not parsed.scheme or not parsed.netloc:
+                raise ValueError('base_api_endpoint must be a valid URL')
+        return config
+


Add configuration for base_api_endpoint (we need to use the .eu version). _set_config is called by the gather stage, validate_config is called when the source is created

kipparker · 2023-03-22T17:23:50Z

ckanext/socrata/plugin.py

@@ -266,8 +298,9 @@ def _page_datasets(domain, batch_number):
                    _request_datasets_from_socrata(domain, batch_number,
                                                   current_offset)
                if datasets is None or len(datasets) == 0:
-                    raise StopIteration
+                    break


I found raising an error was just stopping the process

kipparker · 2023-03-22T17:24:17Z

ckanext/socrata/plugin.py

@@ -396,7 +429,7 @@ def import_stage(self, harvest_object):

        else:
            # We need to explicitly provide a package ID
-            package_dict['id'] = unicode(uuid.uuid4())
+            package_dict['id'] = str(uuid.uuid4())


Added for python 3 compatibility

kipparker added 3 commits March 22, 2023 13:31

Update socrata plugin for python 3, make base url configurable

8a02f50

Name the data files (match the title)

3a51583

remove logging

98d1cf4

kipparker commented Mar 22, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python3 support #1

Python3 support #1

kipparker commented Mar 22, 2023

kipparker Mar 22, 2023

kipparker Mar 22, 2023

kipparker Mar 22, 2023

kipparker Mar 22, 2023

kipparker Mar 22, 2023

kipparker Mar 22, 2023

kipparker Mar 22, 2023

Python3 support #1

Are you sure you want to change the base?

Python3 support #1

Conversation

kipparker commented Mar 22, 2023

kipparker Mar 22, 2023

Choose a reason for hiding this comment

kipparker Mar 22, 2023

Choose a reason for hiding this comment

kipparker Mar 22, 2023

Choose a reason for hiding this comment

kipparker Mar 22, 2023

Choose a reason for hiding this comment

kipparker Mar 22, 2023

Choose a reason for hiding this comment

kipparker Mar 22, 2023

Choose a reason for hiding this comment

kipparker Mar 22, 2023

Choose a reason for hiding this comment