Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python DB API backend #400

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from
Draft

Add Python DB API backend #400

wants to merge 28 commits into from

Conversation

khaeru
Copy link
Member

@khaeru khaeru commented Feb 23, 2021

Do not merge.

This PR adds ixmp.backend.dbapi, a proof-of-concept/partially-implemented Backend based on the Python DB API 2.0 (PEP 249). The purpose is explorative: to provide an informed estimate of how much & what kind of work would be needed to actually add such a Backend. As of 2021-02-23, no work is ongoing on this PR; it exists only for information and to collect discussion below.

Discussion

Databases

  • This implementation uses SQLite, because there is a sqlite3 module in the Python standard library, so there are no added dependencies.
  • Because the Python DB API is an API, the code should be adaptable to other packages/modules.
  • For instance, cx_Oracle is a DB API 2.0-compliant package that connects to Oracle databases. It handles the same kind of connections we currently use through JDBC (docs).
  • For another instance, PostgreSQL via psycopg2 or -3.
  • However, different packages may require different SQL syntax.

Schema

  • The proof-of-concept does not use the same database schema as the JDBCBackend.
  • This is deliberate: the goal is to have a minimal working example that satisfies the Backend API, not necessarily a performant or scalable one, and to not be tied to historical choices in the Java code.
  • It is possible to add a distinct .ixmp.backend.legacy or similar that is tailored to that existing database structure.
    • I expect this would end up being a larger task because it would need to mirror some of the complexities of the Java code.
    • One of these is the way that data objects are serialized as binary blobs for storage. I don't know anything about this, and am not sure if it can be replicated in Python.
    • If that is done, the schema developed for this proof-of-concept could be kept as a second option, or discarded (i.e. this PR not completed).

Remaining work

Some of the things that would need to be done to complete this implementation:

  • Remaining methods. Of about 44 methods in the Backend interface, there are about 15 methods left to implement.
    • At a glance none should be particularly complex. We would need to detect and mirror any complex/unadvertised behaviour of JDBCBackend, esp. for the handling of “meta” and “docs” annotations.
  • Automatic contents (Disentangle message_ix and ixmp_source message_ix#254). All data objects created automatically in Java for Platform, TimeSeries, and Scenario should be replicated in Python. This includes:
    • Things created at instantiation, e.g. units on Platforms.
    • Things created at other events, e.g. during a clone or GDX write operation.
  • GDX file input/output. When using JDBC, this is handled in Java. As we've noticed, there are several Python packages for reading and/or writing GDX, building on the low- and mid-level codes provided by GAMS Ltd. (I wrote one of them myself…).
    • Choose a package.
    • Implement input and output.
    • Probably should move ixmp.backend.io to ixmp.io.excel, ixmp.io.gdx, etc. for better organization.
  • Extend the test suite. Using pytest, it's possible to run some tests twice, e.g. for different classes. This is already done in the ixmp test suite:
    # Tests of ixmp.TimeSeries.
    #
    # Since Scenario is a subclass of TimeSeries, all TimeSeries functionality should work
    # exactly the same way on Scenario instances. The *ts* fixture is parametrized to yield
    # both TimeSeries and Scenario objects, so every test is run on each type.
    @pytest.fixture(scope="function", params=[TimeSeries, Scenario])
    def ts(request, mp):
    """An empty TimeSeries with a temporary name on the test_mp."""
    # Use a hash of the pytest node ID to avoid exceeding the maximum
    # length for a scenario name
    node = hash(request.node.nodeid.replace("/", " "))
    # Class of object to yield
    cls = request.param
    yield cls(mp, model=f"test-{node}", scenario="test", version="new")
    • The same could be done (e.g. by extending the test_mp fixture) to run some or all existing tests on both JDBC- and DatabaseBackend - backed Platforms.
    • This will help to spot behaviours particular to JDBCBackend, as above.
  • Streamline. The code currently contains a number of SQL queries that are somewhat similar to one another; these could probably be condensed a lot with some helper methods.

See also

  • Branch backend-xarray, an earlier experiment in a similar direction.

How to review

N/A, discussion only.

PR checklist

  • Continuous integration checks all ✅
  • Add or expand tests; coverage checks both ✅
  • Add, expand, or update documentation.
  • Update release notes.

@codecov
Copy link

codecov bot commented Feb 24, 2021

Codecov Report

Merging #400 (281e84e) into master (33404fd) will increase coverage by 0.0%.
The diff coverage is n/a.

❗ Current head 281e84e differs from pull request most recent head d406b52. Consider uploading reports for the commit d406b52 to get more accurate results

@@          Coverage Diff           @@
##           master    #400   +/-   ##
======================================
  Coverage    96.7%   96.8%           
======================================
  Files          37      37           
  Lines        4247    4248    +1     
======================================
+ Hits         4111    4116    +5     
+ Misses        136     132    -4     
Impacted Files Coverage Δ
ixmp/tests/reporting/test_reporter.py 99.0% <0.0%> (-0.1%) ⬇️
ixmp/model/gams.py 100.0% <0.0%> (ø)
ixmp/backend/jdbc.py 96.4% <0.0%> (+0.3%) ⬆️
ixmp/testing.py 79.7% <0.0%> (+0.5%) ⬆️
ixmp/utils.py 96.0% <0.0%> (+0.9%) ⬆️

khaeru added 26 commits April 6, 2021 18:46
- list_items(), init_item(), item_index(), item_set_elements()
Add a fallback implementation in Scenario.has_solution() to be used when
a backend does not define this method.
This code was incorrectly placed in a concrete Backend implementation.
It actually applies to all usage of the public Platform and TimeSeries
APIs.
@khaeru khaeru force-pushed the feature/py-backend branch from 195720b to 3665e9d Compare April 6, 2021 20:50
@khaeru khaeru added enh New features & functionality help welcome labels Jul 18, 2021
@khaeru khaeru mentioned this pull request Jul 22, 2021
7 tasks
@khaeru khaeru marked this pull request as draft August 27, 2021 11:57
@khaeru khaeru mentioned this pull request May 9, 2022
6 tasks
@khaeru khaeru changed the title 🚧 Add Python DB API backend Add Python DB API backend Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enh New features & functionality help welcome
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant