Add Python DB API backend #400

khaeru · 2021-02-23T11:42:03Z

Do not merge.

This PR adds ixmp.backend.dbapi, a proof-of-concept/partially-implemented Backend based on the Python DB API 2.0 (PEP 249). The purpose is explorative: to provide an informed estimate of how much & what kind of work would be needed to actually add such a Backend. As of 2021-02-23, no work is ongoing on this PR; it exists only for information and to collect discussion below.

Discussion

Databases

This implementation uses SQLite, because there is a sqlite3 module in the Python standard library, so there are no added dependencies.
Because the Python DB API is an API, the code should be adaptable to other packages/modules.
For instance, cx_Oracle is a DB API 2.0-compliant package that connects to Oracle databases. It handles the same kind of connections we currently use through JDBC (docs).
For another instance, PostgreSQL via psycopg2 or -3.
However, different packages may require different SQL syntax.

Schema

The proof-of-concept does not use the same database schema as the JDBCBackend.
This is deliberate: the goal is to have a minimal working example that satisfies the Backend API, not necessarily a performant or scalable one, and to not be tied to historical choices in the Java code.
It is possible to add a distinct .ixmp.backend.legacy or similar that is tailored to that existing database structure.
- I expect this would end up being a larger task because it would need to mirror some of the complexities of the Java code.
- One of these is the way that data objects are serialized as binary blobs for storage. I don't know anything about this, and am not sure if it can be replicated in Python.
- If that is done, the schema developed for this proof-of-concept could be kept as a second option, or discarded (i.e. this PR not completed).

Remaining work

Some of the things that would need to be done to complete this implementation:

Remaining methods. Of about 44 methods in the Backend interface, there are about 15 methods left to implement.
- At a glance none should be particularly complex. We would need to detect and mirror any complex/unadvertised behaviour of JDBCBackend, esp. for the handling of “meta” and “docs” annotations.
Automatic contents (Disentangle message_ix and ixmp_source message_ix#254). All data objects created automatically in Java for Platform, TimeSeries, and Scenario should be replicated in Python. This includes:
- Things created at instantiation, e.g. units on Platforms.
- Things created at other events, e.g. during a clone or GDX write operation.
GDX file input/output. When using JDBC, this is handled in Java. As we've noticed, there are several Python packages for reading and/or writing GDX, building on the low- and mid-level codes provided by GAMS Ltd. (I wrote one of them myself…).
- Choose a package.
- Implement input and output.
- Probably should move ixmp.backend.io to ixmp.io.excel, ixmp.io.gdx, etc. for better organization.

Extend the test suite. Using pytest, it's possible to run some tests twice, e.g. for different classes. This is already done in the ixmp test suite:

ixmp/ixmp/tests/core/test_timeseries.py

Lines 133 to 148 in 281e84e

    
           # Tests of ixmp.TimeSeries. 
        
           # 
        
           # Since Scenario is a subclass of TimeSeries, all TimeSeries functionality should work 
        
           # exactly the same way on Scenario instances. The *ts* fixture is parametrized to yield 
        
           # both TimeSeries and Scenario objects, so every test is run on each type. 
        
           @pytest.fixture(scope="function", params=[TimeSeries, Scenario]) 
        
           def ts(request, mp): 
        
               """An empty TimeSeries with a temporary name on the test_mp.""" 
        
               # Use a hash of the pytest node ID to avoid exceeding the maximum 
        
               # length for a scenario name 
        
               node = hash(request.node.nodeid.replace("/", " ")) 
        
               # Class of object to yield 
        
               cls = request.param 
        
               yield cls(mp, model=f"test-{node}", scenario="test", version="new")

The same could be done (e.g. by extending the test_mp fixture) to run some or all existing tests on both JDBC- and DatabaseBackend - backed Platforms.
This will help to spot behaviours particular to JDBCBackend, as above.

Streamline. The code currently contains a number of SQL queries that are somewhat similar to one another; these could probably be condensed a lot with some helper methods.

How to review

N/A, discussion only.

PR checklist

Continuous integration checks all ✅
Add or expand tests; coverage checks both ✅
Add, expand, or update documentation.
Update release notes.

codecov · 2021-02-24T05:32:47Z

Codecov Report

Merging #400 (281e84e) into master (33404fd) will increase coverage by 0.0%.
The diff coverage is n/a.

❗ Current head 281e84e differs from pull request most recent head d406b52. Consider uploading reports for the commit d406b52 to get more accurate results

@@          Coverage Diff           @@
##           master    #400   +/-   ##
======================================
  Coverage    96.7%   96.8%           
======================================
  Files          37      37           
  Lines        4247    4248    +1     
======================================
+ Hits         4111    4116    +5     
+ Misses        136     132    -4

Impacted Files	Coverage Δ
ixmp/tests/reporting/test_reporter.py	`99.0% <0.0%> (-0.1%)`	⬇️
ixmp/model/gams.py	`100.0% <0.0%> (ø)`
ixmp/backend/jdbc.py	`96.4% <0.0%> (+0.3%)`	⬆️
ixmp/testing.py	`79.7% <0.0%> (+0.5%)`	⬆️
ixmp/utils.py	`96.0% <0.0%> (+0.9%)`	⬆️

- list_items(), init_item(), item_index(), item_set_elements()

Add a fallback implementation in Scenario.has_solution() to be used when a backend does not define this method.

This code was incorrectly placed in a concrete Backend implementation. It actually applies to all usage of the public Platform and TimeSeries APIs.

khaeru added 26 commits April 6, 2021 18:46

Add .backend.dbapi

57491f9

Register dbapi in BACKENDS

b829683

Handle dbapi configuration in Config.add_platform()

dcdc9a8

Add .backend.test_dbapi

860f794

Implement DatabaseBackend.init()

149087f

Implement 4 DatabaseBackend methods

8503a05

- list_items(), init_item(), item_index(), item_set_elements()

Implement DatabaseBackend.item_get_elements()

ebc3b61

Implement DatabaseBackend.set_as_default() and .check_out()

620e593

Implement DatabaseBackend.set_data()

33bc5bb

Implement DatabaseBackend.get() and .is_default()

d3f618a

Implement DatabaseBackend.run_id()

2445d7b

.base.Backend.has_solution() is optional

1ca8c89

Add a fallback implementation in Scenario.has_solution() to be used when a backend does not define this method.

Add stub for .backend.io.s_write_gdx()

c7b9303

Implement DatabaseBackend.write_file()

5313259

Handle existing default in DatabaseBackend.set_as_default()

6250813

Unpickle data in DatabaseBackend._item_data()

e4ba883

Handle scalar equ & var in DatabaseBackend.item_get_elements()

6583f7c

Add (failing) test of Scenario.solve() using DatabaseBackend

7fe63a1

Implement DatabaseBackend.get_data()

0f3e556

Implement DatabaseBackend.{get,set}_geo()

7c175f1

Add solve=True argument to .testing.populate_test_platform()

a241852

Implement Platform.{get,set}_meta for parameter handling

b179226

Remove JDBCBackend._validate_meta_args and usage

54c7444

This code was incorrectly placed in a concrete Backend implementation. It actually applies to all usage of the public Platform and TimeSeries APIs.

Temporarily copy test_meta contents into test_dbapi

9486865

Make query under DatabaseBackend.get() reusable in a private method

d5c5371

Partly implement DatabaseBackend.{get,remove,set}_meta()

3665e9d

khaeru force-pushed the feature/py-backend branch from 195720b to 3665e9d Compare April 6, 2021 20:50

khaeru added 2 commits April 6, 2021 23:06

Add a UNIQUE constraint to annotations table

b636869

Remove placeholder for non-Backend method "delete_meta"

d406b52

khaeru added enh New features & functionality help welcome labels Jul 18, 2021

khaeru mentioned this pull request Jul 22, 2021

Implement clone() in Python #424

Draft

7 tasks

khaeru marked this pull request as draft August 27, 2021 11:57

khaeru mentioned this pull request May 9, 2022

Add PyomoModel class #420

Draft

6 tasks

khaeru changed the title ~~🚧 Add Python DB API backend~~ Add Python DB API backend Sep 22, 2022

khaeru mentioned this pull request Aug 12, 2024

Add .models.shift_period() iiasa/message_ix#873

Draft

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Python DB API backend #400

Add Python DB API backend #400

khaeru commented Feb 23, 2021 •

edited

Loading

codecov bot commented Feb 24, 2021 •

edited

Loading

	# Tests of ixmp.TimeSeries.
	#
	# Since Scenario is a subclass of TimeSeries, all TimeSeries functionality should work
	# exactly the same way on Scenario instances. The ts fixture is parametrized to yield
	# both TimeSeries and Scenario objects, so every test is run on each type.


	@pytest.fixture(scope="function", params=[TimeSeries, Scenario])
	def ts(request, mp):
	"""An empty TimeSeries with a temporary name on the test_mp."""
	# Use a hash of the pytest node ID to avoid exceeding the maximum
	# length for a scenario name
	node = hash(request.node.nodeid.replace("/", " "))
	# Class of object to yield
	cls = request.param
	yield cls(mp, model=f"test-{node}", scenario="test", version="new")

Add Python DB API backend #400

Are you sure you want to change the base?

Add Python DB API backend #400

Conversation

khaeru commented Feb 23, 2021 • edited Loading

Discussion

Databases

Schema

Remaining work

See also

How to review

PR checklist

codecov bot commented Feb 24, 2021 • edited Loading

Codecov Report

khaeru commented Feb 23, 2021 •

edited

Loading

codecov bot commented Feb 24, 2021 •

edited

Loading