Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Business Logic #1785

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from
Draft

Business Logic #1785

wants to merge 14 commits into from

Conversation

dsblank
Copy link
Member

@dsblank dsblank commented Oct 10, 2024

This PR is an example of speeding up Gramps by creating a "Business Logic" layer. The Business Logic layer is composed of functions largely moved out of filters and rules. The BusinessLogic class is designed to be a Mixin for every database class.

Once the code has been moved out of the filter/rule, then specific implementations can be implemented in the low level database class.

I'll discuss more of this in the discussion group: https://gramps.discourse.group/t/business-logic-via-underloads/6211?u=dsblank

@dsblank dsblank self-assigned this Oct 10, 2024
@dsblank
Copy link
Member Author

dsblank commented Oct 10, 2024

Oh yeah, the low-level code requires a DB upgrade. I put a draft of that code here: https://github.com/gramps-project/gramps/blob/e1cab51a2ab4466cdc09cf856e3365242bb643af/scripts/unpickle.py

@dsblank
Copy link
Member Author

dsblank commented Oct 10, 2024

And every place that saves the pickle, we'll have to update that too. (Then some day we can get rid of the pickled blobs completely).

@Nick-Hall
Copy link
Member

At a quick glance, I really like this approach. I have made a comment about the JSON format in your other PR.

In general discussions about this type of enhancement should be on the gramps-devel mailing list or here on GitHub. Not all of our developers use the Discourse forum.

@dsblank
Copy link
Member Author

dsblank commented Oct 11, 2024

Sounds good!

@dsblank dsblank changed the title Business Logic via underloads Business Logic Oct 12, 2024
@kulath
Copy link
Member

kulath commented Oct 12, 2024

I think moving business logic into the database layer is a truly terrible idea. It seems to run completely contrary to all the ideas we have developed over many years on modularisation, structured programming and abstraction. The database code should be just about accessing the database. Modularisation is all about making the code maintainable. This (unless I have misunderstood, but commit 645b408 seems to be this) makes the code so much more difficult to maintain.

@dsblank
Copy link
Member Author

dsblank commented Oct 13, 2024

Well, Business Logic appears in code everywhere! In views, gramplets of all kinds, and filters/rules. And is replicated in multiple places. I don’t think that is very organized.

The second part is just optionally implementing those in a lower level.

Honestly, the filters I was looking at hadn’t changed in years (probably never). So, I don’t think there would not be much to maintain once written.

Don’t you think the possibility of speeding up gramps is worth exploring?

@kulath
Copy link
Member

kulath commented Oct 13, 2024

I've replied to dsblank's comment on the Discourse forum.

@Nick-Hall
Copy link
Member

We already have functions such as get_raw_person_data(self, handle) to return the "raw" format data without constructing the object hierarchy.

Perhaps we could just add some new ones like extract_person_data(self, handle, json_path_list) to extract given fields. The json_path_list would be a list JSON paths pointing to the fields to return. Then we could use these fast access methods in the business logic outside of the database layer.

How much quicker is using SQLite to extract JSON fields over returning the entire JSON for an object and using python to extract the fields?

@dsblank
Copy link
Member Author

dsblank commented Oct 14, 2024

How much quicker is using SQLite to extract JSON fields over returning the entire JSON for an object and using python to extract the fields?

I think as long as we can select the data and not have to construct the primary objects, any variation will work.

Perhaps we could just add some new ones like extract_person_data(self, handle, json_path_list) to extract given fields. The json_path_list would be a list JSON paths pointing to the fields to return. Then we could use these fast access methods in the business logic outside of the database layer.

That's an interesting approach. One caveat: the DBAPI versions may have slightly diferrent syntax for JSON path extraction. But we can adjust like we do in Posgresql vs Sqlite.

@dsblank
Copy link
Member Author

dsblank commented Oct 14, 2024

But letting the SQL layer pull out the parts will be as fast as possible I think.

@dsblank
Copy link
Member Author

dsblank commented Oct 14, 2024

Here is an interesting version:

  1. The BusinessLogic is stand-alone and contains all of the functions necessary to support the methods defined here. It uses jsonpath_ng to parse the json_path's in a generic implementation of extract_data() (tested some)
  2. The DBAPI layer contains an extract_data() that uses the json_path's in SQL.

@dsblank
Copy link
Member Author

dsblank commented Oct 14, 2024

Oh, this works really well. I'm going to test what the speed difference is between the generic + jsonpath_ng vs dbapi + JSON_EXTRACT. Place your bets.

@dsblank
Copy link
Member Author

dsblank commented Oct 15, 2024

I can probably refactor the generic jsonpath_ng version so that parser doesn't have to parse the same query over and over. But the time differences are pretty stunningly different so far:

Time to select ["$.handle", "$.gramps_id"] for each person in example database (after conversion to json_data).

  • Generic Python implementation: 21.77 seconds
  • SQL + JSON_EXTRACT(): 0.04 seconds
db = open_database("Example")
start = time.time()
for handle in db.get_person_handles():
    data = extract_data_generic(db, "person", handle, ["$.handle", "$.gramps_id"])
    assert data[0] == handle
print("Generic", time.time() - start)
start = time.time()
for handle in db.get_person_handles():
    data = extract_data_sql(db, "person", handle, ["$.handle", "$.gramps_id"])
    assert data[0] == handle
print("SQL", time.time() - start)
db.close()

@dsblank
Copy link
Member Author

dsblank commented Oct 15, 2024

Caching the parser for each json_path in the generic method gives:

  • Generic: 0.14 seconds
  • SQL: 0.05 seconds

@dsblank
Copy link
Member Author

dsblank commented Oct 15, 2024

Since this works so well, and now works on generic database API, the question is: do we want to confine the use to inside BusinessLogic? Or move extract_data() to DbGeneric and use wherever it is useful over the code base?

It does seem like we can use extract_data() wherever we want (so move to DbGeneric) but it still makes organizational sense to have a collection of business logic separate from the DB.

@dsblank
Copy link
Member Author

dsblank commented Oct 15, 2024

There is a subtle incompatibility between the json_extract() function in SQLite and the json_extract() function in MySQL. The MySQL version of json_extract() always returns JSON. The SQLite version of json_extract() only returns JSON if there are two or more PATH arguments (because the result is then a JSON array) or if the single PATH argument references an array or object. In SQLite, if json_extract() has only a single PATH argument and that PATH references a JSON null or a string or a numeric value, then json_extract() returns the corresponding SQL NULL, TEXT, INTEGER, or REAL value.

From https://www.sqlite.org/json1.html#jex

Regarding Postgresql: not sure if it requires a field type of json, jsonb, or text. But there are ways to support jsonpath.

@dsblank
Copy link
Member Author

dsblank commented Oct 18, 2024

Re compatibility:

Beginning with SQLite version 3.38.0 (2022-02-22), the -> and ->> operators are available for extracting subcomponents of JSON. The SQLite implementation of -> and ->> strives to be compatible with both MySQL and PostgreSQL.

From: https://www.sqlite.org/json1.html#jptr

@Nick-Hall
Copy link
Member

The version of sqlite3 in Ubuntu Jammy (22.04LTS) is v3.37.2. What versions do we want to support in Gramps?

For Gtk and python we support versions that are not EOL at the time of release. However SQLite only supports the latest release. It would probably be unreasonable of us to require the latest release. It would be nice to be able to use the -> and ->> operators though.

@emyoulation
Copy link
Contributor

Add a Gramps glossary placeholder for JSON ( https://gramps-project.org/wiki/index.php/Gramps_Glossary#J ) so we have a place to link introductory material about the BLOB to JSON change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants