Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Sqlite backend #935

Closed
wants to merge 3 commits into from
Closed

Conversation

mqus
Copy link

@mqus mqus commented Nov 30, 2020

Description

I'm a "Host everything on a raspberry pi baremetal" kind of guy and already hosted the 1.5 sync server. I have a new box and wanted to try out the new sync server but my favourite sqlite backend was not supported yet. So I gave implementing it a try this weekend ;) I already knew some rust and have worked with diesel, so this was not too hard.

These commits basically copy the mysql backend to support sqlite databases.
The migrations directory was split to include a mysql and a sqlite subdirectory, the sqlite subdirectory only contains a single migration, that creates the new sqlite database. This should be ok since there are no syncstorage instances with the sqlite backend in the wild ;)

I also copied the entire db::mysql module and retrofitted everything on a first go through. The new backend can be chosen by using a sqlite:path/file.db URL (theoretically, sqlite::memory: is accepted but will not work since migrations and other queries use different connections for now) I did not copy the diesel_ext module, since that one is highly mysql specific. But I also don't know if the associated spots in the code need any replacement (see comments).

I had to alter the queries/migrations a bit. The main issues were column types (mostly superficial since sqlite understands them as synonyms), Index creation and Upsert statements, especially the last are handled a bit differently in sqlite and needed some tweaking.

I then copied the circleci build-and-test job for testing the same things with an sqlite backend.

The good news: All tests and checks run through successfully (save for one dependency check which is not my fault). I also altered as little as possible in the existing code to ensure that no breakage occurs.

The bad news: There is now much duplicated code, some of which might be irrelevant to the sqlite backend, since I am not all that familiar with the codebase and the mysql/diesel limitations that were worked around. I am not sure how to proceed here as you (the maintainers) surely have your own thoughts and opinions on how to restructure this code. Feel free to leave comments :)

I propose to clean up this new backend code a bit more (but only on the sqlite side) and refactor both backends in another PR. This should make reviewing easier (simply look over the diff of the mysql and the sqlite dirs) and allow for a clean cut between f eature and refactoring.

I also haven't actually tested if this really can sync my firefox data. I will follow up on this in the coming days.

  • Manually test the new backend [with different firefox clients]
  • Clean up some more

Testing

Compile as usual and set SYNC_DATABASE_URL to sqlite:syncstorage.db when running or executing tests

Issue(s)

Issue: #498 Support sqlite/postgresql backend

.select(user_collections::modified)
.filter(user_collections::user_id.eq(user_id))
.filter(user_collections::collection_id.eq(collection_id))
// .lock_in_share_mode()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is necessary in sqlite or how to implement it there

.select(user_collections::modified)
.filter(user_collections::user_id.eq(user_id))
.filter(user_collections::collection_id.eq(collection_id))
// .for_update()
Copy link
Author

@mqus mqus Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I don't know how to make this work in sqlite or if it is needed.

As far as I understood it (from a very brief read) this works in tandem with the line referenced above. Is this right?

fn check_sync(&self) -> Result<results::Check> {
// Check if querying works
sql_query("SELECT 1").execute(&self.conn)?;
Ok(true)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqlite does not support SHOW STATUS or similar queries. Since it seemed to just check if the database is reachable, I chose this way. But I'm not entirely sure.

@mqus
Copy link
Author

mqus commented Nov 30, 2020

I just noticed that I didn't understand the relation between this service and https://github.com/mozilla-services/syncserver at all (I thought this is a successor to the whole thing and not just a part), but this backend seems to be needed anyway in the future so 🤷‍♂️ . I'll be happy to help out ;-)

@jrconlin
Copy link
Member

jrconlin commented Dec 1, 2020

First off, thank you for digging into this.

When we originally built out the mysql support side, we had targeted systems like mariadb, since that tends to be fairly widely available and runs on lots of systems. One of the other engineers worked a fair bit on this and can probably offer a lot of insight.

A lot of the locking issues may still be needed because of multiple devices. For instance, someone could have a desktop, laptop and mobile device that all need to sync, and those could request a sync operation at the same time. I know sqlite tends to be reasonably good at that internally and does a table lock during a write, but there can be multiple operations at play and it's generally good to have a known start and stop for a given transaction.

There might be a few other things we could do to reduce the differences between the code. For now, it's probably just best to make sure that the end to end tests pass with it. Those test a lot of the API and function.

As for pointing Android to a private server, it's not super obvious, and unfortunately the team responsible hasn't talked about it a lot for a few reasons. The way to set things up is to

  1. log out of Sync.
  2. tap the 3 vertical dot menu (apparently also called a shish-kebab menu)
  3. Tap "Settings"
  4. Scroll down to "About Firefox" and Tap it.
  5. Tap the Firefox Logo 5 times. (You'll see a pop up at the bottom that tells you that you're X number of taps away from developer mode)
  6. Go back to the settings menu, and you'll see two new settings, "Custom Firefox Account Server" and "Custom Sync server".

Those are how you point to your own server.

We're kind of slammed with a bunch of stuff, but we'll try to look over your code as you progress.

@jrconlin jrconlin marked this pull request as draft December 1, 2020 04:02
@jrconlin
Copy link
Member

jrconlin commented Dec 1, 2020

Also going to mark this as a draft for now.

@fzzzy
Copy link
Contributor

fzzzy commented Dec 2, 2020

This is super cool! Thanks!

syncstorage-rs actually will eventually contain a complete replacement for syncserver, as syncstorage-rs will contain both a tokenserver and a syncstorage server eventually.

@mqus
Copy link
Author

mqus commented Jan 2, 2021

I finally had time and managed to successfully test the server (and sync) on Firefox Desktop (84) on Linux (two different machines with one having a pretty big history). A Windows Machine "playing back a "backup" "(sync, close firefox, remove data folder, open firefox, login again, sync again) also worked.

I also have fennec (68.5.0) on my phone (Android 4.4.2, fenix only works on >=5), but it didn't really work there. Pressing "Sync now" in the sync settings starts and stops the sync ca. 1 second afterwards, with "Last Sync:" staying on "Never". I also can't access any synced data from either side. The API calls didn't even get through my reverse proxy somehow(no logs for the domain and the relevant time window)... I am using the default account server so either my reverse proxy silently drops packets (I had some problems with SNI with firefox sync in the past, can't really remember... I'll try to do tcpdump) or there are some other incompatibilities... Can I debug this somehow? Do you have some pointers? Either way, this should not be an issue with the sync server or my changes :)

I'll try to test fenix next (maybe via emulator... let's see 😉 )

@mqus
Copy link
Author

mqus commented Jan 3, 2021

So I checked, and while there was the issue with sni, there was also the issue of mismatching ciphersuites, which took me a while to fix completely (was entirely a configuration issue with my reverse proxy caddy). Now, syncing with fennec also works 🎉 .

@jrconlin
Copy link
Member

jrconlin commented Jan 4, 2021

Oh, neat! Silly thought, but do you think that writing up a quick thing about the mismatching ciphersuites might be useful for someone else? It's probably uncommon, but you hit it, and it might offer a few clues to someone else.

@mqus
Copy link
Author

mqus commented Jan 4, 2021

Well, the ciphersuite issue is most probably a result of the old android version paired with a reverse proxy geared for security by default and not maximum compatibility... I think I wouldn't have had those problems with nginx... Maybe I'll write it up sometime :)

@mqus
Copy link
Author

mqus commented Feb 6, 2021

So... I added another desktop client (the third or fourth) and today tried fenix on an android emulator. Fenix works fine, just sending tabs to it only seems to register when I manually sync the fenix client (or after some time). But I haven't tested this extensively.

During that I also checked the logs for the syncstorage app and noticed that it threw some errors during the initial sync of the third desktop client (regular stable firefox on linux) a few days prior, that I didn't notice because the sync went nicely anyway. Here are the logs (the prefixes are from systemd since I'm managing syncstorage through it):

Feb 01 00:31:22 basil systemd[1]: Started SyncStorage - Mozilla Sync Storage built with Rust.
Feb 01 00:31:22 basil syncstorage[72120]: Feb 01 00:31:22.745 INFO Starting 2 workers
Feb 01 00:31:22 basil syncstorage[72120]: Feb 01 00:31:22.746 INFO Starting "actix-web-service-127.0.0.1:3014" service on 127.0.0.1:3014
Feb 01 00:31:22 basil syncstorage[72120]: Feb 01 00:31:22.746 INFO Server running on http://127.0.0.1:3014 (sqlite) No quota
Feb 03 01:30:46 basil syncstorage[72120]: Feb 03 01:30:46.891 ERRO Internal Server Error: ApiError { inner:
Feb 03 01:30:46 basil syncstorage[72120]: A database error occurred: database is locked, status: 500 }
Feb 03 01:30:46 basil syncstorage[72120]: Feb 03 01:30:46.900 ERRO Internal Server Error: ApiError { inner:
Feb 03 01:30:46 basil syncstorage[72120]: A database error occurred: database is locked, status: 500 }
Feb 03 01:30:49 basil syncstorage[72120]: Feb 03 01:30:49.346 ERRO Internal Server Error: ApiError { inner:
Feb 03 01:30:49 basil syncstorage[72120]: A database error occurred: database is locked, status: 500 }
Feb 03 01:30:49 basil syncstorage[72120]: Feb 03 01:30:49.499 ERRO Internal Server Error: ApiError { inner:
Feb 03 01:30:49 basil syncstorage[72120]: A database error occurred: database is locked, status: 500 }
Feb 03 01:30:50 basil syncstorage[72120]: Feb 03 01:30:50.046 ERRO Internal Server Error: ApiError { inner:
Feb 03 01:30:50 basil syncstorage[72120]: A database error occurred: database is locked, status: 500 }
Feb 03 01:30:50 basil syncstorage[72120]: Feb 03 01:30:50.064 ERRO Internal Server Error: ApiError { inner:
Feb 03 01:30:50 basil syncstorage[72120]: A database error occurred: database is locked, status: 500 }
Feb 03 01:40:05 basil syncstorage[72120]: Feb 03 01:40:05.525 ERRO Internal Server Error: ApiError { inner:
Feb 03 01:40:05 basil syncstorage[72120]: A database error occurred: database is locked, status: 500 }

I don't really know what happened there and as a user I can't see any issues, but I think this does not happen in the other backends. I know that sqlite does lock the whole database when doing certain things but I have no clue where those issues occur and how to solve them. The locations I already commented on above seem to be related.

I assume that his works anyway since firefox or the sync server just tries to fetch those objects again later, so in a homebox context this is not a major issue and sqlite is not geared for large usecases anyway.

@jrconlin
Copy link
Member

jrconlin commented Feb 6, 2021

Thanks!

The locking issue is something we've seen in the past with SQLite. SQLite has a VERY simple locking mechanism which works out to be basically "writes exclusively lock the file". This can be a problem if you've got multiple threads with writes, as can happen with sync. I didn't work on the old SQLite code, so I can't say with authority, but you might be able to address this by running syncstorage-rs in single thread mode (e.g. set the environment variable RUST_THREADS=1). That will make sync a fair bit slower overall, but since it's just you syncing to the file, you may not notice anything. As you noticed, sync retries when it gets errors.

There may be other tricks you could use as well, like splitting the tables into multiple files (eek!), or there could be some other bits of clever in how you set the locking on the tables to limit the locks as much as possible. I've only done stuff with SQLite for toy projects, so I don't really have a whole lot of in-depth experience with it. (That's also why I switched my installs of HomeAssistant and PiHole to use a SQL database on a different machine, well, that and I didn't want to deal with burning through SD cards.)

Also, for whatever reason, github is SUPER PICKY about signatures on commits. Things have been a bit hectic here, so we've not yet had a chance to land your patch, but we will. I may duplicate it with full credit to you in the comment and link to this patch.

@mqus
Copy link
Author

mqus commented Feb 11, 2021

Re locking issues:
I think limiting syncstorage to a single thread seems to be the most reasonable solution, although there also seem to be more solutions, see https://stackoverflow.com/questions/57123453/how-to-use-diesel-with-sqlite-connections-and-avoid-database-is-locked-type-of . But personally, I'm also not an expert on this and I think this is better suited for exploring in a follow-up PR.

Also, for whatever reason, github is SUPER PICKY about signatures on commits. Things have been a bit hectic here, so we've not yet had a chance to land your patch, but we will. I may duplicate it with full credit to you in the comment and link to this patch.

sure, no pressure :)

@khimaros
Copy link

khimaros commented Jun 6, 2021

@mqus -- is this still something you're hoping to merge? i'm very interested in using it.

@mqus
Copy link
Author

mqus commented Jun 6, 2021

@mqus -- is this still something you're hoping to merge? i'm very interested in using it.

Well, it's a PR, what do you think? :D But I do have stopped working on it atm and will start again as soon as I get reviews :)

mqus added 3 commits July 3, 2021 20:48
- Remove diesel_ext, which only contained barely used extensions for mysql
- Merge migrations for sqlite
- Alter queries that differ in sqlite (mainly UPSERT statements)
@mqus mqus force-pushed the sqlite-support branch from a365a73 to 9ec4066 Compare July 3, 2021 18:50
@mqus
Copy link
Author

mqus commented Jul 3, 2021

rebased on the current master (CI fails because I haven't paid for circleCI)

@Eragonfr
Copy link
Contributor

Eragonfr commented Jan 28, 2024

As this PR didn't got updated since 2021 I tried to update it to the current codebase. I still have a hard time getting the token server running with SQLite, but I seem to have a working SQLite backend for syncstorage.
Here -> Eragonfr/syncstorage-rs:syncstorage-sqlite
My commits are a mess, and the code probably needs to be rewritten to have more shared code between the MySQL backend and SQLite backend, as they have a lot of common functions.
I don't know if I should open an other PR, or if there is a way to reuse this PR, because most of the SQLite specific code has been taken from @mqus work. I only made it work with the current codebase.

@Eragonfr
Copy link
Contributor

I'll open a new PR with my version of the code.

@mqus mqus closed this May 2, 2024
@mqus
Copy link
Author

mqus commented May 2, 2024

Closing since there is an alternative pr now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants