Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the DB system to be actually... good. #194

Closed
9 tasks done
embolalia opened this issue Feb 28, 2013 · 21 comments
Closed
9 tasks done

Rewrite the DB system to be actually... good. #194

embolalia opened this issue Feb 28, 2013 · 21 comments

Comments

@embolalia
Copy link
Contributor

  • Write new database system
  • Write tests for new database system
  • Document new database functions
  • Add new API functions to 4.x db
  • Add deprecation warnings on old DB functions and MySQL/Postgres db creation.
  • Document how to migrate modules from old db to new
  • Automate migration of old db data to new db
  • Create SQLAlchemy schema representation
  • Document how to use SQLAlchemy with new db

Between using Django's DB wrapper system for another project and trying to make rss work with Willie's DB, I've realized Willie's DB wrapper is kinda shit. It's great for preferences stuff, but it does not scale well. So at some point, it should be rewritten to include such illustrious features as the ability to return more than one row and perform more complicated SELECT statements.

@embolalia
Copy link
Contributor Author

As a side-note: Some form of functionality should be added to handle Nick objects in a meaningful way.

@embolalia
Copy link
Contributor Author

I've been discussing how to track when users have multiple nicks, for the purposes of retrieving settings etc., in #tech with Kylie. It looks like the best way is to have a Nicks table, with an id, slug, and canonical Nick. When a user wants to add an alias, she can .addalias newnick while her current nick is one in the table. A new row, with the id of the current nick and slug and canonical value of the new nick will be created. Config options should allow .addalias to be disabled, or made admin-only (admin use of .addalias would be done with two arguments, the current and new alias).

When an unregistered user sets a setting, or a new Nick entry is otherwise needed, a registration entry should be created automatically.

A .unregister command should be made available to delete the current alias. Possibly, an admin-only .merge command could be used to associate two groups of nicks to each other.

@embolalia
Copy link
Contributor Author

Using the DB system for reminders/tells has been suggested, and I think it's a good idea. This use case is worth considering when designing the new API. Tells currently store from, to, verb (ask or tell), time sent, and message, while reminders store timestamp at which to remind, channel to remind to, whom to remind, and the message. I think that would probably best be served by two tables.

@ghost
Copy link

ghost commented Apr 16, 2013

channels and users in their own tables! this allows for better normalization

@triplem
Copy link

triplem commented Nov 23, 2013

Any chance that this change will include PostgreSQL support?

@tyrope
Copy link

tyrope commented Nov 23, 2013

I personally have no knowledge of PostgreSQL, @triplem, but if you're willing to help I'll make sure to poke you once the db is properly setup for both sqlite and MySQL so we can take a look.

@triplem
Copy link

triplem commented Nov 23, 2013

Yeah, why not. Probably something like SQLAlchemy would be another choice? This provides already a very good abstraction layer and modules could use this as well for additional database storeage.

@embolalia
Copy link
Contributor Author

It's been a long-standing goal not to introduce dependencies into any API-level functionality - including the database.

As far as which backends we'll be supporting, SQLite is the first priority. Really, given the scale and nature of the data we're dealing with here, anything else is overkill. Supporting MySQL (or rather MariaDB, since the plurality of our devs are on Fedora) will come second, after SQLite is stable. Anything else will be "patches welcome".

@embolalia
Copy link
Contributor Author

I'm leaning toward dropping the idea of Willie's DB being a full query-writing engine, since it's never done that terribly well and will never do it as well as SQLAlchemy. Instead, what I'm thinking is having willie.db provide a wrapper only around a key-value store (or rather two) for channel and user settings, with only the most basic assistance for modules in writing queries. (Basically, what you see in RSS now - a way to grab a connection to a shared DB, so that modules don't all need their own DB. Plus a bit better abstraction of substituting into queries, but that's about it.)

I say "key-value store", but I'm not suggesting we use MongoDB, even if it is web-scale.

Basically, we'd have a (My|Postgre)SQL(ite) db with two tables (id, slug, nick) and (id, key, value), for users; and two tables (id, slug, channel) and (id, key, value) for channels. RSS and other modules would make, write SQL for, and do what they need with their own tables.

@elad661
Copy link
Contributor

elad661 commented Dec 15, 2013

Why not a simple json file then? it'll eliminate 3rd party deps.

-Elad Alfassa.

@ari-koivula
Copy link
Contributor

You mean, like a dict? =/

@embolalia
Copy link
Contributor Author

There are a few reasons why I still think an actual DB is a good idea:

  1. Eliminates or reduces the data file hell from jenni. Every module with cross-session storage had its own file for it, in its own format. Some were SQLite, some were one kind of flat file, some were another. And the problem of finding, ensuring permissions on, and parsing these files was solved over and over again.
  2. Encourages cross-module sharing of data. Stuff like time zone settings could easily have been kept to just the clock module, and without a pre-defined way to share that, it likely would have.
  3. Enables third-party access to data. I haven't seen this actually happen yet, but I have always planned to do so. Allowing people to change their settings for channel statistics from within IRC is one example. This is much easier to do with a proper DB than with a flat file. (If you think SQLite is bad with concurrent users, try doing it with a flat file…)

@embolalia embolalia added this to the 5.0.0 milestone Feb 23, 2014
@embolalia
Copy link
Contributor Author

I've started poking at this (see 6f63f7b). I'm following what was said above about a simple key/value store for nicks and channels. Query writing will be dumped entirely, save for the k/v stuff, with just a shell method to do simple executions, but I haven't looked at that too in-depth yet.

@johnlage
Copy link
Contributor

johnlage commented Aug 1, 2014

Would modules still be able to create their own tables and manage them how they wish? Also if not would their be an alternative if it is not Nick based?

@embolalia
Copy link
Contributor Author

You'll still be able to do what you want with the DB, you'll just have to do it in SQL. (Or bring in something like SQLAlchemy. I'll probably write something up on how to do that.)

@johnlage
Copy link
Contributor

johnlage commented Aug 1, 2014

Is there a way that you could maybe have the old DB system in a separate
module for one or 2 releases? (so that we could still use the APIs ​and
update our modules over time)

John Lage

On Fri, Aug 1, 2014 at 7:28 AM, Edward Powell [email protected]
wrote:

You'll still be able to do what you want with the DB, you'll just have to
do it in SQL. (Or bring in something like SQLAlchemy. I'll probably write
something up on how to do that.)


Reply to this email directly or view it on GitHub
#194 (comment).

@tyrope
Copy link

tyrope commented Aug 1, 2014

Why would we do that? It's a major version increase for a reason. API should be expected to be incompatible with 4.x

@johnlage
Copy link
Contributor

johnlage commented Aug 1, 2014

Good Point, nevermind previous comment from me.

John Lage

On Fri, Aug 1, 2014 at 10:11 AM, Dimitri Molenaars <[email protected]

wrote:

Why would we do that? It's a major version increase for a reason. API
should be expected to be incompatible with 4.x


Reply to this email directly or view it on GitHub
#194 (comment).

@elad661
Copy link
Contributor

elad661 commented Sep 7, 2014

I just had to use the db system for the first time today. It sucks ass. We really need to fix that.

@embolalia if SQLAlchemy will make things easier, than we can use it. Introducing deps is better than having a shitty API.

@embolalia
Copy link
Contributor Author

The rewrite is done, but there are still some more tasks left. I've added a tasklist to the OP to keep track of them.

embolalia added a commit that referenced this issue Oct 27, 2014
There's still plenty of potential for problems with this. Stuff like
this is why #194 needs to happen. Related to #645.
@embolalia
Copy link
Contributor Author

I don't want to delay this any more than it already is, so I'm dropping the SQLAlchemy helpers. Once this issue is closed, I'll create another one to track that (likely also perennial) task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants