Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large database question #32

Open
DamienHarper opened this issue Jan 5, 2018 · 17 comments
Open

Large database question #32

DamienHarper opened this issue Jan 5, 2018 · 17 comments

Comments

@DamienHarper
Copy link
Contributor

Hi,
We're currently building a new large project with a relatively large database (around 200 tables).
We need to keep track of changes done on mostly all tables and be able to show diffs through the UI.
This bundle seems to fit our needs but, if I'm not mistaken, it stores changes in only two tables. So, with a lot of users, I fear that we'll quickly have two gigantic tables with billions of records which will make them slow to lookup or to query (to show diff in the UI) and as a consequence will be harder to backup (and restore).
So, is it possible to split audit logs into more tables (one audit table per audited table for example as simplethings/entity-audit-bundle does) ?
If no, do you have some feedback about similar use cases?

@l3pp4rd
Copy link
Member

l3pp4rd commented Jan 6, 2018

Hi, I'm thinking, that splitting it into the tables, would make unnecessary complexity. Basically this bundle consists only with one core class, the audit event subscriber and it is quite simple to fork and adapt to custom needs.

In general, regarding table sizes, I agree that this will be an issue and in my opinion it should be archived or garbage collected. I would be happy to accept a contribution, or create one later when I have time: which implements a symfony command in this bundle, which would produce only the latest snapshot of each logged entity and removes all old entries. In order to produce the latest snapshot, we could take all entities logged load them from db and create their snapshots, then remove all older entries or archive them somewhere as a command option.

Building separate tables for each entity, would result in the same issue as without split, since in most cases there will be few highly used tables

@l3pp4rd
Copy link
Member

l3pp4rd commented Jan 6, 2018

though, my described option, might not work well if there are more inserts than updates. but for example, records, having lifecycle and being removed later could also be archived. Anyway, that is a complex task and hardly there is one best option, most likely it is not feasible to implement something like that in the library at all.

@DamienHarper
Copy link
Contributor Author

Thanks for your feedback.

@l3pp4rd
Copy link
Member

l3pp4rd commented Jan 8, 2018

but indeed, the lookup in large table will be terrible, since association foreign key is stored in string format, will cause a cast to int and it will be too slow even with more than 100K rows. For such cases, I just can suggest to copy the source and adapt it, changing FK to int already would speed it up, but that will not be enough.

@DamienHarper
Copy link
Contributor Author

Well, I finally chose to develop my own bundle to fit my needs/context: https://github.com/DamienHarper/DoctrineAuditBundle
It relies on one audit table per audited one and provides a command to clean old entries.
Thanks for your bundle, it was an inspiration for me.

@l3pp4rd
Copy link
Member

l3pp4rd commented Feb 10, 2018

glad to hear that, cheers

@ghost
Copy link

ghost commented Apr 19, 2019

I realize this is an older ticket, but I wanted to mention that most RDBMS' have a feature called table partitioning. You can partition on the type to achieve one (internal) "table" per type and partition on things such as dates to avoid issues with very large tables.

This lets you have one actual table in the database, which actually consists of many tables transparently.

Doctrine doesn't support this natively, however, but it would likely work with a partitioned table that you create yourself out of the box as it happens transparently by the database and it's "just another table" to Doctrine.

You also need to ensure that when you fetch records, you add additional WHERE conditions based on the partition condition so the database can exclude certain partitions entirely (thus, improving performance).

@DamienHarper
Copy link
Contributor Author

@tominventisbe sure partitioning can help here! but as you said, there are a few drawbacks:

  • partitioning is not available with all RDBMS (most of them support it though)
  • partitioning generally requires to have a finite set of partitions (or a "predictable" set of partitions) so you'll have to end up partitioning on the operation type (insert/update/association/...) here. Then if your system has many audited tables with many write operations (insert/update), partitions will get crowded and performance hit will happen as well as backup hard times.

Having separate audit tables does not completely solve these problems but pushes the limits further:

  • audit entries are distributed into several tables (similarly to distributed into partitions)
  • audit tables won't all grow at the same rate so probably only a few of them will get big
  • audit cleaning is easier and less risky because you can target which audit tables have to be cleaned
  • locking is less a problem (audits concurrent reads and writes)
  • you can even add partitioning on audit tables

@ghost
Copy link

ghost commented Apr 19, 2019

Indeed. I just thought I'd mention it for anyone looking for an intermediate solution, but didn't intend to disprove the need for the original request.

In case anyone's interested and as mentioned by Damien, several RDBMS' support partitions, such as MySQL and PostgreSQL.

PostgreSQL 11 added some additional advanced features such as default partitions for records that don't fit in any of the existing ones. (At least) with PostgreSQL, you can:

  • Partition on the audit_logs.tbl field, which is close to a per-entity partition if every entity has its own table
  • Make subpartitions (i.e. subpartitions of other partitions) on the table and date ranges to have partitions per table per month or something
  • Set a default partition to avoid problems if you forgot to create a new one

You will still need to do some plumbing in your code to create new partitions, either manually or automatically and you will likely still suffer from locks in this case.

Hope this helps someone!

@DamienHarper
Copy link
Contributor Author

@tominventisbe I understand that you didn't intend to disprove the need for the original request, I hope I didn't offend you (wasn't my goal for sure). Have a nice Easter weekend.

@Gabr-bb
Copy link

Gabr-bb commented Dec 16, 2021

regarding this , is there a way to force the logger to log into a different database just before the data is inserted?

@DamienHarper
Copy link
Contributor Author

@Gabr-bb I don't know if this is possible with this bundle as of now (wasn't possible when I used it 2.5 years ago).

As I mentioned in a previous reply, this bundle has been an inspiration for me and I built my own audit library and bundle that supports multiple storage, it might fit your needs in case this bundle doesn't.

@Gabr-bb
Copy link

Gabr-bb commented Dec 16, 2021

that looks promising, can i use your library instead of this one and it would work the same without changing database structure or lose the old data?

@DamienHarper
Copy link
Contributor Author

No it won't use the same table to store audit data. Have a look to the documentation.

@chancegarcia
Copy link
Contributor

@Gabr-bb I don't know if this is possible with this bundle as of now (wasn't possible when I used it 2.5 years ago).

As I mentioned in a previous reply, this bundle has been an inspiration for me and I built my own audit library and bundle that supports multiple storage, it might fit your needs in case this bundle doesn't.

thanks for making that bundle. we switched form using this one to yours last year and it has been awesome.

@DamienHarper
Copy link
Contributor Author

DamienHarper commented Dec 16, 2021

@Gabr-bb I don't know if this is possible with this bundle as of now (wasn't possible when I used it 2.5 years ago).

Wow, I realize this happened
Almost 5 years ago, time passes too quickly 😅

@indjeto
Copy link
Collaborator

indjeto commented Feb 4, 2024

I added a command in v1.0, that deletes the old records.
bin/console audit-logs:delete-old-logs --retention-period=P6M

It does it on batches of 10k and sleep interval to not block the db server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants