Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed guideline: Process data close to the source. #73

Closed
BezPowell opened this issue Feb 1, 2024 · 5 comments
Closed

Proposed guideline: Process data close to the source. #73

BezPowell opened this issue Feb 1, 2024 · 5 comments
Assignees
Labels
enhancement New guideline, success criteria, or content
Milestone

Comments

@BezPowell
Copy link

This is in some ways related to a few of the other guidelines, but might be worth mentioning as its own topic.

Proposed new guideline:

Process data close to the source.

All information passed between the layers of an application incurs a cost, both in terms of data transferred, and CPU cycles for (de)serialisation. Wherever possible, data transformations should be performed close to the source to reduce these costs and avoid processing data that will later be discarded.

Success Criterion - Perform filtering on the database.

When retrieving information from a database, perform all filtering and field selection in SQL. Only retrieve the values that you need and avoid reliance on framework helpers that might defer filtering to later on in the process.

Benefits

Environmental:

Filtering out unneeded data at a deeper level of the application may reduce energy usage, as less processing is required for (de)serialisation.

Performance

Relational databases and other specialist data stores are generally heavily optimised for data filtering and retrieval. Performing transformations at this level of the application may lead to reduced CPU time and faster responses.

@AlexDawsonUK
Copy link
Member

I agree entirely with this one, though do you have any third-party links to support the guideline which we could include as reference material (we try to have weighted evidence whenever possible)?

@AlexDawsonUK AlexDawsonUK self-assigned this Feb 1, 2024
@AlexDawsonUK AlexDawsonUK added the enhancement New guideline, success criteria, or content label Feb 1, 2024
@AlexDawsonUK AlexDawsonUK modified the milestones: v1.0-D2, v1.0-D5 Feb 1, 2024
@BezPowell
Copy link
Author

Hi Alex. I don't remember any references off the top of my head, but will try and dig some out this week. This has mainly come up recently for me with laravel, and I do seem to remember reading a few articles that had benchmarks in them.

@BezPowell
Copy link
Author

Very basic, but does actually have some benchmarks: https://onextrapixel.com/mysql-has-functions-part-5-php-vs-mysql-performance/
Hopefully, I'll be able to find some other sources with actual figures, but this seems to be one of those 'common knowledge' things where everyone knows aggregating data in SQL is more efficient, but doesn't actually have any numbers to back it up.

@AlexDawsonUK
Copy link
Member

Thinking about this more, "only accessing the database once" is mentioned by 3.24 as are the perils associated with "repeated requests" for information (also note we don't just want to be SQL explicit as there are other database types on the Web).

I think the best course of action may be todo the following:

  • The SC and Benefits should be merged into 3.24 as it is largely covered in the Database Guideline.
  • The content on "Process data close to the source" has a wider appeal (beyond databases) and possibly falls into Edge-Computing which might work well in 4.10 as a new SC dedicated to tightening the relational stack.

I'll see what I can cook up (noting this useful link on Query perf) for the next draft.

@AlexDawsonUK
Copy link
Member

The new SC and content update have been integrated into the living draft.
They will be published along with other changes in the next version of the specification.

AlexDawsonUK added a commit that referenced this issue Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New guideline, success criteria, or content
Projects
None yet
Development

No branches or pull requests

2 participants