Proposed guideline: Process data close to the source. #73

BezPowell · 2024-02-01T10:48:10Z

This is in some ways related to a few of the other guidelines, but might be worth mentioning as its own topic.

Proposed new guideline:

Process data close to the source.

All information passed between the layers of an application incurs a cost, both in terms of data transferred, and CPU cycles for (de)serialisation. Wherever possible, data transformations should be performed close to the source to reduce these costs and avoid processing data that will later be discarded.

Success Criterion - Perform filtering on the database.

When retrieving information from a database, perform all filtering and field selection in SQL. Only retrieve the values that you need and avoid reliance on framework helpers that might defer filtering to later on in the process.

Benefits

Environmental:

Filtering out unneeded data at a deeper level of the application may reduce energy usage, as less processing is required for (de)serialisation.

Performance

Relational databases and other specialist data stores are generally heavily optimised for data filtering and retrieval. Performing transformations at this level of the application may lead to reduced CPU time and faster responses.

AlexDawsonUK · 2024-02-01T11:00:51Z

I agree entirely with this one, though do you have any third-party links to support the guideline which we could include as reference material (we try to have weighted evidence whenever possible)?

BezPowell · 2024-02-01T11:12:52Z

Hi Alex. I don't remember any references off the top of my head, but will try and dig some out this week. This has mainly come up recently for me with laravel, and I do seem to remember reading a few articles that had benchmarks in them.

BezPowell · 2024-02-01T21:05:53Z

Very basic, but does actually have some benchmarks: https://onextrapixel.com/mysql-has-functions-part-5-php-vs-mysql-performance/
Hopefully, I'll be able to find some other sources with actual figures, but this seems to be one of those 'common knowledge' things where everyone knows aggregating data in SQL is more efficient, but doesn't actually have any numbers to back it up.

AlexDawsonUK · 2024-02-11T02:33:01Z

Thinking about this more, "only accessing the database once" is mentioned by 3.24 as are the perils associated with "repeated requests" for information (also note we don't just want to be SQL explicit as there are other database types on the Web).

I think the best course of action may be todo the following:

The SC and Benefits should be merged into 3.24 as it is largely covered in the Database Guideline.
The content on "Process data close to the source" has a wider appeal (beyond databases) and possibly falls into Edge-Computing which might work well in 4.10 as a new SC dedicated to tightening the relational stack.

I'll see what I can cook up (noting this useful link on Query perf) for the next draft.

AlexDawsonUK · 2024-02-11T12:56:27Z

The new SC and content update have been integrated into the living draft.
They will be published along with other changes in the next version of the specification.

AlexDawsonUK self-assigned this Feb 1, 2024

AlexDawsonUK added the enhancement New guideline, success criteria, or content label Feb 1, 2024

AlexDawsonUK modified the milestones: v1.0-D2, v1.0-D5 Feb 1, 2024

AlexDawsonUK closed this as completed Feb 11, 2024

AlexDawsonUK added a commit that referenced this issue Feb 11, 2024

New SC & Content Update #73

7655b53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed guideline: Process data close to the source. #73

Proposed guideline: Process data close to the source. #73

BezPowell commented Feb 1, 2024

AlexDawsonUK commented Feb 1, 2024

BezPowell commented Feb 1, 2024

BezPowell commented Feb 1, 2024

AlexDawsonUK commented Feb 11, 2024

AlexDawsonUK commented Feb 11, 2024

Proposed guideline: Process data close to the source. #73

Proposed guideline: Process data close to the source. #73

Comments

BezPowell commented Feb 1, 2024

Process data close to the source.

Success Criterion - Perform filtering on the database.

Benefits

Environmental:

Performance

AlexDawsonUK commented Feb 1, 2024

BezPowell commented Feb 1, 2024

BezPowell commented Feb 1, 2024

AlexDawsonUK commented Feb 11, 2024

AlexDawsonUK commented Feb 11, 2024