Add functions to efficiently serialize records #24

idem-s1n · 2023-07-17T08:22:20Z

When serializing records, looping through all the columns is time consuming on big databases.

Therefore, I have added a function to extract only the fields defined in a record then map them to the right column.

idem-s1n · 2023-08-17T12:15:45Z

Thank you for requesting a review for this PR.
Is there a reason why it is still pending ?

Schamper · 2023-08-17T20:43:00Z

Apologies @idem-s1n. I meant to read a bit into ESEDB again before starting the review so I could provide better suggestions, however I hadn't gotten around to that yet (conferences, time off, other work). Specifically, I wanted to propose a method that reduces the amount of code duplication with the other (regular) way of retrieving a column. I could've just commented that, so please forgive me for only doing that now 😄.

Would it work/get sufficient performance if you just did the column lookup and then simply call .get(column) with that column? That should only add a few lines per fixed/variable/tagged, instead of duplicating most of the parsing code.

reduced code duplication

idem-s1n · 2023-08-24T09:19:29Z

Thanks @Schamper for the feedback.

I did some tests with a call to .get(column), there is a slight loss of performance. However, calling ._parse_value(column) gives better results.

Therefore, I have updated the serialize() function in order to reduce duplication of the parsing code. Changes were pushed on our main branch.

dissect/esedb/record.py

PR24 Co-authored-by: Erik Schamper <[email protected]>

codecov · 2023-09-01T21:59:57Z

Codecov Report

Merging #24 (974543b) into main (f477082) will increase coverage by 0.02%.
The diff coverage is 82.35%.

@@            Coverage Diff             @@
##             main      #24      +/-   ##
==========================================
+ Coverage   79.12%   79.15%   +0.02%     
==========================================
  Files          15       15              
  Lines        1255     1271      +16     
==========================================
+ Hits          993     1006      +13     
- Misses        262      265       +3

Flag	Coverage Δ
unittests	`79.15% <82.35%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
dissect/esedb/record.py	`84.26% <82.35%> (-0.20%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Schamper · 2023-09-04T10:29:22Z

I don't have permission to push to your branch, so could you please apply the following changes. It adds a unit test and fixes a range error for the column IDs. It also corrects the linting issue.

diff --git a/dissect/esedb/record.py b/dissect/esedb/record.py
index 4d2d95d..0cbe4c2 100644
--- a/dissect/esedb/record.py
+++ b/dissect/esedb/record.py
@@ -4,7 +4,7 @@ import functools
 import struct
 from binascii import hexlify
 from functools import lru_cache
-from typing import TYPE_CHECKING, Any, Optional, Iterator
+from typing import TYPE_CHECKING, Any, Iterator, Optional

 from dissect.util.xmemoryview import xmemoryview

@@ -128,7 +128,9 @@ class RecordData:
             if num_variable > 0 and len(self.data) >= 4 + (num_variable * 2):
                 # Parse the variable offsets already, if we have them
                 # There can only be 128 at most, so this shouldn't be an expensive operation
-                self._variable_offsets = struct.unpack("<%dH" % num_variable, self.data[self._variable_offset_start : self._variable_data_start])
+                self._variable_offsets = struct.unpack(
+                    "<%dH" % num_variable, self.data[self._variable_offset_start : self._variable_data_start]
+                )

             self._tagged_data_start = self._variable_data_start
             if self._variable_offsets:
@@ -190,10 +192,10 @@ class RecordData:

         def _iter_column_id() -> Iterator[Column]:
             # Fixed
-            yield from range(1, self._last_fixed_id)
-
+            yield from range(1, self._last_fixed_id + 1)
+
             # Variable
-            yield from range(128, self._last_variable_id)
+            yield from range(128, self._last_variable_id + 1)

             # Tagged
             for idx in range(self._tagged_data_count):
diff --git a/tests/test_record.py b/tests/test_record.py
new file mode 100644
index 0000000..afea747
--- /dev/null
+++ b/tests/test_record.py
@@ -0,0 +1,40 @@
+from typing import BinaryIO
+
+from dissect.esedb.esedb import EseDB
+
+
+def test_as_dict(basic_db: BinaryIO):
+    db = EseDB(basic_db)
+    table = db.table("basic")
+
+    records = list(table.records())
+    assert len(records) == 2
+
+    assert [r.as_dict() for r in records] == [
+        {
+            "Id": 1,
+            "Bit": False,
+            "UnsignedByte": 213,
+            "Short": -1337,
+            "Long": -13371337,
+            "Currency": 1337133713371337,
+            "IEEESingle": 1.0,
+            "IEEEDouble": 13371337.13371337,
+            "DateTime": 4675210852477960192,
+            "UnsignedLong": 13371337,
+            "LongLong": -13371337,
+            "GUID": "3f360af1-6766-46dc-9af2-0dacf295c2a1",
+            "UnsignedShort": 1337,
+        },
+        {
+            "Id": 2,
+            "Bit": True,
+            "UnsignedByte": 255,
+            "Short": 1339,
+            "Long": 13391339,
+            "Currency": -1339133913391339,
+            "IEEESingle": -2.0,
+            "IEEEDouble": -13391339.13391339,
+            "DateTime": -4537072128574357504,
+        },
+    ]

I'll also kick the CLA bot.

DissectBot · 2023-09-04T10:30:01Z

@idem-s1n thank you for your contribution! As this is your first code contribution, please read the following Contributor License Agreement (CLA). If you agree with the CLA, please reply with the following information:

@DissectBot agree [company="{your company}"]

Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.

(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.

Contributor License Agreement

Contribution License Agreement

This Contribution License Agreement ("Agreement") governs your Contribution(s) (as defined below) and conveys certain license rights to Fox-IT B.V. ("Fox-IT") for your Contribution(s) to Fox-IT"s open source Dissect project. This Agreement covers any and all Contributions that you ("You" or "Your"), now or in the future, Submit (as defined below) to this project. This Agreement is between Fox-IT B.V. and You and takes effect when you click an “I Accept” button, check box presented with these terms, otherwise accept these terms or, if earlier, when You Submit a Contribution.

Definitions.
"Contribution" means any original work of authorship, including any modifications or additions to an existing work, that is intentionally submitted by You to Fox-IT for inclusion in, or documentation of, any of the software products owned or managed by, or on behalf of, Fox-IT as part of the Project (the "Work").
"Project" means any of the projects owned or managed by Fox-IT and offered under a license approved by the Open Source Initiative (www.opensource.org).
"Submit" means any form of electronic, verbal, or written communication sent to Fox-IT or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, Fox-IT for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by You as "Not a Contribution."
Grant of Copyright License. Subject to the terms and conditions of this Agreement, You hereby grant to Fox-IT and to recipients of software distributed by Fox-IT a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Your Contributions and such derivative works.
Grant of Patent License. Subject to the terms and conditions of this Agreement, You hereby grant to Fox-IT and to recipients of software distributed by Fox-IT a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, maintain, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by You that are necessarily infringed by Your Contribution(s) alone or by combination of Your Contribution(s) with the Work to which such Contribution(s) was submitted. If any entity institutes patent litigation against You or any other entity (including a cross-claim or counterclaim in a lawsuit) alleging that your Contribution, or the Work to which you have contributed, constitutes direct or contributory patent infringement, then any patent licenses granted to that entity under this Agreement for that Contribution or Work shall terminate as of the date such litigation is filed.
Representations. You represent that:
- You are legally entitled to grant the above license.
- each of Your Contributions is Your original creation (see section 8 for submissions on behalf of others).
- Your Contribution submissions include complete details of any third-party license or other restriction (including, but not limited to, related patents and trademarks) of which you are personally aware and which are associated with any part of Your Contributions.
Employer. If Your Contribution is made in the course of Your work for an employer or Your employer has intellectual property rights in Your Submission by contract or applicable law, You must secure permission from Your employer to make the Contribution before signing this Agreement. In that case, the term "You" in this Agreement will refer to You and the employer collectively. If You change employers in the future and desire to Submit additional Contribution for the new employer, then You agree to sign a new Agreement and secure permission from the new employer before Submitting those Contributions.
Support. You are not expected to provide support for Your Contribution, unless You choose to do so. Any such support provided to the Project is provided free of charge.
Warranty. Unless required by applicable law or agreed to in writing, You provide Your Contributions on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE.
Third party material. Should You wish to submit work that is not Your original creation, You may only submit it to Fox-IT separately from any Contribution, identifying the complete details of its source and of any license or other restriction (including, but not limited to, related patents, trademarks, and license agreements) of which You are personally aware, and conspicuously marking the work as "Submitted on behalf of a third-party: [named here]".
Notify. You agree to notify Fox-IT of any facts or circumstances of which You become aware that would make the above representations inaccurate in any respect.
Governing law / competent court. This Agreement is governed by the laws of the Netherlands. Any disputes that may arise are resolved by arbitration in accordance with the Arbitration Regulations of the Foundation for the Settlement of Automation Disputes (Stichting Geschillenoplossing Automatisering – SGOA – (www.sgoa.eu), this without prejudice to either party"s right to request preliminary relief in preliminary relief proceedings or arbitral preliminary relief proceedings. Arbitration proceedings take place in Amsterdam, or in any other place designated in the Arbitration Regulations. Arbitration shall take place in English.

idem-s1n · 2023-09-04T13:10:55Z

@DissectBot agree company="Synacktiv"

idem-s1n · 2023-09-06T16:10:06Z

@Schamper Is it possible to plan a new version release so these changes are included in the pip package ?

Schamper · 2023-09-07T18:34:36Z

@Schamper Is it possible to plan a new version release so these changes are included in the pip package ?

I believe we have a new release planned for the end of next week/beginning of the week after!

Add functions to efficiently serialize records

d9d5e0d

Schamper self-requested a review August 3, 2023 10:44

Factoring code of the record serialization process

9a59fc4

reduced code duplication

Schamper requested changes Aug 28, 2023

View reviewed changes

Schamper reviewed Aug 28, 2023

View reviewed changes

dissect/esedb/record.py Outdated Show resolved Hide resolved

idem-s1n and others added 2 commits August 30, 2023 19:21

Apply suggestions from code review

782e9b8

PR24 Co-authored-by: Erik Schamper <[email protected]>

Fix range error

79059f3

Fix missing import

4dd28d0

Fix range error and linting issue. Add unit test for record

974543b

Schamper approved these changes Sep 4, 2023

View reviewed changes

Schamper merged commit af92e25 into fox-it:main Sep 4, 2023
10 checks passed

Schamper mentioned this pull request Sep 4, 2023

Add support for XPRESS10 compression #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functions to efficiently serialize records #24

Add functions to efficiently serialize records #24

idem-s1n commented Jul 17, 2023

idem-s1n commented Aug 17, 2023

Schamper commented Aug 17, 2023 •

edited

Loading

idem-s1n commented Aug 24, 2023

codecov bot commented Sep 1, 2023 •

edited

Loading

Schamper commented Sep 4, 2023

DissectBot commented Sep 4, 2023

Contribution License Agreement

idem-s1n commented Sep 4, 2023 •

edited

Loading

idem-s1n commented Sep 6, 2023

Schamper commented Sep 7, 2023

Add functions to efficiently serialize records #24

Add functions to efficiently serialize records #24

Conversation

idem-s1n commented Jul 17, 2023

idem-s1n commented Aug 17, 2023

Schamper commented Aug 17, 2023 • edited Loading

idem-s1n commented Aug 24, 2023

codecov bot commented Sep 1, 2023 • edited Loading

Codecov Report

Schamper commented Sep 4, 2023

DissectBot commented Sep 4, 2023

Contribution License Agreement

idem-s1n commented Sep 4, 2023 • edited Loading

idem-s1n commented Sep 6, 2023

Schamper commented Sep 7, 2023

Schamper commented Aug 17, 2023 •

edited

Loading

codecov bot commented Sep 1, 2023 •

edited

Loading

idem-s1n commented Sep 4, 2023 •

edited

Loading