-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functions to efficiently serialize records #24
Conversation
Hi @Schamper Thank you for requesting a review for this PR. |
Apologies @idem-s1n. I meant to read a bit into ESEDB again before starting the review so I could provide better suggestions, however I hadn't gotten around to that yet (conferences, time off, other work). Specifically, I wanted to propose a method that reduces the amount of code duplication with the other (regular) way of retrieving a column. I could've just commented that, so please forgive me for only doing that now 😄. Would it work/get sufficient performance if you just did the column lookup and then simply call |
reduced code duplication
Thanks @Schamper for the feedback. I did some tests with a call to Therefore, I have updated the |
PR24 Co-authored-by: Erik Schamper <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #24 +/- ##
==========================================
+ Coverage 79.12% 79.15% +0.02%
==========================================
Files 15 15
Lines 1255 1271 +16
==========================================
+ Hits 993 1006 +13
- Misses 262 265 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I don't have permission to push to your branch, so could you please apply the following changes. It adds a unit test and fixes a range error for the column IDs. It also corrects the linting issue. diff --git a/dissect/esedb/record.py b/dissect/esedb/record.py
index 4d2d95d..0cbe4c2 100644
--- a/dissect/esedb/record.py
+++ b/dissect/esedb/record.py
@@ -4,7 +4,7 @@ import functools
import struct
from binascii import hexlify
from functools import lru_cache
-from typing import TYPE_CHECKING, Any, Optional, Iterator
+from typing import TYPE_CHECKING, Any, Iterator, Optional
from dissect.util.xmemoryview import xmemoryview
@@ -128,7 +128,9 @@ class RecordData:
if num_variable > 0 and len(self.data) >= 4 + (num_variable * 2):
# Parse the variable offsets already, if we have them
# There can only be 128 at most, so this shouldn't be an expensive operation
- self._variable_offsets = struct.unpack("<%dH" % num_variable, self.data[self._variable_offset_start : self._variable_data_start])
+ self._variable_offsets = struct.unpack(
+ "<%dH" % num_variable, self.data[self._variable_offset_start : self._variable_data_start]
+ )
self._tagged_data_start = self._variable_data_start
if self._variable_offsets:
@@ -190,10 +192,10 @@ class RecordData:
def _iter_column_id() -> Iterator[Column]:
# Fixed
- yield from range(1, self._last_fixed_id)
-
+ yield from range(1, self._last_fixed_id + 1)
+
# Variable
- yield from range(128, self._last_variable_id)
+ yield from range(128, self._last_variable_id + 1)
# Tagged
for idx in range(self._tagged_data_count):
diff --git a/tests/test_record.py b/tests/test_record.py
new file mode 100644
index 0000000..afea747
--- /dev/null
+++ b/tests/test_record.py
@@ -0,0 +1,40 @@
+from typing import BinaryIO
+
+from dissect.esedb.esedb import EseDB
+
+
+def test_as_dict(basic_db: BinaryIO):
+ db = EseDB(basic_db)
+ table = db.table("basic")
+
+ records = list(table.records())
+ assert len(records) == 2
+
+ assert [r.as_dict() for r in records] == [
+ {
+ "Id": 1,
+ "Bit": False,
+ "UnsignedByte": 213,
+ "Short": -1337,
+ "Long": -13371337,
+ "Currency": 1337133713371337,
+ "IEEESingle": 1.0,
+ "IEEEDouble": 13371337.13371337,
+ "DateTime": 4675210852477960192,
+ "UnsignedLong": 13371337,
+ "LongLong": -13371337,
+ "GUID": "3f360af1-6766-46dc-9af2-0dacf295c2a1",
+ "UnsignedShort": 1337,
+ },
+ {
+ "Id": 2,
+ "Bit": True,
+ "UnsignedByte": 255,
+ "Short": 1339,
+ "Long": 13391339,
+ "Currency": -1339133913391339,
+ "IEEESingle": -2.0,
+ "IEEEDouble": -13391339.13391339,
+ "DateTime": -4537072128574357504,
+ },
+ ] I'll also kick the CLA bot. |
@idem-s1n thank you for your contribution! As this is your first code contribution, please read the following Contributor License Agreement (CLA). If you agree with the CLA, please reply with the following information:
Contributor License Agreement
Contribution License AgreementThis Contribution License Agreement ("Agreement") governs your Contribution(s) (as defined below) and conveys certain license rights to Fox-IT B.V. ("Fox-IT") for your Contribution(s) to Fox-IT"s open source Dissect project. This Agreement covers any and all Contributions that you ("You" or "Your"), now or in the future, Submit (as defined below) to this project. This Agreement is between Fox-IT B.V. and You and takes effect when you click an “I Accept” button, check box presented with these terms, otherwise accept these terms or, if earlier, when You Submit a Contribution.
|
@DissectBot agree company="Synacktiv" |
@Schamper Is it possible to plan a new version release so these changes are included in the pip package ? |
I believe we have a new release planned for the end of next week/beginning of the week after! |
When serializing records, looping through all the columns is time consuming on big databases.
Therefore, I have added a function to extract only the fields defined in a record then map them to the right column.