-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a FS for any KS-generated parser. #13
base: master
Are you sure you want to change the base?
Conversation
@GreyCat, @dgelessus, now it seems to be working. |
I've done a bit of testing - it seems to be working for the most part, but I got this exception repeatedly: Uncaught exception from FUSE operation read, returning errno.EINVAL.
Traceback (most recent call last):
File "/home/user/.local/lib/python3.7/site-packages/fuse.py", line 734, in _wrapper
return func(*args, **kwargs) or 0
File "/home/user/.local/lib/python3.7/site-packages/fuse.py", line 846, in read
offset, fh)
File "/home/user/.local/lib/python3.7/site-packages/fuse.py", line 1251, in __call__
ret = getattr(self, op)(path, *args)
File "/home/user/work/Kaitai Struct/kaitai_fs/kaitaifs/fs/AnyKSFS.py", line 317, in read
data = self.get_file_body(obj, offset, length)
File "/home/user/work/Kaitai Struct/kaitai_fs/kaitaifs/fs/AnyKSFS.py", line 321, in get_file_body
return obj[offset:offset + length]
TypeError: 'SignatureType' object is not subscriptable I was testing with the Other than that it seems to be working well. Though I don't quite understand the logic behind how the fields are exposed as files - most fields are grouped together into a single |
In fact it is not necessarily json, it can also be msgpack or bson or yaml, depending on filter chain. Though I have not yet implemented configuring it from CLI. Also it is possible to represent objects of other types as different files depending on their types by adding a filter. I have not yet decided what to do with strings, to put them into a dict or to put them into txt files, probably we should act depending on their sizes.
if property type is subclass of
Yes, I have not yet implemented different naming for arrays.
Good catch. Though I am unsure how to represent it. Probably we should switch to from json to yaml and add enum name as an end line comment. Alsk I wanted to implement caching. Currently to get sizes of json objects they are generated and then discarded. We should cache the sizes. |
I don't think this would work well in practice - if you do that, every program that wants to read a string field needs to check both for a separate file and in the
It might be a good idea to cache the entire string/bytes representation of the JSON objects, that way reading them is also faster. Though to be honest I don't think performance is a big issue in practice - my "generic FS" implementation (#17) has no caching at all, and I never had any noticeable performance problems with it (even on a 2010 ThinkPad with HDD). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general comments:
- The code doesn't follow PEP 8 - almost all names are
camelCase
where they should besnake_case
, and spaces are missing in many places (especially in assignments and type annotations). - Some parts of the code are IMHO not very understandable currently, especially the
TypeSplitter
class hierarchy - it's not obvious which methods/fields are meant to be used externally and which ones are just implementation details. In general there should be more comments and docstrings. - Some of the
TypeSplitter
subclasses are meant as alternatives to each other and cannot be used together. It would be helpful to document this somewhere. - A few third-party libraries (e. g.
ruamel.yaml
,msgpack
) are used even though they are not listed as dependencies insetup.cfg
.
|
||
from fuse import FUSE | ||
|
||
from functools import partial | ||
|
||
from kaitaifs.parser.rar import Rar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Rar
import used anywhere in __main__
?
def getParserCtor(moduleName:str): | ||
import importlib | ||
import ast | ||
from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason why the modules are imported inside the function? The usual Python convention is to import everything at top level, unless there's a specific reason not to (like lazy loading for performance reasons).
FILESYSTEMS = { | ||
'rar': RarFS, | ||
'quake_pak': QuakePakFS, | ||
'heroes_of_might_and_magic_agg': HeroesOfMightAndMagicAggFS, | ||
'iso9660': Iso9660FS, | ||
} | ||
|
||
def getParserCtor(moduleName:str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name should be get_parser_ctor
, as per PEP 8. Also, get_parser_class
might be a better name (from my experience, the term "ctor" is not used often in Python).
from pathlib import Path | ||
try: | ||
mod=importlib.import_module("."+moduleName, package="kaitaifs.parser") | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be except ImportError
to avoid catching unrelated exceptions.
I'm also not sure if this try
block is a good idea. The main use case for a "generic" KaitaiFS is for working with specs that KaitaiFS doesn't support natively, so there's usually no need to use it with the specs included with KaitaiFS.
Also, this could lead to confusing behavior when a spec exists both at top level and under kaitaifs.parser
. If you run kaitaifs --any rar ...
, the current code would always use kaitaifs.parser.rar
, even if you have a rar
module on your PYTHONPATH. If this try
block is kept, the logic should be reversed so that it first searches for a top-level module, and only looks into kaitaifs.parser
if no top-level module is found.
a=ast.parse(p.read_text()) | ||
for el in a.body: | ||
if isinstance(el, ast.ClassDef): | ||
return getattr(mod, el.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ast
here might be a little overkill. Also, do we know for sure that the top-level class will always be the first class in the generated file?
For context, my implementation originally required the user to pass both module name and class name explicitly. Later I changed it so that the class name is automatically extracted from the module name (by taking the last part of the module name and converting it to CamelCase).
class TypeSplitter: | ||
fileExtension = None | ||
def __init__(self): | ||
raise NotImplementedError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good way to implement abstract classes, because it prevents subclasses from calling super().__init__(...)
. It would be better to use Python's standard abc
module here.
objectIndexOverhead = 2 # pair of quotes | ||
|
||
def __init__(self): | ||
import ruamel.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any practical differences between this module and PyYAML? I'm asking because in my experience PyYAML is more widely used (I haven't heard of ruamel.yaml before).
return res | ||
|
||
#def getxattr(self, path, name, position=0): | ||
# return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed instead of commented out.
} | ||
|
||
if isinstance(obj, (bytes, bytearray)): | ||
res['st_size'] = len(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for other object types? I remember getting odd behavior with some programs when files don't have a correct size.
'st_gid': 0, | ||
'st_uid': 0, | ||
'st_gid': os.getgid() if not isWin else 0, | ||
'st_uid': os.geteuid() if not isWin else 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this condition necessary? I know that there's some kind of Windows port of FUSE, but I would expect it to handle file UIDs/GIDs properly (or at least ignore them if they don't apply on Windows).
If this condition is really needed, it would be good to add a comment explaining why.
IMHO it is not programs, but is for humans manually examining files and executing hex editors on different unknown chunks or doing diffing with generic tools. Finished specialized programs should use the code generated by KSC directly.
Yes. it is just not comfortable to view large or multiline strings in json. We should definitely use YAML by default. |
Another thing we may want to have is symlinks when multiple instances point to the same object. |
It's buggy, doesn't work (I don't see there hinary blobs for now), but I still publish it.