-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CSV dialects #3864
Comments
@dafeder : This is of interest for us. In Germany the delimiter is usually a semicolon, and even MS Excel or the liking use the semicolon delimiter by default in german versions. Am I right that currently the delimiter is hardcoded in this place: https://github.com/GetDKAN/dkan/blob/2.x/modules/datastore/src/Service/ImportService.php#L166 And right now there is no configuration option for this? Regarding
Do you already have something in mind? Would extending the distribution schema about an optional field to define the CSV dialect be a viable option from your point of view? |
We are in a tricky spot because we are trying to stay as close to DCAT as possible, but this is kind of outside the scope of DCAT. I think as a stopgap we should figure out some relatively straightforward way to override that hardcoded value, but it may be that a better solution is to have a system outside of the metastore completely for storing file resources, perhaps as part of the datastore, and decouple that as much as possible from the metastore. This is sort of already the case but Resources are basically just a URL and a timestamp at the moment. |
Also, there is a way to do this, sort of, with event listeners. The $events[Import::EVENT_CONFIGURE_PARSER][] = [‘set’]; [...] public function set(Event $event) {
$parserConfiguration = $event->getData();
$parserConfiguration['delimeter'] = ';';
$event->setData($parserConfiguration);
} h/t @janette |
@dafeder : Thanks a lot for the hint. This works nicely to change it to semicolon overall. I missed that out. Still often times I am only looking for the good old hooks and forgetting about the new synfony events ... By the way and off-topic: is there a documentation standard like for the hooks for events? I just searched a bit and could not find anything fruitful about this. Regarding conditional logic: the event gets only the parser configuration as data? So I have no clue about the resource that is parsed here? Or am I missing something again? |
I think you're right, it's basically all or nothing, sorry to lead you astray there. And yeah, documenting those events has been on our to-do list for a long time now, this is a good reminder. |
fyi #4176 |
There are a lot of permutations of CSV out there, from TSV to things like semicolon-delimited files, to different escaping methods, etc. Even though both DKAN's native CSV parser and the mysql LOAD DATA importer can be configured to support most of these permutations, there is no easy way to do this in DKAN, on either a per-resource or system-wide level.
Frictionless Data project has a spec designed to address just this issue, CSV Dialect. We should explore ways to support different dialects in importers, and figure out the most efficient way to communicate which dialect to use to the importer on a per-resource basis.
The text was updated successfully, but these errors were encountered: