Parallel processing #7

lefterav · 2013-01-29T11:07:14Z

Features calculation/extraction consists of tasks which could be run in parallel. Also, some tasks could also be split in parts and executed faster, if run in parallel. Parallelization could take place in a multi-CPU fashion or a grid-engine.

In the current implementation, all execution is serialized which is a major drawback if data-sets are big.

A possible fix for this would be to break the pipeline in many inter-dependent executables and the get the entire process through EMS (experiment.perl), after specifying the data-flow dependencies in a configuration file. This solution has the advantage that it allows for including the machine learning part in the same pipeline and re-run only some parts of the pre-processing, if configuration changes. The downside of this solution is that all feature extraction steps have to be commandline executables with a common input/output format

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel processing #7

Parallel processing #7

lefterav commented Jan 29, 2013

Parallel processing #7

Parallel processing #7

Comments

lefterav commented Jan 29, 2013