-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use custom CollectionReader? #20
Comments
Hm, I would say in principle using a non-filesystem-based reader should work. If the implementation currently expects a filesystem-based reader, it may be possible to change that. Did you try customizing the DkproHadoopDriver such that you could e.g. specify "-none-" as input and/or output and that the driver handles that by simply not setting the job input/output? |
I did not try that yet. I've had some doubts about the role of the |
I was rather thinking about adding a condition like
Hm... it seems rather odd to me that the CASWritableSequenceFileWriter should try writing to inputPath. Maybe changing that to write to a (temporary) HDFS working directory wouldn't hurt anyway. |
According to the documentation, DKPro BigData can process 'any file we have a UIMa collection reader for'.
However, the method
DkproHadoopDriver.run(String[] args)
strictly requires the specification of input and output paths. I have a use case in which the reader should read from a MySQL database (using theJdbcReader
), hence not reading from the file system, with the following class.When I submit the job to the Hadoop (or Yarn) cluster with any parameters, the process is aborted with the following message:
When I add fake parameters (
test
), anInvalidInputException
is thrown:[...]
Should this behaviour be fixed or is it currently just not possible to use an input reader like this?
The text was updated successfully, but these errors were encountered: