Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InlineFirehose does not work with index_parallel ingestion #8673

Closed
vogievetsky opened this issue Oct 15, 2019 · 4 comments · Fixed by #8682
Closed

InlineFirehose does not work with index_parallel ingestion #8673

vogievetsky opened this issue Oct 15, 2019 · 4 comments · Fixed by #8682

Comments

@vogievetsky
Copy link
Contributor

vogievetsky commented Oct 15, 2019

In Druid 0.16.0

As the title says, if you try to submit a task like:

{
  "type": "index_parallel",
  "spec": {
    "type": "index_parallel",
    "ioConfig": {
      "type": "index_parallel",
      "firehose": {
        "type": "inline",
        "data": "{\"name\":\"Vadim\"}"
      }
    },
    "tuningConfig": {
      "type": "index_parallel"
    },
    "dataSchema": {
      "dataSource": "some_data",
      "granularitySpec": {
        "type": "uniform",
        "queryGranularity": "HOUR",
        "rollup": true,
        "segmentGranularity": "DAY"
      },
      "parser": {
        "type": "string",
        "parseSpec": {
          "format": "json",
          "timestampSpec": {
            "column": "!!!_no_such_column_!!!",
            "missingValue": "2010-01-01T00:00:00Z"
          },
          "dimensionsSpec": {
            "dimensions": [
              "name"
            ]
          }
        }
      },
      "metricsSpec": [
        {
          "name": "count",
          "type": "count"
        }
      ]
    }
  }
}

You get an error of:

{"error":"Instantiation of [simple type, class org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask] value failed: [InlineFirehoseFactory] should implement FiniteFirehoseFactory"}

However changing the ingestion to just index:

{
  "type": "index",
  "spec": {
    "type": "index",
    "ioConfig": {
      "type": "index",
      "firehose": {
        "type": "inline",
        "data": "{\"name\":\"Vadim\"}"
      }
    },
    "tuningConfig": {
      "type": "index"
    },
    "dataSchema": {
      "dataSource": "some_data",
      "granularitySpec": {
        "type": "uniform",
        "queryGranularity": "HOUR",
        "rollup": true,
        "segmentGranularity": "DAY"
      },
      "parser": {
        "type": "string",
        "parseSpec": {
          "format": "json",
          "timestampSpec": {
            "column": "!!!_no_such_column_!!!",
            "missingValue": "2010-01-01T00:00:00Z"
          },
          "dimensionsSpec": {
            "dimensions": [
              "name"
            ]
          }
        }
      },
      "metricsSpec": [
        {
          "name": "count",
          "type": "count"
        }
      ]
    }
  }
}

Makes it all work

@ccaominh
Copy link
Contributor

To work with parallel indexing, one possibility is to modifyInlineFirehose to implement FiniteFirehoseFactory so that it is splittable and always returns 1 split (that contains the entire contents).

@jihoonson
Copy link
Contributor

I was writing about the same comment with @ccaominh. InlineFirehose could implement FiniteFirehoseFactory but it should always run in the sequential mode.

@gianm gianm added the Starter label Oct 15, 2019
@jnaous
Copy link
Contributor

jnaous commented Oct 15, 2019 via email

@gianm
Copy link
Contributor

gianm commented Oct 15, 2019

If this gets fixed, is it possible to fix these cryptic messages that are sent out? I assume it's because of Jackson deserialization exceptions being regurgitated to the user?

Yeah, those are Jackson exceptions, and it would be nice to make them more pleasant in a systematic way. I'm not sure if there's an easy way to do that without destroying the information in the error message or needing to parse their text. Maybe Jackson provides a way to get errors in a more programmatic way and we can write our own messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants