Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when result is large #639

Closed
ghost opened this issue Sep 8, 2015 · 2 comments
Closed

Error when result is large #639

ghost opened this issue Sep 8, 2015 · 2 comments

Comments

@ghost
Copy link

ghost commented Sep 8, 2015

When I use sql query to fetch a large table and get large result (about 300,000 rows).
The elasticsearch-jdbc cannot import all to elasticsearch.
I got message below:

[16:55:19,855][INFO ][metrics.source.plain ][pool-4-thread-1] totalrows = 262080, 30 seconds = 30061 ms, 155219389 = 148.03 MB bytes, 592.0 bytes = 592 avg size, 8,718.273 dps, 5.042 MB/s
[16:55:19,855][INFO ][metrics.sink.plain ][pool-4-thread-1] 30 seconds = 30062 ms, submitted = 262080, succeeded = 186411, failed = 34633, 268009203 = 255.59 MB bytes, 1022.0 bytes = 1,022 avg size, 8,717.983 dps, 8.706 MB/s

Log as below:

[17:12:39,380][INFO ][importer.jdbc.context.standard][pool-2-thread-1] metrics thread started
[17:12:39,381][INFO ][importer.jdbc.context.standard][pool-2-thread-1] found sink class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSink@6ef5f8ea
[17:12:39,384][INFO ][importer.jdbc.context.standard][pool-2-thread-1] found source class org.xbib.elasticsearch.jdbc.strategy.standard.StandardSource@43c19940
[17:12:39,411][INFO ][BaseTransportClient ][pool-2-thread-1] creating transport client, java version 1.7.0_85, effective settings {cluster.name=elasticsearch, host.0=localhost, port=9300, sniff=false, autodiscover=false, name=importer, client.transport.ignore_cluster_name=false, client.transport.ping_timeout=5s, client.transport.nodes_sampler_interval=5s}
[17:12:39,438][INFO ][org.elasticsearch.plugins][pool-2-thread-1] [importer] loaded [support-1.7.1.0-b344fa4], sites []
[17:12:39,817][INFO ][BaseTransportClient ][pool-2-thread-1] trying to connect to [inet[localhost/127.0.0.1:9300]]
[17:12:39,893][INFO ][BaseTransportClient ][pool-2-thread-1] connected to [[Mister Hyde][MAvfNKBjQf-AK79_FCmUHA][foo.com][inet[localhost/127.0.0.1:9300]]]
[17:12:45,109][ERROR][BulkTransportClient ][elasticsearch[importer][transport_client_worker][T#1]{New I/O worker #1}] bulk [1] failed with 301 failed items, failure message = failure in bulk execution:
[24]: index [log], type [bar2], id [61684], message [MapperParsingException[object mapping for [bar2] tried to parse field [Message] as object, but got EOF, has a concrete value been provided to it?]]
[49]: index [log], type [bar2], id [61709], message [MapperParsingException[object mapping for [bar2] tried to parse field [Message] as object, but got EOF, has a concrete value been provided to it?]]
[51]: index [log], type [bar2], id [61711], message [MapperParsingException[object mapping for [bar2] tried to parse field [Message] as object, but got EOF, has a concrete value been provided to it?]]
[59]: index [log], type [bar2], id [61719], message [MapperParsingException[object mapping for [bar2] tried to parse field [Message] as object, but got EOF, has a concrete value been provided to it?]]
[60]: index [log], type [bar2], id [61720], message [MapperParsingException[object mapping for [bar2] tried to parse field [Message] as object, but got EOF, has a concrete value been provided to it?]]
...(all the same error but different id)

Code as below:

#!/bin/sh
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
bin=${DIR}/../bin
lib=${DIR}/../lib

echo '
{
    "type" : "jdbc",
    "jdbc" : {
        "statefile" : "statefile.json",
        "url" : "jdbc:mysql://path.to.mysql.com/database",
        "user" : "user",
        "password" : "password",
        "sql" : [
                {
                 "statement" : "select *, ID as _id, \"log\" as _index, \"bar1\" as _type from foo where `tag` LIKE ? ",
                 "parameter" : [ "bar1%" ]
                },
                {
                 "statement" : "select *, ID as _id, \"log\" as _index, \"bar2\" as _type from foo where `tag` LIKE ? ",
                 "parameter" : [ "bar2%" ]
                }
        ],
        "index_settings" : {
            "index" : {
                "number_of_shards" : 1
            }
        },
        "elasticsearch.host" : "localhost",
        "elasticsearch.port" : "9300",
        "metrics" : {
            "enabled" : true
        }
    }
}
' | java \
    -cp "${lib}/*" \
    -Dlog4j.configurationFile=${bin}/log4j2.xml \
    org.xbib.tools.Runner \
    org.xbib.tools.JDBCImporter
@jprante
Copy link
Owner

jprante commented Sep 9, 2015

The data you index is not valid for Elasticsearch, you can not index both plain value or a structured object to the same field.

@ghost
Copy link
Author

ghost commented Sep 10, 2015

I found why.
It is because my JSON object(in mysql string type) contain a string include "\x22" and it is consider as double quote, when elasticsearch-jdbc pass this JSON object to elasticsearch, the elasticsearch received the double quote, elasticsearch consider the JSON object is finished, so cause this problem.

my message field content is as below(it's a string in mysql):

{  
   "remote_addr":"123.45.67.89",
   "remote_user":"-",
   "time":"22/Jul/2015:17:01:02 -0700",
   "request":"GET /+BKg5naO8d0=/1zqjhy'%\x22()%7B%7D%3Cx%3E:1zqjhy;9 HTTP/1.1",
   "request_method":"GET",
   "request_length":"1214",
   "request_time":"0.040",
   "status":400,
   "body_bytes_sent":1193,
   "http_referer":"https://foo.com/+BKg5naO8d0=",
   "http_user_agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36",
   "http_x_forwarded_for":"-"
}

And elasticsearch-jdbc will send below

"{  
   \"remote_addr\":\"123.45.67.89\",
   \"remote_user\":\"-\",
   \"time\":\"22/Jul/2015:17:01:02 -0700\",
   \"request\":\"GET /+BKg5naO8d0=/1zqjhy'%\\x22()%7B%7D%3Cx%3E:1zqjhy;9 HTTP/1.1\",
   \"request_method\":\"GET\",
   \"request_length\":\"1214\",
   \"request_time\":\"0.040\",
   \"status\":400,
   \"body_bytes_sent\":1193,
   \"http_referer\":\"https://foo.com/+BKg5naO8d0=\",
   \"http_user_agent\":\"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36\",
   \"http_x_forwarded_for\":\"-\"
}"

When elasticsearch received \x22, it is translated as \x22 which mean a double quote, so elasticsearch cannot parse this JSON object.

@ghost ghost closed this as completed Sep 15, 2015
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant