Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Need to be able to quickly/temporarily adjust timeout when uploading to File Data Visualizer #69624

Open
slightlybent opened this issue Jun 19, 2020 · 2 comments
Labels

Comments

@slightlybent
Copy link

(introduced in Elastic Stack 6.5 - file data visualizer)

Suggest a link for user with admin role to be able easily adjust timeout for file import in the experimental "Visualize data from a log file" under Machine Learning, or at least temporarily override the timeout when uploading via file data visualizer?

Currently when attempting to import file of just under 10MB getting "File could not be read
Request Time-out
" error.

@timroes timroes added the :ml label Jun 22, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@peteharverson peteharverson changed the title Need to be able to quickly/temporarily adjust timeout when uploading to File Data Visualizer [ML] Need to be able to quickly/temporarily adjust timeout when uploading to File Data Visualizer Jun 23, 2020
@peteharverson peteharverson added the Feature:File and Index Data Viz ML file and index data visualizer label Jun 23, 2020
@droberts195
Copy link
Contributor

@slightlybent this is a valid enhancement request in its own right, but we might be able to do more to help.

Which version are you using? If it's 6.6.1 or above you can use the change made in elastic/elasticsearch#38191 to maybe get a clue about why it took so long.

If you run from the command line you'll see the "explanation so far" in the timeout exception. For example:

curl -u elastic:password -s -H "Content-Type: application/json" -XPOST "http://localhost:9200/_ml/find_file_structure?pretty&explain" -T my_10mb_file.txt

Often when you get a timeout it's because you have a CSV file where one of the lines has a different number of fields to the others, causing the CSV format to be rejected and the file to be analysed as a semi-structured log file. Then attempting to build a Grok pattern takes ages as there's so much structure. But you really want the file analysed as CSV. If this is your problem then in the "explanation so far" in the timeout exception you'll see something like "not CSV because line 491 had a different number of fields to the first line". We have also recently made a change to make CSV parsing more lenient if the format is explicitly overridden to CSV (see elastic/elasticsearch#55735) but unfortunately that is of no practical benefit until #38868 is implemented. But at least you know improvements are on the way.

If it's not misdetection of CSV and you're running a version earlier than 7.3.0 then you're probably suffering from slow timestamp determination. This was made much faster in elastic/elasticsearch#41948 and you should upgrade to 7.3.0 or above if you can.

Finally, if it's neither of those things then it would be interesting if we could see your file so we can find out why it's so slow. If it's not too hard would you be able to anonymise it and replace anything at all confidential, then zip it and attach to this issue? But it's only worth doing this if it's not one of the well known problems I mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants