Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AAE Failing hourly, with exit value {badarg,46}, [{base64,decode_binary[{base64.erl... [JIRA: RIAK-1557] #450

Closed
ghost opened this issue Feb 10, 2015 · 23 comments
Assignees

Comments

@ghost
Copy link

ghost commented Feb 10, 2015

We are getting these error messages hourly, I assume when trying to rebuild the AAE tables....

emulator Error in process <#.####.###> on node '[email protected]' with exit value: {{badarg,46},[{base64,decode_binary,2,[{file,"base64.erl"},{line,191}]},{yz_solr,to_pair,1,[{file,"src/yz_solr.erl"},{line,414}]}, {yz_solr,'-get_pairs/1-lc$^0/1-0-',1,[{file,"src/yzsolr.erl"},{line,411}]},{yz_solr,entropy_data,2,[{file...

Unfortunately erlang appears to cut off the error logs, so that's all I can provide currently...

Would appreciate any tips on how to proceed in terms of debugging the issue.

@zeeshanlakhani
Copy link
Contributor

Hey @Boardom. You should be able to see the entire error in the crash.log file. That'd help us take a look at what's going on. If you need more info about Logs in Riak, check out http://docs.basho.com/riak/latest/community/faqs/logs/.

@ghost
Copy link
Author

ghost commented Feb 10, 2015

Yup. I had already looked there... Only difference is that one prints out the ===ERROR REPORT==== and then the line.

Any other suggestions.

@zeeshanlakhani
Copy link
Contributor

@Boardom have you tried deleting the data_dir directory where AAE trees are stored before adding data again -> http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/#Active-Anti-Entropy?

@ghost
Copy link
Author

ghost commented Feb 11, 2015

We actually pushed updated schema's, which appears to have resolved the isssue... Unfortunate as there's probably a data handling issue somewhere... Thanks for the help.

Feel free to close.

@zeeshanlakhani
Copy link
Contributor

@Boardom yeah... surprised that there was not more descriptive data handling/corruption errors in any of the logs, even moreso, the solr.log if the validation issues occured on the Solr side, which would be my guess here. We have an issue out to better expose these solr errors in the logs, #446.

Thanks.

@ghost
Copy link
Author

ghost commented Feb 11, 2015

I looked into it a bit a while back, and it stemmed back to something with
erlang and max buffer size on the error reporting/stack trace stuff... I
didn't follow up far enough obviously, but definately odd... Truncating
stack traces is just evil.

On Wed, Feb 11, 2015 at 2:20 PM, Zeeshan Lakhani [email protected]
wrote:

@Boardom https://github.com/boardom yeah... surprised that there was
not more descriptive data handling/corruption errors in any of the logs,
even moreso, the solr.log if the validation issues occured on the Solr
side, which would be my guess here. We have an issue out to better expose
these solr errors in the logs, #446
#446.

Thanks.


Reply to this email directly or view it on GitHub
#450 (comment).

@stevegaron
Copy link

Could you please re-open this ticket. It would seem that AAE issues that I mentioned in the mailing list here: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2015-February/016782.html are cause by this very bug. Every time I clear the yz_anti_entropy directory and attach to a node to run: (yz_entropy_mgr:init([])). I get the badarg,46 error in the logs.

@Basho-JIRA Basho-JIRA changed the title AAE Failing hourly, with exit value {badarg,46}, [{base64,decode_binary[{base64.erl... AAE Failing hourly, with exit value {badarg,46}, [{base64,decode_binary[{base64.erl... [JIRA: RIAK-1557] Feb 20, 2015
@zeeshanlakhani
Copy link
Contributor

@stevegaron it seems like the issue may be on the data itself though, causing the AAE errors; though, as per #446, we need a better way to log the issues w/ the data.

@stevegaron
Copy link

I have the same issue on both my prod server and my dev server. My dev server is nearly empty. I can run any debugging on my dev server, can I have AAE running manually in a shell so I can see the full error msg?

@zeeshanlakhani
Copy link
Contributor

@stevegaron I'd try the commands (and set_envs) shown in this video spot: https://www.youtube.com/watch?v=ETJqu5SmwOc#t=2290 where he's calling reset_build_tokens on both the yz_entropy_mrg and riak_kv_entropy_manager. Otherwise, the steps to see what's going on somewhat manually is through tracing in erlang w/ Redbug/dbg. Do you have any experience tracing Erlang programs?

@zeeshanlakhani
Copy link
Contributor

@stevegaron I'm wondering if one of your keys contains spaces, as it may be due to this, #436. If so, I'll be working on that fix next week.

@stevegaron
Copy link

Yes I have 5000 ish keys with spaces...

@zeeshanlakhani
Copy link
Contributor

@stevegaron ok... we have something close for a fix for this already on the java side. I should be able to PR something next week. /cc @Boardom @seancribbs. Sorry for the inconvenience.

@stevegaron
Copy link

@zeeshanlakhani I'll remove all keys with spaces in my dev cluster and let you know if AAE finishes. Will this fix make it into 2.0.5?

@stevegaron
Copy link

@zeeshanlakhani I've started to see numbers showing up in the 'riak-admin search aae-status' ouput and I haven't seen any error so far... looks promising. Hopefully you guys can 2.0.5 with that patch in fairly soon.

@seancribbs
Copy link

@stevegaron We are trying to solidify and verify 2.0.5 right now. We discussed this issue today in our weekly meeting and decided that, at least if the fix doesn't make it into 2.0.5, we will make available a patch to you and other affected users. We will update this issue as soon as we know whether the patch will be in 2.0.5.

Thank you for working with @zeeshanlakhani on confirming the bug.

@seancribbs
Copy link

@stevegaron The fix did not make it into 2.0.5, but @zeeshanlakhani tells me the patch is nearly ready. We'll update this again soon.

@ghost
Copy link
Author

ghost commented Feb 27, 2015

Fellas,

Just noticed the space issue wasn't raised as a 'Known Issue' in the 2.0.5
release notes... It's a pretty easy thing to avoid if you know it's there,
pretty vicious bite in the ass if you aren't aware. Might be worthwhile to
add.

On Mon, Feb 23, 2015 at 5:12 PM, Sean Cribbs [email protected]
wrote:

@stevegaron https://github.com/stevegaron The fix did not make it into
2.0.5, but @zeeshanlakhani https://github.com/zeeshanlakhani tells me
the patch is nearly ready. We'll update this again soon.


Reply to this email directly or view it on GitHub
#450 (comment).

@zeeshanlakhani
Copy link
Contributor

@Boardom Release Notes have been updated... and the patch is in the review stage: #459.

@Basho-JIRA
Copy link

PR -> #459

_[posted via JIRA by Zeeshan Lakhani]_

@Basho-JIRA
Copy link

PR -> #459... S3 updated w/ yokozuna-2.jar

_[posted via JIRA by Zeeshan Lakhani]_

@Basho-JIRA
Copy link

Code review done, awaiting CI.

_[posted via JIRA by Sean Cribbs]_

@Basho-JIRA
Copy link

Updated "fix version" to 2.0.6 since Zeeshan added the related GH to the 2.0.6 release notes based on this fix landing in 2.0.6.

_[posted via JIRA by Patricia Brewer]_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants