Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot re-create Solr index [JIRA: CLIENTS-529] #408

Open
dnohales opened this issue Jul 28, 2015 · 6 comments
Open

Cannot re-create Solr index [JIRA: CLIENTS-529] #408

dnohales opened this issue Jul 28, 2015 · 6 comments

Comments

@dnohales
Copy link

I'm having problems with Riak and Python client while trying to delete and create a search index after a schema update. Our system check for changes in an XML file with the Solr schema, when the file changes, we refresh the index (update schema, delete index, create index and store all bucket data in the index again).

Roughly, what we are doing is the following:

import riak
c = riak.RiakClient(nodes=[{'host': '127.0.0.1', 'pb_port': 8087}],
                    protocol='pbc')
schema_data = <xml>

c.bucket('my_bucket').clear_properties()
c.delete_search_index('my_index')

c.create_search_schema('my_schema', schema_data)

c.create_search_index('my_index', 'my_schema')
c.bucket('my_bucket').set_property('search_index', 'my_index')

print c.bucket('my_bucket').search('*:*', index='my_index')

Having 5 records in my_bucket (and in my_index), after executing all these sentences, the search method is still returning the same data as before, I was expecting the Solr's my_index index to be empty (since I'm not re-indexing the bucket data in my_index with this code). Also, I cannot see my changes in the Solr schema browser.

But then, after some trial and error, I've noticed the following. If I execute this script:

import riak
c = riak.RiakClient(nodes=[{'host': '127.0.0.1', 'pb_port': 8087}],
                    protocol='pbc')

c.bucket('my_bucket').clear_properties()
c.delete_search_index('my_index')

And then, in another Python instance, I execute this script:

import riak
c = riak.RiakClient(nodes=[{'host': '127.0.0.1', 'pb_port': 8087}],
                    protocol='pbc')
schema_data = <xml>
c.create_search_schema('my_schema', schema_data)

c.create_search_index('my_index', 'my_schema')
c.bucket('my_bucket').set_property('search_index', 'my_index')

print c.bucket('my_bucket').search('*:*', index='my_index')

Namely, if I delete the index and create it again separately, everything works fine. my_index has no data anymore and Solr schema browser shows me the updated schema. Also, I've noticed that create_search_index has taken a lot more of time to execute (in the first script, everything was executed instantly).

Am I doing something wrong or is this a bug?

@Basho-JIRA Basho-JIRA changed the title Cannot re-create Solr index Cannot re-create Solr index [JIRA: CLIENTS-529] Jul 28, 2015
@hazen
Copy link

hazen commented Jul 28, 2015

@eagleoneraptor I'm wondering if this is a timing issue in that it takes a while for the changes in Riak to propagate over to Solr. @zeeshanlakhani pointed out to me that this has to happen to all nodes in the cluster, not just one. FYI, for our testing we use this function wait_for_yz_index() to wait until the index has settled: https://github.com/basho/riak-python-client/blob/master/riak/tests/test_yokozuna.py#L28-L38 It's ugly, but more reliable than a sleep(). Let me know if this helps at all, otherwise I'll investigate further.

@dnohales
Copy link
Author

@javajolt if I do a 30 seconds sleep after clear_properties and delete_search_index, the index is created properly afterwards (in the same script), but I ran into problems trying to use your busy waiting suggestion.

  • If I do busy waiting until the Solr index has no records after executing clear_properties, I keep waiting forever, the Solr index is never updated just by doing clear_properties (that makes sense in fact).
  • If I do busy waiting after executing clear_properties and delete_search_index, I get an index not found error...

How exactly should I implement the busy waiting in my case (deleting the whole index)?

@hazen
Copy link

hazen commented Jul 28, 2015

I guess I would keep searching until I got the not found error, then you would know you have successfully deleted this. Sounds like a useful feature to add to the client library, too, so you would not have to do this yourself.

@dnohales
Copy link
Author

So, you mean something like:

c.delete_search_index('my_index')

success = True
while success:
    try:
        print bucket.search('_yz_rk:' + key, index='my_index')['docs']
    except Exception as e:
        print e
        success = False

# Create index

That is not working, in the first call to search I get a "No index my_index found" error and the later executed code to create index doesn't do anything at all (create_search_index is executed instantly and the index remains intact).

@dnohales
Copy link
Author

So, I think I figured out how to hack it without the sleep, though I want your opinion. I really don't know how Riak or Yokozuna works internally, but what I believe is happening is that, when I delete the index, Riak mark is as deleted instantly, so it returns "Index not found" exceptions regardless the index is present or not in Solr. So what I did is to bypass Riak and talk to Solr directly:

def wait_for_delete_index():
    while True:
        try:
            urllib2.urlopen('http://localhost:8093/internal_solr/my_index/select')
            # Schema found in Solr, try again
            time.sleep(1)
        except urllib2.HTTPError:
            # Schema not found in Solr
            return

c.delete_search_index('my_index')
wait_for_delete_index()
c.create_search_schema('my_schema', schema_data)
c.create_search_index('my_index', 'my_schema')

What do you think?

@hazen
Copy link

hazen commented Jul 30, 2015

Interesting. I would bet it works, but it's unfortunate you have to go directly to Solr. That said, it's the safest solution because Solr is the component we are waiting for. The only downside is that if you have multiple nodes, I don't know that querying the single Solr node would tell you that the index has been deleted on ALL of the nodes in your cluster. There is one Solr instance per Riak node.

@lukebakken lukebakken modified the milestone: riak-python-client-2.7.1 Dec 16, 2016
@lukebakken lukebakken modified the milestones: riak-python-client-2.7.1, riak-python-client-3.0.0 Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants