Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade fields with dot character to 2.0.0 #15122

Closed
pkr1234 opened this issue Nov 30, 2015 · 7 comments
Closed

Upgrade fields with dot character to 2.0.0 #15122

pkr1234 opened this issue Nov 30, 2015 · 7 comments

Comments

@pkr1234
Copy link

pkr1234 commented Nov 30, 2015

Hi,

I have a small number of fields with "." character in my 1.5 elasticsearch stack which gets data from logstash 1.4.2. I take snapshots to S3 daily.

The problem is that I can't start a 2.0.0 elasticsearch and use the snapshot to restore because it complains about "." character. I checked the fields with "." character using curl -XGET 'http://localhost:9200/_all/_mapping . I know about the logstash de-dot filter but this does not help me as it cannot go back and fix existing data.

How do I restore my snapshot? If the option is to delete the offending data then I'm ok with it. Could anyone let me know how?

@clintongormley
Copy link
Contributor

Hi @pkr1234

Dots are no longer allowed because they introduce an ambiguity in field lookup that is impossible to work around. So your options are:

  • reindex the data (in 1.x) into a new index without dots in the field names
  • delete the old data
  • keep a small 1.x cluster around to look at the old indices when you need to

@pkr1234
Copy link
Author

pkr1234 commented Dec 1, 2015

Thanks Clinton.

I went for the second option and deleted the data using delete by query and taken another snapshot. But the _restore still fails. I checked the _mapping and the mapping still exists for the offending fields. Is this causing the problem now for restore?

How do I get around this?

@clintongormley
Copy link
Contributor

You have to delete the index itself.

@pkr1234
Copy link
Author

pkr1234 commented Dec 1, 2015

That's not an option for me. I keep 45-60 days of logs pushed in by logstash for display by Kibana. These field names appear in all of the indexes logstash-yyyy.mm.dd .

Is there a script that renames the existing fields and reindexes anywhere that I can use? I'm stuck.

@clintongormley
Copy link
Contributor

@pkr1234
Copy link
Author

pkr1234 commented Dec 2, 2015

Thanks Clinton. The python library is quite easy to install. I can't say the same about the perl library which fails to install using cpan. It complains about 0.20 Hijk. Anyway, I have worked out a way and here are my steps:

A. Open all indexes
curl -XPOST http://localhost:9200/logstash-*/_open

B. Identify my dodgy fields with a "." character
curl -XGET http://localhost:9200/_all/_mapping?pretty |grep "\."
Grab all fields with dodgy character

C. Search (count) my records with dodgy field

 curl -XGET 'http://localhost:9200/_all/_count' -d '{
   "query" : {
     "filtered" : {
         "filter" : {
            "exists" : { "field" : "DODGY.FIELD.NAME"}
         }
     }
   }
}'

D. Once we know how many records will be affected and we are OK with purging them. I'm ok with purge but others may not be. I don;t know how to rename it. Maybe you could help us rename it. Anyway, to delete those records.

 curl -XDELETE  'http://localhost:9200/_all/_query' -d '{
   "query" : {
     "filtered" : {
         "filter" : {
            "exists" : { "field" : "DODGY.FIELD.NAME"}
         }
     }
   }
}'

E. Now the records are deleted but the mapping still exists and snapshot restore will still fail. So we need to reindex. So install python library:

yum install python-pip
pip install elasticsearch

I have written a script that reindexes the old index to a new name. It works on 1.5.x version. Older versions of elasticsearch may have a problem if indexes are closed as cat did not list closed indexes. For older versions, simply open all indexes using curl. First argument is elasticsearch host and the second one is the indexname that we want to reindex. The target is indexname + 'a'. It will close the old index after reindex.

#!/usr/bin/python

#########################
#pkr1234
# A script to reindex. It takes an index name and reindexes to indexname followed by a character 'a' i.e. logstash-2015.12.02a
#########################

import sys
import traceback
import time
from elasticsearch import Elasticsearch
from elasticsearch.helpers import reindex

def isempty(name, value):
    if (value == None):
        print '[' + name + '] must be defined!'
        return 1
    if (len(value.strip()) == 0):
        print '[' + name + '] cannot be empty!'
        return 1

    return 0

def printexit(msg, exitcode):
    print "\n" + msg + "\n"
    sys.exit(exitcode)

def existsindex(indexlist, indexname):
    # Expects a python list of indexes as in cat
    found = None
    for item in indexlist:
        itemsplitted = item.split()
        if ('close' in item):
            indname = itemsplitted[1]
            indstatus = itemsplitted[0]
        else:
            indname = itemsplitted[2]
            indstatus = itemsplitted[1]
        if (indname == indexname):
            found = [indname, indstatus]
            break
    return found



if (len(sys.argv) <> 3):
    printexit("You must supply two arguments: elasticsearch host as first argument and index name as second argument!", 1)


elhost = sys.argv[1]
elindex = sys.argv[2]
elnewindex = sys.argv[2] + 'a'

isempty('elhost', elhost)
isempty('elindex', elindex)

es = None
esindiceslist = None

try:
    es = Elasticsearch(host=elhost)
    esindiceslist = es.cat.indices().splitlines()
except:
    traceback.print_exc()
    printexit("Unable to connect to elasticsearch and list indices! [" + elhost + "]", 1)

nelind = existsindex(esindiceslist, elnewindex)
if (nelind):
    printexit("New index [" + elnewindex + "] already exists with status [" + nelind[1] + ". Cannot create!", 1)

elind = existsindex(esindiceslist, elindex)
if (elind == None):
    printexit("Index specified [" + elindex + "] does not exist. Cannot reindex!", 1)


if (elind[1] == "close"):
   # We have to open it for reindex to proceed
   print "Now opening source [" + elindex + "] and sleeping 120 sec.."
   es.indices.open(elindex)
   time.sleep(120)

print "Now indexing source [" + elindex + "] to target [" + elnewindex + "] .."
reindex(client=es,source_index=elindex,target_index=elnewindex)

# Close old index
es.indices.close(elindex)

# Open new index
es.indices.open(elnewindex)


reindex.py 'localhost'  'logstash-2015.12.02' 

This will result in logstash-2015.12.02 being closed and a new index created called logstash-2015.12.02a

F. Call the above script for each and every index that contains dodgy field. Once the reindex is complete, list the indexes:

curl http://localhost:9200/_cat/indices?pretty

It will show both old and new indexes (as in 1.5.0). For older elasticsearch, all indexes will have to be opened with a wildcard in a curl command.

G. Once satisfied, delete the old indices (ones not ending in 'a')

curl -XDELETE http://localhost:9200/[OLDINDEXNAME]

H. Once the new indices are created and old ones deleted, take the snapshot using the snapshot API. Take it over to the new 2.0.0 cluster and it should import as it does not have the fields with "." character.

Hope this helps someone. Ideally, a migration script should've been provided. Reindex is a slow process. I have about 45 days worth of logs (each 1 GB) - it will take a couple of overnight jobs to do it.

For new events, de-dot filter on logstash does the job fine on the fly renaming dodgy fiedls to underscores.

For rename - I don't know how to do it. My process is based upon deleting the offending records.

@clintongormley
Copy link
Contributor

Hope this helps someone. Ideally, a migration script should've been provided. Reindex is a slow process.

Won't help you now, but we're working on making this better: #15125

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants