increase autocomplete 'phrase:slop' setting from 2->3 #489

missinglink · 2016-04-07T10:56:40Z

by increasing the phrase slop from 2->3 we are able to successfully match "52 Görlitzer Straße" with "Görlitzer Straße 52".

I will add integration tests in another PR against schema, the value of 3 was confirmed with the following test case (2 fails but 3 passes):

// test the minimum amount of slop required to retrieve documents
module.exports.tests.slop = function(test, common){
  test( 'slop', function(t){

    var suite = new elastictest.Suite( null, { schema: schema } );
    suite.action( function( done ){ setTimeout( done, 500 ); }); // wait for es to bring some shards up

    // index a document
    suite.action( function( done ){
      suite.client.index({
        index: suite.props.index,
        type: 'test',
        id: '1',
        body: { name: { default: '52 Görlitzer Straße' } }
      }, done);
    });

    // search using 'peliasQueryFullToken'
    // in this case we require a slop of 3 to return the same
    // record with the street number and street name reversed.
    // (as is common in European countries, such as Germany).
    suite.assert( function( done ){
      suite.client.search({
        index: suite.props.index,
        type: 'test',
        body: { query: { match: {
          'name.default': {
            'analyzer': 'peliasQueryFullToken',
            'query': 'Görlitzer Straße 52',
            'type': 'phrase',
            'slop': 3,
          }
        }}}
      }, function( err, res ){
        t.equal( err, undefined );
        t.equal( res.hits.total, 1, 'document found' );
        done();
      });
    });

    suite.run( t.end );
  });
};

closes pelias/pelias#307

hannesj · 2016-04-07T12:22:58Z

This wouldn't still fix all issues, as there are street names consisting of three parts, such as Minna Canthin katu.

riordan · 2016-04-07T13:52:48Z

@hannesj: yes, you're absolutely correct.

However, when combined with Libpostal's improved address parsing, we'll be able to better apply this rule based on component parts.

e.g.

In [4]: parser.parse_address(" Minna Canthin katu 32")
Out[4]: [(u'minna canthin katu', u'road'), (u'32', u'house_number')]

In [5]: parser.parse_address("katu Minna Canthin 32")
Out[5]: [(u'katu minna canthin', u'road'), (u'32', u'house_number')]

In [6]: parser.parse_address("32 Minna Cathin katu")
Out[6]: [(u'32', u'house_number'), (u'minna', u'house'), (u'cathin katu', u'road')]

In [7]: parser.parse_address("Görlitzer Straße 52")
Out[7]: [(u'goerlitzer strasse', u'road'), (u'52', u'house_number')]

It didn't get it right on 32 Minna Cathin Katu, but we can work to get some redundancy built in and add failures to the next model training.

hannesj · 2016-04-07T13:55:39Z

Thanks, do you have any estimate when you would start working with the libpostal integration?

orangejulius · 2016-04-12T21:53:55Z

This gets my 👍 after an acceptance-test and loadtest on dev or prod_build.

add slop tests for pelias/api#489

increase autocomplete 'phrase:slop' setting from 2->3

3a789b4

missinglink self-assigned this Apr 7, 2016

missinglink added the in review label Apr 7, 2016

missinglink added this to the Autocomplete Improvements milestone Apr 7, 2016

missinglink added a commit to pelias/schema that referenced this pull request Apr 7, 2016

add slop tests for pelias/api#489

1a04114

missinglink mentioned this pull request Apr 7, 2016

add slop tests for https://github.com/pelias/api/pull/489 pelias/schema#122

Merged

missinglink added a commit to pelias/schema that referenced this pull request Apr 13, 2016

Merge pull request #122 from pelias/slop_tests

3bb8558

add slop tests for pelias/api#489

missinglink mentioned this pull request Apr 29, 2016

autocomplete milestone #526

Merged

orangejulius merged commit 3a789b4 into master Apr 29, 2016

orangejulius removed the in review label Apr 29, 2016

orangejulius deleted the autocomplete_increase_slop branch May 25, 2016 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

increase autocomplete 'phrase:slop' setting from 2->3 #489

increase autocomplete 'phrase:slop' setting from 2->3 #489

missinglink commented Apr 7, 2016

hannesj commented Apr 7, 2016

riordan commented Apr 7, 2016

hannesj commented Apr 7, 2016 via email

orangejulius commented Apr 12, 2016

increase autocomplete 'phrase:slop' setting from 2->3 #489

increase autocomplete 'phrase:slop' setting from 2->3 #489

Conversation

missinglink commented Apr 7, 2016

hannesj commented Apr 7, 2016

riordan commented Apr 7, 2016

hannesj commented Apr 7, 2016 via email

orangejulius commented Apr 12, 2016