Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase autocomplete 'phrase:slop' setting from 2->3 #489

Merged
merged 1 commit into from
Apr 29, 2016

Conversation

missinglink
Copy link
Member

this PR resolves pelias/pelias#307

by increasing the phrase slop from 2->3 we are able to successfully match "52 Görlitzer Straße" with "Görlitzer Straße 52".

I will add integration tests in another PR against schema, the value of 3 was confirmed with the following test case (2 fails but 3 passes):

// test the minimum amount of slop required to retrieve documents
module.exports.tests.slop = function(test, common){
  test( 'slop', function(t){

    var suite = new elastictest.Suite( null, { schema: schema } );
    suite.action( function( done ){ setTimeout( done, 500 ); }); // wait for es to bring some shards up

    // index a document
    suite.action( function( done ){
      suite.client.index({
        index: suite.props.index,
        type: 'test',
        id: '1',
        body: { name: { default: '52 Görlitzer Straße' } }
      }, done);
    });

    // search using 'peliasQueryFullToken'
    // in this case we require a slop of 3 to return the same
    // record with the street number and street name reversed.
    // (as is common in European countries, such as Germany).
    suite.assert( function( done ){
      suite.client.search({
        index: suite.props.index,
        type: 'test',
        body: { query: { match: {
          'name.default': {
            'analyzer': 'peliasQueryFullToken',
            'query': 'Görlitzer Straße 52',
            'type': 'phrase',
            'slop': 3,
          }
        }}}
      }, function( err, res ){
        t.equal( err, undefined );
        t.equal( res.hits.total, 1, 'document found' );
        done();
      });
    });

    suite.run( t.end );
  });
};

closes pelias/pelias#307

@hannesj
Copy link
Contributor

hannesj commented Apr 7, 2016

This wouldn't still fix all issues, as there are street names consisting of three parts, such as Minna Canthin katu.

@riordan
Copy link
Contributor

riordan commented Apr 7, 2016

@hannesj: yes, you're absolutely correct.

However, when combined with Libpostal's improved address parsing, we'll be able to better apply this rule based on component parts.

e.g.

In [4]: parser.parse_address(" Minna Canthin katu 32")
Out[4]: [(u'minna canthin katu', u'road'), (u'32', u'house_number')]

In [5]: parser.parse_address("katu Minna Canthin 32")
Out[5]: [(u'katu minna canthin', u'road'), (u'32', u'house_number')]

In [6]: parser.parse_address("32 Minna Cathin katu")
Out[6]: [(u'32', u'house_number'), (u'minna', u'house'), (u'cathin katu', u'road')]

In [7]: parser.parse_address("Görlitzer Straße 52")
Out[7]: [(u'goerlitzer strasse', u'road'), (u'52', u'house_number')]

It didn't get it right on 32 Minna Cathin Katu, but we can work to get some redundancy built in and add failures to the next model training.

@hannesj
Copy link
Contributor

hannesj commented Apr 7, 2016 via email

@orangejulius
Copy link
Member

This gets my 👍 after an acceptance-test and loadtest on dev or prod_build.

missinglink added a commit to pelias/schema that referenced this pull request Apr 13, 2016
@orangejulius orangejulius merged commit 3a789b4 into master Apr 29, 2016
@orangejulius orangejulius deleted the autocomplete_increase_slop branch May 25, 2016 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

autocomplete: documents should be retrievable using their label text
4 participants