Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proto elastic suggester #1549

Merged
merged 31 commits into from
Nov 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
de4c163
proto elastic suggester
rmelisson Oct 23, 2019
7637bcb
elastic suggester : indexation ok
rmelisson Oct 23, 2019
f38c613
Elastic suggester : indexing / API / tests - first pass
rmelisson Oct 28, 2019
b0ff2ed
use structured json suggestion
rmelisson Oct 28, 2019
f871a32
add test cases
rmelisson Oct 29, 2019
49b382f
lucene suggester : cleaning, populate
rmelisson Oct 29, 2019
dcade21
elastic suggester: uncomment
rmelisson Oct 29, 2019
8d33b67
Elastic suggester : fix data/ tests
rmelisson Oct 29, 2019
0efb283
Elastic suggester : try to link it to frontend
rmelisson Oct 29, 2019
82e175d
elastic suggester : Docker / download suggestions
rmelisson Oct 30, 2019
653f9da
elastic suggester : linter issue
rmelisson Oct 30, 2019
2e49e55
elastic suggester : linter issue
rmelisson Oct 30, 2019
a7984d3
Docker data : alpine install curl
rmelisson Oct 30, 2019
4fd977a
linter dockerfile
rmelisson Oct 30, 2019
d001022
docker file linter hell
rmelisson Oct 30, 2019
91576bc
docker update curl version
rmelisson Oct 30, 2019
bf09498
add small boost on weight
rmelisson Oct 30, 2019
72e50ee
add stop words
rmelisson Oct 30, 2019
314549e
update prefix search : replace match_phrase_prefix by match_phrase an…
rmelisson Nov 5, 2019
7b38f8e
include comments from PR
rmelisson Nov 5, 2019
51cc2f5
update test
rmelisson Nov 5, 2019
3fbe94c
fake edit to relaunch ci
rmelisson Nov 5, 2019
3c13b1c
remove python suggestion endpoint
rmelisson Nov 5, 2019
54403ea
remove python suggester - continuation
rmelisson Nov 5, 2019
f4e4c41
python suggester continuation : Dockerfile update
rmelisson Nov 6, 2019
1ec6976
remove changelog comment
rmelisson Nov 6, 2019
4a42de1
remove python suggester : scissors are too big
rmelisson Nov 6, 2019
1984f54
remove python suggester : stopwords are hidden in the suggestion files
rmelisson Nov 6, 2019
7629c4e
yarn lock problem
rmelisson Nov 6, 2019
84b35fe
restore changelog
rmelisson Nov 6, 2019
98670af
elastic suggester : update according to PR remarks
rmelisson Nov 12, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@ dist
docker-compose.override.yml
__pycache__
packages/code-du-travail-data/dump.json
packages/code-du-travail-data/dataset/suggestions.txt
package-lock.json
yarn-error.log
2 changes: 1 addition & 1 deletion .k8s/frontend/deployment.dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ spec:
- name: PIWIK_URL
value: ${CI_PIWIK_URL}
- name: SUGGEST_URL
value: "${NLP_URL}/api/suggest"
value: "${API_URL}/api/v1/suggest"
- name: VERSION
value: "${VERSION}"
initContainers:
Expand Down
2 changes: 1 addition & 1 deletion packages/code-du-travail-api/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"dev": "NLP_URL=http://localhost:1337/nlp nodemon ./src/server/index.js",
"dev-with-nlp": "nodemon ./src/server/index.js",
"pretest": "NODE_ENV=test node -r esm tests/create_indexes.js",
"test": "ELASTICSEARCH_DOCUMENT_INDEX=cdtn_document_test ELASTICSEARCH_CONVENTION_INDEX=cdtn_convention_test ELASTICSEARCH_THEME_INDEX=cdtn_theme_test jest",
"test": "ELASTICSEARCH_SUGGESTION_INDEX=cdtn_suggestion_test ELASTICSEARCH_DOCUMENT_INDEX=cdtn_document_test ELASTICSEARCH_CONVENTION_INDEX=cdtn_convention_test ELASTICSEARCH_THEME_INDEX=cdtn_theme_test jest",
"elastic": "node scripts/elastic.js"
},
"repository": {
Expand Down
2 changes: 2 additions & 0 deletions packages/code-du-travail-api/src/server/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ const searchRoutes = require("./routes/search");
const versionRoutes = require("./routes/version");
const docsRoutes = require("./routes/docs");
const themesRoute = require("./routes/themes");
const suggestRoute = require("./routes/suggest");

const { logger } = require("./utils/logger");

Expand Down Expand Up @@ -55,6 +56,7 @@ app.use(searchRoutes.routes());
app.use(itemsRoutes.routes());
app.use(versionRoutes.routes());
app.use(themesRoute.routes());
app.use(suggestRoute.routes());

app.use(docsRoutes);

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`ensure results are only returned when enough characters passed 1`] = `Array []`;

exports[`fuzzy matching results are lower than exact matchs 1`] = `
Array [
"contractuelle",
"composition",
]
`;


exports[`fuzzy matching works 1`] = `
Array [
"retraite",
]
`;

exports[`return suggestions for re in the right format 1`] = `
Array [
"renseignements",
"repos",
"retraite",
"réintégration",
"déplacement régulier",
]
`;

exports[`when query match several suggestions with same prefix,
ensure order is based on rank 1`] = `
Array [
"renseignements",
"repos",
"retraite",
"réintégration",
"déplacement régulier",
]
`;

exports[`when query match several suggestions with same rank,
ensure order is based on query prefix matching position 1`] = `
Array [
"renseignements",
"repos",
"retraite",
"réintégration",
"déplacement régulier",
]
`;
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
const request = require("supertest");
const Koa = require("koa");
const router = require("../suggest");

const app = new Koa();
app.use(router.routes());

function getSuggestions(query) {
return request(app.callback()).get(`/api/v1/suggest?q=` + query);
}

test("return suggestions for re in the right format", async () => {
const response = await getSuggestions("re");
expect(response.status).toBe(200);
expect(response.get("Content-Type")).toMatch(/json/);
expect(response.body).toMatchSnapshot();
});

test("accentuation is ignored", async () => {
const response = await getSuggestions("ré");
expect(response.body.includes("retraite")).toBeTruthy();
});

test(`when query match several suggestions with same rank,
ensure order is based on query prefix matching position`, async () => {
const response = await getSuggestions("ré");
expect(response.body).toMatchSnapshot();
});

test(`when query match several suggestions with same prefix,
ensure order is based on rank`, async () => {
const response = await getSuggestions("re");
expect(response.body).toMatchSnapshot();
});

test("fuzzy matching works", async () => {
const response = await getSuggestions("reta");
expect(response.body).toMatchSnapshot();
});

test("fuzzy matching results are lower than exact matchs", async () => {
const response = await getSuggestions("con");
expect(response.body).toMatchSnapshot();
});

test("ensure results are only returned when enough characters passed", async () => {
const response = await getSuggestions("d");
expect(response.body).toMatchSnapshot();
});
36 changes: 36 additions & 0 deletions packages/code-du-travail-api/src/server/routes/suggest/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
const Router = require("koa-router");
const API_BASE_URL = require("../v1.prefix");
const elasticsearchClient = require("../../conf/elasticsearch.js");
const { getSuggestQuery } = require("./suggest.elastic.js");

const index = process.env.ELASTICSEARCH_SUGGESTION_INDEX || "cdtn_suggestions";

const router = new Router({ prefix: API_BASE_URL });

const minQueryLength = 2;
const suggestionsSize = 5;

/**
* Return the search suggestion
*
* @example
* http://localhost:1337/api/v1/suggest?q=aba
*
* @returns {Object} List of matching suggestions.
*/
router.get("/suggest", async ctx => {
const { q = "", size = suggestionsSize } = ctx.request.query;

if (q.length >= minQueryLength) {
const body = getSuggestQuery(q, size);
const response = await elasticsearchClient.search({
index,
body
});
ctx.body = response.body.hits.hits.map(t => t._source.title);
} else {
ctx.body = [];
}
});

module.exports = router;
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
function getSuggestQuery(query, size) {
return {
_source: ["title"],
size: size,
query: {
bool: {
must: [
{
match: {
title: {
query,
fuzziness: "auto"
}
}
}
],
should: [
{
match_phrase_prefix: {
"title.prefix": {
query
}
}
},
{
rank_feature: {
field: "ranking",
log: {
scaling_factor: 1
},
boost: 3
}
}
]
}
}
};
}

module.exports = { getSuggestQuery };
14 changes: 14 additions & 0 deletions packages/code-du-travail-api/tests/create_indexes.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,15 @@ import documents from "./cdtn_document_data.json";
import { conventionCollectiveMapping } from "@cdt/data/indexing/convention_collective.mapping";
import conventions from "./convention_data.json";
import { themesMapping } from "@cdt/data/indexing/themes.mapping";
import { suggestionMapping } from "@cdt/data/indexing/suggestion.mapping";
import suggestions from "./suggestions_data.json";

const themes = documents.filter(document => document.source === SOURCES.THEMES);

const documentIndexName = "cdtn_document_test";
const themeIndexName = "cdtn_theme_test";
const conventionsIndexName = "cdtn_convention_test";
const suggestionsIndexName = "cdtn_suggestion_test";

async function main() {
await version({ client });
Expand Down Expand Up @@ -52,6 +55,17 @@ async function main() {
indexName: conventionsIndexName,
documents: conventions
});

await createIndex({
client,
indexName: suggestionsIndexName,
mappings: suggestionMapping
});
await indexDocumentsBatched({
client,
indexName: suggestionsIndexName,
documents: suggestions
});
}

main();
14 changes: 14 additions & 0 deletions packages/code-du-travail-api/tests/suggestions_data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[
{ "ranking": "1", "title": "préavis" },
{ "ranking": "1", "title": "urgent" },
{ "ranking": "1", "title": "déduction" },
{ "ranking": "1", "title": "avertissement" },
{ "ranking": "1", "title": "retraite" },
{ "ranking": "1", "title": "repos" },
{ "ranking": "42", "title": "renseignements" },
{ "ranking": "1", "title": "réintégration" },
{ "ranking": "1", "title": "férié" },
{ "ranking": "2", "title": "déplacement régulier" },
{ "ranking": "1", "title": "contractuelle" },
{ "ranking": "1", "title": "composition" }
]
1 change: 1 addition & 0 deletions packages/code-du-travail-data/.dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@
**/.docz
**/coverage
**/fiches_service_public/data*
dataset/suggestions.txt
6 changes: 6 additions & 0 deletions packages/code-du-travail-data/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ FROM ${NLP_IMAGE} as cdtn-nlp-image

FROM node:10-alpine

RUN apk add --no-cache curl=7.64.0-r3

COPY ./package.json /app/package.json
COPY --from=cdtn-base-image /app/packages/code-du-travail-data/dist /app/dist
COPY --from=cdtn-nlp-image /app/data/dump.tf.json /app/dump.tf.json
Expand All @@ -22,6 +24,10 @@ COPY ./dataset/stop_words/stop_words.json ./dataset/stop_words/stop_words.json
COPY ./dataset/synonyms/synonyms.json ./dataset/synonyms/synonyms.json
COPY ./dataset/datafiller/themes.data.json ./dataset/datafiller/themes.data.json

ENV SUGGEST_DATA_URL=https://gist.github.com/rmelisson/31a6a17284d4022baa1faeda13afcc3a/raw/05d8138deed49cbab058b474fa8a0594b233bca2/cdtn_entities.txt
RUN curl -L $SUGGEST_DATA_URL -o ./dataset/suggestions.txt

WORKDIR /app
ENV SUGGEST_FILE=../dataset/suggestions.txt
ENV DUMP_PATH=../dump.tf.json
ENTRYPOINT ["yarn", "populate"]
Loading