diff --git a/docs/notebooks/api_user_guide/4_search.ipynb b/docs/notebooks/api_user_guide/4_search.ipynb index ade3ccabb..0337a757f 100644 --- a/docs/notebooks/api_user_guide/4_search.ipynb +++ b/docs/notebooks/api_user_guide/4_search.ipynb @@ -2228,7 +2228,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In the previous request we made use of the whoosh query language which can be used to do complex text search. It supports the boolean operators AND, OR and NOT to combine the search terms. If a space is given between two words as in the example above, this corresponds to the operator AND. Brackets '()' can also be used. The example above also shows the use of the wildcard operator '*' which can represent any numer of characters. The wildcard operator '?' always represents only one character. It is also possible to match a range of terms by using square brackets '[]' and TO, e.g. [A TO D] will match all words in the lexical range between A and D. Below you can find some examples for the different operators." + "In the previous request we made use of the [whoosh query language](https://whoosh.readthedocs.io/en/latest/querylang.html#the-default-query-language) which can be used to do complex text search. It supports the boolean operators `AND`, `OR` and `NOT` to combine the search terms. If a space is given between two words as in the example above, this corresponds to the operator AND. Brackets `()` can also be used. The example above also shows the use of the wildcard operator `*` which can represent any numer of characters. The wildcard operator `?` always represents only one character. It is also possible to match a range of terms by using square brackets `[]` and TO, e.g. `[A TO D]` will match all words in the lexical range between A and D. Below you can find some examples for the different operators." ] }, { @@ -2274,9 +2274,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "returns all product types where the platform is either LANDSAT or SENTINEL1:\n", - "\n", - "['L57_REFLECTANCE', 'LANDSAT_C2L1', 'LANDSAT_C2L2', 'LANDSAT_C2L2ALB_BT', 'LANDSAT_C2L2ALB_SR', 'LANDSAT_C2L2ALB_ST', 'LANDSAT_C2L2ALB_TA', 'LANDSAT_C2L2_SR', 'LANDSAT_C2L2_ST', 'LANDSAT_ETM_C1', 'LANDSAT_ETM_C2L1', 'LANDSAT_ETM_C2L2', 'LANDSAT_TM_C1', 'LANDSAT_TM_C2L1', 'LANDSAT_TM_C2L2', 'S1_SAR_GRD', 'S1_SAR_OCN', 'S1_SAR_RAW', 'S1_SAR_SLC']" + "returns all product types where the platform is either LANDSAT or SENTINEL1." ] }, { @@ -2319,9 +2317,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "returns all product types which contain either the keywords LANDSAT and collection2 or the keyword SAR:\n", - "\n", - "['LANDSAT_C2L1', 'LANDSAT_C2L2', 'LANDSAT_C2L2ALB_BT', 'LANDSAT_C2L2ALB_SR', 'LANDSAT_C2L2ALB_ST', 'LANDSAT_C2L2ALB_TA', 'LANDSAT_C2L2_SR', 'LANDSAT_C2L2_ST', 'LANDSAT_ETM_C2L1', 'LANDSAT_ETM_C2L2', 'LANDSAT_TM_C2L1', 'LANDSAT_TM_C2L2', 'S1_SAR_GRD', 'S1_SAR_OCN', 'S1_SAR_RAW', 'S1_SAR_SLC']" + "returns all product types which contain either the keywords LANDSAT and collection2 or the keyword SAR." ] }, { @@ -2366,9 +2362,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "returns all product types where the platformSerialIdentifier is composed of 'L' and one other character:\n", - "\n", - "['L57_REFLECTANCE', 'L8_OLI_TIRS_C1L1', 'L8_REFLECTANCE', 'LANDSAT_C2L1', 'LANDSAT_C2L2', 'LANDSAT_C2L2ALB_BT', 'LANDSAT_C2L2ALB_SR', 'LANDSAT_C2L2ALB_ST', 'LANDSAT_C2L2ALB_TA', 'LANDSAT_C2L2_SR', 'LANDSAT_C2L2_ST', 'LANDSAT_ETM_C1', 'LANDSAT_ETM_C2L1', 'LANDSAT_ETM_C2L2', 'LANDSAT_TM_C1', 'LANDSAT_TM_C2L1', 'LANDSAT_TM_C2L2']" + "returns all product types where the platformSerialIdentifier is composed of 'L' and one other character." ] }, { @@ -2439,9 +2433,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "returns all product types where the platform is SENTINEL1, SENTINEL2 or SENTINEL3:\n", - "\n", - "['S1_SAR_GRD', 'S1_SAR_OCN', 'S1_SAR_RAW', 'S1_SAR_SLC', 'S2_MSI_L1C', 'S2_MSI_L2A', 'S2_MSI_L2A_COG', 'S2_MSI_L2A_MAJA', 'S2_MSI_L2B_MAJA_SNOW', 'S2_MSI_L2B_MAJA_WATER', 'S2_MSI_L3A_WASP', 'S3_EFR', 'S3_ERR', 'S3_LAN', 'S3_OLCI_L2LFR', 'S3_OLCI_L2LRR', 'S3_OLCI_L2WFR', 'S3_OLCI_L2WRR', 'S3_RAC', 'S3_SLSTR_L1RBT', 'S3_SLSTR_L2AOD', 'S3_SLSTR_L2FRP', 'S3_SLSTR_L2LST', 'S3_SLSTR_L2WST', 'S3_SRA', 'S3_SRA_A', 'S3_SRA_BS', 'S3_SY_AOD', 'S3_SY_SYN', 'S3_SY_V10', 'S3_SY_VG1', 'S3_SY_VGP', 'S3_WAT']" + "returns all product types where the platform is SENTINEL1, SENTINEL2 or SENTINEL3." ] }, { @@ -2454,7 +2446,7 @@ }, { "cell_type": "code", - "execution_count": 74, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -2477,7 +2469,7 @@ " 'LANDSAT_TM_C2L2']" ] }, - "execution_count": 74, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -2491,29 +2483,68 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "['LANDSAT_C2L1', \n", - "'L57_REFLECTANCE', \n", - "'LANDSAT_C2L2', \n", - "'LANDSAT_C2L2ALB_BT', \n", - "'LANDSAT_C2L2ALB_SR', \n", - "'LANDSAT_C2L2ALB_ST', \n", - "'LANDSAT_C2L2ALB_TA', \n", - "'LANDSAT_C2L2_SR', \n", - "'LANDSAT_C2L2_ST', \n", - "'LANDSAT_ETM_C1', \n", - "'LANDSAT_ETM_C2L1', \n", - "'LANDSAT_ETM_C2L2', \n", - "'LANDSAT_TM_C1', \n", - "'LANDSAT_TM_C2L1', \n", - "'LANDSAT_TM_C2L2']" + "The product types in the result are ordered by how well they match the criteria. In the example above only the first product type (LANDSAT_C2L1) matches the second parameter (platformSerialIdentifier=\"L1\"), all other product types only match the first criterion. Therefore, it is usually best to use the first product type in the list as it will be the one that fits best." ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "The product types in the result are ordered by how well they match the criteria. In the example above only the first product type (LANDSAT_C2L1) matches the second parameter (platformSerialIdentifier=\"L1\"), all other product types only match the first criterion. Therefore, it is usually best to use the first product type in the list as it will be the one that fits best." + "Per paramater guesses are joined using a `UNION` by default (`intersect=False`). This can also be changed to an intersection:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['LANDSAT_C2L1']" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dag.guess_product_type(platform=\"LANDSAT\", platformSerialIdentifier=\"L1\", intersect=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Whoosh query language](https://whoosh.readthedocs.io/en/latest/querylang.html#the-default-query-language) *free text search* can also be passed to the method, it will be used to search in `title`, `abstract` and `keywords` fields:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['ERA5_SL_MONTHLY',\n", + " 'ERA5_PL_MONTHLY',\n", + " 'ERA5_LAND_MONTHLY',\n", + " 'ERA5_SL',\n", + " 'ERA5_PL',\n", + " 'GLOFAS_SEASONAL_REFORECAST',\n", + " 'SEASONAL_MONTHLY_PL',\n", + " 'SEASONAL_MONTHLY_SL']" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dag.guess_product_type(\"ECMWF AND MONTHLY\")" ] }, { diff --git a/eodag/api/core.py b/eodag/api/core.py index 2407732d8..92716831e 100644 --- a/eodag/api/core.py +++ b/eodag/api/core.py @@ -289,7 +289,7 @@ def build_index(self) -> None: product_types_schema = Schema( ID=fields.STORED, alias=fields.ID, - abstract=fields.STORED, + abstract=fields.TEXT, instrument=fields.IDLIST, platform=fields.ID, platformSerialIdentifier=fields.IDLIST, @@ -297,7 +297,7 @@ def build_index(self) -> None: sensorType=fields.ID, md5=fields.ID, license=fields.ID, - title=fields.ID, + title=fields.TEXT, missionStartDate=fields.ID, missionEndDate=fields.ID, keywords=fields.KEYWORD(analyzer=kw_analyzer), @@ -914,9 +914,22 @@ def get_alias_from_product_type(self, product_type: str) -> str: return self.product_types_config[product_type].get("alias", product_type) - def guess_product_type(self, **kwargs: Any) -> List[str]: - """Find eodag product types codes that best match a set of search params + def guess_product_type( + self, + free_text_filter: Optional[str] = None, + intersect: bool = False, + **kwargs: Any, + ) -> List[str]: + """Find eodag product types ids that best match a set of search params + + See https://whoosh.readthedocs.io/en/latest/querylang.html#the-default-query-language + for syntax. + :param free_text_filter: whoosh compatible free text search filter used to search + `title`, `abstract` and `keywords` + :type free_text_filter: Optional[str] + :param intersect: join results for each parameter using INTERSECT instead of UNION + :type intersect: bool :param kwargs: A set of search parameters as keywords arguments :returns: The best match for the given parameters :rtype: list[str] @@ -924,6 +937,9 @@ def guess_product_type(self, **kwargs: Any) -> List[str]: """ if kwargs.get("productType", None): return [kwargs["productType"]] + free_text_search_params = ( + ["title", "abstract", "keywords"] if free_text_filter else [] + ) supported_params = { param for param in ( @@ -934,6 +950,8 @@ def guess_product_type(self, **kwargs: Any) -> List[str]: "sensorType", "keywords", "md5", + "abstract", + "title", ) if kwargs.get(param, None) is not None } @@ -941,19 +959,35 @@ def guess_product_type(self, **kwargs: Any) -> List[str]: raise EodagError("Missing product types index") with self._product_types_index.searcher() as searcher: results = None - # For each search key, do a guess and then upgrade the result (i.e. when - # merging results, if a hit appears in both results, its position is raised - # to the top. This way, the top most result will be the hit that best + # Using `upgrade_and_extend`, for each search key, do a guess and + # then upgrade the result (i.e. when merging results, + # if a hit appears in both results, its position is raised + # to the top). This way, the top most result will be the hit that best # matches the given queries. Put another way, this best guess is the one # that crosses the highest number of search params from the given queries + + # Always use UNION to join free_text_search results + for search_key in free_text_search_params: + query = QueryParser(search_key, self._product_types_index.schema).parse( + free_text_filter + ) + if results is None: + results = searcher.search(query, limit=None) + else: + results.upgrade_and_extend(searcher.search(query, limit=None)) + + # join results from kwargs using UNION or INTERSECT for search_key in supported_params: query = QueryParser(search_key, self._product_types_index.schema).parse( kwargs[search_key] ) if results is None: results = searcher.search(query, limit=None) + elif intersect: + results.filter(searcher.search(query, limit=None)) else: results.upgrade_and_extend(searcher.search(query, limit=None)) + guesses: List[str] = [r["ID"] for r in results or []] if guesses: return guesses diff --git a/eodag/resources/stac.yml b/eodag/resources/stac.yml index 1c408e434..bf9765e9b 100644 --- a/eodag/resources/stac.yml +++ b/eodag/resources/stac.yml @@ -62,6 +62,7 @@ conformance: - https://api.stacspec.org/v1.0.0/ogcapi-features#query - https://api.stacspec.org/v1.0.0/ogcapi-features#sort - https://api.stacspec.org/v1.0.0/collections + - https://api.stacspec.org/v1.0.0/collection-search#free-text - http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/core - http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/oas30 - http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/geojson diff --git a/eodag/resources/stac_api.yml b/eodag/resources/stac_api.yml index 112ad4799..5b37bd903 100644 --- a/eodag/resources/stac_api.yml +++ b/eodag/resources/stac_api.yml @@ -174,9 +174,12 @@ paths: operationId: getCollections parameters: - $ref: '#/components/parameters/provider' + - $ref: '#/components/parameters/q' responses: '200': $ref: '#/components/responses/Collections' + '202': + $ref: '#/components/responses/Accepted' '500': $ref: '#/components/responses/ServerError' /collections/{collectionId}: @@ -1913,6 +1916,12 @@ components: text/html: schema: type: string + Accepted: + description: The request has been accepted, but the data is not yet ready. Please wait a few minutes before trying again. + content: + application/json: + schema: + $ref: '#/components/schemas/exception' Collections: description: >- The feature collections shared by this API. diff --git a/eodag/rest/stac.py b/eodag/rest/stac.py index de6d76c1f..fef6fb098 100644 --- a/eodag/rest/stac.py +++ b/eodag/rest/stac.py @@ -637,10 +637,29 @@ def __get_product_types( """ if filters is None: filters = {} + free_text_filter = filters.pop("q", None) + + # product types matching filters try: - guessed_product_types = self.eodag_api.guess_product_type(**filters) + guessed_product_types = ( + self.eodag_api.guess_product_type(**filters) if filters else [] + ) except NoMatchingProductType: guessed_product_types = [] + + # product types matching free text filter + if free_text_filter and not guessed_product_types: + whooshable_filter = " OR ".join( + [f"({x})" for x in free_text_filter.split(",")] + ) + try: + guessed_product_types = self.eodag_api.guess_product_type( + whooshable_filter + ) + except NoMatchingProductType: + guessed_product_types = [] + + # list product types with all metadata using guessed ids if guessed_product_types: product_types = [ pt diff --git a/tests/resources/ext_product_types_free_text_search.json b/tests/resources/ext_product_types_free_text_search.json new file mode 100644 index 000000000..498bf4423 --- /dev/null +++ b/tests/resources/ext_product_types_free_text_search.json @@ -0,0 +1,59 @@ +{ + "astraea_eod": { + "providers_config": { + "foo": { + "productType": "foo", + "metadata_mapping": { + "cloudCover": "$.null" + } + }, + "bar": { + "productType": "bar", + "metadata_mapping": { + "cloudCover": "$.null" + } + }, + "foobar": { + "productType": "foobar", + "metadata_mapping": { + "cloudCover": "$.null" + } + } + }, + "product_types_config": { + "foo": { + "abstract": "abstractFOO - This is FOO. FooAndBar", + "instrument": "Not Available", + "platform": "Not Available", + "platformSerialIdentifier": "Not Available", + "processingLevel": "Not Available", + "keywords": "suspendisse", + "license": "WTFPL", + "title": "titleFOO - Lorem FOO collection", + "missionStartDate": "2012-12-12T00:00:00.000Z" + }, + "bar": { + "abstract": "abstractBAR - This is BAR", + "instrument": "Not Available", + "platform": "Not Available", + "platformSerialIdentifier": "Not Available", + "processingLevel": "Not Available", + "keywords": "lectus,lectus_bar_key", + "license": "WTFPL", + "title": "titleBAR - Lorem BAR collection (FooAndBar)", + "missionStartDate": "2012-12-12T00:00:00.000Z" + }, + "foobar": { + "abstract": "abstract FOOBAR - This is FOOBAR", + "instrument": "Not Available", + "platform": "Not Available", + "platformSerialIdentifier": "Not Available", + "processingLevel": "Not Available", + "keywords": "tortor", + "license": "WTFPL", + "title": "titleFOOBAR - Lorem FOOBAR collection", + "missionStartDate": "2012-12-12T00:00:00.000Z" + } + } + } +} diff --git a/tests/units/test_core.py b/tests/units/test_core.py index 29ab7b818..b6eaa26ff 100644 --- a/tests/units/test_core.py +++ b/tests/units/test_core.py @@ -517,6 +517,53 @@ def test_list_product_types_fetch_providers(self, mock_fetch_product_types_list) self.dag.list_product_types(provider="peps", fetch_providers=True) mock_fetch_product_types_list.assert_called_once_with(self.dag, provider="peps") + def test_guess_product_type_with_filter(self): + """Testing the search terms""" + + with open( + os.path.join(TEST_RESOURCES_PATH, "ext_product_types_free_text_search.json") + ) as f: + ext_product_types_conf = json.load(f) + self.dag.update_product_types_list(ext_product_types_conf) + + # Free text search: match in the abstract + filter = "ABSTRACTFOO" + product_types_ids = self.dag.guess_product_type(filter) + self.assertListEqual(product_types_ids, ["foo"]) + filter = "(ABSTRACTFOO)" + product_types_ids = self.dag.guess_product_type(filter) + self.assertListEqual(product_types_ids, ["foo"]) + filter = " FOO THIS IS " + product_types_ids = self.dag.guess_product_type(filter) + self.assertListEqual(product_types_ids, ["foo"]) + + # Free text search: match in the keywords + filter = "LECTUS_BAR_KEY" + product_types_ids = self.dag.guess_product_type(filter) + self.assertListEqual(product_types_ids, ["bar"]) + + # Free text search: match in the title + filter = "COLLECTION FOOBAR" + product_types_ids = self.dag.guess_product_type(filter) + self.assertListEqual(product_types_ids, ["foobar"]) + + # Free text search: multiple terms + filter = "(This is FOOBAR) OR (This is BAR)" + product_types_ids = self.dag.guess_product_type(filter) + self.assertListEqual(sorted(product_types_ids), ["bar", "foobar"]) + + # Free text search: multiple terms joined with param search (UNION) + filter = "(This is FOOBAR) OR (This is BAR)" + product_types_ids = self.dag.guess_product_type(filter, title="FOO*") + self.assertListEqual(sorted(product_types_ids), ["bar", "foo", "foobar"]) + + # Free text search: multiple terms joined with param search (INTERSECT) + filter = "(This is FOOBAR) OR (This is BAR)" + product_types_ids = self.dag.guess_product_type( + filter, intersect=True, title="titleFOO*" + ) + self.assertListEqual(sorted(product_types_ids), ["foobar"]) + def test_update_product_types_list(self): """Core api.update_product_types_list must update eodag product types list""" with open(os.path.join(TEST_RESOURCES_PATH, "ext_product_types.json")) as f: diff --git a/tests/units/test_http_server.py b/tests/units/test_http_server.py index db6d8f6d9..82b337326 100644 --- a/tests/units/test_http_server.py +++ b/tests/units/test_http_server.py @@ -964,7 +964,6 @@ def test_list_product_types_ok(self, list_pt: Mock, guess_pt: Mock): """A simple request for product types with(out) a provider must succeed""" for url in ("/collections",): r = self.app.get(url) - self.assertTrue(guess_pt.called) self.assertTrue(list_pt.called) self.assertEqual(200, r.status_code) self.assertListEqual( @@ -1377,3 +1376,14 @@ def test_cql_post_search(self): } }, ) + + @mock.patch("eodag.rest.core.eodag_api.list_product_types", autospec=True) + @mock.patch("eodag.rest.core.eodag_api.guess_product_type", autospec=True) + def test_collection_free_text_search(self, guess_pt: Mock, list_pt: Mock): + """Test STAC Collection free-text search""" + + url = "/collections?q=TERM1,TERM2" + r = self.app.get(url) + list_pt.assert_called_once_with(provider=None) + guess_pt.assert_called_once_with("(TERM1) OR (TERM2)") + self.assertEqual(200, r.status_code)