CS-SI · sbrunato · Mar 8, 2024 · Oct 30, 2023 · Oct 31, 2023 · Nov 20, 2023
diff --git a/docs/notebooks/api_user_guide/4_search.ipynb b/docs/notebooks/api_user_guide/4_search.ipynb
@@ -2228,7 +2228,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In the previous request we made use of the whoosh query language which can be used to do complex text search. It supports the boolean operators AND, OR and NOT to combine the search terms. If a space is given between two words as in the example above, this corresponds to the operator AND. Brackets '()' can also be used. The example above also shows the use of the wildcard operator '*' which can represent any numer of characters. The wildcard operator '?' always represents only one character. It is also possible to match a range of terms by using square brackets '[]' and TO, e.g. [A TO D] will match all words in the lexical range between A and D. Below you can find some examples for the different operators."
+    "In the previous request we made use of the [whoosh query language](https://whoosh.readthedocs.io/en/latest/querylang.html#the-default-query-language) which can be used to do complex text search. It supports the boolean operators `AND`, `OR` and `NOT` to combine the search terms. If a space is given between two words as in the example above, this corresponds to the operator AND. Brackets `()` can also be used. The example above also shows the use of the wildcard operator `*` which can represent any numer of characters. The wildcard operator `?` always represents only one character. It is also possible to match a range of terms by using square brackets `[]` and TO, e.g. `[A TO D]` will match all words in the lexical range between A and D. Below you can find some examples for the different operators."
    ]
   },
   {
@@ -2274,9 +2274,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "returns all product types where the platform is either LANDSAT or SENTINEL1:\n",
-    "\n",
-    "['L57_REFLECTANCE', 'LANDSAT_C2L1', 'LANDSAT_C2L2', 'LANDSAT_C2L2ALB_BT', 'LANDSAT_C2L2ALB_SR', 'LANDSAT_C2L2ALB_ST', 'LANDSAT_C2L2ALB_TA', 'LANDSAT_C2L2_SR', 'LANDSAT_C2L2_ST', 'LANDSAT_ETM_C1', 'LANDSAT_ETM_C2L1', 'LANDSAT_ETM_C2L2', 'LANDSAT_TM_C1', 'LANDSAT_TM_C2L1', 'LANDSAT_TM_C2L2', 'S1_SAR_GRD', 'S1_SAR_OCN', 'S1_SAR_RAW', 'S1_SAR_SLC']"
+    "returns all product types where the platform is either LANDSAT or SENTINEL1."
    ]
   },
   {
@@ -2319,9 +2317,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "returns all product types which contain either the keywords LANDSAT and collection2 or the keyword SAR:\n",
-    "\n",
-    "['LANDSAT_C2L1', 'LANDSAT_C2L2', 'LANDSAT_C2L2ALB_BT', 'LANDSAT_C2L2ALB_SR', 'LANDSAT_C2L2ALB_ST', 'LANDSAT_C2L2ALB_TA', 'LANDSAT_C2L2_SR', 'LANDSAT_C2L2_ST', 'LANDSAT_ETM_C2L1', 'LANDSAT_ETM_C2L2', 'LANDSAT_TM_C2L1', 'LANDSAT_TM_C2L2', 'S1_SAR_GRD', 'S1_SAR_OCN', 'S1_SAR_RAW', 'S1_SAR_SLC']"
+    "returns all product types which contain either the keywords LANDSAT and collection2 or the keyword SAR."
    ]
   },
   {
@@ -2366,9 +2362,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "returns all product types where the platformSerialIdentifier is composed of 'L' and one other character:\n",
-    "\n",
-    "['L57_REFLECTANCE', 'L8_OLI_TIRS_C1L1', 'L8_REFLECTANCE', 'LANDSAT_C2L1', 'LANDSAT_C2L2', 'LANDSAT_C2L2ALB_BT', 'LANDSAT_C2L2ALB_SR', 'LANDSAT_C2L2ALB_ST', 'LANDSAT_C2L2ALB_TA', 'LANDSAT_C2L2_SR', 'LANDSAT_C2L2_ST', 'LANDSAT_ETM_C1', 'LANDSAT_ETM_C2L1', 'LANDSAT_ETM_C2L2', 'LANDSAT_TM_C1', 'LANDSAT_TM_C2L1', 'LANDSAT_TM_C2L2']"
+    "returns all product types where the platformSerialIdentifier is composed of 'L' and one other character."
    ]
   },
   {
@@ -2439,9 +2433,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "returns all product types where the platform is SENTINEL1, SENTINEL2 or SENTINEL3:\n",
-    "\n",
-    "['S1_SAR_GRD', 'S1_SAR_OCN', 'S1_SAR_RAW', 'S1_SAR_SLC', 'S2_MSI_L1C', 'S2_MSI_L2A', 'S2_MSI_L2A_COG', 'S2_MSI_L2A_MAJA', 'S2_MSI_L2B_MAJA_SNOW', 'S2_MSI_L2B_MAJA_WATER', 'S2_MSI_L3A_WASP', 'S3_EFR', 'S3_ERR', 'S3_LAN', 'S3_OLCI_L2LFR', 'S3_OLCI_L2LRR', 'S3_OLCI_L2WFR', 'S3_OLCI_L2WRR', 'S3_RAC', 'S3_SLSTR_L1RBT', 'S3_SLSTR_L2AOD', 'S3_SLSTR_L2FRP', 'S3_SLSTR_L2LST', 'S3_SLSTR_L2WST', 'S3_SRA', 'S3_SRA_A', 'S3_SRA_BS', 'S3_SY_AOD', 'S3_SY_SYN', 'S3_SY_V10', 'S3_SY_VG1', 'S3_SY_VGP', 'S3_WAT']"
+    "returns all product types where the platform is SENTINEL1, SENTINEL2 or SENTINEL3."
    ]
   },
   {
@@ -2454,7 +2446,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 74,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
@@ -2477,7 +2469,7 @@
        " 'LANDSAT_TM_C2L2']"
       ]
      },
-     "execution_count": 74,
+     "execution_count": 4,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -2491,29 +2483,68 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "['LANDSAT_C2L1', \n",
-    "'L57_REFLECTANCE', \n",
-    "'LANDSAT_C2L2', \n",
-    "'LANDSAT_C2L2ALB_BT', \n",
-    "'LANDSAT_C2L2ALB_SR', \n",
-    "'LANDSAT_C2L2ALB_ST', \n",
-    "'LANDSAT_C2L2ALB_TA', \n",
-    "'LANDSAT_C2L2_SR', \n",
-    "'LANDSAT_C2L2_ST', \n",
-    "'LANDSAT_ETM_C1', \n",
-    "'LANDSAT_ETM_C2L1', \n",
-    "'LANDSAT_ETM_C2L2', \n",
-    "'LANDSAT_TM_C1', \n",
-    "'LANDSAT_TM_C2L1', \n",
-    "'LANDSAT_TM_C2L2']"
+    "The product types in the result are ordered by how well they match the criteria. In the example above only the first product type (LANDSAT_C2L1) matches the second parameter (platformSerialIdentifier=\"L1\"), all other product types only match the first criterion. Therefore, it is usually best to use the first product type in the list as it will be the one that fits best."
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The product types in the result are ordered by how well they match the criteria. In the example above only the first product type (LANDSAT_C2L1) matches the second parameter (platformSerialIdentifier=\"L1\"), all other product types only match the first criterion. Therefore, it is usually best to use the first product type in the list as it will be the one that fits best."
+    "Per paramater guesses are joined using a `UNION` by default (`intersect=False`). This can also be changed to an intersection:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['LANDSAT_C2L1']"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dag.guess_product_type(platform=\"LANDSAT\", platformSerialIdentifier=\"L1\", intersect=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "[Whoosh query language](https://whoosh.readthedocs.io/en/latest/querylang.html#the-default-query-language) *free text search* can also be passed to the method, it will be used to search in `title`, `abstract` and `keywords` fields:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['ERA5_SL_MONTHLY',\n",
+       " 'ERA5_PL_MONTHLY',\n",
+       " 'ERA5_LAND_MONTHLY',\n",
+       " 'ERA5_SL',\n",
+       " 'ERA5_PL',\n",
+       " 'GLOFAS_SEASONAL_REFORECAST',\n",
+       " 'SEASONAL_MONTHLY_PL',\n",
+       " 'SEASONAL_MONTHLY_SL']"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dag.guess_product_type(\"ECMWF AND MONTHLY\")"
    ]
   },
   {

diff --git a/eodag/api/core.py b/eodag/api/core.py
@@ -289,15 +289,15 @@ def build_index(self) -> None:
             product_types_schema = Schema(
                 ID=fields.STORED,
                 alias=fields.ID,
-                abstract=fields.STORED,
+                abstract=fields.TEXT,
                 instrument=fields.IDLIST,
                 platform=fields.ID,
                 platformSerialIdentifier=fields.IDLIST,
                 processingLevel=fields.ID,
                 sensorType=fields.ID,
                 md5=fields.ID,
                 license=fields.ID,
-                title=fields.ID,
+                title=fields.TEXT,
                 missionStartDate=fields.ID,
                 missionEndDate=fields.ID,
                 keywords=fields.KEYWORD(analyzer=kw_analyzer),
@@ -914,16 +914,32 @@ def get_alias_from_product_type(self, product_type: str) -> str:
 
         return self.product_types_config[product_type].get("alias", product_type)
 
-    def guess_product_type(self, **kwargs: Any) -> List[str]:
-        """Find eodag product types codes that best match a set of search params
+    def guess_product_type(
+        self,
+        free_text_filter: Optional[str] = None,
+        intersect: bool = False,
+        **kwargs: Any,
+    ) -> List[str]:
+        """Find eodag product types ids that best match a set of search params
+
+        See https://whoosh.readthedocs.io/en/latest/querylang.html#the-default-query-language
+        for syntax.
 
+        :param free_text_filter: whoosh compatible free text search filter used to search
+                                 `title`, `abstract` and `keywords`
+        :type free_text_filter: Optional[str]
+        :param intersect: join results for each parameter using INTERSECT instead of UNION
+        :type intersect: bool
         :param kwargs: A set of search parameters as keywords arguments
         :returns: The best match for the given parameters
         :rtype: list[str]
         :raises: :class:`~eodag.utils.exceptions.NoMatchingProductType`
         """
         if kwargs.get("productType", None):
             return [kwargs["productType"]]
+        free_text_search_params = (
+            ["title", "abstract", "keywords"] if free_text_filter else []
+        )
         supported_params = {
             param
             for param in (
@@ -934,26 +950,44 @@ def guess_product_type(self, **kwargs: Any) -> List[str]:
                 "sensorType",
                 "keywords",
                 "md5",
+                "abstract",
+                "title",
             )
             if kwargs.get(param, None) is not None
         }
         if not self._product_types_index:
             raise EodagError("Missing product types index")
         with self._product_types_index.searcher() as searcher:
             results = None
-            # For each search key, do a guess and then upgrade the result (i.e. when
-            # merging results, if a hit appears in both results, its position is raised
-            # to the top. This way, the top most result will be the hit that best
+            # Using `upgrade_and_extend`, for each search key, do a guess and
+            # then upgrade the result (i.e. when merging results,
+            # if a hit appears in both results, its position is raised
+            # to the top). This way, the top most result will be the hit that best
             # matches the given queries. Put another way, this best guess is the one
             # that crosses the highest number of search params from the given queries
+
+            # Always use UNION to join free_text_search results
+            for search_key in free_text_search_params:
+                query = QueryParser(search_key, self._product_types_index.schema).parse(
+                    free_text_filter
+                )
+                if results is None:
+                    results = searcher.search(query, limit=None)
+                else:
+                    results.upgrade_and_extend(searcher.search(query, limit=None))
+
+            # join results from kwargs using UNION or INTERSECT
             for search_key in supported_params:
                 query = QueryParser(search_key, self._product_types_index.schema).parse(
                     kwargs[search_key]
                 )
                 if results is None:
                     results = searcher.search(query, limit=None)
+                elif intersect:
+                    results.filter(searcher.search(query, limit=None))
                 else:
                     results.upgrade_and_extend(searcher.search(query, limit=None))
+
             guesses: List[str] = [r["ID"] for r in results or []]
         if guesses:
             return guesses

diff --git a/eodag/resources/stac.yml b/eodag/resources/stac.yml
@@ -62,6 +62,7 @@ conformance:
     - https://api.stacspec.org/v1.0.0/ogcapi-features#query
     - https://api.stacspec.org/v1.0.0/ogcapi-features#sort
     - https://api.stacspec.org/v1.0.0/collections
+    - https://api.stacspec.org/v1.0.0/collection-search#free-text
     - http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/core
     - http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/oas30
     - http://www.opengis.net/spec/ogcapi-features-1/1.0/conf/geojson

diff --git a/eodag/resources/stac_api.yml b/eodag/resources/stac_api.yml
@@ -174,9 +174,12 @@ paths:
       operationId: getCollections
       parameters:
         - $ref: '#/components/parameters/provider'
+        - $ref: '#/components/parameters/q'
       responses:
         '200':
           $ref: '#/components/responses/Collections'
+        '202':
+          $ref: '#/components/responses/Accepted'
         '500':
           $ref: '#/components/responses/ServerError'
   /collections/{collectionId}:
@@ -1913,6 +1916,12 @@ components:
         text/html:
           schema:
             type: string
+    Accepted:
+      description: The request has been accepted, but the data is not yet ready. Please wait a few minutes before trying again.
+      content:
+        application/json:
+          schema:
+            $ref: '#/components/schemas/exception'
     Collections:
       description: >-
         The feature collections shared by this API.

diff --git a/eodag/rest/stac.py b/eodag/rest/stac.py
@@ -637,10 +637,29 @@ def __get_product_types(
         """
         if filters is None:
             filters = {}
+        free_text_filter = filters.pop("q", None)
+
+        # product types matching filters
         try:
-            guessed_product_types = self.eodag_api.guess_product_type(**filters)
+            guessed_product_types = (
+                self.eodag_api.guess_product_type(**filters) if filters else []
+            )
         except NoMatchingProductType:
             guessed_product_types = []
+
+        # product types matching free text filter
+        if free_text_filter and not guessed_product_types:
+            whooshable_filter = " OR ".join(
+                [f"({x})" for x in free_text_filter.split(",")]
+            )
+            try:
+                guessed_product_types = self.eodag_api.guess_product_type(
+                    whooshable_filter
+                )
+            except NoMatchingProductType:
+                guessed_product_types = []
+
+        # list product types with all metadata using guessed ids
         if guessed_product_types:
             product_types = [
                 pt

diff --git a/tests/resources/ext_product_types_free_text_search.json b/tests/resources/ext_product_types_free_text_search.json
@@ -0,0 +1,59 @@
+{
+  "astraea_eod": {
+    "providers_config": {
+      "foo": {
+        "productType": "foo",
+        "metadata_mapping": {
+          "cloudCover": "$.null"
+        }
+      },
+      "bar": {
+        "productType": "bar",
+        "metadata_mapping": {
+          "cloudCover": "$.null"
+        }
+      },
+      "foobar": {
+        "productType": "foobar",
+        "metadata_mapping": {
+          "cloudCover": "$.null"
+        }
+      }
+    },
+    "product_types_config": {
+      "foo": {
+        "abstract": "abstractFOO - This is FOO. FooAndBar",
+        "instrument": "Not Available",
+        "platform": "Not Available",
+        "platformSerialIdentifier": "Not Available",
+        "processingLevel": "Not Available",
+        "keywords": "suspendisse",
+        "license": "WTFPL",
+        "title": "titleFOO - Lorem FOO collection",
+        "missionStartDate": "2012-12-12T00:00:00.000Z"
+      },
+      "bar": {
+        "abstract": "abstractBAR - This is BAR",
+        "instrument": "Not Available",
+        "platform": "Not Available",
+        "platformSerialIdentifier": "Not Available",
+        "processingLevel": "Not Available",
+        "keywords": "lectus,lectus_bar_key",
+        "license": "WTFPL",
+        "title": "titleBAR - Lorem BAR collection (FooAndBar)",
+        "missionStartDate": "2012-12-12T00:00:00.000Z"
+      },
+      "foobar": {
+        "abstract": "abstract FOOBAR - This is FOOBAR",
+        "instrument": "Not Available",
+        "platform": "Not Available",
+        "platformSerialIdentifier": "Not Available",
+        "processingLevel": "Not Available",
+        "keywords": "tortor",
+        "license": "WTFPL",
+        "title": "titleFOOBAR - Lorem FOOBAR collection",
+        "missionStartDate": "2012-12-12T00:00:00.000Z"
+      }
+    }
+  }
+}