[formrecognizer] Adding 2022-06-30-preview work (Azure#24701)

* move June beta work to main repo * [formrecognizer] Fix documentation (Azure#24269) * fix missing brackets in docs * remove extra brackets * add case insensitive enum meta * merge pylint changes * fix broken links (Azure#24127) Co-authored-by: Krista Pratico <[email protected]>
sarkar-rajarshi · Jun 9, 2022 · 4f9784a · 4f9784a
1 parent 21fe8d7
commit 4f9784a
Show file tree

Hide file tree

Showing 271 changed files with 208,322 additions and 158,180 deletions.
diff --git a/.vscode/cspell.json b/.vscode/cspell.json
@@ -72,6 +72,7 @@
     "sdk/eventhub/azure-eventhub/**",
     "sdk/graphrbac/azure-graphrbac/**",
     "sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/**",
+    "sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/**",
     "sdk/identity/azure-identity/images/**",
     "sdk/identity/azure-identity/tests/pod-identity/**",
     "sdk/identity/azure-identity/tests/managed-identity-live/service-fabric/**",

diff --git a/sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md b/sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md
@@ -3,8 +3,19 @@
 ## 3.2.0b5 (Unreleased)
 
 ### Features Added
+- Added `paragraphs` property on `AnalyzeResult`.
+- Added new `DocumentParagraph` model to represent document paragraphs.
+- Added new `AddressValue` model to represent address fields found in documents.
+- Added `caption` and `footnotes` properties on `DocumentTable`.
+- Added `DocumentCaption` and `DocumentFootnote` models to represent captions and footnotes found in the document.
+- Added `kind` property on `DocumentPage`.
 
 ### Breaking Changes
+- Renamed `bounding_box` to `polygon` on `BoundingRegion`, `DocumentContentElement`, `DocumentLine`, `DocumentSelectionMark`, `DocumentWord`.
+- Renamed `language_code` to `locale` on `DocumentLanguage`.
+- Some models that previously returned string for address related fields may now return `AddressValue`. TIP: Use `get_model()` on `DocumentModelAdministrationClient` to see updated prebuilt model schemas.
+- Removed `entities` property on `AnalyzeResult`.
+- Removed `DocumentEntity` model.
 
 ### Bugs Fixed
 
@@ -38,7 +49,7 @@
 
 ### Breaking Changes
 - Added new required parameter `build_mode` to `begin_build_model()`.
-- Some models that previously returned float for currency related fields may now return a `CurrencyValue`. TIP: Use `get_model()` to see updated prebuilt model schemas.
+- Some models that previously returned float for currency related fields may now return a `CurrencyValue`. TIP: Use `get_model()` on `DocumentModelAdministrationClient` to see updated prebuilt model schemas.
 
 ### Bugs Fixed
 - Default the `percent_completed` property to 0 when not returned with model operation information.

diff --git a/sdk/formrecognizer/azure-ai-formrecognizer/MIGRATION_GUIDE.md b/sdk/formrecognizer/azure-ai-formrecognizer/MIGRATION_GUIDE.md
@@ -1,6 +1,8 @@
 # Guide for migrating azure-ai-formrecognizer to version 3.2.x from versions 3.1.x and below
 
-This guide is intended to assist in the migration to `azure-ai-formrecognizer (3.2.x)` from versions `3.1.x` and below. It will focus on side-by-side comparisons for similar operations between versions. Please note that version `3.2.0b1` will be used for comparison with `3.1.2`.
+This guide is intended to assist in the migration to `azure-ai-formrecognizer (3.2.x)` from versions `3.1.x` and below. It will focus on side-by-side comparisons for similar operations between versions. Please note that version `3.2.0b1` will be used for comparison with `3.1.2`. 
+
+> NOTE: Please read the [CHANGELOG][changelog] to see important changes that have occurred since version `3.2.0b1` of the SDK.
 
 Familiarity with `azure-ai-formrecognizer (3.1.x and below)` package is assumed. For those new to the Azure Form Recognizer client library for Python please refer to the [README][readme] rather than this guide.
 
@@ -666,6 +668,7 @@ Differences between the versions:
 
 For additional samples please take a look at the [Form Recognizer Samples][samples_readme] for more guidance.
 
+[changelog]: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md
 [readme]: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/formrecognizer/azure-ai-formrecognizer/README.md
 [samples_readme]: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/formrecognizer/azure-ai-formrecognizer/samples/README.md
 [fr_labeling_tool]: https://aka.ms/azsdk/formrecognizer/formrecognizerstudio
diff --git a/sdk/formrecognizer/azure-ai-formrecognizer/README.md b/sdk/formrecognizer/azure-ai-formrecognizer/README.md
@@ -3,7 +3,7 @@
 Azure Cognitive Services Form Recognizer is a cloud service that uses machine learning to analyze text and structured data from your documents. It includes the following main features:
 
 - Layout - Extract content and structure (ex. words, selection marks, tables) from documents.
-- Document - Analyze key-value pairs and entities in addition to general layout from documents.
+- Document - Analyze key-value pairs in addition to general layout from documents.
 - Read - Read page information and detected languages from documents.
 - Prebuilt - Extract common field values from select document types (ex. receipts, invoices, business cards, ID documents, U.S. W-2 tax documents) using prebuilt models.
 - Custom - Build custom models from your own data to extract tailored field values in addition to general layout from documents.
@@ -28,13 +28,13 @@ Install the Azure Form Recognizer client library for Python with [pip][pip]:
 pip install azure-ai-formrecognizer --pre
 ```
 
-> Note: This version of the client library defaults to the `2022-01-30-preview` version of the service.
+> Note: This version of the client library defaults to the `2022-06-30-preview` version of the service.
 
 This table shows the relationship between SDK versions and supported API versions of the service:
 
 |SDK version|Supported API version of service
 |-|-
-|3.2.0b4 - Latest beta release | 2.0, 2.1, 2022-01-30-preview
+|3.2.0b5 - Latest beta release | 2.0, 2.1, 2022-06-30-preview
 |3.1.X - Latest GA release| 2.0, 2.1 (default)
 |3.0.0| 2.0
 
@@ -45,7 +45,7 @@ This table shows the relationship between SDK versions and supported API version
 
 |API version|Supported clients
 |-|-
-|2022-01-30-preview | DocumentAnalysisClient and DocumentModelAdministrationClient
+|2022-06-30-preview | DocumentAnalysisClient and DocumentModelAdministrationClient
 |2.1 | FormRecognizerClient and FormTrainingClient
 |2.0 | FormRecognizerClient and FormTrainingClient
 
@@ -163,7 +163,7 @@ Use the `model` parameter to select the type of model for analysis.
 |Model| Features
 |-|-
 |`prebuilt-layout`| Text extraction, selection marks, tables
-|`prebuilt-document`| Text extraction, selection marks, tables, key-value pairs and entities
+|`prebuilt-document`| Text extraction, selection marks, tables, and key-value pairs
 |`prebuilt-read`|Text extraction and detected languages
 |`prebuilt-invoices`| Text extraction, selection marks, tables, and pre-trained fields and values pertaining to English invoices
 |`prebuilt-businessCard`| Text extraction and pre-trained fields and values pertaining to English business cards
@@ -239,10 +239,10 @@ for page in result.pages:
 
     for line_idx, line in enumerate(page.lines):
         print(
-            "...Line # {} has content '{}' within bounding box '{}'".format(
+            "...Line # {} has content '{}' within bounding polygon '{}'".format(
                 line_idx,
                 line.content,
-                line.bounding_box,
+                line.polygon,
             )
         )
 
@@ -255,9 +255,9 @@ for page in result.pages:
 
     for selection_mark in page.selection_marks:
         print(
-            "...Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
+            "...Selection mark is '{}' within bounding polygon '{}' and has a confidence of {}".format(
                 selection_mark.state,
-                selection_mark.bounding_box,
+                selection_mark.polygon,
                 selection_mark.confidence,
             )
         )
@@ -273,7 +273,7 @@ for table_idx, table in enumerate(result.tables):
             "Table # {} location on page: {} is {}".format(
                 table_idx,
                 region.page_number,
-                region.bounding_box
+                region.polygon
             )
         )
     for cell in table.cells:
@@ -287,7 +287,7 @@ for table_idx, table in enumerate(result.tables):
 ```
 
 ### Using the General Document Model
-Analyze entities, key-value pairs, tables, styles, and selection marks from documents using the general document model provided by the Form Recognizer service.
+Analyze key-value pairs, tables, styles, and selection marks from documents using the general document model provided by the Form Recognizer service.
 Select the General Document Model by passing `model="prebuilt-document"` into the `begin_analyze_document` method:
 
 ```python
@@ -305,13 +305,6 @@ with open("<path to your document>", "rb") as fd:
 poller = document_analysis_client.begin_analyze_document("prebuilt-document", document)
 result = poller.result()
 
-print("----Entities found in document----")
-for entity in result.entities:
-    print("Entity '{}' has category '{}' with sub-category '{}'".format(
-        entity.content, entity.category, entity.sub_category
-    ))
-    print("...with confidence {}\n".format(entity.confidence))
-
 print("----Key-value pairs found in document----")
 for kv_pair in result.key_value_pairs:
     if kv_pair.key:
@@ -341,7 +334,7 @@ for table_idx, table in enumerate(result.tables):
             "Table # {} location on page: {} is {}".format(
                 table_idx,
                 region.page_number,
-                region.bounding_box,
+                region.polygon,
             )
         )
 
@@ -362,11 +355,11 @@ for page in result.pages:
     for line_idx, line in enumerate(page.lines):
         words = line.get_words()
         print(
-            "...Line # {} has {} words and text '{}' within bounding box '{}'".format(
+            "...Line # {} has {} words and text '{}' within bounding polygon '{}'".format(
                 line_idx,
                 len(words),
                 line.content,
-                line.bounding_box,
+                line.polygon,
             )
         )
 
@@ -379,9 +372,9 @@ for page in result.pages:
 
     for selection_mark in page.selection_marks:
         print(
-            "...Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
+            "...Selection mark is '{}' within bounding polygon '{}' and has a confidence of {}".format(
                 selection_mark.state,
-                selection_mark.bounding_box,
+                selection_mark.polygon,
                 selection_mark.confidence,
             )
         )

diff --git a/sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/__init__.py b/sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/__init__.py
@@ -40,11 +40,13 @@
     AnalyzeResult,
     AnalyzedDocument,
     BoundingRegion,
+    AddressValue,
     CurrencyValue,
     DocumentBuildMode,
+    DocumentCaption,
     DocumentContentElement,
-    DocumentEntity,
     DocumentField,
+    DocumentFootnote,
     DocumentKeyValuePair,
     DocumentKeyValueElement,
     DocumentLanguage,
@@ -103,11 +105,13 @@
     "AnalyzeResult",
     "AnalyzedDocument",
     "BoundingRegion",
+    "AddressValue",
     "CurrencyValue",
     "DocumentBuildMode",
+    "DocumentCaption",
     "DocumentContentElement",
-    "DocumentEntity",
     "DocumentField",
+    "DocumentFootnote",
     "DocumentKeyValueElement",
     "DocumentKeyValuePair",
     "DocumentLanguage",

diff --git a/sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_api_versions.py b/sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_api_versions.py
@@ -11,7 +11,7 @@ class DocumentAnalysisApiVersion(str, Enum, metaclass=CaseInsensitiveEnumMeta):
     """Form Recognizer API versions supported by DocumentAnalysisClient and DocumentModelAdministrationClient."""
 
     #: This is the default version
-    V2022_01_30_PREVIEW = "2022-01-30-preview"
+    V2022_06_30_PREVIEW = "2022-06-30-preview"
 
 
 class FormRecognizerApiVersion(str, Enum, metaclass=CaseInsensitiveEnumMeta):

diff --git a/...rmrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_document_analysis_client.py b/...rmrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_document_analysis_client.py
@@ -63,7 +63,7 @@ class DocumentAnalysisClient(FormRecognizerClientBase):
     def __init__(self, endpoint, credential, **kwargs):
         # type: (str, Union[AzureKeyCredential, TokenCredential], Any) -> None
         api_version = kwargs.pop(
-            "api_version", DocumentAnalysisApiVersion.V2022_01_30_PREVIEW
+            "api_version", DocumentAnalysisApiVersion.V2022_06_30_PREVIEW
         )
         super(DocumentAnalysisClient, self).__init__(
             endpoint=endpoint,
@@ -127,7 +127,7 @@ def begin_analyze_document(self, model, document, **kwargs):
 
         return self._client.begin_analyze_document(  # type: ignore
             model_id=model,
-            analyze_request=document,
+            analyze_request=document,  # type: ignore
             content_type="application/octet-stream",
             string_index_type="unicodeCodePoint",
             continuation_token=continuation_token,
@@ -176,7 +176,7 @@ def begin_analyze_document_from_url(self, model, document_url, **kwargs):
 
         return self._client.begin_analyze_document(  # type: ignore
             model_id=model,
-            analyze_request={"url_source": document_url},
+            analyze_request={"urlSource": document_url},  # type: ignore
             string_index_type="unicodeCodePoint",
             continuation_token=continuation_token,
             cls=cls,

diff --git a/.../azure-ai-formrecognizer/azure/ai/formrecognizer/_document_model_administration_client.py b/.../azure-ai-formrecognizer/azure/ai/formrecognizer/_document_model_administration_client.py
@@ -86,7 +86,7 @@ class DocumentModelAdministrationClient(FormRecognizerClientBase):
     def __init__(self, endpoint, credential, **kwargs):
         # type: (str, Union[AzureKeyCredential, TokenCredential], Any) -> None
         api_version = kwargs.pop(
-            "api_version", DocumentAnalysisApiVersion.V2022_01_30_PREVIEW
+            "api_version", DocumentAnalysisApiVersion.V2022_06_30_PREVIEW
         )
         super(DocumentModelAdministrationClient, self).__init__(
             endpoint=endpoint,

diff --git a/...formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_form_recognizer_client.py b/...formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_form_recognizer_client.py
@@ -158,7 +158,7 @@ def begin_recognize_receipts(self, receipt, **kwargs):
                     "'pages' is only available for API version V2_1 and up"
                 )
         return self._client.begin_analyze_receipt_async(  # type: ignore
-            file_stream=receipt,
+            file_stream=receipt,  # type: ignore
             content_type=content_type,
             include_text_details=include_field_elements,
             cls=cls,
@@ -227,7 +227,7 @@ def begin_recognize_receipts_from_url(self, receipt_url, **kwargs):
                     "'pages' is only available for API version V2_1 and up"
                 )
         return self._client.begin_analyze_receipt_async(  # type: ignore
-            file_stream={"source": receipt_url},
+            file_stream={"source": receipt_url},  # type: ignore
             include_text_details=include_field_elements,
             cls=cls,
             polling=True,
@@ -289,7 +289,7 @@ def begin_recognize_business_cards(self, business_card, **kwargs):
 
         try:
             return self._client.begin_analyze_business_card_async(  # type: ignore
-                file_stream=business_card,
+                file_stream=business_card,  # type: ignore
                 content_type=content_type,
                 include_text_details=include_field_elements,
                 cls=kwargs.pop("cls", self._prebuilt_callback),
@@ -337,7 +337,7 @@ def begin_recognize_business_cards_from_url(self, business_card_url, **kwargs):
 
         try:
             return self._client.begin_analyze_business_card_async(  # type: ignore
-                file_stream={"source": business_card_url},
+                file_stream={"source": business_card_url},  # type: ignore
                 include_text_details=include_field_elements,
                 cls=kwargs.pop("cls", self._prebuilt_callback),
                 polling=True,
@@ -403,7 +403,7 @@ def begin_recognize_identity_documents(self, identity_document, **kwargs):
 
         try:
             return self._client.begin_analyze_id_document_async(  # type: ignore
-                file_stream=identity_document,
+                file_stream=identity_document,  # type: ignore
                 content_type=content_type,
                 include_text_details=include_field_elements,
                 cls=kwargs.pop("cls", self._prebuilt_callback),
@@ -450,7 +450,7 @@ def begin_recognize_identity_documents_from_url(
 
         try:
             return self._client.begin_analyze_id_document_async(  # type: ignore
-                file_stream={"source": identity_document_url},
+                file_stream={"source": identity_document_url},  # type: ignore
                 include_text_details=include_field_elements,
                 cls=kwargs.pop("cls", self._prebuilt_callback),
                 polling=True,
@@ -518,7 +518,7 @@ def begin_recognize_invoices(self, invoice, **kwargs):
 
         try:
             return self._client.begin_analyze_invoice_async(  # type: ignore
-                file_stream=invoice,
+                file_stream=invoice,  # type: ignore
                 content_type=content_type,
                 include_text_details=include_field_elements,
                 cls=kwargs.pop("cls", self._prebuilt_callback),
@@ -564,7 +564,7 @@ def begin_recognize_invoices_from_url(self, invoice_url, **kwargs):
 
         try:
             return self._client.begin_analyze_invoice_async(  # type: ignore
-                file_stream={"source": invoice_url},
+                file_stream={"source": invoice_url},  # type: ignore
                 include_text_details=include_field_elements,
                 cls=kwargs.pop("cls", self._prebuilt_callback),
                 polling=True,
@@ -669,7 +669,7 @@ def begin_recognize_content(self, form, **kwargs):
                 )
 
         return self._client.begin_analyze_layout_async(  # type: ignore
-            file_stream=form,
+            file_stream=form,  # type: ignore
             content_type=content_type,
             cls=kwargs.pop("cls", self._content_callback),
             polling=True,
@@ -737,7 +737,7 @@ def begin_recognize_content_from_url(self, form_url, **kwargs):
                 )
 
         return self._client.begin_analyze_layout_async(  # type: ignore
-            file_stream={"source": form_url},
+            file_stream={"source": form_url},  # type: ignore
             cls=kwargs.pop("cls", self._content_callback),
             polling=True,
             **kwargs
@@ -822,7 +822,7 @@ def analyze_callback(
                 )
 
         return self._client.begin_analyze_with_custom_model(  # type: ignore
-            file_stream=form,
+            file_stream=form,  # type: ignore
             model_id=model_id,
             include_text_details=include_field_elements,
             content_type=content_type,
@@ -889,7 +889,7 @@ def analyze_callback(
                 )
 
         return self._client.begin_analyze_with_custom_model(  # type: ignore
-            file_stream={"source": form_url},
+            file_stream={"source": form_url},  # type: ignore
             model_id=model_id,
             include_text_details=include_field_elements,
             cls=callback,

diff --git a/sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_form_training_client.py b/sdk/formrecognizer/azure-ai-formrecognizer/azure/ai/formrecognizer/_form_training_client.py
@@ -321,7 +321,7 @@ def get_custom_model(self, model_id, **kwargs):
         )
         if (
             hasattr(response, "composed_train_results")
-            and response.composed_train_results
+            and response.composed_train_results  # type: ignore
         ):
             return CustomFormModel._from_generated_composed(response)
         return CustomFormModel._from_generated(response, api_version=self._api_version)
@@ -489,7 +489,7 @@ def _compose_callback(
         continuation_token = kwargs.pop("continuation_token", None)
         try:
             return self._client.begin_compose_custom_models_async(  # type: ignore
-                {"model_ids": model_ids, "model_name": model_name},
+                {"model_ids": model_ids, "model_name": model_name}, # type: ignore
                 cls=kwargs.pop("cls", _compose_callback),
                 polling=LROBasePolling(
                     timeout=polling_interval,