From 441c7be1402067872764ade5a8030b5fab786418 Mon Sep 17 00:00:00 2001 From: Ronny H <138828701+ron-unstructured@users.noreply.github.com> Date: Wed, 4 Oct 2023 16:40:31 -0700 Subject: [PATCH] Apply suggestions from code review Co-authored-by: qued <64741807+qued@users.noreply.github.com> Co-authored-by: shreyanid <42684285+shreyanid@users.noreply.github.com> --- docs/source/metadata.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/metadata.rst b/docs/source/metadata.rst index 3a008b5b91..eddb32400c 100644 --- a/docs/source/metadata.rst +++ b/docs/source/metadata.rst @@ -32,7 +32,7 @@ the source file: +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``filetype`` | File Type | | +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ``type`` | Element Type | Categorizes elements into types such as Title, NarrativeText. Not a metadata field | +| ``type`` | Element Type | Categorizes elements into types such as Title, NarrativeText. Not a metadata field. | +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``coordinates`` | XY Bounding Box Coordinates | | +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ @@ -44,18 +44,18 @@ the source file: +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``text_as_html`` | HTML representation of extracted tables | Only applicable to ``Table`` Elements | +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ``languages`` | Document Languages | At document level or element level | +| ``languages`` | Document Languages | At document level or element level. List is ordered by probability of being the primary language of the text. | +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``emphasized_text_contents``| Emphasized text (bold or italic) in the original document| | +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``emphasized_text_tags`` | Tags on text that is emphasized in the original document | | +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ``num_characters`` | The number of characters used | Used for chunking | +| ``num_characters`` | The number of characters used | Used for chunking. | | | for max_characters in add_chunking_strategy | | +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``is_continuation`` | True if element is a continuation of a previous element | Only relevant for chunking, if an element was divided into two due to ``max_characters`` | +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ``detection_class_prob`` | Detection Model Class Probabilities | From unstructured-inference, hi-res strategy | +| ``detection_class_prob`` | Detection model class probabilities | From unstructured-inference, hi-res strategy. | +-----------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ :raw-html:`
` @@ -110,7 +110,7 @@ Additional Metadata Fields by Document Type ########################################### +-------------------------+---------------------+--------------------------------------------------------+ -| ``Field Name`` | Applicable Doc Types| Short Description | +| Field Name | Applicable Doc Types| Short Description | +=========================+=====================+========================================================+ | ``page_number`` | DOCX,PDF, PPT,XLSX | Page Number | +-------------------------+---------------------+--------------------------------------------------------+