Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update how_to_do_kie_en.md #10731

Merged
merged 1 commit into from
Oct 11, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions ppstructure/kie/how_to_do_kie_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ For the document images in a specific scene, the position and layout of the key

The KIE in the document image generally contains 2 subtasks, which is as shown follows.

* (1) SER: semantic entity recognition, which classifies each detected textline, such as dividing it into name and ID card. As shown in the red boxes in the following figure.
* (1) SER: semantic entity recognition, which classifies each detected textline, such as dividing it into name and ID No. As shown in the red boxes in the following figure.

* (2) RE: relationship extraction, which matches the question and answer based on SER results. As shown in the figure below, the yellow arrows match the question and answer.

Expand Down Expand Up @@ -51,7 +51,7 @@ For more detailed introduction of the algorithms, please refer to Chapter 6 of [
Token based methods such as LayoutXLM are implemented in PaddleOCR. What's more, in PP-StructureV2, we simplify the LayoutXLM model and proposed VI-LayoutXLM, in which the visual feature extraction module is removed for speed-up. The textline sorting strategy conforming to the human reading order and UDML knowledge distillation strategy are utilized for higher model accuracy.


In the non end-to-end KIE method, KIE needs at least ** 2 steps**. Firstly, the OCR model is used to extract the text and its position. Secondly, the KIE model is used to extract the key information according to the image, text position and text content.
In the non end-to-end KIE method, KIE needs at least **2 steps**. Firstly, the OCR model is used to extract the text and its position. Secondly, the KIE model is used to extract the key information according to the image, text position and text content.


### 2.1 Train OCR Models
Expand Down Expand Up @@ -132,7 +132,7 @@ In terms of model, it is recommended to use the VI-layoutXLM model proposed in P

The SER model is mainly used to identify all keys and values in the document image, and the RE model is mainly used to match all keys and values.

Taking the ID card scenario as an example, the key information generally includes key information such as `name`, `DOB`, etc. in the SER stage, we need to identify all questions (keys) and answers (values). The demo annotation is as follows. All keys can be annotated as `question`, and all keys can be annotated as `answer`.
Taking the ID card scenario as an example, the key information generally includes key information such as `name`, `DOB`, etc. in the SER stage, we need to identify all questions (keys) and answers (values). The demo annotation is as follows. All keys can be annotated as `question`, and all values can be annotated as `answer`.


<div align="center">
Expand All @@ -151,7 +151,7 @@ For each textline, you need to add 'ID' and 'linking' field information. The 'ID

**Note:**

-During annotation, if value is multiple textines, a key value pair can be added in linking, such as `[[0, 1], [0, 2]]`.
During annotation, if value is multiple text lines, a key-value pair can be added in linking, such as `[[0, 1], [0, 2]]`.

In terms of data, generally speaking, for relatively fixed scenes, about **50** training images can achieve acceptable effects.

Expand Down