Skip to content

Commit

Permalink
Feat: weighted average table metrics (#3348)
Browse files Browse the repository at this point in the history
This PR uses (number of actual table) weighted average instead of
average without weights for table metrics.

- pages where there are ground truth tables the weight is proportional
to the number of ground truth tables in that page
- pages where there are no ground truth tables but has predicted tables
(false positive) are assigned as 1 table worth of weight for the whole
page for calculating the mean value of `table_level_acc`
- pages with false positive tables do not contribute to table structural
or table content metrics

## test

This PR updates the existing test for evaluating table metrics:
- adds a second file with just 1 table vs. the existing file with 2
tables
- test the weighted average is written to the report
  • Loading branch information
badGarnet authored Nov 20, 2024
1 parent 85ecdab commit 3b9b01c
Show file tree
Hide file tree
Showing 7 changed files with 2,902 additions and 9 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
## 0.16.6-dev1
## 0.16.6-dev2

### Enhancements
- **Every <table> tag is considered to be ontology.Table** Added special handling for tables in HTML partitioning. This change is made to improve the accuracy of table extraction from HTML documents.
- **Every HTML has default ontology class assigned** When parsing HTML to ontology each defined HTML in the Ontology has assigned default ontology class. This way it is possible to assign ontology class instead of UncategorizedText when the HTML tag is predicted correctly without class assigned class
- **Use (number of actual table) weighted average for table metrics** In evaluating table metrics the mean aggregation now uses the actual number of tables in a document to weight the metric scores

### Features

Expand Down
Loading

0 comments on commit 3b9b01c

Please sign in to comment.