Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练模型SLANet_plus,验证数据集出错 #2463

Open
Bb91234567890 opened this issue Nov 12, 2024 · 5 comments
Open

训练模型SLANet_plus,验证数据集出错 #2463

Bb91234567890 opened this issue Nov 12, 2024 · 5 comments
Assignees

Comments

@Bb91234567890
Copy link

描述问题

训练表格结构识别模型SLANet_plus,在验证数据集的这一步出现错误:paddlex.utils.errors.dataset_checker.CheckFailedError: Check dataset failed. We encountered the following error:
The number of cells needs to be consistent with the number of tokens but the number of cells is {boxes_num}, and the number of tokens is {tokens_num}.

复现

  1. 您是否已经正常运行我们提供的教程

  2. 您是否在教程的基础上修改代码内容?还请您提供运行的代码

  3. 您使用的数据集是?
    images中有张图片401.jpg

train.txt内容:
{"filename": "images/401.jpg", "html": {"structure": {"tokens": ["<tbody>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td", " rowspan=\"6\"", ">", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td", " rowspan=\"4\"", ">", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"3\"", ">", "</td>", "<td", " rowspan=\"3\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "</tr>", "</tbody>"]}, "cells": [{"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 7], [145, 7], [145, 75], [41, 75]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 7], [311, 7], [311, 75], [159, 75]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[324, 8], [538, 8], [538, 77], [324, 77]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[552, 10], [854, 10], [854, 75], [552, 75]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[865, 8], [1212, 8], [1212, 77], [865, 77]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 82], [145, 82], [145, 151], [41, 151]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 82], [311, 82], [311, 151], [159, 151]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[324, 85], [538, 85], [538, 116], [324, 116]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[553, 86], [853, 86], [853, 115], [553, 115]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[865, 85], [1211, 85], [1211, 114], [865, 114]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[324, 123], [537, 123], [537, 156], [324, 156]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[552, 125], [853, 125], [853, 153], [552, 153]]}, {"tokens": ["U", "r", "e", "l", "=", "0", ".", "1", "2", "%"], "bbox": [[865, 125], [1211, 125], [1211, 192], [865, 192]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 162], [145, 162], [145, 230], [41, 230]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 162], [311, 162], [311, 231], [159, 231]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 164], [537, 164], [537, 194], [325, 194]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[552, 164], [853, 164], [853, 194], [552, 194]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 202], [536, 202], [536, 233], [325, 233]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[552, 203], [853, 203], [853, 232], [552, 232]]}, {"tokens": ["U", "r", "e", "l", "=", "0", ".", "0", "6", "%"], "bbox": [[865, 203], [1212, 203], [1212, 430], [865, 430]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 244], [145, 244], [145, 312], [41, 312]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 244], [311, 244], [311, 313], [159, 313]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 241], [536, 241], [536, 274], [325, 274]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[554, 242], [853, 242], [853, 272], [554, 272]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[324, 282], [537, 282], [537, 311], [324, 311]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[555, 283], [853, 283], [853, 310], [555, 310]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 323], [145, 323], [145, 391], [41, 391]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 322], [311, 322], [311, 390], [159, 390]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 321], [537, 321], [537, 350], [325, 350]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[554, 322], [852, 322], [852, 349], [554, 349]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 361], [537, 361], [537, 388], [325, 388]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[554, 361], [851, 361], [851, 391], [554, 391]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 401], [145, 401], [145, 469], [41, 469]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 401], [311, 401], [311, 469], [159, 469]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 400], [537, 400], [537, 427], [325, 427]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[554, 400], [851, 400], [851, 427], [554, 427]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 438], [538, 438], [538, 470], [325, 470]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[555, 439], [851, 439], [851, 468], [555, 468]]}, {"tokens": ["U", "r", "e", "l", "=", "0", ".", "1", "2", "%"], "bbox": [[866, 440], [1211, 440], [1211, 589], [866, 589]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 478], [145, 478], [145, 588], [41, 588]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 478], [311, 478], [311, 588], [159, 588]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 479], [537, 479], [537, 510], [325, 510]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[556, 479], [851, 479], [851, 508], [556, 508]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 519], [537, 519], [537, 548], [325, 548]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[556, 518], [851, 518], [851, 546], [556, 546]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 557], [538, 557], [538, 584], [325, 584]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[557, 558], [852, 558], [852, 588], [557, 588]]}]}, "gt": "<html><body><table><tbody><tr><td>最大允许误</td><td>最大允许误</td><td>最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td rowspan=\"2\">最大允许误</td><td rowspan=\"2\">最大允许误</td><td>最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td><td rowspan=\"2\">Urel=0.12%</td></tr><tr><td rowspan=\"2\">最大允许误</td><td rowspan=\"2\">最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td><td rowspan=\"6\">Urel=0.06%</td></tr><tr><td rowspan=\"2\">最大允许误</td><td rowspan=\"2\">最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td></tr><tr><td rowspan=\"2\">最大允许误</td><td rowspan=\"2\">最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td></tr><tr><td rowspan=\"2\">最大允许误</td><td rowspan=\"2\">最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td><td rowspan=\"4\">Urel=0.12%</td></tr><tr><td rowspan=\"3\">最大允许误</td><td rowspan=\"3\">最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td></tr></tbody></table></body></html>"}

image

401

  1. 请提供您出现的报错信息及相关log
    aistudio@jupyter-13749311-8521962:~/PaddleX$ python main.py -c paddlex/configs/table_recognition/SLANet_plus.yaml -o Global.mode=check_dataset -o Global.dataset_dir=./train_data Traceback (most recent call last): File "/home/aistudio/PaddleX/paddlex/utils/result_saver.py", line 29, in wrap result = func(self, *args, **kwargs) File "/home/aistudio/PaddleX/paddlex/engine.py", line 38, in run return self._model.check_dataset() File "/home/aistudio/PaddleX/paddlex/model.py", line 90, in check_dataset return dataset_checker.check() File "/home/aistudio/PaddleX/paddlex/modules/base/dataset_checker/dataset_checker.py", line 75, in check attrs = self.check_dataset(dataset_dir) File "/home/aistudio/PaddleX/paddlex/modules/table_recognition/dataset_checker/__init__.py", line 67, in check_dataset return check(dataset_dir, self.global_config.output, sample_num=10) File "/home/aistudio/PaddleX/paddlex/modules/table_recognition/dataset_checker/dataset_src/check_dataset.py", line 75, in check raise CheckFailedError( paddlex.utils.errors.dataset_checker.CheckFailedError: Check dataset failed. We encountered the following error: The number of cells needs to be consistent with the number of tokens but the number of cells is {boxes_num}, and the number of tokens is {tokens_num}.

环境

  1. 请提供您使用的PaddlePaddle和PaddleX的版本号
    paddlepaddle:3.0.0beta2
    paddlex:3.0-beta1

  2. 请提供您使用的操作系统信息,如Linux/Windows/MacOS
    linux

  3. 请问您使用的Python版本是?
    3.10.10

  4. 请问您使用的CUDA/cuDNN的版本号是?
    11.8

@zhang-prog
Copy link
Collaborator

验证集的txt貌似是空的,正常划分一下数据集试试呢?
另外要保证每个tokens都对应了一个bbox哈,这点也可以注意一下。

@flow3rdown
Copy link

您好,请问您训练过程中Acc正常吗?

@zhang-prog
Copy link
Collaborator

acc问题在这里哈 #2460

@Bb91234567890
Copy link
Author

验证集的txt貌似是空的,正常划分一下数据集试试呢? 另外要保证每个tokens都对应了一个bbox哈,这点也可以注意一下。

好的我试试

@zhang-prog
Copy link
Collaborator

好的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants