训练模型SLANet_plus，验证数据集出错 #2463

Bb91234567890 · 2024-11-12T05:30:16Z

描述问题

训练表格结构识别模型SLANet_plus，在验证数据集的这一步出现错误：paddlex.utils.errors.dataset_checker.CheckFailedError: Check dataset failed. We encountered the following error:
The number of cells needs to be consistent with the number of tokens but the number of cells is {boxes_num}, and the number of tokens is {tokens_num}.

复现

您是否已经正常运行我们提供的教程？
是
您是否在教程的基础上修改代码内容？还请您提供运行的代码
否
您使用的数据集是？
images中有张图片401.jpg

train.txt内容：
{"filename": "images/401.jpg", "html": {"structure": {"tokens": ["<tbody>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td", " rowspan=\"6\"", ">", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"2\"", ">", "</td>", "<td", " rowspan=\"2\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td", " rowspan=\"4\"", ">", "</td>", "</tr>", "<tr>", "<td", " rowspan=\"3\"", ">", "</td>", "<td", " rowspan=\"3\"", ">", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "</tr>", "</tbody>"]}, "cells": [{"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 7], [145, 7], [145, 75], [41, 75]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 7], [311, 7], [311, 75], [159, 75]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[324, 8], [538, 8], [538, 77], [324, 77]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[552, 10], [854, 10], [854, 75], [552, 75]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[865, 8], [1212, 8], [1212, 77], [865, 77]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 82], [145, 82], [145, 151], [41, 151]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 82], [311, 82], [311, 151], [159, 151]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[324, 85], [538, 85], [538, 116], [324, 116]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[553, 86], [853, 86], [853, 115], [553, 115]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[865, 85], [1211, 85], [1211, 114], [865, 114]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[324, 123], [537, 123], [537, 156], [324, 156]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[552, 125], [853, 125], [853, 153], [552, 153]]}, {"tokens": ["U", "r", "e", "l", "=", "0", ".", "1", "2", "%"], "bbox": [[865, 125], [1211, 125], [1211, 192], [865, 192]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 162], [145, 162], [145, 230], [41, 230]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 162], [311, 162], [311, 231], [159, 231]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 164], [537, 164], [537, 194], [325, 194]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[552, 164], [853, 164], [853, 194], [552, 194]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 202], [536, 202], [536, 233], [325, 233]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[552, 203], [853, 203], [853, 232], [552, 232]]}, {"tokens": ["U", "r", "e", "l", "=", "0", ".", "0", "6", "%"], "bbox": [[865, 203], [1212, 203], [1212, 430], [865, 430]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 244], [145, 244], [145, 312], [41, 312]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 244], [311, 244], [311, 313], [159, 313]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 241], [536, 241], [536, 274], [325, 274]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[554, 242], [853, 242], [853, 272], [554, 272]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[324, 282], [537, 282], [537, 311], [324, 311]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[555, 283], [853, 283], [853, 310], [555, 310]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 323], [145, 323], [145, 391], [41, 391]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 322], [311, 322], [311, 390], [159, 390]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 321], [537, 321], [537, 350], [325, 350]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[554, 322], [852, 322], [852, 349], [554, 349]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 361], [537, 361], [537, 388], [325, 388]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[554, 361], [851, 361], [851, 391], [554, 391]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 401], [145, 401], [145, 469], [41, 469]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 401], [311, 401], [311, 469], [159, 469]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 400], [537, 400], [537, 427], [325, 427]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[554, 400], [851, 400], [851, 427], [554, 427]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 438], [538, 438], [538, 470], [325, 470]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[555, 439], [851, 439], [851, 468], [555, 468]]}, {"tokens": ["U", "r", "e", "l", "=", "0", ".", "1", "2", "%"], "bbox": [[866, 440], [1211, 440], [1211, 589], [866, 589]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[41, 478], [145, 478], [145, 588], [41, 588]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[159, 478], [311, 478], [311, 588], [159, 588]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 479], [537, 479], [537, 510], [325, 510]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[556, 479], [851, 479], [851, 508], [556, 508]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 519], [537, 519], [537, 548], [325, 548]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[556, 518], [851, 518], [851, 546], [556, 546]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[325, 557], [538, 557], [538, 584], [325, 584]]}, {"tokens": ["最", "大", "允", "许", "误"], "bbox": [[557, 558], [852, 558], [852, 588], [557, 588]]}]}, "gt": "<html><body><table><tbody><tr><td>最大允许误</td><td>最大允许误</td><td>最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td rowspan=\"2\">最大允许误</td><td rowspan=\"2\">最大允许误</td><td>最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td><td rowspan=\"2\">Urel=0.12%</td></tr><tr><td rowspan=\"2\">最大允许误</td><td rowspan=\"2\">最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td><td rowspan=\"6\">Urel=0.06%</td></tr><tr><td rowspan=\"2\">最大允许误</td><td rowspan=\"2\">最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td></tr><tr><td rowspan=\"2\">最大允许误</td><td rowspan=\"2\">最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td></tr><tr><td rowspan=\"2\">最大允许误</td><td rowspan=\"2\">最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td><td rowspan=\"4\">Urel=0.12%</td></tr><tr><td rowspan=\"3\">最大允许误</td><td rowspan=\"3\">最大允许误</td><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td></tr><tr><td>最大允许误</td><td>最大允许误</td></tr></tbody></table></body></html>"}

请提供您出现的报错信息及相关log
aistudio@jupyter-13749311-8521962:~/PaddleX$ python main.py -c paddlex/configs/table_recognition/SLANet_plus.yaml -o Global.mode=check_dataset -o Global.dataset_dir=./train_data Traceback (most recent call last): File "/home/aistudio/PaddleX/paddlex/utils/result_saver.py", line 29, in wrap result = func(self, *args, **kwargs) File "/home/aistudio/PaddleX/paddlex/engine.py", line 38, in run return self._model.check_dataset() File "/home/aistudio/PaddleX/paddlex/model.py", line 90, in check_dataset return dataset_checker.check() File "/home/aistudio/PaddleX/paddlex/modules/base/dataset_checker/dataset_checker.py", line 75, in check attrs = self.check_dataset(dataset_dir) File "/home/aistudio/PaddleX/paddlex/modules/table_recognition/dataset_checker/__init__.py", line 67, in check_dataset return check(dataset_dir, self.global_config.output, sample_num=10) File "/home/aistudio/PaddleX/paddlex/modules/table_recognition/dataset_checker/dataset_src/check_dataset.py", line 75, in check raise CheckFailedError( paddlex.utils.errors.dataset_checker.CheckFailedError: Check dataset failed. We encountered the following error: The number of cells needs to be consistent with the number of tokens but the number of cells is {boxes_num}, and the number of tokens is {tokens_num}.

环境

请提供您使用的PaddlePaddle和PaddleX的版本号
paddlepaddle:3.0.0beta2
paddlex:3.0-beta1
请提供您使用的操作系统信息，如Linux/Windows/MacOS
linux
请问您使用的Python版本是？
3.10.10
请问您使用的CUDA/cuDNN的版本号是？
11.8

The text was updated successfully, but these errors were encountered:

zhang-prog · 2024-11-13T08:03:06Z

验证集的txt貌似是空的，正常划分一下数据集试试呢？
另外要保证每个tokens都对应了一个bbox哈，这点也可以注意一下。

flow3rdown · 2024-11-13T11:22:54Z

您好，请问您训练过程中Acc正常吗？

zhang-prog · 2024-11-14T12:05:22Z

acc问题在这里哈 #2460

Bb91234567890 · 2024-11-17T02:33:31Z

验证集的txt貌似是空的，正常划分一下数据集试试呢？另外要保证每个tokens都对应了一个bbox哈，这点也可以注意一下。

好的我试试

zhang-prog · 2024-11-19T03:05:11Z

好的

TingquanGao assigned zhang-prog Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

训练模型SLANet_plus，验证数据集出错 #2463

训练模型SLANet_plus，验证数据集出错 #2463

Bb91234567890 commented Nov 12, 2024

zhang-prog commented Nov 13, 2024

flow3rdown commented Nov 13, 2024

zhang-prog commented Nov 14, 2024

Bb91234567890 commented Nov 17, 2024

zhang-prog commented Nov 19, 2024

训练模型SLANet_plus，验证数据集出错 #2463

训练模型SLANet_plus，验证数据集出错 #2463

Comments

Bb91234567890 commented Nov 12, 2024

描述问题

复现

环境

zhang-prog commented Nov 13, 2024

flow3rdown commented Nov 13, 2024

zhang-prog commented Nov 14, 2024

Bb91234567890 commented Nov 17, 2024

zhang-prog commented Nov 19, 2024