Skip to content

Commit

Permalink
Merge pull request #886 from opendatalab/release-0.9.2
Browse files Browse the repository at this point in the history
Release 0.9.2
  • Loading branch information
myhloli authored Nov 6, 2024
2 parents 3fd024d + aeae1d0 commit b25ff7a
Show file tree
Hide file tree
Showing 8 changed files with 98 additions and 74 deletions.
50 changes: 27 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
</div>

# Changelog
- 2024/11/06 0.9.1 released. Integrated the [StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B) model for table recognition functionality.
- 2024/11/06 0.9.2 released. Integrated the [StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B) model for table recognition functionality.
- 2024/10/31 0.9.0 released. This is a major new version with extensive code refactoring, addressing numerous issues, improving performance, reducing hardware requirements, and enhancing usability:
- Refactored the sorting module code to use [layoutreader](https://github.com/ppaanngggg/layoutreader) for reading order sorting, ensuring high accuracy in various layouts.
- Refactored the paragraph concatenation module to achieve good results in cross-column, cross-page, cross-figure, and cross-table scenarios.
Expand Down Expand Up @@ -138,13 +138,14 @@ There are three different ways to experience MinerU:
- [Quick CPU Demo (Windows, Linux, Mac)](#quick-cpu-demo)
- [Linux/Windows + CUDA](#Using-GPU)

**⚠️ Pre-installation Notice—Hardware and Software Environment Support**

To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues.

By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features.

In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
> [!WARNING]
> **Pre-installation Notice—Hardware and Software Environment Support**
>
> To ensure the stability and reliability of the project, we only optimize and test for specific hardware and software environments during development. This ensures that users deploying and running the project on recommended system configurations will get the best performance with the fewest compatibility issues.
>
> By focusing resources on the mainline environment, our team can more efficiently resolve potential bugs and develop new features.
>
> In non-mainline environments, due to the diversity of hardware and software configurations, as well as third-party dependency compatibility issues, we cannot guarantee 100% project availability. Therefore, for users who wish to use this project in non-recommended environments, we suggest carefully reading the documentation and FAQ first. Most issues already have corresponding solutions in the FAQ. We also encourage community feedback to help us gradually expand support.
<table>
<tr>
Expand Down Expand Up @@ -224,11 +225,13 @@ Refer to [How to Download Model Files](docs/how_to_download_models_en.md) for de
After completing the [2. Download model weight files](#2-download-model-weight-files) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your 【user directory】.

> [!TIP]
> The user directory for Windows is "C:\\Users\\username", for Linux it is "/home/username", and for macOS it is "/Users/username".
You can modify certain configurations in this file to enable or disable features, such as table recognition:


> [!NOTE]
> If the following items are not present in the JSON, please manually add the required items and remove the comment content (standard JSON does not support comments).
```json
Expand Down Expand Up @@ -257,13 +260,14 @@ If your device supports CUDA and meets the GPU requirements of the mainline envi
- [Ubuntu 22.04 LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_en_US.md)
- [Windows 10/11 + GPU](docs/README_Windows_CUDA_Acceleration_en_US.md)
- Quick Deployment with Docker
> Docker requires a GPU with at least 16GB of VRAM, and all acceleration features are enabled by default.
>
> Before running this Docker, you can use the following command to check if your device supports CUDA acceleration on Docker.
>
> ```bash
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
> ```
> [!IMPORTANT]
> Docker requires a GPU with at least 16GB of VRAM, and all acceleration features are enabled by default.
>
> Before running this Docker, you can use the following command to check if your device supports CUDA acceleration on Docker.
>
> ```bash
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
> ```
```bash
wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
docker build -t mineru:latest .
Expand Down Expand Up @@ -325,8 +329,8 @@ The results will be saved in the `{some_output_dir}` directory. The output file
├── some_pdf_spans.pdf # smallest granularity bbox position information diagram
└── some_pdf_content_list.json # Rich text JSON arranged in reading order
```

For more information about the output files, please refer to the [Output File Description](docs/output_file_en_us.md).
> [!TIP]
> For more information about the output files, please refer to the [Output File Description](docs/output_file_en_us.md).
### API

Expand Down Expand Up @@ -377,12 +381,12 @@ TODO

# TODO

- 🗹 Reading order based on the model
- 🗹 Recognition of `index` and `list` in the main text
- 🗹 Table recognition
- Code block recognition in the main text
- [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
- Geometric shape recognition
- [x] Reading order based on the model
- [x] Recognition of `index` and `list` in the main text
- [x] Table recognition
- [ ] Code block recognition in the main text
- [ ] [Chemical formula recognition](docs/chemical_knowledge_introduction/introduction.pdf)
- [ ] Geometric shape recognition

# Known Issues

Expand Down
52 changes: 30 additions & 22 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@

# 更新记录

- 2024/11/06 0.9.1发布,为表格识别功能接入了[StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B)模型
- 2024/11/06 0.9.2发布,为表格识别功能接入了[StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B)模型
- 2024/10/31 0.9.0发布,这是我们进行了大量代码重构的全新版本,解决了众多问题,提升了性能,降低了硬件需求,并提供了更丰富的易用性:
- 重构排序模块代码,使用 [layoutreader](https://github.com/ppaanngggg/layoutreader) 进行阅读顺序排序,确保在各种排版下都能实现极高准确率
- 重构段落拼接模块,在跨栏、跨页、跨图、跨表情况下均能实现良好的段落拼接效果
Expand Down Expand Up @@ -139,13 +139,15 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
- [使用CPU快速体验(Windows,Linux,Mac)](#使用cpu快速体验)
- [Linux/Windows + CUDA](#使用gpu)

**⚠️安装前必看——软硬件环境支持说明**

为了确保项目的稳定性和可靠性,我们在开发过程中仅对特定的软硬件环境进行优化和测试。这样当用户在推荐的系统配置上部署和运行项目时,能够获得最佳的性能表现和最少的兼容性问题。

通过集中资源和精力于主线环境,我们团队能够更高效地解决潜在的BUG,及时开发新功能。

在非主线环境中,由于硬件、软件配置的多样性,以及第三方依赖项的兼容性问题,我们无法100%保证项目的完全可用性。因此,对于希望在非推荐环境中使用本项目的用户,我们建议先仔细阅读文档以及FAQ,大多数问题已经在FAQ中有对应的解决方案,除此之外我们鼓励社区反馈问题,以便我们能够逐步扩大支持范围。
> [!WARNING]
> **安装前必看——软硬件环境支持说明**
>
> 为了确保项目的稳定性和可靠性,我们在开发过程中仅对特定的软硬件环境进行优化和测试。这样当用户在推荐的系统配置上部署和运行项目时,能够获得最佳的性能表现和最少的兼容性问题。
>
> 通过集中资源和精力于主线环境,我们团队能够更高效地解决潜在的BUG,及时开发新功能。
>
> 在非主线环境中,由于硬件、软件配置的多样性,以及第三方依赖项的兼容性问题,我们无法100%保证项目的完全可用性。因此,对于希望在非推荐环境中使用本项目的用户,我们建议先仔细阅读文档以及FAQ,大多数问题已经在FAQ中有对应的解决方案,除此之外我们鼓励社区反馈问题,以便我们能够逐步扩大支持范围。
<table>
<tr>
Expand Down Expand Up @@ -211,7 +213,8 @@ https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c

#### 1. 安装magic-pdf

最新版本国内镜像源同步可能会有延迟,请耐心等待
> [!NOTE]
> 最新版本国内镜像源同步可能会有延迟,请耐心等待
```bash
conda create -n MinerU python=3.10
Expand All @@ -227,10 +230,13 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h

完成[2. 下载模型权重文件](#2-下载模型权重文件)步骤后,脚本会自动生成用户目录下的magic-pdf.json文件,并自动配置默认模型路径。
您可在【用户目录】下找到magic-pdf.json文件。

> [!TIP]
> windows的用户目录为 "C:\\Users\\用户名", linux用户目录为 "/home/用户名", macOS用户目录为 "/Users/用户名"
您可修改该文件中的部分配置实现功能的开关,如表格识别功能:

> [!NOTE]
>如json内没有如下项目,请手动添加需要的项目,并删除注释内容(标准json不支持注释)
```json
Expand Down Expand Up @@ -259,13 +265,14 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
- [Ubuntu22.04LTS + GPU](docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md)
- [Windows10/11 + GPU](docs/README_Windows_CUDA_Acceleration_zh_CN.md)
- 使用Docker快速部署
> Docker 需设备gpu显存大于等于16GB,默认开启所有加速功能
>
> 运行本docker前可以通过以下命令检测自己的设备是否支持在docker上使用CUDA加速
>
> ```bash
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
> ```
> [!IMPORTANT]
> Docker 需设备gpu显存大于等于16GB,默认开启所有加速功能
>
> 运行本docker前可以通过以下命令检测自己的设备是否支持在docker上使用CUDA加速
>
> ```bash
> docker run --rm --gpus=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
> ```
```bash
wget https://github.com/opendatalab/MinerU/raw/master/Dockerfile
docker build -t mineru:latest .
Expand Down Expand Up @@ -329,7 +336,8 @@ magic-pdf -p {some_pdf} -o {some_output_dir} -m auto
└── some_pdf_content_list.json # 按阅读顺序排列的富文本json
```

更多有关输出文件的信息,请参考[输出文件说明](docs/output_file_zh_cn.md)
> [!TIP]
> 更多有关输出文件的信息,请参考[输出文件说明](docs/output_file_zh_cn.md)
### API

Expand Down Expand Up @@ -380,12 +388,12 @@ TODO

# TODO

- 🗹 基于模型的阅读顺序
- 🗹 正文中目录、列表识别
- 🗹 表格识别
- 正文中代码块识别
- [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
- 几何图形识别
- [x] 基于模型的阅读顺序
- [x] 正文中目录、列表识别
- [x] 表格识别
- [ ] 正文中代码块识别
- [ ] [化学式识别](docs/chemical_knowledge_introduction/introduction.pdf)
- [ ] 几何图形识别

# Known Issues

Expand Down
20 changes: 11 additions & 9 deletions docs/README_Ubuntu_CUDA_Acceleration_en_US.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ nvidia-smi

If you see information similar to the following, it means that the NVIDIA drivers are already installed, and you can skip Step 2.

Notice:`CUDA Version` should be >= 12.1, If the displayed version number is less than 12.1, please upgrade the driver.
> [!NOTE]
> Notice:`CUDA Version` should be >= 12.1, If the displayed version number is less than 12.1, please upgrade the driver.
```plaintext
+---------------------------------------------------------------------------------------+
Expand Down Expand Up @@ -64,14 +65,14 @@ conda activate MinerU
```sh
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
```

After installation, make sure to check the version of `magic-pdf` using the following command:

```sh
magic-pdf --version
```

If the version number is less than 0.7.0, please report the issue.
> [!IMPORTANT]
> After installation, make sure to check the version of `magic-pdf` using the following command:
>
> ```sh
> magic-pdf --version
> ```
>
> If the version number is less than 0.7.0, please report the issue.
### 6. Download Models
Expand All @@ -84,6 +85,7 @@ Refer to detailed instructions on [how to download model files](how_to_download_
After completing the [6. Download Models](#6-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your user directory.
> [!TIP]
> The user directory for Linux is "/home/username".
Expand Down
16 changes: 9 additions & 7 deletions docs/README_Ubuntu_CUDA_Acceleration_zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ nvidia-smi

如果看到类似如下的信息,说明已经安装了nvidia驱动,可以跳过步骤2

注意:`CUDA Version` 显示的版本号应 >= 12.1,如显示的版本号小于12.1,请升级驱动
> [!NOTE]
> `CUDA Version` 显示的版本号应 >= 12.1,如显示的版本号小于12.1,请升级驱动
```plaintext
+---------------------------------------------------------------------------------------+
Expand Down Expand Up @@ -65,7 +66,8 @@ conda activate MinerU
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple
```

> ❗️下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
> [!IMPORTANT]
> 下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
>
> ```bash
> magic-pdf --version
Expand All @@ -83,7 +85,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
完成[6.下载模型](#6-下载模型)步骤后,脚本会自动生成用户目录下的magic-pdf.json文件,并自动配置默认模型路径。
您可在【用户目录】下找到magic-pdf.json文件。
> [!TIP]
> linux用户目录为 "/home/用户名"
## 8. 第一次运行
Expand Down Expand Up @@ -112,8 +114,8 @@ magic-pdf -p small_ocr.pdf -o ./output
```bash
magic-pdf -p small_ocr.pdf -o ./output
```
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`layout detection cost``mfr time` 应提速10倍以上。
> [!TIP]
> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`layout detection cost``mfr time` 应提速10倍以上。
## 10. 为ocr开启cuda加速
Expand All @@ -128,5 +130,5 @@ python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.
```bash
magic-pdf -p small_ocr.pdf -o ./output
```
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr cost`应提速10倍以上。
> [!TIP]
> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr cost`应提速10倍以上。
8 changes: 5 additions & 3 deletions docs/README_Windows_CUDA_Acceleration_en_US.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ conda activate MinerU
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
```

> ❗️After installation, verify the version of `magic-pdf`:
> [!IMPORTANT]
> After installation, verify the version of `magic-pdf`:
>
> ```bash
> magic-pdf --version
Expand All @@ -45,6 +46,7 @@ Refer to detailed instructions on [how to download model files](how_to_download_
After completing the [5. Download Models](#5-download-models) step, the script will automatically generate a `magic-pdf.json` file in the user directory and configure the default model path.
You can find the `magic-pdf.json` file in your 【user directory】 .
> [!TIP]
> The user directory for Windows is "C:/Users/username".
### 7. First Run
Expand All @@ -65,8 +67,8 @@ If your graphics card has at least 8GB of VRAM, follow these steps to test CUDA-
```
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
```
> ❗️Ensure the following versions are specified in the command:
> [!IMPORTANT]
> Ensure the following versions are specified in the command:
>
> ```
> torch==2.3.1 torchvision==0.18.1
Expand Down
15 changes: 9 additions & 6 deletions docs/README_Windows_CUDA_Acceleration_zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ conda activate MinerU
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple
```

> ❗️下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
> [!IMPORTANT]
> 下载完成后,务必通过以下命令确认magic-pdf的版本是否正确
>
> ```bash
> magic-pdf --version
Expand All @@ -46,7 +47,7 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
完成[5.下载模型](#5-下载模型)步骤后,脚本会自动生成用户目录下的magic-pdf.json文件,并自动配置默认模型路径。
您可在【用户目录】下找到magic-pdf.json文件。
> [!TIP]
> windows用户目录为 "C:/Users/用户名"
## 7. 第一次运行
Expand All @@ -68,7 +69,8 @@ pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com -i h
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https://download.pytorch.org/whl/cu118
```
> ❗️务必在命令中指定以下版本
> [!IMPORTANT]
> 务必在命令中指定以下版本
>
> ```bash
> torch==2.3.1 torchvision==0.18.1
Expand All @@ -90,7 +92,8 @@ pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 --index-url https
magic-pdf -p small_ocr.pdf -o ./output
```
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段的耗时来简单判断,通常情况下,`layout detection time``mfr time` 应提速10倍以上。
> [!TIP]
> CUDA加速是否生效可以根据log中输出的各个阶段的耗时来简单判断,通常情况下,`layout detection time``mfr time` 应提速10倍以上。
## 9. 为ocr开启cuda加速
Expand All @@ -105,5 +108,5 @@ pip install paddlepaddle-gpu==2.6.1
```bash
magic-pdf -p small_ocr.pdf -o ./output
```
> 提示:CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr time`应提速10倍以上。
> [!TIP]
> CUDA加速是否生效可以根据log中输出的各个阶段cost耗时来简单判断,通常情况下,`ocr time`应提速10倍以上。
5 changes: 3 additions & 2 deletions docs/how_to_download_models_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,13 @@ The configuration file can be found in the user directory, with the filename `ma

## 1. Models downloaded via Git LFS

> [!IMPORTANT]
> Due to feedback from some users that downloading model files using git lfs was incomplete or resulted in corrupted model files, this method is no longer recommended.
>
> For versions 0.9.x and later, due to the repository change and the addition of the layout sorting model in PDF-Extract-Kit 1.0, the models cannot be updated using the `git pull` command. Instead, a Python script must be used for one-click updates.
When magic-pdf <= 0.8.1, if you have previously downloaded the model files via git lfs, you can navigate to the previous download directory and update the models using the `git pull` command.

> For versions 0.9.x and later, due to the repository change and the addition of the layout sorting model in PDF-Extract-Kit 1.0, the models cannot be updated using the `git pull` command. Instead, a Python script must be used for one-click updates.
## 2. Models downloaded via Hugging Face or Model Scope

If you previously downloaded models via Hugging Face or Model Scope, you can rerun the Python script used for the initial download. This will automatically update the model directory to the latest version.
Loading

0 comments on commit b25ff7a

Please sign in to comment.