Skip to content

Commit

Permalink
Merge pull request #5 from Icekey/rework_fork
Browse files Browse the repository at this point in the history
reworked project structure
  • Loading branch information
thomasgruebl authored Mar 26, 2023
2 parents a006805 + 4738b83 commit d59e462
Show file tree
Hide file tree
Showing 15 changed files with 642 additions and 640 deletions.
9 changes: 4 additions & 5 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "rusty-tesseract"
version = "1.0.1"
version = "1.1.1"
edition = "2021"
authors = ["thomasgruebl"]
description = "A Rust wrapper for Google Tesseract"
Expand All @@ -11,8 +11,7 @@ repository = "https://github.com/thomasgruebl/rusty-tesseract"

[dependencies]
subprocess = "0.2.8"
polars = "0.18.0"
ndarray = "0.15.4"
substring = "1.4.5"
multimap = "0.8.3"
image = "0.23.14"
image = "0.23.14"
thiserror = "1.0.40"
tempfile = "3.4.0"
96 changes: 48 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ A Rust wrapper for Google Tesseract
Add the following line to your <b>Cargo.toml</b> file:

```rust
rusty-tesseract = "1.0.1"
rusty-tesseract = "1.1.1"
```

## Description
Expand All @@ -35,88 +35,88 @@ Tesseract: https://github.com/tesseract-ocr/tesseract

### 1. Read Image

Create an Image object by specifying a path or alternatively an image array in (height, width, channel) format (similar to Python's numpy array for opencv).
Note: Leave the Array3 parameter as is if you don't intend to use it.
Create an Image object by specifying a path or alternatively a DynamicImage from the image crate https://docs.rs/image/latest/image/

```rust
let _ = Image::new(
String::from("img/string.png"),
Array3::<u8>::zeros((100, 100, 3)),
);
// you can use the from_path function
let _ = Image::from_path("img/string.png");

// alternatively instantiate directly:

let mut img = Image {
path: String::from("img/string.png"),
ndarray: Array3::<u8>::zeros((100, 100, 3)), // example: creates an 100x100 pixel image with 3 colour channels (RGB)
};
// or instantiate Image from a DynamicImage
let dynamic_image = ImageReader::open("img/string.png")
.unwrap()
.decode()
.unwrap();
let img = Image::from_dynamic_image(&dynamic_image).unwrap();
```

### 2. Set tesseract parameters

Set tesseract parameters using the Args struct.

```rust
let default_args = Args::new();
let default_args = Args::default();

// the default parameters are
/* pub fn new() -> Args {
Args {
config: HashMap::new(),
lang: "eng",
out_filename: "out",
dpi: 150,
boxfile: false
}
}
/*
Args {
lang: "eng",
dpi: 150,
psm: 3,
oem: 3,
}
*/

// fill your own argument struct if needed
let mut my_args = Args {
out_filename: "out", // name of output_file
lang: "eng", // model language (tesseract default = 'eng')
config: HashMap::new(), // create empty hashmap to fill with command line parameters such as --psm or --oem (see tesseract --help-extra)
dpi: 150, // specify DPI for input image
boxfile: false // specify whether the output should be a bounding box or string output
lang: "eng", // model language (tesseract default = 'eng')
dpi: 150, // specify DPI for input image
psm: 3, // define page segmentation mode 6 (i.e. "Assume a single uniform block of text")
oem: 3, // define optical character recognition mode 3 (i.e. "Default, based on what is available")
};
image_to_string_args.config.insert("psm", "6"); // define page segmentation mode 6 (i.e. "Assume a single uniform block of text")
image_to_string_args.config.insert("oem", "3"); // define optical character recognition mode 3 (i.e. "Default, based on what is available")
```

### 3. Get the tesseract model output

Choose either string, bounding box or data output:

```rust
// string output
let output = rusty_tesseract::image_to_string(&img, my_args);
println!("The String output is: {:?}", output.output);

// define bounding box parameters
let mut image_to_boxes_args = Args {
out_filename: "font_name.font.exp0",
// define parameters
let mut my_args = Args {
lang: "eng",
config: HashMap::new(),
dpi: 150,
boxfile: true
psm: 6,
oem: 3
};
image_to_boxes_args.config.insert("psm", "6");
image_to_boxes_args.config.insert("oem", "3");

// boxes printed in OUTPUT_DICT or OUTPUT_DATAFRAME format store the key as a string (i.e. the character) and
// store the value as a list of strings (if the same character occurs more than once)
let boxes = rusty_tesseract::image_to_boxes(&img, image_to_boxes_args);
println!("The Boxfile output is: {:?}", boxes.dataframe);
// string output
let output = rusty_tesseract::image_to_string(&img, &my_args).unwrap();
println!("The String output is: {:?}", output);


// image_to_data prints out both the "image_to_string()" and "image_to_boxes()" information + a creates a TSV table with confidences
let data = rusty_tesseract::image_to_data(&img, default_args);
println!("The data output is: {:?}", data.dict);

// image_to_boxes creates a BoxOutput containing the parsed output from Tesseract when using the "makebox" Parameter
let box_output = rusty_tesseract::image_to_boxes(&img, &my_args).unwrap();
println!(
"The first boxfile symbol is: {}",
box_output.boxes[0].symbol
);
println!("The full boxfile output is:\n{}", box_output.output);

// image_to_data creates a DataOutput containing the parsed output from Tesseract when using the "TSV" Parameter
let data_output = rusty_tesseract::image_to_data(&img, &my_args).unwrap();
let first_text_line = &data_output.data[4];
println!(
"The first text is '{}' with confidence {}",
first_text_line.text, first_text_line.conf
);
println!("The full data output is:\n{}", data_output.output);
```

### Get tesseract version

```rust
let tesseract_version = rusty_tesseract::get_tesseract_version();
let tesseract_version = rusty_tesseract::get_tesseract_version().unwrap();
println!("The tesseract version is: {:?}", tesseract_version);
```

Expand Down
22 changes: 0 additions & 22 deletions example_output/eng.testcase.exp0.box

This file was deleted.

22 changes: 0 additions & 22 deletions example_output/font_name.font.exp0.box

This file was deleted.

10 changes: 0 additions & 10 deletions example_output/out.tsv

This file was deleted.

2 changes: 0 additions & 2 deletions example_output/out.txt

This file was deleted.

50 changes: 0 additions & 50 deletions src/error.rs

This file was deleted.

5 changes: 1 addition & 4 deletions src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
pub mod error;
pub mod tesseract;

pub use error::*;
pub use image;
pub use tesseract::*;

pub use ndarray;
Loading

0 comments on commit d59e462

Please sign in to comment.