Reading text from desktop applications - use OCR or copy text to clipboard #926

chuenlim · 2021-02-02T18:15:48Z

Hello. I've been trying to figure out how to perform an OCR "read" from a page as per the instructions here #113

but I'm unable the find the "page.png" image that was mentioned in this webpage, nor was I able to run the sample script successfully.

Any help would be much appreciated!

kensoh · 2021-02-03T00:23:22Z

Hi @chuenlim, first make sure you have OpenJDK installed - https://tagui.readthedocs.io/en/latest/setup.html

That is required for the computer vision and OCR capabilities to work (through SikuliX engine).

I did a video which covers the following examples -

screenshot of webpage

snap page to result.png

screenshot of whole desktop

snap page.png to result.png

read webpage HTML

read page to result_text

read desktop text using OCR

read page.png to result_text

Essentially, page.png is used to tell TagUI to use computer vision or OCR way of interacting with the whole screen, versus just interacting with the webpage. There isn't a need for you have an a page.png file in order to run above steps. In the video, I also gave some examples which you could get text using OCR on a rectangle boundary range, a image frame, and getting text from desktop apps without using OCR. For eg, by selecting and copying out the text using keyboard [ctrl]c and then accessing the data directly using clipboard(). I'll add the video link right after this, lemme know how it goes!

kensoh · 2021-02-03T00:29:03Z

Link to the video - https://www.youtube.com/watch?v=3I-lbvB0vNc

chuenlim · 2021-02-03T02:51:31Z

Your help and support is super @kensoh thanks so much! both the speed and quality of your explanation!

One more question after checking out your video... You mentioned that it is possible for users to interact with screen elements using an "identifier" as the first approach, I believe you used the XPATH to an element for websites in your other videos.. I was wondering if there is an equivalent approach for interacting with other "non-web" applications, i.e. for other windows applications.. Or is using CV (image matching) the only available approach to achieve this?

kensoh · 2021-02-03T16:50:12Z

Thanks for your encouraging words! I'm trying out something new, to reply support issues not only through here, but also a video recording so that stuff can be demoed and additional contextual details can be shared to users :)

That's right, XPath is used for websites. Though TagUI will also use a smart algorithm to detect the best match if you use some attributes of the element (id or name or href or text() for example).

For desktop apps, in order to make the project manageable and also let TagUI work for Mac and Linux users, only computer vision method + keyboard combinations is applicable. So far, it has been working well for various apps. And it is easy to visualise the exact steps what are done.

Later this year, when we implement an AI Recorder, user will not need to use snipping tool to capture the screenshots anymore, they can just record and the TagUI workflow and image files will be self-generated and work across desktop and web apps.

(cc @ruthtxh for info)

chuenlim · 2021-02-03T19:05:36Z

The video recording is a nice touch :) I'm sure others will find it useful.
Thanks for the explanation Ken, looking forward to the upcoming updates!

kensoh changed the title ~~Using page.png to read from windows applications~~ Reading text from desktop applications - use OCR or copy text to clipboard Feb 3, 2021

kensoh added the query label Feb 3, 2021

kensoh closed this as completed Feb 18, 2021

kensoh self-assigned this Mar 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading text from desktop applications - use OCR or copy text to clipboard #926

Reading text from desktop applications - use OCR or copy text to clipboard #926

chuenlim commented Feb 2, 2021

kensoh commented Feb 3, 2021

kensoh commented Feb 3, 2021

chuenlim commented Feb 3, 2021

kensoh commented Feb 3, 2021

chuenlim commented Feb 3, 2021

Reading text from desktop applications - use OCR or copy text to clipboard #926

Reading text from desktop applications - use OCR or copy text to clipboard #926

Comments

chuenlim commented Feb 2, 2021

kensoh commented Feb 3, 2021

kensoh commented Feb 3, 2021

chuenlim commented Feb 3, 2021

kensoh commented Feb 3, 2021

chuenlim commented Feb 3, 2021