Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading text from desktop applications - use OCR or copy text to clipboard #926

Closed
chuenlim opened this issue Feb 2, 2021 · 5 comments
Closed
Assignees
Labels

Comments

@chuenlim
Copy link

chuenlim commented Feb 2, 2021

Hello. I've been trying to figure out how to perform an OCR "read" from a page as per the instructions here #113

but I'm unable the find the "page.png" image that was mentioned in this webpage, nor was I able to run the sample script successfully.

Any help would be much appreciated!

@kensoh
Copy link
Member

kensoh commented Feb 3, 2021

Hi @chuenlim, first make sure you have OpenJDK installed - https://tagui.readthedocs.io/en/latest/setup.html

That is required for the computer vision and OCR capabilities to work (through SikuliX engine).

I did a video which covers the following examples -

screenshot of webpage

snap page to result.png

screenshot of whole desktop

snap page.png to result.png

read webpage HTML

read page to result_text

read desktop text using OCR

read page.png to result_text

Essentially, page.png is used to tell TagUI to use computer vision or OCR way of interacting with the whole screen, versus just interacting with the webpage. There isn't a need for you have an a page.png file in order to run above steps. In the video, I also gave some examples which you could get text using OCR on a rectangle boundary range, a image frame, and getting text from desktop apps without using OCR. For eg, by selecting and copying out the text using keyboard [ctrl]c and then accessing the data directly using clipboard(). I'll add the video link right after this, lemme know how it goes!

@kensoh kensoh changed the title Using page.png to read from windows applications Reading text from desktop applications - use OCR or copy text to clipboard Feb 3, 2021
@kensoh kensoh added the query label Feb 3, 2021
@kensoh
Copy link
Member

kensoh commented Feb 3, 2021

Link to the video - https://www.youtube.com/watch?v=3I-lbvB0vNc

@chuenlim
Copy link
Author

chuenlim commented Feb 3, 2021

Your help and support is super @kensoh thanks so much! both the speed and quality of your explanation!

One more question after checking out your video... You mentioned that it is possible for users to interact with screen elements using an "identifier" as the first approach, I believe you used the XPATH to an element for websites in your other videos.. I was wondering if there is an equivalent approach for interacting with other "non-web" applications, i.e. for other windows applications.. Or is using CV (image matching) the only available approach to achieve this?

@kensoh
Copy link
Member

kensoh commented Feb 3, 2021

Thanks for your encouraging words! I'm trying out something new, to reply support issues not only through here, but also a video recording so that stuff can be demoed and additional contextual details can be shared to users :)

That's right, XPath is used for websites. Though TagUI will also use a smart algorithm to detect the best match if you use some attributes of the element (id or name or href or text() for example).

For desktop apps, in order to make the project manageable and also let TagUI work for Mac and Linux users, only computer vision method + keyboard combinations is applicable. So far, it has been working well for various apps. And it is easy to visualise the exact steps what are done.

Later this year, when we implement an AI Recorder, user will not need to use snipping tool to capture the screenshots anymore, they can just record and the TagUI workflow and image files will be self-generated and work across desktop and web apps.

(cc @ruthtxh for info)

@chuenlim
Copy link
Author

chuenlim commented Feb 3, 2021

The video recording is a nice touch :) I'm sure others will find it useful.
Thanks for the explanation Ken, looking forward to the upcoming updates!

@kensoh kensoh closed this as completed Feb 18, 2021
@kensoh kensoh self-assigned this Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants