-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading text from desktop applications - use OCR or copy text to clipboard #926
Comments
Hi @chuenlim, first make sure you have OpenJDK installed - https://tagui.readthedocs.io/en/latest/setup.html That is required for the computer vision and OCR capabilities to work (through SikuliX engine). I did a video which covers the following examples - screenshot of webpage
screenshot of whole desktop
read webpage HTML
read desktop text using OCR
Essentially, |
Link to the video - https://www.youtube.com/watch?v=3I-lbvB0vNc |
Your help and support is super @kensoh thanks so much! both the speed and quality of your explanation! One more question after checking out your video... You mentioned that it is possible for users to interact with screen elements using an "identifier" as the first approach, I believe you used the XPATH to an element for websites in your other videos.. I was wondering if there is an equivalent approach for interacting with other "non-web" applications, i.e. for other windows applications.. Or is using CV (image matching) the only available approach to achieve this? |
Thanks for your encouraging words! I'm trying out something new, to reply support issues not only through here, but also a video recording so that stuff can be demoed and additional contextual details can be shared to users :) That's right, XPath is used for websites. Though TagUI will also use a smart algorithm to detect the best match if you use some attributes of the element (id or name or href or text() for example). For desktop apps, in order to make the project manageable and also let TagUI work for Mac and Linux users, only computer vision method + keyboard combinations is applicable. So far, it has been working well for various apps. And it is easy to visualise the exact steps what are done. Later this year, when we implement an AI Recorder, user will not need to use snipping tool to capture the screenshots anymore, they can just record and the TagUI workflow and image files will be self-generated and work across desktop and web apps. (cc @ruthtxh for info) |
The video recording is a nice touch :) I'm sure others will find it useful. |
Hello. I've been trying to figure out how to perform an OCR "read" from a page as per the instructions here #113
but I'm unable the find the "page.png" image that was mentioned in this webpage, nor was I able to run the sample script successfully.
Any help would be much appreciated!
The text was updated successfully, but these errors were encountered: