Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analýza označuje nesprávné pojmy #269

Open
marketaadamcova opened this issue Apr 22, 2022 · 12 comments
Open

analýza označuje nesprávné pojmy #269

marketaadamcova opened this issue Apr 22, 2022 · 12 comments
Labels
bug Something isn't working text analysis Ticket se projevuje v termitu, ale týká se textové analýzy

Comments

@marketaadamcova
Copy link

Popis problému

Popište:

  • vytvořila slovník procházením souboru 'dataset1_filtered' , poté spustila analýzu
  • analýza označila pojmy ale u některých (odhadem tak 30%). je přiřazen úplně jiný pojem.
  • na první pohled se zdá, že je pojem označen správně (pojem se ve slovníku vyskytuje), ale po klinutí na pojem se zobrazí úplně jiný přiřazený pojem

image
image
image

Závažnost problému

  • popište, jak závažný pro práci s TermItem pro Vás problém je:
  1. je to vážný problém,
@marketaadamcova marketaadamcova added the bug Something isn't working label Apr 22, 2022
@ledsoft ledsoft added the text analysis Ticket se projevuje v termitu, ale týká se textové analýzy label Apr 25, 2022
@blcham
Copy link

blcham commented Apr 27, 2022

@ledsoft I wanted to download the annotated document but there is not button to download it. Didn't we have this functionality?

image

@marketaadamcova On our last meeting about text analysis on skype we discussed following:

  • your input text should normal sentences or "all smaller case", not "all capital case" like u have
  • your terms should be written small case (pls read every ? where you enter data)

image

Thus I suggest you to try following experiment with term "sidewall panel":

  • make following term:
     label: "sidewall panel"
     synonym: --------------- (leave this empty, you don't need to put plurals there)
  • make new document that will have only 2 lines but all lower case, e.g.
5382503 during sidewall panel replacement were found several nutplates loosened /missing. shm - pls perform nutplate replacement

these sidewall panels were found worn out of limit at fwd cargo compartment: 453a1410-7, 453a1410-9.

Additional notes:

  • I wanted to experiment on small document because text analysis is very fast in such case
  • let me know how it works

@ledsoft
Copy link
Contributor

ledsoft commented Apr 28, 2022

@blcham Indeed we did. But it was removed by @psiotwo when refactoring documents/files processing. I was pushing for it being supported again, but it was never a priority. Anyway, now that I am the only developer of TermIt, I suppose I can implement it back finally.

@marketaadamcova
Copy link
Author

@blcham
result of analysis of original document
image

result of analysis of lowercase text
image

any tip how to convert capital case text in xls document to lower case ?

I found another problem
I cannot delete term (i accidentely created term in plural but cannot delete it now)
image

@blcham
Copy link

blcham commented Apr 28, 2022

any tip how to convert capital case text in xls document to lower case ?

see first hit:
https://www.google.com/search?q=excel+convert+capital+to+lowercase&oq=excel+convert+capital+&aqs=chrome.0.0i512j69i57j0i512j0i22i30l4j0i390.6022j0j7&client=ubuntu&sourceid=chrome&ie=UTF-8

Another solution would be to export CSV, change capital->lower case in text editor, put back from CSV to excel.

Text editor that knows how to change capita-->lower case is for example vim :
https://stackoverflow.com/questions/1102859/how-to-convert-all-text-to-lowercase-in-vim#:~:text=If%20you%20want%20to%20convert,the%20character%20under%20the%20cursor.

@blcham
Copy link

blcham commented Apr 28, 2022

I cannot delete term (i accidentely created term in plural but cannot delete it now)

@ledsoft what is best way to solve it ? Should I delete it using graphdb or there is a better way ?

@marketaadamcova
Copy link
Author

@ledsoft
similar problem with deleting whole vocabulary
image

@marketaadamcova
Copy link
Author

any tip how to convert capital case text in xls document to lower case ?

see first hit: https://www.google.com/search?q=excel+convert+capital+to+lowercase&oq=excel+convert+capital+&aqs=chrome.0.0i512j69i57j0i512j0i22i30l4j0i390.6022j0j7&client=ubuntu&sourceid=chrome&ie=UTF-8

Another solution would be to export CSV, change capital->lower case in text editor, put back from CSV to excel.

Text editor that knows how to change capita-->lower case is for example vim : https://stackoverflow.com/questions/1102859/how-to-convert-all-text-to-lowercase-in-vim#:~:text=If%20you%20want%20to%20convert,the%20character%20under%20the%20cursor.

solved, thanks

@ledsoft
Copy link
Contributor

ledsoft commented Apr 28, 2022

I cannot delete term (i accidentely created term in plural but cannot delete it now)

@ledsoft what is best way to solve it ? Should I delete it using graphdb or there is a better way ?

@blcham Delete occurrences and assignments of the specified term from the repository. Then it should be possible to remove it via TermIt.

@ledsoft
Copy link
Contributor

ledsoft commented Apr 28, 2022

@blcham As for the vocabulary - as long as it is not empty, it is not possible to remove it. Either remove all its terms and then remove it. Or, perhaps a quicker solution, remove the whole vocabulary context from the repository (as long as there aren't any references to it from elsewhere).

@blcham
Copy link

blcham commented Apr 28, 2022

@marketaadamcova let me know what do you want me to remove, thanks.

@ahmadjana
Copy link

ahmadjana commented Jun 15, 2022

@blcham
analysis indicates incorrect concepts or terms (technical perspective): I think because of the word score.
it returns the terms with a higher score.
Ker used to determine how important the word in the documents is by giving a score.
the same for issue #240 and issue #241

@blcham
Copy link

blcham commented Jul 18, 2022

@ahmadjana I am not sure exactly what you mean here ... what means incorrect concepts or terms?

We have many concrete examples where it fails. Could you explain it explicitly and in greater detail in those examples why it does not work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working text analysis Ticket se projevuje v termitu, ale týká se textové analýzy
Projects
None yet
Development

No branches or pull requests

4 participants