Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Invalid hash string, length does not match any known encoding" > Error Occurs. #99

Closed
soimkim opened this issue Apr 14, 2021 · 7 comments

Comments

@soimkim
Copy link

soimkim commented Apr 14, 2021

When the hash value created with py_tlsh (4.5.0) is called with java, an "Invalid hash string, length does not match any known encoding" error occurs.

Would you reply to what should be corrected in the java code part to fix this error?

@jonjoliver
Copy link
Collaborator

Hi @soimkim

There were 2 issues which required us to change some elements of TLSH

  1. There was a potential division by 0 (issue Bug in div by 0 in q3 after find_quartile returns a q3=0 #79) - the C++ code caught it with a try-catch
  2. The C++ code had issues with file > 2 Gig (issue tlsh calculation on >2GB files causing OutOfArrayIndexException in java implementation #84)
    We fixed the C++ code base (this repository)
    And released a new Python library - that also fixed problems with Python on Windows
    This is the py-tlsh package (4.5.0)

This caused changes to the generation of a few few files (edge cases)
Still we added a "T1" version string to the front of the hash - to track these changes
The C++ code and the Python code is backwards compatible with old hashes

Which Java library of TLSH are you using?
I think we need to fix that up to be up to date with these few changes

Cheers
jono

@soimkim
Copy link
Author

soimkim commented Apr 15, 2021

Hi @jonjoliver ,

I use the library (tlsh_3.7.1.jar) built in the java folder of this repository and call the "totalDiff, fromTlshStr" function that compares the tlsh value extracted from the tool as follows example.

 
public class MainTest {
 
public static void main(String[] args) { 
  String srcTlsh = "21272383E754E01BE4FF953116996103B3853D588A42A31A1790F6EE39BFCC63F86E85";
  String targetTlsh = "540612D3F355F42BC636C53271A24222519BCDE48703EB266506F7B9ACFBE854980BD8";
  
  Tlsh tlshTest1 = Tlsh.fromTlshStr(srcTlsh);
  Tlsh tlshTest2 = Tlsh.fromTlshStr(targetTlsh);
  System.out.println(tlshTest1.totalDiff(tlshTest2, true));
 }
 } 

I need a call from a spring based web service, so I need tlsh written in java.

Thanks & regards,
Soim

@jonjoliver
Copy link
Collaborator

Just chasing down the developer who did the Java port
I will set up the environment - and then do some fairly minor tweaks to move it from 3.7.1 -> 4.5.0
This may take me a while (once I have the environment (gradle etc) it should be very quick)

Workaround: you could add a few lines of code to remove the T1 from the start of the hash that you get from Python
before you do anything in Java (???)

@soimkim
Copy link
Author

soimkim commented Apr 15, 2021

Thank you very much for your quick reply!
The workaround you gave me is also a good way.

mrpolyonymous pushed a commit that referenced this issue Apr 22, 2021
* Update TLSH Java implementation to understand new TLSH version data
* Partial fix for issue #84 to allow TLSH on files up to 4GiB
* Fix unit tests to work with new test data files
* Remove bintray hosting as it is being shut down
jonjoliver added a commit that referenced this issue Apr 23, 2021
jonjoliver pushed a commit that referenced this issue Apr 23, 2021
23/04/2021
        Merging in pull requests
        issue #99 - new Java version that solves large file problem (Thanks Daniel)
        Add architecture ppc64le to travis build (Thanks ddeka2910)
        Fix tmpArray is undefined in JavaScript version (Thanks carbureted)
@jonjoliver
Copy link
Collaborator

@soimkim Could you test version 4.6.0 ?
Thanks

@soimkim
Copy link
Author

soimkim commented Apr 27, 2021

@jonjoliver ,
When I tested it with the jar file build with v4.6.0, the comparison of the tlsh value for the hash value output by py_tlsh (4.5.0) works without error.

Thanks for responding quickly!

@soimkim
Copy link
Author

soimkim commented Apr 27, 2021

This is an issue resolved in v4.6.0.

@soimkim soimkim closed this as completed Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants