Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old WhatsApp databases parsing affected by forwarded message feature [4.1.3 regression] #1774

Closed
lfcnassif opened this issue Jul 17, 2023 · 25 comments · Fixed by #1777
Closed
Assignees
Labels

Comments

@lfcnassif
Copy link
Member

lfcnassif commented Jul 17, 2023

@thalespr reported this to me and sent a log with several entries like this:

java.sql.SQLException: Column forwarded not found. no such column
	at iped.parsers.sqlite.SQLiteUndeleteTableResultSetAdapter.getMappedColumnName(SQLiteUndeleteTableResultSetAdapter.java:76) ~[iped-parsers-impl-4.2-snapshot.jar:?]
	at iped.parsers.sqlite.SQLiteUndeleteTableResultSetAdapter.getInt(SQLiteUndeleteTableResultSetAdapter.java:377) ~[iped-parsers-impl-4.2-snapshot.jar:?]
	at iped.parsers.whatsapp.ExtractorAndroid.createMessageFromDBRow(ExtractorAndroid.java:504) ~[iped-parsers-impl-4.2-snapshot.jar:?]
	at iped.parsers.whatsapp.ExtractorAndroid.extractMessages(ExtractorAndroid.java:387) [iped-parsers-impl-4.2-snapshot.jar:?]
	at iped.parsers.whatsapp.ExtractorAndroid.extractChatList(ExtractorAndroid.java:213) [iped-parsers-impl-4.2-snapshot.jar:?]
	at iped.parsers.whatsapp.Extractor.getChatList(Extractor.java:34) [iped-parsers-impl-4.2-snapshot.jar:?]
	at iped.parsers.whatsapp.WhatsAppParser.extractChatList(WhatsAppParser.java:529) [iped-parsers-impl-4.2-snapshot.jar:?]
	at iped.parsers.whatsapp.WhatsAppParser.parseDB(WhatsAppParser.java:399) [iped-parsers-impl-4.2-snapshot.jar:?]
	at iped.parsers.whatsapp.WhatsAppParser.parseAndCheckIfIsMainDb(WhatsAppParser.java:461) [iped-parsers-impl-4.2-snapshot.jar:?]
	at iped.parsers.whatsapp.WhatsAppParser.parse(WhatsAppParser.java:243) [iped-parsers-impl-4.2-snapshot.jar:?]
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) [tika-core-2.4.0-p1.jar:2.4.0]
	at iped.parsers.standard.StandardParser.parse(StandardParser.java:245) [iped-parsers-impl-4.2-snapshot.jar:?]
	at iped.engine.io.ParsingReader$BackgroundParsing.run(ParsingReader.java:247) [iped-engine-4.2-snapshot.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
	at java.lang.Thread.run(Unknown Source) [?:?]

I was able to reproduce, chats are rendered without messages. It seems to happen when the parser tries to recover deleted messages, disabling it doesn't avoid the issue, but results in no chats decoded. Unfortunately this wasn't detected on #1287 huge regression test with the huge 300 DBs data set since I disabled deleted message recovery because of OOME issues when running the test on all DBs at the same time...

@lfcnassif lfcnassif added the bug label Jul 17, 2023
@lfcnassif
Copy link
Member Author

lfcnassif commented Jul 17, 2023

Anyone could take a look at this? I won't have time this week since I'm in charge of the whole unit. I can send the sample privately.

@wladimirleite
Copy link
Member

wladimirleite commented Jul 17, 2023

I will be back on Friday. If nobody takes care of this before that, I will look into it (please, send me the triggering database, if that is the case).

@lfcnassif
Copy link
Member Author

I will be back on Friday. If nobody take care of this before that, I will look into it (please, send me the triggering database, if that is the case).

Thanks @tc-wleite, but don't worry, since this seems to affect just very old databases, please enjoy your vacation.

@gfd2020
Copy link
Collaborator

gfd2020 commented Jul 17, 2023

Anyone could take a look at this? I won't have time this week since I'm in charge of the whole unit. I can send the sample privately.

The only way I could handle this error was by testing if the column exists. If it exists, do the normal select, if it doesn't exist, put the collumn with zero value, for example in the select: ' ... 0 as forwarded FROM ....'.
Sorry I should have warned you about this, I forgot. Since the regression tests passed, I didn't even bother checking the code.

Just make a test, just like orthers tests ...

private boolean databaseHasForwarded(Connection conn) throws SQLException {
    boolean result = false;
    DatabaseMetaData md = conn.getMetaData();
    try (ResultSet rs = md.getColumns(null, null, "messages", "forwarded")) {
        if (rs.next()) {
            result = true;
        }
    }
    return result;
}

@lfcnassif
Copy link
Member Author

Not sure, but I'll try to take a look at this in the (later) evening or at night...

@hauck-jvsh
Copy link
Member

hauck-jvsh commented Jul 17, 2023

Maybe we can just add a try catch in the createMessageFromDBRow (ExtractorAndroid.java:504)

 try {
        m.setForwarded(rs.getInt("forwarded") > 0);
    } catch (SQLException e) {
        m.setForwarded(false);
    }

@wladimirleite
Copy link
Member

As far as I remember, there is an auxiliary function that can be used (checkIfColumnExists).

@lfcnassif
Copy link
Member Author

I'm reopening this to don't forget: @thalespr reported commit ffbcac0 may miss the thumbnails table with some old DBs.

@lfcnassif lfcnassif reopened this Aug 2, 2023
@lfcnassif
Copy link
Member Author

I wanted to create a PR with commit f8832ba for review, but I pushed it to master non intentionally. I would appreciate if anyone could take a look at it, since the last attempt to fix this caused a regression to some old DBs, missing the thumbnails table.

@lfcnassif lfcnassif changed the title Old WhatsApp databases parsing broken by forwarded message feature Old WhatsApp databases parsing affected by forwarded message feature Aug 9, 2023
@lfcnassif
Copy link
Member Author

lfcnassif commented Aug 11, 2023

Does someone have some time to take a look and test commit f8832ba so we can include it in v4.1.4 safely?

@wladimirleite
Copy link
Member

Does someone have some time to take a look and test commit f8832ba so we can include it in v4.1.4 safely?

Hi @lfcnassif! I can test it later today, if nobody else is available.

@paulobreim
Copy link

Does someone have some time to take a look and test commit f8832ba so we can include it in v4.1.4 safely?

I can, but I don´t have the WA files to test.

@lfcnassif
Copy link
Member Author

Hi @lfcnassif! I can test it later today, if nobody else is available.

Thank you @tc-wleite! I made a few tests, but I would appreciate a second opinion given the previous regression. But don't hurry, 4.1.4 can wait next week.

I can, but I don´t have the WA files to test.

Thanks @paulobreim. But since @tc-wleite knows the code and has samples, I think it would be easier for him, when he has some spare time.

@lfcnassif lfcnassif changed the title Old WhatsApp databases parsing affected by forwarded message feature Old WhatsApp databases parsing affected by forwarded message feature [4.1.3 regression] Aug 11, 2023
@wladimirleite
Copy link
Member

Thank you @tc-wleite! I made a few tests, but I would appreciate a second opinion given the previous regression. But don't hurry, 4.1.4 can wait next week.

I just collected ~10 older databases (from 2014 to 2020).
I will run some tests and report here later.

@wladimirleite
Copy link
Member

wladimirleite commented Aug 12, 2023

Tests finished and everything seems fine.
I also included more recent databases in the tests.
Only one of the older databases triggered the issue, when processing with 4.1.3.
By the way, sorry for overlooking this while implementing the forwarded messages feature.

@lfcnassif
Copy link
Member Author

Thank you for all tests @tc-wleite! Don't worry, I didn't detect this regression in my large test with 300 DBs either...

@gfd2020
Copy link
Collaborator

gfd2020 commented Aug 17, 2023

I have an old Whatsapp database (2017) and it is not working on the master branch.
No chat retrieved.

Not Working:
4.0.2
4.1.2
4.1.3
4.1.4
Master-branch (downloaded yesterday)

Working ( all chats retrieved.) :
3.15.6
3.17-rc1
3.18.14

@wladimirleite
Copy link
Member

Hi @gfd2020!
Can you share this database privately?
Did you check the processing log for errors related to this file? Errors related to missing columns/tables usually can be found in the log.

@lfcnassif
Copy link
Member Author

If confirmed, I think we should open another issue, since it seems a 4.0 regression not related to forwarded column.

@gfd2020
Copy link
Collaborator

gfd2020 commented Aug 17, 2023

Hi @gfd2020! Can you share this database privately? Did you check the processing log for errors related to this file? Errors related to missing columns/tables usually can be found in the log.

I found the problem. If you order the msgstore.db file to be processed directly, IPED does not parse the chats, but if it is inside a directory and you order it to be processed, it works.
Is this behavior expected?

Example:
iped.exe -d C:\dir\msgtore.db -o out ( does not work)
iped.exe -d C:\dir -o out ( works)

PS: The old versions (3.x) don't have this problem.

@lfcnassif
Copy link
Member Author

Not sure, but I think this was intentional, to avoid parsing the whole root evidence into UI when clicking on it, since it can be huge.

@wladimirleite
Copy link
Member

wladimirleite commented Aug 17, 2023

Not sure, but I think this was intentional, to avoid parsing the whole root evidence into UI when clicking on it, since it can be huge.

Current command line help says:

    -d, -data
      input data (can be used multiple times): folder, DD, 001, E01 images
      (+AFF on Linux), ISO, physical drive, or *.iped file (with tagged files
      to export and reindex)

Are there situations in which processing a single file of other types is useful?
If not, to avoid confusion, it would be possible to reject, during the initial check of input parameters, other regular files as input.
AD1, UFDR and probably some archive types should also be accepted (7z, zip, rar, gz, tar).

@lfcnassif
Copy link
Member Author

Current command line help says:

    -d, -data
      input data (can be used multiple times): folder, DD, 001, E01 images
      (+AFF on Linux), ISO, physical drive, or *.iped file (with tagged files
      to export and reindex)

Hum... this is outdated and we should add Ex01, VHD, VHDX, VMDK, AD1, UFDR to the help.

Are there situations in which processing a single file of other types is useful?

I already did what @gfd2020 tried in the past to test processing of a single test file when it was located together with other files in the same folder into my regression test corpus. But in real use cases, I don't see an advantage doing this. Actually, the single file used to be rendered using a hard drive icon in the Evidence Tree, what is wrong, not sure how it is rendered today.

@wladimirleite
Copy link
Member

Actually, the single file used to be rendered using a hard drive icon in the Evidence Tree, what is wrong, not sure how it is rendered today.

It uses the associated icon (not fixed as a hard drive anymore).

@gfd2020
Copy link
Collaborator

gfd2020 commented Aug 18, 2023

Sorry for the confusion. I didn't know about this IPED behavior, as I use the 3.x version more, I thought it would be the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants