Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Export] Inconsistency in the number of exported entities #7925

Closed
Lhorus6 opened this issue Aug 1, 2024 · 16 comments
Closed

[Export] Inconsistency in the number of exported entities #7925

Lhorus6 opened this issue Aug 1, 2024 · 16 comments
Assignees
Labels
bug use for describing something not working as expected export Functional scope : EXPORT solved use to identify issue that has been solved (must be linked to the solving PR)
Milestone

Comments

@Lhorus6
Copy link

Lhorus6 commented Aug 1, 2024

Description

When exporting a list of entities, I get inconsistencies in the number of entities I get.

Here's an example:

On testing, when I filter on the Indicators list "Platform creation date > 7/30/2024", I get 137 results according to the UI

Screenshot 2024-08-01 200505

  • When I export everything in csv format (simple export), I get 200 entities (nb: 201 lines - 1 header line) -> how can I have more than I have in the UI?

Screenshot 2024-08-01 200752

  • When I export everything in plain/txt format (simple export), I get 100 entities (nb: 100 lines, 1 Indicator per line) -> is there a limit of 100 for txt export? If not, why I have less than in the UI?

Screenshot 2024-08-01 201207

  • When I export everything in json/stix format (simple export), I get 137 Indicators, so the STIX export seems correct.

Environment

OCTI 6.2.10

Reproducible Steps

Steps to create the smallest reproducible scenario:

  1. Go to the list of indicators.
  2. Apply filters to get 100 - 500 entities.
  3. Try exporting in different formats (simple export).
  4. Compare numbers of elements between exports, and values in the interface.

Expected Output

Having the same number of elements everywhere

Additional information

A customer reported to me these numbers that he observed on his side :

  • The GUI shows 708 elements.
  • CSV and TXT exports return 407 items.
  • The STIX export returns 708 elements.
  • He also had a case where he could only find 107 elements.
@Lhorus6 Lhorus6 added bug use for describing something not working as expected needs triage use to identify issue needing triage from Filigran Product team labels Aug 1, 2024
@nino-filigran
Copy link

I've tried the following on Demo (public):

  • filter on IOCs with original creation date greater than/equal 07/30/2024 AND Main obs type = Artifact OR Hostname OR IPV4 OR IPV6 OR Network Traffic OR software resulting in 104 entities.

  • All exports showing me 104 entities (export with MArking = TLP RED to ensure I got all IOCs)

  • filter on IOCs with original creation date greater than/equal 07/01/2024 AND Main obs type = Artifact OR Hostname OR IPV4 OR IPV6 OR Network Traffic OR software resulting in 493 entities.

  • All exports showing me 493 entities (export with MArking = TLP RED to ensure I got all IOCs)

Will try on testing and keep you posted

@nino-filigran
Copy link

@Lhorus6 Even on testing I'm not able to reproduce this morning. I've filtered on orginal creation date greater than/equal to 08/01/24 giving me 111 entities and in all exports I got the same. I know we had an issue with markings until yesterday that could affect connectors. Could you please retry on your side/ask the customer to retry using the latest version and see if this is still happening?

@Kedae
Copy link
Member

Kedae commented Aug 2, 2024

Additionnaly I think CSV export doesn't handle "simple" export and always do full

@labo-flg
Copy link
Member

labo-flg commented Aug 2, 2024

I managed to reproduce the issue by :

  • adding a user with marking access limited to TLP:GREEN (but still all the required permissions like any other connector)
  • set this user's token in the configuration of the export-file-txt connector
  • get a list of indicators, all TLP:CLEAR => got 72 items
  • update one of them to TLP:RED
  • export in text

=> only 71 items exported ; the one in TLP:RED has not been exported.

@Lhorus6 Could you check if the user associated with the connectors have the right config (allowed markings) ?
I mean the token used in the config corresponds to the right user.

@Lhorus6
Copy link
Author

Lhorus6 commented Aug 2, 2024

All export* connectors' user are part of the Connectors Group. This Group can access all markings

@labo-flg
Copy link
Member

labo-flg commented Aug 2, 2024

And are the connectors using the right tokens ? (maybe a misconfig in the config.yml files)

@labo-flg
Copy link
Member

labo-flg commented Aug 2, 2024

Latest findings: if the connector user belongs to several groups, and one group has a very restrictive max_shareable_markings, the most restrictive configuration applies. So the connector user might be allowed to see an entity in TLP;RED if allowed marking reaches TLP;RED, but still unable to export it if max_shareable_marking of one of the groups has TLP:GREEN for instance.

Also note that if a max_shareable_marking list does not contain a marking category, this means entities with any marking of this category cannot be shared.

I've done some tests playing with these parameters and I get numbers inconsistency.

@labo-flg
Copy link
Member

labo-flg commented Aug 2, 2024

Still investigating though ; this seems to not be the case here in this issue case : we tested with the connector user configured at allowed markings = all markings, and max_shareable markings = all markings.

It work as expected when I test locally, but still failing on target setup.

@Lhorus6
Copy link
Author

Lhorus6 commented Aug 2, 2024

"the connectors using the right tokens" -> everything is scripted, there's no reason why it shouldn't be.
I can't check, it's in a secret in the connector VMs.

@Lhorus6
Copy link
Author

Lhorus6 commented Aug 2, 2024

  • max_shareable markings -> isn't that just for public dashboards?
  • "most restrictive configuration applies" -> I don't think that's logical. If I'm allowed to read A on one side and, A and B on the other, why can I only read A?

@labo-flg
Copy link
Member

labo-flg commented Aug 5, 2024

"the connectors using the right tokens" -> everything is scripted, there's no reason why it shouldn't be.

Ok, I would be surprise too.

max_shareable markings -> isn't that just for public dashboards?

No, it's for sharing data through files too, so applies in our case.

"most restrictive configuration applies" -> I don't think that's logical. If I'm allowed to read A on one side and, A and B on the other, why can I only read A?

I honestly don't know why we have this behavior and why it would be relevant. I'm interested however in being consistent across the app. @nino-filigran wdyt ?

@nino-filigran nino-filigran added needs more info Intel needed about the use case and removed needs triage use to identify issue needing triage from Filigran Product team labels Aug 5, 2024
@nino-filigran
Copy link

I would agree with @Lhorus6 on this: If I'm not mistaken, when you're in several groups (let's say group A with TLP Green and Group B with TLP Red), we grant you access to TLP Red. Same applies for capabilities: we would grant you the capa of the group with higher capa. So to remain coherent, I think it would make more sense to have the same behavior : if you're in two groups and the max shareable marking is different, always take the "highest".

But what I don't get is is this the root cause of the problem here?

cc @romain-filigran FYI

@labo-flg
Copy link
Member

labo-flg commented Aug 5, 2024

We can work on fixing the behavior for max_shareable_marking, that's for sure.

But what I don't get is is this the root cause of the problem here?

This could have been the root cause of the problem (as I stated earlier I can reproduce somehow the issue this way) but apparently it's not the case.

I still don't know what's causing the difference in numbers. I've mostly investigated around marking access, as it seemed a good suspect, but now I'm stuck. No idea.

@nino-filigran nino-filigran added the export Functional scope : EXPORT label Aug 5, 2024
@JeremyCloarec
Copy link
Contributor

JeremyCloarec commented Aug 23, 2024

We found the cause of the issue: pagination isn't handled correctly when sorting by _score.
Exports were done sorting by _score, but the pagination is inconsistant when doing so: the _score can change between the first and second query, breaking the pagination in the process. This explain the inconsistent behavior, with the export sometimes working properly, and other times exporting an incoherent number of entities.
This can also be reproduced on the front end: when going on a list and ordering by _score, the infinite scrolling breaks after scrolling for a while

@JeremyCloarec
Copy link
Contributor

A fix is being worked on to make _score ordering work.
In the meantime, a workaround is available: before exporting a list, you can order by any given column. This will cause the export to use a different ordering than the _score one, and thus make the export work properly

@nino-filigran nino-filigran added this to the Bugs backlog milestone Sep 18, 2024
@JeremyCloarec
Copy link
Contributor

This issue was fixed with pr OpenCTI-Platform/client-python#733 on python client

@JeremyCloarec JeremyCloarec added the solved use to identify issue that has been solved (must be linked to the solving PR) label Sep 27, 2024
@labo-flg labo-flg modified the milestones: Bugs backlog, Release 6.3.4 Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug use for describing something not working as expected export Functional scope : EXPORT solved use to identify issue that has been solved (must be linked to the solving PR)
Projects
None yet
Development

No branches or pull requests

5 participants