-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering events that are not replies (presence & absence filters) #523
Comments
Nostream and nostr-rs-relay (IIRC) match no events when passing an empty array for a generic tag filter... so I don't see that behaviour as useful, so what @alexgleason proposes makes sense to me. |
I will do some testing tomorrow to see what the query and performance looks like. Agree with @cameri, this is a useless query to do today (returns zero events); however I think that is the logical behavior (tag match requires at least one member of the provided list match, so a zero element list implies no match). My first thought is we should leave it as is, and searching for the absence of a tag is not a path I want to go down. But I will test and think on it a bit more. I definitely understand the value of the proposal. |
I think you're right. My OP was based on the idea relays were ignoring the empty "#e" filter and returning all events, but that does not seem to be the case. I agree it's logical to not return any events. I'll filter the events client-side for now. |
Filter sets can either be (1) missing, (2) empty or (3) have contents. Those can have three different meanings. But I am wary of assigning special meaning to the empty array. It would also break gossip's current code quite badly. (1) Missing from the filter means "do not filter on this field". |
We could use
|
I think NIP-01 is pretty clear on what should happen:
But yeah, an empty array is useless and a pretty common source of misunderstanding. I almost want to make it throw an error here "your query is broken". Filtering for events that have 0 |
Does it damage anyone if we do the |
I think it is a very different kind of query than we do in the other cases - so like @hoytech mentioned, it could necessitate a new index. I would need a subquery to deal with it, which isn't great for performance, but a proper benchmark is still on my short term todo list I think it is simpler to not have different behavior for null, and would prefer to just throw an error. I would like to see more use cases if possible to justify this search-for-no-tags option. |
The media tab wants only events which have any media tag, but it doesn't care what its value is. Let's pretend for a moment we implemented media attachments the right way on Nostr, by using an "m" tag. I want to filter out events which do not contain an "m" tag. I have no idea what that filter looks like. The more I think about it, I realize my own ask in this issue is flawed. But maybe there's some way to extend filters in a way that's more flexible and makes sense. |
This could be achieved for kind 1 events if clients agree to using a depth tag. In my opinion, an interesting feed would include not only root events (like this issue says, with no |
I'm going to close this. Thank you all for your feedback and ideas. There's maybe still a problem to be solved here, but it's not the one that's stated. |
For global feed it would make a lot of sense to have some way to query events without e tag. On some relays I need to fetch over 400 events before getting a non-reply event. This seems like huge waste of bandwidth. Regardless of what the exact syntax would be (empty array, null, array with 1 null item, new tag), I strongly think this "Filtering events that are not replies" should somehow be possible. |
Over 400! This issue should be reopened =0? @alexgleason On this brief discussion I mentioned this possible solution: What about pushing for this addition: I suspect relay databases won't have a hard time storing and indexing one-letter tag occurence count. |
Not a bad idea. |
Two different versions supporting OR query:
1) is an improvement but doesn't support for example 0 #t occurrences OR "bitcoin" because "count.#t" + '#t" are an AND clause (although it would be possible with 2 separate filters) |
Clould also do something like |
@staab The runes thing was considered too complex. NIP-26 used a simpler version so no >= nor <= for example. a) It would lead us to this: a) Is easier. #t count AND #e count must match |
So the question is, which one is best: 1), 2), a) or b)? |
I agree the runes were too complex, but a simpler version might work here. Of all of these options #1 is probably best, but |
Without the |
|
Wait, I think NIP-01 says like Sorry I'm confused. How would be your example, like |
Sorry yeah, that's what I mean. And you're right, that's an OR, not an AND, it would be a firehose. So my example won't work. |
Ok I removed the options that I think had problems (I can explain why if needed). Options to query by one-letter tag ocurrence count: |
Nah I want to check if it's present in supported_nips before attempting the query. I like your idea, though. |
@alexgleason a) b) or c) idea? 🙃️ |
These ideas are going way too far in treating relays like databases. These things you're coming up with are basically mongodb queries. Highly centralizing. |
By the way, the |
@fiatjaf this issue is about fetching only root events. |
How is querying relays centralizing? Clients can remove/hide anything anyways. Relays are just databases are they not?
The problem is not everyone does and not everyone will. |
I didn't like any of the proposals in this discussion except the ones from way back. I don't like the term "count" (confusing) or "length". Honestly I think that's just too complex for relays and does too much. This issue was about getting events that are not replies, and I think a simple solution would be good enough for now without locking us out of some more advanced approach like a runes-based approach later on. Here are the simple things you can put in your "#e" query:
I'm in favor of (4) meaning "please give me only events that do not have 'e' tags". That's simple. It doesn't introduce a bunch of stuff that is hard to reason about and hard to code into relays. And it solves the problem this issue was opened to solve. |
Related: I would like to get all events within a time range which contain ANY hashtag. I would use this to calculate trending hashtags. It's not about "count", it's about the "presence" or "absence" of a specific tag.
|
A simple way to do this might be something like:
|
Considering the events do have an empty array I think a filter with |
It seems like it's very hard for databases to do what I want (presence or absence of ANY tag), because it would require a boolean index of every possible tag on every event. You could have a partial index for only presence, or only absence, but even then you'd have to have it for every possible tag. The only way it seems doable is to do a full table scan of all events. Maybe some database genius here knows differently. |
Agree. That's the only reason I suggested expanding it to an exclude filter for all filterable items looking ahead to other potential benefits of an exclude filter, like the hashtag trending feature you're thinking about. It doesn't really seem like there's a good way to do something like that without it. If maybe that's too complex I'd be fine with the easiest route for now, but an exclude filter would be a great future add on. |
#683 proposes a presence filter, with a syntax we haven't seen yet. |
After some more research I think it is very possible for databases to achieve presence and absence filters. It's harder for some databases than others, and only particular tags would be able to support this. I think it should not be expected to be a standard feature of Nostr, and only something that particular relays implement. But I do think a way to represent the intent is needed. So I think there should be a NIP for this. How about this syntax:
Other notes:
I will open an MR for a NIP at some point. There are bigger tofu to fry at the moment. |
@alexgleason Why do you think that? #772 adds I could add For a client to ask just for root (tag abscence) or just for reply events (tag presence), it needs confidence that ALL/MOST relays implements the filter, specially considering most times the client shouldn't choose relays it prefers but instead pick strictly what is inside NIP-65 events or other relay hints. That's why it should be a NIP-01 addition or else no client is going to use it. |
I'm planning to use the syntax like I would approve a NIP for presence and absence filters. |
I propose to add new another filter
If # is given for filters, REQ returns a result containing all of the list of tag names have that follow. motivation: current specification can not find events only that have g tag. events that have geohash
events that have e and p both
|
That complicates the queries on the relay side. How about using a different kind for events that are always expected to have That's the purpose of kinds. |
The fact multiple devs have independently decided they need presence filters indicates a pain-point in the protocol. The workarounds are not great, or impossible, to do solely on the client. |
I prefer a different kind for replies, so for example kind 1 for roots and kind 11 for replies, but this would break everything so maybe a flag day 6 months from now would help. |
@alexgleason the protocol has multiple pain-points that come from the fact that Nostr isn't a centralized MongoDB. Our goal should be to work around them in a way that doesn't introduce code bloat, performance issues or complexity that results in centralization. Also there are many more clients that work perfectly well and didn't need this. |
@fabianfabian in retrospect I also think it would have been better to use a different kind for replies, but I wouldn't want to change that at this point. However we could try to use different kinds for different use cases from now on. I'm interested in learning what is the concrete use case of @mattn and @alexgleason for wanting these features so we can come up with a solution together that can be standardized -- I'm pretty sure it can be done with either a new kind or a new normal tag, or both, without having to change the relay query language. |
On Tue, Oct 24, 2023 at 02:15:02PM -0700, fiatjaf_ wrote:
However we could try to use different kinds for different use cases
from now on. I'm interested in learning what is the concrete use case
of @mattn and @alexgleason for wanting these features so we can come up
with a solution together that can be standardized -- I'm pretty sure it
can be done with either a new kind or a new normal tag, or both,
without having to change the relay query language.
one use case I ran into the other day was returning all kind1 events
with hashtags so that damus could build trending hashtag stats locally.
Right now we're relying on a fixed set of hashtags or simply everything
which is not ideal. It's a pretty niche usecase though.
|
I want to fetch only events that are not replies, ie they do not contain any "e" tags. Using the filter
{ "#e": [] }
, the empty array is ignored, and I receive events from relays that contain "e" tags.Passing an empty array to a tag filter is ambiguous, and I bet different relay software handles it differently. So I think it should be specified, and that an empty array should return events which specifically do not contain that tag.
I'm trying to adapt the following design to Nostr, where top-level posts are displayed in a separate tab than replies.
The text was updated successfully, but these errors were encountered: