feat: support for conversation_id #115

ri72miieop · 2024-10-10T21:24:53Z

This PR adds the field conversation_id to the tweets table. It has a value when the whole chain of replies up to the main tweet is found on the archive, otherwise it stays NULL.

When the user finishes the upload of the archive and the upload_stage is changed to 'complete' a new job with the key 'update_conversation_ids' is queued. This job calls a function which runs the update over all the tweets (this is needed to also update conversation_id of other users that have interacted with tweets from the one who uploaded, but there's some space for optimizations here) and refreshes the materialized view.

The materialized view 'main_thread_view' contains only the tweets in the main thread, so no replies from other users nor a continuation of a conversation between the user who wrote the thread and others.

related to #70

vercel · 2024-10-10T21:24:57Z

@ri72miieop is attempting to deploy a commit to the theexgenesis' projects Team on Vercel.

A member of the Team first needs to authorize it.

vercel · 2024-10-11T10:08:46Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
community-archive	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 28, 2024 8:00pm

TheExGenesis · 2024-10-14T10:04:28Z

@ri72miieop you can run pnpm build locally to check if it'll build in production - in this case it was this error

TheExGenesis · 2024-10-14T10:26:04Z

btw for long running materialized view refreshes it's best to do it concurrently - otherwise selects are prevented REFRESH MATERIALIZED VIEW CONCURRENTLY main_thread_view;

TheExGenesis · 2024-10-14T10:27:43Z

I also wouldn't put conversation_id in the main tweets table, I'd make a separate table, just bc it's something we're adding on

TheExGenesis · 2024-10-14T10:28:40Z

I would also put all these functions in the private schema, the only reason to be in public is if we need to call them from the client side, as a user

TheExGenesis · 2024-10-14T10:35:37Z

I'm also wondering if that materialized view is necessary. Do you have a specific purpose in mind for it? Might it be enough to have a regular view that computes just in time?

* 'main' of https://github.com/ri72miieop/community-archive: fix: import formatteduser type script: check archives in storage but not db and vice versa refactor: user info, add user followers to featured profiles, likes to user dir feat: word occurrences pg fn Update archive_data.md Update archive_data.md style: format counts script: anon key download storage fix: don't require like.js fix: delete timeout migration doc: what data we use from archive fix: delete archive timeout

Added index on main_thread_view Refresh materialized view concurrently

TheExGenesis · 2024-10-14T16:30:45Z

for context here

…tions Change materilized view to calculate info for all threads for a function that does that same for a single conversation_id Create view tweets_w_conversation_id

ri72miieop · 2024-10-15T18:23:37Z

Changes made:

Move conversation_id from tweets table to a new table called conversations
Change materialized view to calculate info for all threads for a function that does that same for a single conversation_id
Create view tweets_w_conversation_id

It's compiling on my machine but it seems to be failing on Vercel? Not sure if that's because there's some manual process necessary or it is failing to compile, if there's any issue let me know.

TheExGenesis · 2024-10-24T10:45:48Z

@ri72miieop Thanks for updating! Looping over individual tweets seems inefficient, have you considered starting from root tweets and processing them in layers? I asked Claude to draft a proposal. Haven't tested yet.

ri72miieop · 2024-10-24T17:40:01Z

Yeah this is similar to what I was doing before. I tested it now, results:

Going one by one seems inefficient but I am taking advantage of the order of the tweets, so when I search for the tweet with the id reply_to_tweet_id in the temporary table I have a guarantee that it either is there and already has the conversation_id computed or is null which means we don't have the complete thread in the db yet so we can ignore for now.

Meanwhile in the recursive case if it is a thread with like 300 elements it will have to recursively search many times for each of the tweet (this is why I it was taking 1m with my test data that had 10% of the tweets of the archive and with the full archive it was timing out).

my SQL is rusty, so if I missed something let me know, but I think this is the reason recursively it scales so much worse with more data than just going through each one of them one by one

TheExGenesis · 2024-10-28T19:19:22Z

testing the latest functions
btw there's a typo in conversations table, needs a comma after "text"


CREATE TABLE IF NOT EXISTS "public"."conversations" (
    "tweet_id" text NOT NULL PRIMARY KEY,
    "conversation_id" text,
    FOREIGN KEY (tweet_id) REFERENCES public.tweets(tweet_id)
);

ri72miieop added 2 commits October 10, 2024 21:42

add conversation_id to tweets

faf3459

fix database-types

72c5754

vercel bot had a problem deploying to Preview October 11, 2024 10:10 Failure

ri72miieop and others added 2 commits October 13, 2024 14:08

Merge branch 'main' into main

d85f12c

Merge branch 'main' into main

df96727

vercel bot had a problem deploying to Preview October 14, 2024 10:04 Failure

ri72miieop added 2 commits October 14, 2024 13:32

fix unused imports

1729a9e

DefenderOfBasic mentioned this pull request Oct 14, 2024

refresh materialized view concurrently #124

Merged

Change schema of functions from public to private;

16acef9

Added index on main_thread_view Refresh materialized view concurrently

ri72miieop added 4 commits October 15, 2024 17:38

Move conversation_id from tweets table to a new table called conversa…

eaee98b

…tions Change materilized view to calculate info for all threads for a function that does that same for a single conversation_id Create view tweets_w_conversation_id

fix usage conversation_id

7764a2f

fix database-types.ts

53b4ad0

Merge branch 'main' into main

c9a4182

vercel bot deployed to Preview October 15, 2024 19:24 View deployment

Merge branch 'main' into main

e40b7d5

vercel bot deployed to Preview October 18, 2024 15:03 View deployment

TheExGenesis and others added 2 commits October 18, 2024 16:04

Merge branch 'main' into main

71f16aa

improved performance

302ff9d

ri72miieop changed the title ~~feat: Add conversation_id to tweets and created materialized view with the main threads of every user~~ feat: support for conversation_id Oct 18, 2024

vercel bot deployed to Preview October 19, 2024 16:46 View deployment

Merge branch 'main' into main

c4e0cf4

vercel bot deployed to Preview October 24, 2024 10:29 View deployment

Merge branch 'main' into main

206a562

vercel bot deployed to Preview October 28, 2024 19:02 View deployment

Merge branch 'main' into main

65ec272

vercel bot deployed to Preview October 28, 2024 20:00 View deployment

TheExGenesis merged commit e4639d6 into TheExGenesis:main Oct 28, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support for conversation_id #115

feat: support for conversation_id #115

ri72miieop commented Oct 10, 2024 •

edited

Loading

vercel bot commented Oct 10, 2024

vercel bot commented Oct 11, 2024 •

edited

Loading

TheExGenesis commented Oct 14, 2024

TheExGenesis commented Oct 14, 2024

TheExGenesis commented Oct 14, 2024

TheExGenesis commented Oct 14, 2024

TheExGenesis commented Oct 14, 2024

TheExGenesis commented Oct 14, 2024

ri72miieop commented Oct 15, 2024

TheExGenesis commented Oct 24, 2024

ri72miieop commented Oct 24, 2024

TheExGenesis commented Oct 28, 2024

feat: support for conversation_id #115

feat: support for conversation_id #115

Conversation

ri72miieop commented Oct 10, 2024 • edited Loading

vercel bot commented Oct 10, 2024

vercel bot commented Oct 11, 2024 • edited Loading

TheExGenesis commented Oct 14, 2024

TheExGenesis commented Oct 14, 2024

TheExGenesis commented Oct 14, 2024

TheExGenesis commented Oct 14, 2024

TheExGenesis commented Oct 14, 2024

TheExGenesis commented Oct 14, 2024

ri72miieop commented Oct 15, 2024

TheExGenesis commented Oct 24, 2024

ri72miieop commented Oct 24, 2024

TheExGenesis commented Oct 28, 2024

ri72miieop commented Oct 10, 2024 •

edited

Loading

vercel bot commented Oct 11, 2024 •

edited

Loading