Keep feeds in Redis synchronized with wiki over time #294

humphd · 2019-11-20T17:56:07Z

This is related to #266, #264. We are going to be downloading the feed list (names, urls) from the wiki, and placing them in Redis. We'll need to keep this data in sync:

we need to re-run this download code over and over at some interval, every 10 minutes? It should be configurable in the env
when we re-download, we have to be careful to remove any feeds/posts from Redis that shouldn't be there. For example, if we download the wiki feeds the first time, and it includes "David Humphrey" and "https://david.humphrey.com/feed", we'll add that to Redis. But, on the second download, this feed could be removed (user deleted it). In this case, we need to clean up Redis to remove the feed and posts (if any) for that user.

I suspect that this will involve altering how we store feeds in Redis so that we can easily look them up by name/url, which we can't do at the moment.

The text was updated successfully, but these errors were encountered:

sukhbeersingh · 2019-11-21T15:21:49Z

I'd like to take this up.

sukhbeersingh · 2020-01-20T23:42:41Z

This is potentially a duplicate of #503.

humphd · 2020-01-21T00:54:21Z

This one is different. In #503 we added a way to loop forever, and re-download/parse feeds.

This issue is about keeping the feeds in Redis up-to-date with what's in the wiki. The work in #503 won't do anything to remove a feed from Redis if it's removed from the wiki, for example.

humphd · 2020-01-30T17:46:40Z

@Silvyre, I was thinking about this today. Our wiki feed list is (currently) our "source of truth," and we want to use it as the official list of what we store in our system. Given this, here are the cases I can think of that we need to deal with:

the wiki feed list has 100 feeds in it, redis has zero feeds in it (first run): we need to put all 100 feeds into redis
the wiki feed list has 100 feeds in it, redis has the same 100 feeds in it (second, third, ... normal run). We don't need to add anything to redis
the wiki feed list has 101 feeds in it, redis has 100 feeds in it (new feed added to wiki): we need to add the new feed to redis
the wiki feed list has 99 feeds in it, redis has 100 feeds in it (feed removed from wiki): we need to remove this feed (and all posts associated with it) from redis
the wiki feed has 100 feeds in it, redis has 100 feeds in it, but they don't match (one feed was deleted from the wiki, and and a new one added): we need to delete the one that was deleted, and add the new one.

The code that currently does a bunch of this is located in src/backend/index.js. When we download the feed list, we parse it into Objects, then we try to get an existing feed from Redis that matches this feed's URL (using Feed.byId(), where the id is a hash of the URL).

What I think we need to also do is get the complete set of all feed Objects from Redis, and put them in a JavaScript Set Object. We then need to create a second JavaScript Set Object, and put all the parsed feed Objects (i.e., from the wiki) into it. Now we can easily calculate the .difference() and get the feed. Here's some pseudo code to show you what I mean:

// Create a set of IDs for feeds in the wiki
let wiki = new Set(['one', 'two', 'three'])

// Create a set of IDs for feeds in redis
let redis = new Set(['one', 'two', 'four'])

// Figure out which feeds need to get added to Redis (in wiki, not in redis)
let newFeeds = new Set([...wiki].filter(feed => !redis.has(feed)))
// this will give us: Set { 'three' }

// Figure out which feeds need to get removed from redis (gone from wiki, but still in redis)
let deletedFeeds = new Set([...redis].filter(feed => !wiki.has(feed)))
// this will give us: Set { 'four' }

So now we can iterate newFeeds and add all of these to Redis, and iterate through all of deletedFeeds and remove them (along with any associated keys, posts, etc).

Writing tests for this will be a bit of work, but we can do that in a follow-up. What do you think of this?

Silvyre · 2020-02-10T15:11:56Z

I plan to collaborate on this as part of Release 0.7

Silvyre · 2020-02-19T16:02:46Z

We agreed to delay this until the user structure (including 'null'/'wiki' users and/or 'admin' users) is implemented, e.g. #709

Silvyre · 2020-03-30T22:42:39Z

Unblocked by #902 + #898, I believe? @humphd

cindyorangis · 2020-03-30T22:49:43Z

@Silvyre Did you have your own triage meeting today? How did you remember this issue from November? XD

Silvyre · 2020-03-30T22:54:53Z

@Silvyre Did you have your own triage meeting today? How did you remember this issue from November? XD

This particular issue has earned a special place in my heart, as it is the only one I have had to completely scrap my initial work on.

cindyorangis · 2020-03-30T22:59:39Z

@Silvyre Did you have your own triage meeting today? How did you remember this issue from November? XD

This particular issue has earned a special place in my heart, as it is the only one I have had to completely scrap my initial work on.

Bring it back! Bring it back! I'm actually planning to bring back my own issue from November as well #290 after @agarcia-caicedo and I finish #896

Silvyre · 2020-03-30T23:05:59Z

We'll see if I have time before the code freeze! 🥶

cindyorangis · 2020-03-30T23:09:15Z

Same

Silvyre · 2020-04-23T05:18:48Z

@Silvyre, I was thinking about this today. Our wiki feed list is (currently) our "source of truth," and we want to use it as the official list of what we store in our system. Given this, here are the cases I can think of that we need to deal with:

the wiki feed list has 100 feeds in it, redis has zero feeds in it (first run): we need to put all 100 feeds into redis

the wiki feed list has 100 feeds in it, redis has the same 100 feeds in it (second, third, ... normal run). We don't need to add anything to redis

the wiki feed list has 101 feeds in it, redis has 100 feeds in it (new feed added to wiki): we need to add the new feed to redis

As far as I know, our system currently handles the above three use cases.

the wiki feed list has 99 feeds in it, redis has 100 feeds in it (feed removed from wiki): we need to remove this feed (and all posts associated with it) from redis

We have yet to implement this use case, I believe.

the wiki feed has 100 feeds in it, redis has 100 feeds in it, but they don't match (one feed was deleted from the wiki, and and a new one added): we need to delete the one that was deleted, and add the new one.

This use case is a combination of use cases 3 (handled) and 4 (not yet handled). As such, resolving use case 4 should resolve this one, as well.

The code that currently does a bunch of this is located in src/backend/index.js. When we download the feed list, we parse it into Objects, then we try to get an existing feed from Redis that matches this feed's URL (using Feed.byId(), where the id is a hash of the URL).

What I think we need to also do is get the complete set of all feed Objects from Redis, and put them in a JavaScript Set Object. We then need to create a second JavaScript Set Object, and put all the parsed feed Objects (i.e., from the wiki) into it. Now we can easily calculate the .difference() and get the feed. Here's some pseudo code to show you what I mean:
// Create a set of IDs for feeds in the wiki
let wiki = new Set(['one', 'two', 'three'])

// Create a set of IDs for feeds in redis
let redis = new Set(['one', 'two', 'four'])

// Figure out which feeds need to get added to Redis (in wiki, not in redis)
let newFeeds = new Set([...wiki].filter(feed => !redis.has(feed)))
// this will give us: Set { 'three' }

// Figure out which feeds need to get removed from redis (gone from wiki, but still in redis)
let deletedFeeds = new Set([...redis].filter(feed => !wiki.has(feed)))
// this will give us: Set { 'four' }
So now we can iterate newFeeds and add all of these to Redis, and iterate through all of deletedFeeds and remove them (along with any associated keys, posts, etc).

Writing tests for this will be a bit of work, but we can do that in a follow-up. What do you think of this?

I believe we can we could implement the remaining missing logic (i.e. for use case 4) in /backend/index.js...

telescope/src/backend/index.js

Lines 76 to 85 in 80aa426

    
           async function processAllFeeds() { 
        
             try { 
        
               // Get an Array of Feed objects from the wiki feed list and Redis 
        
               const [all, wiki] = await Promise.all([Feed.all(), getWikiFeeds()]); 
        
               // Process these feeds into the database and feed queue 
        
               await processFeeds([...all, ...wiki]); 
        
             } catch (err) { 
        
               logger.error({ err }, 'Error queuing feeds'); 
        
             } 
        
           }

...like so:

 // telescope/src/backend/index.js
 // ...
 const normalizeUrl = require('normalize-url');
 const hash = require('../src/backend/data/hash');
 const urlToId = (url) => hash(normalizeUrl(url));
 // ...

 async function processAllFeeds() { 
   try { 
     // Get an Array of Feed objects from the wiki feed list and Redis 
     let [all, wiki] = await Promise.all([Feed.all(), getWikiFeeds()]); 
     
     // ensure no duplicate feeds exist
     all = Array.from(new Set(all));
     wiki = Array.from(new Set(wiki));

     const wikiIds = wiki.map(feed => urlToId(feed.url));
    
     // prevent deleted Wiki feeds from being processed
     // additionally delete such feeds from Redis
     all = all.filter(async (feed) => {
       if (!wikiIds.includes(feed.id)) {
          await feed.delete();
          return false;
       }
       return true;
     });
     
     // Process remaining feeds into the database and feed queue 
     await processFeeds([...all, ...wiki]); 
   } catch (err) { 
     logger.error({ err }, 'Error queuing feeds'); 
   } 
 }

On a related note, there now exists a new problematic use case, however, that we may wish to address within a separate issue:

Situation: a Wiki feed is deleted via the My Feeds page.
Current behaviour: When the Wiki feeds list is subsequently fetched, the previously-deleted feed will be added back to Telescope.
Desired behaviour: Wiki feeds deleted via the My Feeds page should be 'blacklisted', preventing Telescope from re-adding previously-deleted feeds when the Wiki feeds list is fetched.
- (Previously-deleted feeds should still be able to be re-added via the My Feeds page. In this edge case, its 'blacklisted' status should be lifted.)

humphd · 2020-04-23T18:45:12Z

Here's another way to look at this problem:

What if we got rid of the wiki feed list?

All we need is a way for an Admin to be able to bulk insert a bunch of feeds once, and then this problem goes away. Maybe we should morph this bug into:

"Allow an Admin to import an OPML or Planet Feed List formatted file, and add all the associated feeds."

I don't see any value in keeping the feed list now that we have Add Feed via authenticated My Feeds.

humphd · 2020-04-23T19:48:21Z

I filed #1058, #1059 to move us away from needing to solve this.

humphd · 2022-02-23T22:31:06Z

This problem is going away now that feeds are moving into Supabase. Closing.

humphd added area: back-end area: redis Redis Database related labels Nov 20, 2019

sukhbeersingh self-assigned this Nov 21, 2019

sukhbeersingh mentioned this issue Nov 26, 2019

Fix #266: inserted wiki parser into feed enque #341

Merged

sukhbeersingh removed their assignment Jan 20, 2020

lozinska assigned humphd and Silvyre Jan 27, 2020

Silvyre mentioned this issue Feb 16, 2020

Initial logic to synchronize feeds (#294, WIP) #727

Closed

11 tasks

Silvyre added the Blocked Can't do this, until something else is done label Mar 2, 2020

Silvyre removed the Blocked Can't do this, until something else is done label Mar 30, 2020

Silvyre mentioned this issue Apr 1, 2020

Admins should be able to modify or delete any user's feeds via frontend #909

Closed

humphd mentioned this issue Apr 2, 2020

Use redis feeds vs. wiki feeds when adding to feed queue #920

Closed

humphd mentioned this issue Apr 23, 2020

Create /admin/import route to allow bulk import of feeds #1058

Closed

chrispinkney added the area: microservices label Mar 16, 2021

humphd closed this as completed Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep feeds in Redis synchronized with wiki over time #294

Keep feeds in Redis synchronized with wiki over time #294

humphd commented Nov 20, 2019

sukhbeersingh commented Nov 21, 2019

sukhbeersingh commented Jan 20, 2020

humphd commented Jan 21, 2020

humphd commented Jan 30, 2020

Silvyre commented Feb 10, 2020

Silvyre commented Feb 19, 2020 •

edited

Loading

Silvyre commented Mar 30, 2020 •

edited

Loading

cindyorangis commented Mar 30, 2020

Silvyre commented Mar 30, 2020

cindyorangis commented Mar 30, 2020

Silvyre commented Mar 30, 2020

cindyorangis commented Mar 30, 2020

Silvyre commented Apr 23, 2020 •

edited

Loading

humphd commented Apr 23, 2020

humphd commented Apr 23, 2020

humphd commented Feb 23, 2022

Keep feeds in Redis synchronized with wiki over time #294

Keep feeds in Redis synchronized with wiki over time #294

Comments

humphd commented Nov 20, 2019

sukhbeersingh commented Nov 21, 2019

sukhbeersingh commented Jan 20, 2020

humphd commented Jan 21, 2020

humphd commented Jan 30, 2020

Silvyre commented Feb 10, 2020

Silvyre commented Feb 19, 2020 • edited Loading

Silvyre commented Mar 30, 2020 • edited Loading

cindyorangis commented Mar 30, 2020

Silvyre commented Mar 30, 2020

cindyorangis commented Mar 30, 2020

Silvyre commented Mar 30, 2020

cindyorangis commented Mar 30, 2020

Silvyre commented Apr 23, 2020 • edited Loading

humphd commented Apr 23, 2020

humphd commented Apr 23, 2020

humphd commented Feb 23, 2022

Silvyre commented Feb 19, 2020 •

edited

Loading

Silvyre commented Mar 30, 2020 •

edited

Loading

Silvyre commented Apr 23, 2020 •

edited

Loading