Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep feeds in Redis synchronized with wiki over time #294

Closed
humphd opened this issue Nov 20, 2019 · 16 comments
Closed

Keep feeds in Redis synchronized with wiki over time #294

humphd opened this issue Nov 20, 2019 · 16 comments
Assignees

Comments

@humphd
Copy link
Contributor

humphd commented Nov 20, 2019

This is related to #266, #264. We are going to be downloading the feed list (names, urls) from the wiki, and placing them in Redis. We'll need to keep this data in sync:

  1. we need to re-run this download code over and over at some interval, every 10 minutes? It should be configurable in the env
  2. when we re-download, we have to be careful to remove any feeds/posts from Redis that shouldn't be there. For example, if we download the wiki feeds the first time, and it includes "David Humphrey" and "https://david.humphrey.com/feed", we'll add that to Redis. But, on the second download, this feed could be removed (user deleted it). In this case, we need to clean up Redis to remove the feed and posts (if any) for that user.

I suspect that this will involve altering how we store feeds in Redis so that we can easily look them up by name/url, which we can't do at the moment.

@humphd humphd added area: back-end area: redis Redis Database related labels Nov 20, 2019
@sukhbeersingh
Copy link
Contributor

I'd like to take this up.

@sukhbeersingh
Copy link
Contributor

This is potentially a duplicate of #503.

@sukhbeersingh sukhbeersingh removed their assignment Jan 20, 2020
@humphd
Copy link
Contributor Author

humphd commented Jan 21, 2020

This one is different. In #503 we added a way to loop forever, and re-download/parse feeds.

This issue is about keeping the feeds in Redis up-to-date with what's in the wiki. The work in #503 won't do anything to remove a feed from Redis if it's removed from the wiki, for example.

@humphd
Copy link
Contributor Author

humphd commented Jan 30, 2020

@Silvyre, I was thinking about this today. Our wiki feed list is (currently) our "source of truth," and we want to use it as the official list of what we store in our system. Given this, here are the cases I can think of that we need to deal with:

  1. the wiki feed list has 100 feeds in it, redis has zero feeds in it (first run): we need to put all 100 feeds into redis
  2. the wiki feed list has 100 feeds in it, redis has the same 100 feeds in it (second, third, ... normal run). We don't need to add anything to redis
  3. the wiki feed list has 101 feeds in it, redis has 100 feeds in it (new feed added to wiki): we need to add the new feed to redis
  4. the wiki feed list has 99 feeds in it, redis has 100 feeds in it (feed removed from wiki): we need to remove this feed (and all posts associated with it) from redis
  5. the wiki feed has 100 feeds in it, redis has 100 feeds in it, but they don't match (one feed was deleted from the wiki, and and a new one added): we need to delete the one that was deleted, and add the new one.

The code that currently does a bunch of this is located in src/backend/index.js. When we download the feed list, we parse it into Objects, then we try to get an existing feed from Redis that matches this feed's URL (using Feed.byId(), where the id is a hash of the URL).

What I think we need to also do is get the complete set of all feed Objects from Redis, and put them in a JavaScript Set Object. We then need to create a second JavaScript Set Object, and put all the parsed feed Objects (i.e., from the wiki) into it. Now we can easily calculate the .difference() and get the feed. Here's some pseudo code to show you what I mean:

// Create a set of IDs for feeds in the wiki
let wiki = new Set(['one', 'two', 'three'])

// Create a set of IDs for feeds in redis
let redis = new Set(['one', 'two', 'four'])

// Figure out which feeds need to get added to Redis (in wiki, not in redis)
let newFeeds = new Set([...wiki].filter(feed => !redis.has(feed)))
// this will give us: Set { 'three' }

// Figure out which feeds need to get removed from redis (gone from wiki, but still in redis)
let deletedFeeds = new Set([...redis].filter(feed => !wiki.has(feed)))
// this will give us: Set { 'four' }

So now we can iterate newFeeds and add all of these to Redis, and iterate through all of deletedFeeds and remove them (along with any associated keys, posts, etc).

Writing tests for this will be a bit of work, but we can do that in a follow-up. What do you think of this?

@Silvyre
Copy link
Contributor

Silvyre commented Feb 10, 2020

I plan to collaborate on this as part of Release 0.7

@Silvyre
Copy link
Contributor

Silvyre commented Feb 19, 2020

We agreed to delay this until the user structure (including 'null'/'wiki' users and/or 'admin' users) is implemented, e.g. #709

@Silvyre Silvyre added the Blocked Can't do this, until something else is done label Mar 2, 2020
@Silvyre
Copy link
Contributor

Silvyre commented Mar 30, 2020

Unblocked by #902 + #898, I believe? @humphd

@Silvyre Silvyre removed the Blocked Can't do this, until something else is done label Mar 30, 2020
@cindyorangis
Copy link
Contributor

@Silvyre Did you have your own triage meeting today? How did you remember this issue from November? XD

@Silvyre
Copy link
Contributor

Silvyre commented Mar 30, 2020

@Silvyre Did you have your own triage meeting today? How did you remember this issue from November? XD

This particular issue has earned a special place in my heart, as it is the only one I have had to completely scrap my initial work on.

@cindyorangis
Copy link
Contributor

@Silvyre Did you have your own triage meeting today? How did you remember this issue from November? XD

This particular issue has earned a special place in my heart, as it is the only one I have had to completely scrap my initial work on.

Bring it back! Bring it back! I'm actually planning to bring back my own issue from November as well #290 after @agarcia-caicedo and I finish #896

@Silvyre
Copy link
Contributor

Silvyre commented Mar 30, 2020

We'll see if I have time before the code freeze! 🥶

@cindyorangis
Copy link
Contributor

Same

@Silvyre
Copy link
Contributor

Silvyre commented Apr 23, 2020

@Silvyre, I was thinking about this today. Our wiki feed list is (currently) our "source of truth," and we want to use it as the official list of what we store in our system. Given this, here are the cases I can think of that we need to deal with:

  1. the wiki feed list has 100 feeds in it, redis has zero feeds in it (first run): we need to put all 100 feeds into redis
  2. the wiki feed list has 100 feeds in it, redis has the same 100 feeds in it (second, third, ... normal run). We don't need to add anything to redis
  3. the wiki feed list has 101 feeds in it, redis has 100 feeds in it (new feed added to wiki): we need to add the new feed to redis

As far as I know, our system currently handles the above three use cases.

  1. the wiki feed list has 99 feeds in it, redis has 100 feeds in it (feed removed from wiki): we need to remove this feed (and all posts associated with it) from redis

We have yet to implement this use case, I believe.

  1. the wiki feed has 100 feeds in it, redis has 100 feeds in it, but they don't match (one feed was deleted from the wiki, and and a new one added): we need to delete the one that was deleted, and add the new one.

This use case is a combination of use cases 3 (handled) and 4 (not yet handled). As such, resolving use case 4 should resolve this one, as well.

The code that currently does a bunch of this is located in src/backend/index.js. When we download the feed list, we parse it into Objects, then we try to get an existing feed from Redis that matches this feed's URL (using Feed.byId(), where the id is a hash of the URL).

What I think we need to also do is get the complete set of all feed Objects from Redis, and put them in a JavaScript Set Object. We then need to create a second JavaScript Set Object, and put all the parsed feed Objects (i.e., from the wiki) into it. Now we can easily calculate the .difference() and get the feed. Here's some pseudo code to show you what I mean:

// Create a set of IDs for feeds in the wiki
let wiki = new Set(['one', 'two', 'three'])

// Create a set of IDs for feeds in redis
let redis = new Set(['one', 'two', 'four'])

// Figure out which feeds need to get added to Redis (in wiki, not in redis)
let newFeeds = new Set([...wiki].filter(feed => !redis.has(feed)))
// this will give us: Set { 'three' }

// Figure out which feeds need to get removed from redis (gone from wiki, but still in redis)
let deletedFeeds = new Set([...redis].filter(feed => !wiki.has(feed)))
// this will give us: Set { 'four' }

So now we can iterate newFeeds and add all of these to Redis, and iterate through all of deletedFeeds and remove them (along with any associated keys, posts, etc).

Writing tests for this will be a bit of work, but we can do that in a follow-up. What do you think of this?

I believe we can we could implement the remaining missing logic (i.e. for use case 4) in /backend/index.js...

async function processAllFeeds() {
try {
// Get an Array of Feed objects from the wiki feed list and Redis
const [all, wiki] = await Promise.all([Feed.all(), getWikiFeeds()]);
// Process these feeds into the database and feed queue
await processFeeds([...all, ...wiki]);
} catch (err) {
logger.error({ err }, 'Error queuing feeds');
}
}

...like so:

 // telescope/src/backend/index.js
 // ...
 const normalizeUrl = require('normalize-url');
 const hash = require('../src/backend/data/hash');
 const urlToId = (url) => hash(normalizeUrl(url));
 // ...

 async function processAllFeeds() { 
   try { 
     // Get an Array of Feed objects from the wiki feed list and Redis 
     let [all, wiki] = await Promise.all([Feed.all(), getWikiFeeds()]); 
     
     // ensure no duplicate feeds exist
     all = Array.from(new Set(all));
     wiki = Array.from(new Set(wiki));

     const wikiIds = wiki.map(feed => urlToId(feed.url));
    
     // prevent deleted Wiki feeds from being processed
     // additionally delete such feeds from Redis
     all = all.filter(async (feed) => {
       if (!wikiIds.includes(feed.id)) {
          await feed.delete();
          return false;
       }
       return true;
     });
     
     // Process remaining feeds into the database and feed queue 
     await processFeeds([...all, ...wiki]); 
   } catch (err) { 
     logger.error({ err }, 'Error queuing feeds'); 
   } 
 } 

On a related note, there now exists a new problematic use case, however, that we may wish to address within a separate issue:

  • Situation: a Wiki feed is deleted via the My Feeds page.
  • Current behaviour: When the Wiki feeds list is subsequently fetched, the previously-deleted feed will be added back to Telescope.
  • Desired behaviour: Wiki feeds deleted via the My Feeds page should be 'blacklisted', preventing Telescope from re-adding previously-deleted feeds when the Wiki feeds list is fetched.
    • (Previously-deleted feeds should still be able to be re-added via the My Feeds page. In this edge case, its 'blacklisted' status should be lifted.)

@humphd
Copy link
Contributor Author

humphd commented Apr 23, 2020

Here's another way to look at this problem:

What if we got rid of the wiki feed list?

All we need is a way for an Admin to be able to bulk insert a bunch of feeds once, and then this problem goes away. Maybe we should morph this bug into:

"Allow an Admin to import an OPML or Planet Feed List formatted file, and add all the associated feeds."

I don't see any value in keeping the feed list now that we have Add Feed via authenticated My Feeds.

@humphd
Copy link
Contributor Author

humphd commented Apr 23, 2020

I filed #1058, #1059 to move us away from needing to solve this.

@humphd
Copy link
Contributor Author

humphd commented Feb 23, 2022

This problem is going away now that feeds are moving into Supabase. Closing.

@humphd humphd closed this as completed Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants