Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider the time period between card creation date and first review date in S0 calculation #713

Open
user1823 opened this issue Dec 4, 2024 · 13 comments
Labels
enhancement New feature or request research spaced repetition algorithm research

Comments

@user1823
Copy link
Collaborator

user1823 commented Dec 4, 2024

Consider three cards:

  • Card 1: Created and rated Good today
  • Card 2: Created 1 week ago and rated Good today
  • Card 3: Created 1 month ago and rated Good today

Currently, FSRS assigns the same S0 to all of these cards. But, if we can find a way to use the time period between card creation date and first review date in calculating S0, we can assign a higher S0 to Card 3 than Card 2 and Card 1.

Obviously, this feature will be useful only for people who make their own cards and not for those who use pre-made decks. But still...

@user1823 user1823 added enhancement New feature or request research spaced repetition algorithm research labels Dec 4, 2024
@Expertium
Copy link
Collaborator

Expertium commented Dec 4, 2024

I don't think that's possible

  1. As far as I know, in Anki we can't access the card creation date during optimization anyway
  2. Even if we can do 1, we can't distinguish between pre-made cards and cards made by the user himself
  3. I can't think of any meaningful way to incorporate this into S0

@brishtibheja
Copy link

brishtibheja commented Dec 4, 2024

After spending this year in the forums, I think most users make their own cards rather than get shared decks. Could be useful for a lot of people.


Just throwing an idea out there, note modification date might be better than card creation? You can modify a note to change the flashcard's formulation. Or you may cloze some additional parts of the note to create new cards.

@user1823
Copy link
Collaborator Author

user1823 commented Dec 4, 2024

As far as I know, in Anki we can't access the card creation date during optimization anyway

card creation date is deduced from cid, which is obviously available

we can't distinguish between pre-made cards and cards made by the user himself

In general, the time difference would be much larger for pre-made cards. Just for research purposes, one can filter out cards whose first review was done at least 1 year after card creation date. Once we get some idea on how to use the info, we can develop a better filter.

I can't think of any meaningful way to incorporate this into S0

One crude idea:
Let (first revlog id - cid)/(86400 * 1000) = x

For each first rating, group cards by x.
In each group, calculate S0 just like we do it now.

For each first rating, find a curve that fits the S0 values for each value of x.

@Expertium
Copy link
Collaborator

Oh, so cid is a unix timestamp? Interesting, I didn't know that.

I don't understand your suggestion, though. Wouldn't that result in having to calculate tons of S0's for every possible x? X is days between the card creation date and the first review, right?

@user1823
Copy link
Collaborator Author

user1823 commented Dec 4, 2024

Wouldn't that result in having to calculate tons of S0's for every possible x?

Yes, but after that, we can try to find a curve that fits most of them.

Oh, so cid is a unix timestamp? Interesting, I didn't know that.

In Anki, almost every type of id is a unix timestamp.

@Expertium
Copy link
Collaborator

Yes, but after that, we can try to find a curve that fits most of them.

Can you write some Python code or pseudocode to make it easier to understand exactly what you have in mind?

@user1823
Copy link
Collaborator Author

user1823 commented Dec 4, 2024

I still don't know how would the data (S0 for each X) look like. Once the data is available, we can try to find a function that can relate S0 and X so that we won't have to store the values of S0 in the parameters. After that, we will just need to store the 1-2 parameters that will be required in that function.

@Expertium
Copy link
Collaborator

Alright, but I still don't think that this is better than just calculating S0 the way it's done right now.

@user1823
Copy link
Collaborator Author

Just to record my thoughts:

I still don't think that this is better than just calculating S0 the way it's done right now.

When the user makes a card themselves and reviews it in Anki, they are not using Anki to learn the information. Rather, they are using Anki to retain what they have learnt from other sources.

So, if the first rating is Good, it means that they could successfully recall what they learnt earlier outside Anki. So, realistically speaking, we are calculating the next stability and not the initial stability.

Clearly, the next stability depends upon the previous stability (the one before the first rating). Unfortunately, we don't have a good way to measure the previous one. But, at least, we can say that the cards rated Good after 30 days will, on average, have a greater stability than the cards rated Good after 7 days.

Currently, FSRS treats these cards in the same way. Calculating the S0 separately after grouping the cards by number of days between card creation and first review will likely improve the calculation of S0.


A quick and dirty way to test this idea:

  • Group the cards by number of days between card creation and first review (x)
  • Use the current formula to calculate S0 for each group and prepare a matrix containing S0 for each rating and x.
  • Use this matrix to get the S0 for calculating the subsequent memory states.

Advantage over the previously proposed method:

  • It can be done quickly (for testing); we don't need to spend time thinking about the relation between x and S0.

Disadvantages over the previously proposed method:

  • Can't be used in Anki (the matrix is too large to be stored in Anki)
  • Some groups will have low number of cards, which can make this more susceptible to noise. Using a mathematical relation (as previously proposed) will solve this issue.

@DerIshmaelite
Copy link

When the user makes a card themselves and reviews it in Anki, they are not using Anki to learn the information. Rather, they are using Anki to retain what they have learnt from other sources.

That is not true. There are people (I for one) who are using Anki as the starting point from which people begin to acquire information.

@user1823
Copy link
Collaborator Author

user1823 commented Dec 14, 2024

If you are creating your own cards, you must first see that information outside Anki, right? It's ok if you don't know it that well but you still know it at the time you are creating the card.

Even if I my assumption about your use-case is wrong, there are at least some (or maybe even many) people who Anki to retain the information learnt outside Anki.

I believe that my suggestion would significantly improve the accuracy of FSRS for them. Once we have it working, we can discuss how to identify the users who Anki in this way.

@Expertium
Copy link
Collaborator

Honestly, I highly doubt it will be worth it, and it will probably be very difficult to figure out how to calculate the new S0. So unless Jarrett wants to work on it, I don't think this will be done, since I don't see the point.

@user1823
Copy link
Collaborator Author

Since there was no response from @L-M-Sherlock, I decided to do some testing on my own collection. The results are interesting. @Expertium

I used the following code in the debug console to find cards whose first review was within less than 10 days of creating the card. (copy the debug console output and paste in the browser window to search)

from aqt.utils import showText

sql = """
SELECT
     cid
FROM 
    revlog AS r1
WHERE 
    ROUND((id - cid) / (86400 * 1000)) < 10
AND
    id = (SELECT MIN(id) FROM revlog AS r2 WHERE r1.cid = r2.cid)
"""

cids = (str(cid) for cid in mw.col.db.list(sql))
search_query = f'"cid:{",".join(cids)}"'
showText(search_query, copyBtn=True)

Then, I moved these cards into a new deck and gave it a separate preset. Then, optimized both decks (presets).


Observations

Parameters of complete deck (before any of the above operations):
1.4843, 5.2091, 34.2467, 85.0440, 7.2761, 0.9915, 3.9806, 0.0010, 1.9269, 0.3428, 1.4228, 1.7287, 0.0304, 0.4317, 2.3433, 0.0171, 6.0000, 0.9510, 0.6386

Evaluation: Log loss: 0.2098, RMSE(bins): 1.32%.

Deck 1 (x < 10d):

No. of reviews: ⁨67,467⁩ reviews
Parameters:
0.9310, 3.0256, 17.7261, 35.8606, 7.3785, 0.9810, 3.9037, 0.0010, 1.7643, 0.2812, 1.2616, 1.8370, 0.1155, 0.4837, 2.0510, 0.0000, 6.0000, 0.9777, 0.6662

Evaluation with new parameters: Log loss: 0.2221, RMSE(bins): 1.69%.
Evaluation with parameters of complete deck: Log loss: 0.2227, RMSE(bins): 1.75%.

Deck 2 (x ≥ 10d):

No. of reviews: ⁨79,443⁩ reviews
Parameters:
2.4860, 8.1461, 48.3187, 73.8801, 7.1577, 0.9967, 3.9325, 0.0010, 1.9119, 0.3130, 1.4081, 2.0449, 0.0167, 0.3757, 2.3958, 0.0000, 6.0000, 0.7905, 0.6943

Evaluation with new parameters: Log loss: 0.1981, RMSE(bins): 1.34%.
Evaluation with parameters of complete deck: Log loss: 0.1986, RMSE(bins): 1.47%.

Conclusion

  • There is a significant difference in the initial stabilities between the two subdecks.
  • Splitting up the deck improved the RMSE's of both the subdecks (by 3.5% and 9% respectively).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request research spaced repetition algorithm research
Projects
None yet
Development

No branches or pull requests

4 participants