Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random sort option for front page #2363

Merged
merged 28 commits into from
May 30, 2023
Merged

Conversation

jecorn
Copy link
Contributor

@jecorn jecorn commented Apr 25, 2023

What type of PR is this?

  • feature

What this PR does / why we need it:

Often, recipe managers (and paper-and-ink cookbooks) are used as inspiration for what to cook. The various sort options on the landing page (nightly) are still relatively static, and offer views that do not change much over time. This is good when (for example) going back to something that one is about to make and has recently viewed. But not as useful when planning new things to make from a recipe collection. The current "Random" button is a step towards inspiration, but is awkward when scanning through multiple recipes looking for something to make.

This PR adds a new sorting option called "Random". Choosing this option pages (infinite scroll) through the database just like all the other sort options, but recipes are randomly ordered. Choosing Ascending/Descending order has little meaning here, but choosing these different orders re-randomizes the page. Navigating away from Random order (e.g. to Alphabetical) and then back also re-randomizes the page. Category/tag/search/etc filtering work as normal in Random view. The ordering is just random.

randomsort

Implementing this involves just two tiny changes: a hook for the random sort type in repository_generic, and adding the random sort order to the index.vue.

Which issue(s) this PR fixes:

Addresses #2307

Testing

All automated tests pass. I did extensive interactive testing via the frontend and really enjoyed scrolling through various random recipe sets.

Release Notes

@michael-genson
Copy link
Collaborator

Huh, TIL about sqlalchemy func.random(). Love this

Can you add a test for this? Just a test for the "random" keyword and that it works (calling it more than once results in different items)

@jecorn
Copy link
Contributor Author

jecorn commented Apr 25, 2023

A test is a great idea. But I'm not sure how to write a test for a frontend experience. Or you mean just testing the programmatic calls to the repo backend?

PS - the sqlalchemy func.X namespace is pretty cool. All kinds of cool things in there! And so far as I can tell, sqlalchemy wraps them up neatly so that it's agnostic to database implementation (e.g. func.random() will call RANDOM() on sqlite/postgres and RAND() on SQL Server). https://docs.sqlalchemy.org/en/20/core/functions.html#selected-known-functions

@michael-genson
Copy link
Collaborator

Or you mean just testing the programmatic calls to the repo backend?

Yup just the backend route, no worries on the frontend, I don't think we have any tests there besides manual confirmation that things look right

@jecorn
Copy link
Contributor Author

jecorn commented Apr 25, 2023

OK, I just added several randomness tests with and without category, tag, tool, and search filters. Note that in my token and fuzzy search PR, made some large changes to the search test. The randomness search test here should be compatible with the search PR, so hopefully merging is smooth once the search PR is accepted.

@jecorn
Copy link
Contributor Author

jecorn commented Apr 25, 2023

By the way, randomness is weird with paging. This is a known thing and there are a lot of discussions about it on stack exchange. Basically, sql db’s all call random per page. Which means that each page will not have duplicates, but successive pages might (and with lots of calls they definitely will). This is very hard to get around. But since mealie has infinite scroll rather than numbered paging, the user experience is actually quite nice.

@michael-genson
Copy link
Collaborator

Ah interesting. While we do have infinite scroll, it actually still paginates (it just doesn't render the new data until you get there, and then it fetches the next page when you get near the end).

In that case, we may want to consider increasing the page size on the frontend when using the random sort, or disabling pagination (and just stop fetching data after the first page).

Or just leave it as-is, not sure how much it matters (if you don't like that it repeats, stop scrolling 🙂). Matter of opinion really. Otherwise LGTM

@jecorn
Copy link
Contributor Author

jecorn commented Apr 26, 2023

Mealie’s pagination is why I mentioned it: full disclosure for feature review, since it’s hard to notice. Personally, I like it as-is for a browsing-for-something-to-make experience. Unless you have a small recipe db (e.g. just enough for a few pages), then you may not even notice the repeats. The main question is whether you & @hay-kot are happy with the effect.

@hay-kot
Copy link
Collaborator

hay-kot commented May 5, 2023

Mealie’s pagination is why I mentioned it: full disclosure for feature review, since it’s hard to notice. Personally, I like it as-is for a browsing-for-something-to-make experience. Unless you have a small recipe db (e.g. just enough for a few pages), then you may not even notice the repeats. The main question is whether you & @hay-kot are happy with the effect.

I'm 100% sure this is going to lead to lots of issues in the bug tracker and discord. The only way this feature would get merged is if we can use some sort of deterministic random ordering, which I believe Postgres supports, but I don't know about SQLite

https://khanduri.github.io/learn/2016/02/26/fetch-rows-in-random-order-with-seed-support

@hay-kot hay-kot marked this pull request as draft May 5, 2023 22:20
@jecorn
Copy link
Contributor Author

jecorn commented May 6, 2023

I thought of a possible db-independent solution last week, so I’ll see what I can get to work.

@jecorn
Copy link
Contributor Author

jecorn commented May 8, 2023

@hay-kot The latest commit in this PR has a working sort backend, but the time-of-search timestamp I'm trying to set in the frontend for seeding never reaches the backend.

There are a few variables I set to help me debug: a default for timestamps instead of None, which helps see that they are set once and then never changed by the frontend, a frontend timestamp that makes it into the address bar while a timestamp destined for the passedQuery never reaches the backend. Plus some logger warnings for debugging (I wasn't getting debug log output).

Let me know if anything is non-obvious. But as @fleshgolem said in Discord, it's not really difficult. All the more mysterious to me why the timestamp disappears into the vastness of space.

Copy link
Collaborator

@hay-kot hay-kot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jecorn
Copy link
Contributor Author

jecorn commented May 10, 2023

Pretty sure you need the pass the timestamp to this component and use it here.

https://github.com/hay-kot/mealie/blob/b333ba2969a7ecda1c3997aba197020b3c2bfae2/frontend/components/Domain/Recipe/RecipeCardSection.vue#L223-L235

Yes!!! I was already passing the timestamp in the props.query RecipeSearchQuery, but I didn't notice that it was then unpacked and used in frontend/composables/recipes/use_recipes.ts:api.recipes.getAll()! Adding one line separate out the timestamp in use_recipes.ts:getAll() did it! Random ordering is now stable across pagination but re-randomizes every page load (plus every time the user re-chooses the random ordering). Thanks so much for looking at this.

@jecorn jecorn marked this pull request as ready for review May 10, 2023 07:36
@michael-genson
Copy link
Collaborator

For the seed used to generate the random order, would it be better for the backend to take any arbitrary string instead of a timestamp that is then converted into a string? I'm not sure how I feel about randomness being coupled to time when it doesn't have to be. Then we can call the backend param something like "random_seed" which is just unused if using random sort.

For the frontend, it can just pass the current datetime as a string and use that as the seed, so the behavior/logic is the same.

I like the method used for generating randomness, assuming it's performant :)

@jecorn
Copy link
Contributor Author

jecorn commented May 10, 2023

This actually reminds me that I forgot to remove the timestamp string from the address bar in index.vue. Doesn't really make sense to clutter things up there. The timestamp string in the frontend was just part of the address bar.

The backend random sorting uses any arbitrary float (in this case the timestamp), since it's just handing off to the python random.seed method. Random sorting resets the pagination time/seed to be the initial search time/seed, so that the random order is stable across pages (the paging method can only take a PaginationQuery and not a RecipeSearchQuery). The variables in the backend RecipeSearchQuery and PaginationQuery are named "timestamp", since I thought knowing the time of search or pagination could be useful at some point in the future. But they also could be any arbitrary float. And one can programmatically use, re-use, or change the timestamps, since they are just class variables. I'm using python's datetime module to provide the timestamp, which is accurate down to the microsecond and so has a very low chance of seed collision.

So I think this is already pretty close to what you suggested (just using float instead of string).

@jecorn
Copy link
Contributor Author

jecorn commented May 11, 2023

PS - at least in my testing, this case-based method is pretty fast. I didn't notice any delays in search or paging with thousands of recipes in the database. Probably because grabbing just the id for ordering means the database can make use of indices.

@jecorn
Copy link
Contributor Author

jecorn commented May 11, 2023

Sorry for the back-and-forth on the last 3 edge-case handling commits. I'm on a crazy timezone and it's starting to catch up with me. I forgot that random seeding can take a NoneType, so that doesn't need to be caught in call cases. It's better to instead check the random-ordering specific case and only ensure population and propagation of the timestamp then.

Copy link
Collaborator

@hay-kot hay-kot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, clever trick by randomly shuffling the ID's before querying all recipes.

Do you have any benchmarks on performance with random vs a regular search? That would be helpful.

@@ -31,6 +32,7 @@ class PaginationQuery(MealieModel):
order_by: str = "created_at"
order_direction: OrderDirection = OrderDirection.desc
query_filter: str | None = None
timestamp: float | None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a string and defaulting to empty would help avoid some possible null errors. Maybe be worth renaming this to randomSeed for clarity as well

Suggested change
timestamp: float | None
timestamp: string = ""

Copy link
Contributor Author

@jecorn jecorn May 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here is that timestamps can be used as seeds, but seeds cannot be used as timestamps. So months down the road, if someone wants to do something else with it (time between searches?), they could easily see this variable name and know that it contains the timestamp. Would that make sense?

For turning it into a string, that would make things more complicated and maybe inteoduce other bugs. random.seed() takes NoneType, so if an empty string somehow made it past validation that would be an Exception. But if the current implementation somehow sneaks past unset, then random.seed() still works (though it would reset the seed every page). And I think of a time stamp as an int/float but a date/time as a string. So to me, it would make sense to leave it this a float/None. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here is that timestamps can be used as seeds, but seeds cannot be used as timestamps. So months down the road, if someone wants to do something else with it (time between searches?), they could easily see this variable name and know that it contains the timestamp. Would that make sense?

But there's no validation that is in a timestamp, so it's really just any float to be used as a seed.

so if an empty string somehow made it past validation that would be an Exception.

I don't think that's a reasons to adjust programming. You can easily test that empty strings are validated correctly.

But if the current implementation somehow sneaks past unset, then random.seed() still works (though it would reset the seed every page). And I think of a time stamp as an int/float but a date/time as a string.

If the users input is invalid, like in the case of order_by = "random" and timestamp = "" then we should return the user an error so they can use the API correctly. We shouldn't accept malformed data and try to do something with it. That always leads to confusion on the consumer side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True that there’s no validation it’s a time stamp if you provide it via the API. Which is why I was setting it to now() if it wasn’t provided.

I don’t really feel strongly about the future-proofing I mentioned, so I’m happy to just rename the variable, pass as a string to random, and return an invalid error if random is set but randomSeed is not.

@@ -23,6 +23,7 @@ class RecipeSearchQuery(MealieModel):
require_all_tools: bool = False
require_all_foods: bool = False
search: str | None
timestamp: float | None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a string and defaulting to empty would help avoid some possible null errors. Maybe be worth renaming this to randomSeed for clarity as well

Suggested change
timestamp: float | None
timestamp: string = ""

Comment on lines 257 to 258
if not search_query.timestamp:
search_query.timestamp = datetime.now().timestamp()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of supplying our own here as a safeguard, we should instead use a Pydantic validation step to ensure that if the order_by is 'random' they provide a timestamp, if the client doesn't provide a timestamp, we should respond with an error code.

https://docs.pydantic.dev/latest/usage/validators/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about pydantic validators, and they look very cool. But maybe overkill here? Several other methods in recipe_crud_routes just use try/except blocks to raise an HTTPException if input is invalid. Maybe that would be simpler, since it's just testing whether the timestamp exists?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be doing as much validation on the edges of the application as possible. if the client provides the wrong data, we shouldn't let the request proceed and return a useful error message. That's why we're using Pydantic, for that client data validation layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hay-kot I finally had a moment to read up on validators. I saw that one use case is to set default values at validation time, with a specific example in the pydantic docs given to set a timestamp to now. So I just want to double-check your preference.

Coming in via the front end, the random_seed will get set by index.vue. But coming in from the API, random_seed is a settable parameter. You previously suggested raising a ValueError if order_by="random" and random_seed = "". But we could instead set random_seed = datetime.now(), and users would be buffered from forgetting to set a seed.

Do you prefer explicit (users must always set a seed or it's an error), or implicit (failing to set a seed uses datetime.now)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hay-kot I finally had a moment to read up on validators. I saw that one use case is to set default values at validation time, with a specific example in the pydantic docs given to set a timestamp to now. So I just want to double-check your preference.

Coming in via the front end, the random_seed will get set by index.vue. But coming in from the API, random_seed is a settable parameter. You previously suggested raising a ValueError if order_by="random" and random_seed = "". But we could instead set random_seed = datetime.now(), and users would be buffered from forgetting to set a seed.

Do you prefer explicit (users must always set a seed or it's an error), or implicit (failing to set a seed uses datetime.now)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, since pagination is completely broken without a seed, as an end user, if I don't supply a seed, I would want to see an error. Otherwise, if I paginate without supplying a seed, I'm getting random pages with repeats and I think the API is broken

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe be worth renaming this to randomSeed for clarity as well

I also like this, since that way it's clear that you can generate any value, not necessarily a timestamp

Copy link
Contributor Author

@jecorn jecorn May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since this seems like it would be the only pydantic validation inside recipe_crud_routes, I'm having trouble figuring out examples to see how to get this ValidationError back to the user as an HTTP error code. Right now my ValidationErrors get stuck on the command line.

The other error handlings in recipe_crud_routes is a simple if/else or try/except block that raises HTTPException when there's a problem. For example, in the same recipe_crud_routes where I would be validating the random_seed:

if search_query.cookbook is None: raise HTTPException(status_code=404, detail="cookbook not found")

Would it be fine for you to just follow the precedent in recipe_crud_routes and do the validation in a simpler way without resorting to pydantic? Or show me an example of how one can get this ValidationError back to the user and I’ll set it up that way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydantic validation errors are automatically propagated up with a clear error message.

https://fastapi.tiangolo.com/tutorial/body/#results

if search_query.cookbook is None: raise HTTPException(status_code=404, detail="cookbook not found")

TBH we shouldn't be throwing HTTP there anyways. Also looking at that code I'm pretty sure it's not even doing what is was intended to do. It should be checking if the cookbook_data is None and not search_query.cookbook which is already guarded right above it.


This or more or less what you should need

    @validator("random_seed", always=True, pre=True)
    def validate_random_seed(random_seed: str, values):  # type: ignore
        if values.get("order_by") == "random" and not random_seed:
            raise ValueError("Random Seed Required for Random Order")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have basically that already and it’s just spitting the ValidationError to the command line and an Internal Server Error back to the API. Hence my confusion about propagation. But good to know this is supposed to work automagically.

I probably have a stupid error somewhere, but my daughter just woke up so I’ll troubleshoot later.

@jecorn
Copy link
Contributor Author

jecorn commented May 14, 2023

Performance is great on my 6,000 recipe database. I don't notice any difference between sorting by Category or Random. This is with a less-than-fast backend (2nd gen intel Core CPU). Grabbing just the recipe ids takes advantage of the index, which really speeds things up. And after that, it's just a python list shuffle. Which might be slow for a million-recipe database, but not for thousands.

@jecorn
Copy link
Contributor Author

jecorn commented May 16, 2023

@hay-kot Here's the state of random sorting. Stable, randomly sorted pagination works coming in from both the frontend and through the API. But the validator to make sure that the paginationSeed is set lands on the command line instead of as a 422 response. There's a test demonstrating that in test_recipe_crud.py.

@hay-kot
Copy link
Collaborator

hay-kot commented May 16, 2023

@jecorn - Should be fixed now. It had to do with, what I would classify as a bug with FastAPI not applying the correct logic to query parameters that are Pydantic models. Though the issue was moved to discussions and remains unaddressed so I doubt it will get fixed upstream 🤷

Discussed

Workaround

@jecorn
Copy link
Contributor Author

jecorn commented May 16, 2023

@hay-kot super cool! Many thanks. What did you think about the fastAPI docs comments on not returning a response on purpose?

PR ready to merge? Or would you like to see any more changes?

@hay-kot
Copy link
Collaborator

hay-kot commented May 16, 2023

super cool! Many thanks. What did you think about the fastAPI docs comments on not returning a response on purpose?

It doesn't apply in this case because the model we're validating is directly from user provided content, so it should validate and behave just like the request body, but it doesn't - so we have to add this weird work-around. :(

I agree with the documentation, if it's a validation error on the response type, that's 100% an internal issue that shouldn't be presented to the user.

PR ready to merge? Or would you like to see any more changes?

Will give it another deeper look over this weekend, but I believe it's good to go.

@hay-kot hay-kot merged commit e1d3a24 into mealie-recipes:mealie-next May 30, 2023
@jecorn jecorn deleted the randomsort branch September 26, 2023 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants