-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index path with original path name #5785
Conversation
We already have this information from the model
This is going to break all projects, and they will need to trigger a build to be fixed... Guessing we don't want that. So, my thoughts: I think we should serve this file from our servers so, if we change something in the search api, we have control on how we represen it |
Yea, I think we should start indexing the full filename in a new field in ES, then add it as a new endpoint in the docsearch API. Our new code can use this, and the old stuff will continue to work. |
Let me know if you want me to upload the minified files here |
if 'current_page_name' in data: | ||
path = data['current_page_name'] | ||
else: | ||
log.info('Unable to index file due to no name %s', filename) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we return none from this function, get_processed_json
is going to return None too, we always expect a dict from here. And get_processed_json
has a default https://github.com/stsewd/readthedocs.org/blob/a2e0e3f5e442b0072a4b7cbac9efadbe2a41c224/readthedocs/projects/models.py#L1254-L1261
Could you edit the description of the PR to include a summary of the problem and what's the solution proposed by the PR? It's hard to review without this context to me. |
@humitos updated ^ |
Hold the review for a moment, I think there is a bettter way to put the new field |
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a much simpler change, and lets us slowly roll out the change across all the places we use search. 👍
log.info('Unable to read file: %s', filename) | ||
return None | ||
log.info('Unable to read file: %s', fjson_filename) | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this cause all indexing to fail if a single file is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see we're catching it at a higher level, 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were returning None, but we expect to always have a dict.
This is caught by https://github.com/stsewd/readthedocs.org/blob/96a85fa8af3cac8b139bdf99598d119eae0e0163/readthedocs/projects/models.py#L1244-L1260
Which returns a default dict
Added to the deploy card that we need to do a full re-index to see this taking effect. |
Wen we index to es from the ImportedFiles models, we don't save the path from the model, but instead the page name from the fjson file. The page name doesn't include the extension, we used to rely on the doctype in the resolver, but that got removed.
We talked about reindexing all with
path
being the realpath
from the model, but that would require more changes, and we are not sure if that's the solution (removing page name = path).So, for now I'm just adding a new field with the original path value.
Fix #5397