Skip to content
This repository has been archived by the owner on Feb 9, 2022. It is now read-only.

Update mdanalysis 01 #4751

Merged
merged 3 commits into from
Oct 20, 2021
Merged

Conversation

orbeckst
Copy link
Contributor

Pull request motivation(s)

Focus search results on relevant content

  • remove code examples for time being
  • remove duplicates (in dynamic blog – just use the archived content) and in docs (always use "stable", exclude versioned or dev content)

What is the current behaviour?

Some content is duplicated or appears in different versions of the docs. Without custom faceting, this is too confusing.

What is the expected behaviour?

Show relevant/recent content.

NB: Do you want to request a feature or report a bug?

update

NB2: Any other feedback / questions ?

part of work on #4699 — there's still content that does not get indexed and I cannot figure out the correct text selectors. Most of the pages below should have more than zero records. We do want to index our index.html pages.

> DocSearch: https://www.mdanalysis.org 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/ 0 records)
> DocSearch: https://www.mdanalysis.org/GridDataFormats/gridData/basic.html 0 records)
> DocSearch: https://www.mdanalysis.org/GridDataFormats/gridData/core.html 0 records)
> DocSearch: https://www.mdanalysis.org/GridDataFormats/gridData/formats.html 0 records)
> DocSearch: https://www.mdanalysis.org/GridDataFormats/gridData/formats/CCP4.html 0 records)
> DocSearch: https://www.mdanalysis.org/GridDataFormats/gridData/formats/OpenDX.html 0 records)
> DocSearch: https://www.mdanalysis.org/GridDataFormats/gridData/formats/gOpenMol.html 0 records)
> DocSearch: https://www.mdanalysis.org/GridDataFormats/gridData/overview.html 0 records)
> DocSearch: https://www.mdanalysis.org/GridDataFormats/index.html 0 records)
> DocSearch: https://www.mdanalysis.org/GridDataFormats/installation.html 0 records)
> DocSearch: https://www.mdanalysis.org/GridDataFormats/search.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/CG_fiber.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/PEG_1chain.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/adk_equilibrium.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/adk_transitions.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/contributing.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/credits.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/helpers.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/ifabp_water.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/index.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/install.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/membrane_peptide.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/nhaa_equilibrium.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/usage.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/vesicles.html 0 records)
> DocSearch: https://www.mdanalysis.org/MDAnalysisData/yiip_equilibrium.html 0 records)
> DocSearch: https://www.mdanalysis.org/distopia/api/distopia.html 0 records)
> DocSearch: https://www.mdanalysis.org/distopia/api/helper_functions.html 0 records)
> DocSearch: https://www.mdanalysis.org/distopia/api/vector_triple.html 0 records)
> DocSearch: https://www.mdanalysis.org/distopia/building_distopia.html 0 records)
> DocSearch: https://www.mdanalysis.org/distopia/index.html 0 records)
> DocSearch: https://www.mdanalysis.org/pytng/documentation_pages/API.html 0 records)
> DocSearch: https://www.mdanalysis.org/pytng/documentation_pages/Blocks.html 0 records)
> DocSearch: https://www.mdanalysis.org/pytng/documentation_pages/Errors.html 0 records)
> DocSearch: https://www.mdanalysis.org/pytng/documentation_pages/Examples.html 0 records)
> DocSearch: https://www.mdanalysis.org/pytng/index.html 0 records)

The output was created with a scraper that does not submit results to the actual index with

./docsearch run ../docsearch-configs/configs/mdanalysis.json 2>&1 | tee RUN.log
cat RUN.log | grep "[^0-9]0 records" | sort

and a few results were removed where we know why we got 0 records, namely

- remove duplicates
- only index "stable" docs and User Guide (instead of explicit
  version information)
- removed indexing of code elements to reduce clutter
@shortcuts
Copy link
Member

shortcuts commented Oct 20, 2021

there's still content that does not get indexed and I cannot figure out the correct text selectors. Most of the pages below should have more than zero records. We do want to index our index.html pages.

When pages are retrieved but without records, it's usually related to the selectors.

Testing document.querySelectorAll("[itemprop='articleBody'] > .section h1, .page h1, .post h1, .body > .section h1"); on https://www.mdanalysis.org/GridDataFormats/gridData/basic.html for example, returns no results.

You can either make your selectors broader (we often go with .class heading) to also retrieve content form these pages, or add a new selectors_key field in the start_urls

@orbeckst
Copy link
Contributor Author

Thanks — I am not a CSS wizard so it will take some time to figure out why the selector isn't working. When I looked at the page with the Firefox Web Developer Tools it looked as if the content was correctly described. Furthermore, all these pages are generated in the same way with sphinx so I would have expected the CSS to be identical.

The changes to the config file in the PR are still valuable to us as they are so if you could merge them then that would be appreciated. (I'll then summarize/link/copy the missing records to #4699 .)

Thanks!

@shortcuts shortcuts merged commit 074b7e5 into algolia:master Oct 20, 2021
@shortcuts
Copy link
Member

Thanks — I am not a CSS wizard so it will take some time to figure out why the selector isn't working. When I looked at the page with the Firefox Web Developer Tools it looked as if the content was correctly described. Furthermore, all these pages are generated in the same way with sphinx so I would have expected the CSS to be identical.

The changes to the config file in the PR are still valuable to us as they are so if you could merge them then that would be appreciated. (I'll then summarize/link/copy the missing records to #4699 .)

Thanks!

If you did not found a solution, I'll check tomorrow (Paris time) if I can help you!

@orbeckst orbeckst deleted the update-mdanalysis-01 branch October 20, 2021 19:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants