Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algolia crawling fails #855

Closed
KnorpelSenf opened this issue Jul 4, 2023 · 11 comments
Closed

Algolia crawling fails #855

KnorpelSenf opened this issue Jul 4, 2023 · 11 comments

Comments

@KnorpelSenf
Copy link
Member

There was a blocking error and the crawler has been paused. Please resolve it before moving forward. Without intervention, the crawl will be discarded and retried on the next schedule (2 attempts remaining).

IMG_20230704_193726_301

Related: https://www.algolia.com/doc/tools/crawler/apis/configuration/

@quadratz
Copy link
Contributor

quadratz commented Jul 7, 2023

Can you show me your crawler configuration? Don't forget to hide the API key

*And for the shake of meme:

AgADGgIAAkiWcEU.jpg

@KnorpelSenf
Copy link
Member Author

We didn't configure it, it's Algolia that hosts it for us. They used to have their configuration stored in https://github.com/algolia/docsearch-configs/blob/master/configs/grammy.json and https://github.com/algolia/docsearch-configs/blob/master/deployed-configs/g/grammy.js but then the migrated their infrastructure to … something very different (I never really tried to understand it) and now it's stored somewhere on their servers.

I can try to dig through the dashboards in some time and see if I find anything that's related to this issue. Either way, the config has never been changed.

@quadratz
Copy link
Contributor

quadratz commented Jul 7, 2023

https://github.com/algolia/docsearch-configs/blob/master/deployed-configs/g/grammy.js

Ah, I think we found the culprit. They are looking for vuepress class which is doesn't exist. My cm-grammy.netlify.dev just got approval for their crawling program. I will try to do some experiment tonight and give you the fixed config asap (hopefully).

@KnorpelSenf
Copy link
Member Author

Don't bother. I got time to look into this. I found https://docsearch.algolia.com/docs/templates/#vitepress-template and will fix it now.

@quadratz
Copy link
Contributor

quadratz commented Jul 7, 2023

Nice. Pretty outdated though. For example, lvl0 should select the active navlink in sidebar. Still better than nothing.

@KnorpelSenf
Copy link
Member Author

Yep it works pretty poorly, I'm still investigating

@KnorpelSenf
Copy link
Member Author

KnorpelSenf commented Jul 7, 2023

The old index for the VuePress site has these many records:

image

The new config and the VitePress site only has these many:

image

So for some reason it does not find all the content. I am not sure why.

@quadratz
Copy link
Contributor

quadratz commented Jul 9, 2023

Pretty much the same:

image

Either we failed to index some information or the new one is more optimized. However, when comparing the search results with the vuepress, the outcome is the same or perhaps even better with more results.

Vuepress: #833 (comment)
Vitepress: https://cm-grammy.netlify.app

Screenshot

Vitepress
Vuepress

Config
new Crawler({
  appId: "1FFMAU2VMZ",
  apiKey: "xxxxxx",
  rateLimit: 8,
  maxDepth: 10,
  startUrls: ["https://cm-grammy.netlify.app"],
  renderJavaScript: false,
  sitemaps: ["https://cm-grammy.netlify.app/sitemap.xml"],
  ignoreCanonicalTo: false,
  discoveryPatterns: ["https://cm-grammy.netlify.app/**"],
  actions: [
    {
      indexName: "grammy",
      pathsToMatch: ["https://cm-grammy.netlify.app/**"],
      recordExtractor: ({ helpers }) => {
        return helpers.docsearch({
          recordProps: {
            content: ".content p, .content li",
            lvl0: {
              selectors: ".VPSidebarItem.is-active .text",
              defaultValue: "Documentation",
            },
            lvl1: ".content h1",
            lvl2: ".content h2",
            lvl3: ".content h3",
            lvl4: ".content h4",
            lvl5: ".content h5",
            lvl6: ".content h6",
          },
          indexHeadings: true,
          aggregateContent: true,
          recordVersion: "v3",
        });
      },
    },
  ],
  safetyChecks: { beforeIndexPublishing: { maxLostRecordsPercentage: 10 } },
  initialIndexSettings: {
    grammy: {
      attributesForFaceting: ["type", "lang"],
      attributesToRetrieve: [
        "hierarchy",
        "content",
        "anchor",
        "url",
        "url_without_anchor",
        "type",
      ],
      attributesToHighlight: ["hierarchy", "content"],
      attributesToSnippet: ["content:10"],
      camelCaseAttributes: ["hierarchy", "content"],
      searchableAttributes: [
        "unordered(hierarchy.lvl0)",
        "unordered(hierarchy.lvl1)",
        "unordered(hierarchy.lvl2)",
        "unordered(hierarchy.lvl3)",
        "unordered(hierarchy.lvl4)",
        "unordered(hierarchy.lvl5)",
        "unordered(hierarchy.lvl6)",
        "content",
      ],
      distinct: true,
      attributeForDistinct: "url",
      customRanking: [
        "desc(weight.pageRank)",
        "desc(weight.level)",
        "asc(weight.position)",
      ],
      ranking: [
        "words",
        "filters",
        "typo",
        "attribute",
        "proximity",
        "exact",
        "custom",
      ],
      highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
      highlightPostTag: "</span>",
      minWordSizefor1Typo: 3,
      minWordSizefor2Typos: 7,
      allowTyposOnNumericTokens: false,
      minProximity: 1,
      ignorePlurals: true,
      advancedSyntax: true,
      attributeCriteriaComputedByMinProximity: true,
      removeWordsIfNoResults: "allOptional",
    },
  },
});

</details>

@KnorpelSenf
Copy link
Member Author

Let's just go ahead then. If we find ways to improve the search in the future, we can still implement them.

This change isn't going to disrupt anything, so it shouldn't be blocking us. I will take care of updating the crawler config and index tomorrow.

@KnorpelSenf
Copy link
Member Author

See vuejs/vitepress#2592 (comment)

@KnorpelSenf
Copy link
Member Author

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants