Algolia crawling fails #855

KnorpelSenf · 2023-07-04T17:38:15Z

There was a blocking error and the crawler has been paused. Please resolve it before moving forward. Without intervention, the crawl will be discarded and retried on the next schedule (2 attempts remaining).

Related: https://www.algolia.com/doc/tools/crawler/apis/configuration/

The text was updated successfully, but these errors were encountered:

quadratz · 2023-07-07T02:47:11Z

Can you show me your crawler configuration? Don't forget to hide the API key

*And for the shake of meme:

KnorpelSenf · 2023-07-07T08:21:31Z

We didn't configure it, it's Algolia that hosts it for us. They used to have their configuration stored in https://github.com/algolia/docsearch-configs/blob/master/configs/grammy.json and https://github.com/algolia/docsearch-configs/blob/master/deployed-configs/g/grammy.js but then the migrated their infrastructure to … something very different (I never really tried to understand it) and now it's stored somewhere on their servers.

I can try to dig through the dashboards in some time and see if I find anything that's related to this issue. Either way, the config has never been changed.

quadratz · 2023-07-07T09:02:43Z

https://github.com/algolia/docsearch-configs/blob/master/deployed-configs/g/grammy.js

Ah, I think we found the culprit. They are looking for vuepress class which is doesn't exist. My cm-grammy.netlify.dev just got approval for their crawling program. I will try to do some experiment tonight and give you the fixed config asap (hopefully).

KnorpelSenf · 2023-07-07T10:22:30Z

Don't bother. I got time to look into this. I found https://docsearch.algolia.com/docs/templates/#vitepress-template and will fix it now.

quadratz · 2023-07-07T10:35:08Z

Nice. Pretty outdated though. For example, lvl0 should select the active navlink in sidebar. Still better than nothing.

KnorpelSenf · 2023-07-07T10:35:57Z

Yep it works pretty poorly, I'm still investigating

KnorpelSenf · 2023-07-07T11:11:58Z

The old index for the VuePress site has these many records:

The new config and the VitePress site only has these many:

So for some reason it does not find all the content. I am not sure why.

quadratz · 2023-07-09T13:56:19Z

Pretty much the same:

Either we failed to index some information or the new one is more optimized. However, when comparing the search results with the vuepress, the outcome is the same or perhaps even better with more results.

Vuepress: #833 (comment)
Vitepress: https://cm-grammy.netlify.app

Screenshot

Config

new Crawler({
  appId: "1FFMAU2VMZ",
  apiKey: "xxxxxx",
  rateLimit: 8,
  maxDepth: 10,
  startUrls: ["https://cm-grammy.netlify.app"],
  renderJavaScript: false,
  sitemaps: ["https://cm-grammy.netlify.app/sitemap.xml"],
  ignoreCanonicalTo: false,
  discoveryPatterns: ["https://cm-grammy.netlify.app/**"],
  actions: [
    {
      indexName: "grammy",
      pathsToMatch: ["https://cm-grammy.netlify.app/**"],
      recordExtractor: ({ helpers }) => {
        return helpers.docsearch({
          recordProps: {
            content: ".content p, .content li",
            lvl0: {
              selectors: ".VPSidebarItem.is-active .text",
              defaultValue: "Documentation",
            },
            lvl1: ".content h1",
            lvl2: ".content h2",
            lvl3: ".content h3",
            lvl4: ".content h4",
            lvl5: ".content h5",
            lvl6: ".content h6",
          },
          indexHeadings: true,
          aggregateContent: true,
          recordVersion: "v3",
        });
      },
    },
  ],
  safetyChecks: { beforeIndexPublishing: { maxLostRecordsPercentage: 10 } },
  initialIndexSettings: {
    grammy: {
      attributesForFaceting: ["type", "lang"],
      attributesToRetrieve: [
        "hierarchy",
        "content",
        "anchor",
        "url",
        "url_without_anchor",
        "type",
      ],
      attributesToHighlight: ["hierarchy", "content"],
      attributesToSnippet: ["content:10"],
      camelCaseAttributes: ["hierarchy", "content"],
      searchableAttributes: [
        "unordered(hierarchy.lvl0)",
        "unordered(hierarchy.lvl1)",
        "unordered(hierarchy.lvl2)",
        "unordered(hierarchy.lvl3)",
        "unordered(hierarchy.lvl4)",
        "unordered(hierarchy.lvl5)",
        "unordered(hierarchy.lvl6)",
        "content",
      ],
      distinct: true,
      attributeForDistinct: "url",
      customRanking: [
        "desc(weight.pageRank)",
        "desc(weight.level)",
        "asc(weight.position)",
      ],
      ranking: [
        "words",
        "filters",
        "typo",
        "attribute",
        "proximity",
        "exact",
        "custom",
      ],
      highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
      highlightPostTag: "</span>",
      minWordSizefor1Typo: 3,
      minWordSizefor2Typos: 7,
      allowTyposOnNumericTokens: false,
      minProximity: 1,
      ignorePlurals: true,
      advancedSyntax: true,
      attributeCriteriaComputedByMinProximity: true,
      removeWordsIfNoResults: "allOptional",
    },
  },
});

</details>

KnorpelSenf · 2023-07-09T18:45:05Z

Let's just go ahead then. If we find ways to improve the search in the future, we can still implement them.

This change isn't going to disrupt anything, so it shouldn't be blocking us. I will take care of updating the crawler config and index tomorrow.

KnorpelSenf · 2023-07-09T18:46:10Z

See vuejs/vitepress#2592 (comment)

KnorpelSenf · 2023-11-21T13:04:51Z

Fixed.

quadratz mentioned this issue Jul 5, 2023

Use Algolia search #861

Merged

quadratz mentioned this issue Jul 11, 2023

Add support for locales grammyjs/docs-bot#24

Draft

KnorpelSenf closed this as completed Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Algolia crawling fails #855

Algolia crawling fails #855

KnorpelSenf commented Jul 4, 2023

quadratz commented Jul 7, 2023 •

edited by rojvv

Loading

KnorpelSenf commented Jul 7, 2023

quadratz commented Jul 7, 2023 •

edited

Loading

KnorpelSenf commented Jul 7, 2023

quadratz commented Jul 7, 2023

KnorpelSenf commented Jul 7, 2023

KnorpelSenf commented Jul 7, 2023 •

edited

Loading

quadratz commented Jul 9, 2023

KnorpelSenf commented Jul 9, 2023

KnorpelSenf commented Jul 9, 2023

KnorpelSenf commented Nov 21, 2023

Algolia crawling fails #855

Algolia crawling fails #855

Comments

KnorpelSenf commented Jul 4, 2023

quadratz commented Jul 7, 2023 • edited by rojvv Loading

KnorpelSenf commented Jul 7, 2023

quadratz commented Jul 7, 2023 • edited Loading

KnorpelSenf commented Jul 7, 2023

quadratz commented Jul 7, 2023

KnorpelSenf commented Jul 7, 2023

KnorpelSenf commented Jul 7, 2023 • edited Loading

quadratz commented Jul 9, 2023

KnorpelSenf commented Jul 9, 2023

KnorpelSenf commented Jul 9, 2023

KnorpelSenf commented Nov 21, 2023

quadratz commented Jul 7, 2023 •

edited by rojvv

Loading

quadratz commented Jul 7, 2023 •

edited

Loading

KnorpelSenf commented Jul 7, 2023 •

edited

Loading