-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Algolia crawling fails #855
Comments
We didn't configure it, it's Algolia that hosts it for us. They used to have their configuration stored in https://github.com/algolia/docsearch-configs/blob/master/configs/grammy.json and https://github.com/algolia/docsearch-configs/blob/master/deployed-configs/g/grammy.js but then the migrated their infrastructure to … something very different (I never really tried to understand it) and now it's stored somewhere on their servers. I can try to dig through the dashboards in some time and see if I find anything that's related to this issue. Either way, the config has never been changed. |
Ah, I think we found the culprit. They are looking for vuepress class which is doesn't exist. My cm-grammy.netlify.dev just got approval for their crawling program. I will try to do some experiment tonight and give you the fixed config asap (hopefully). |
Don't bother. I got time to look into this. I found https://docsearch.algolia.com/docs/templates/#vitepress-template and will fix it now. |
Nice. Pretty outdated though. For example, lvl0 should select the active navlink in sidebar. Still better than nothing. |
Yep it works pretty poorly, I'm still investigating |
Pretty much the same: Either we failed to index some information or the new one is more optimized. However, when comparing the search results with the vuepress, the outcome is the same or perhaps even better with more results. Vuepress: #833 (comment) Confignew Crawler({
appId: "1FFMAU2VMZ",
apiKey: "xxxxxx",
rateLimit: 8,
maxDepth: 10,
startUrls: ["https://cm-grammy.netlify.app"],
renderJavaScript: false,
sitemaps: ["https://cm-grammy.netlify.app/sitemap.xml"],
ignoreCanonicalTo: false,
discoveryPatterns: ["https://cm-grammy.netlify.app/**"],
actions: [
{
indexName: "grammy",
pathsToMatch: ["https://cm-grammy.netlify.app/**"],
recordExtractor: ({ helpers }) => {
return helpers.docsearch({
recordProps: {
content: ".content p, .content li",
lvl0: {
selectors: ".VPSidebarItem.is-active .text",
defaultValue: "Documentation",
},
lvl1: ".content h1",
lvl2: ".content h2",
lvl3: ".content h3",
lvl4: ".content h4",
lvl5: ".content h5",
lvl6: ".content h6",
},
indexHeadings: true,
aggregateContent: true,
recordVersion: "v3",
});
},
},
],
safetyChecks: { beforeIndexPublishing: { maxLostRecordsPercentage: 10 } },
initialIndexSettings: {
grammy: {
attributesForFaceting: ["type", "lang"],
attributesToRetrieve: [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type",
],
attributesToHighlight: ["hierarchy", "content"],
attributesToSnippet: ["content:10"],
camelCaseAttributes: ["hierarchy", "content"],
searchableAttributes: [
"unordered(hierarchy.lvl0)",
"unordered(hierarchy.lvl1)",
"unordered(hierarchy.lvl2)",
"unordered(hierarchy.lvl3)",
"unordered(hierarchy.lvl4)",
"unordered(hierarchy.lvl5)",
"unordered(hierarchy.lvl6)",
"content",
],
distinct: true,
attributeForDistinct: "url",
customRanking: [
"desc(weight.pageRank)",
"desc(weight.level)",
"asc(weight.position)",
],
ranking: [
"words",
"filters",
"typo",
"attribute",
"proximity",
"exact",
"custom",
],
highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
highlightPostTag: "</span>",
minWordSizefor1Typo: 3,
minWordSizefor2Typos: 7,
allowTyposOnNumericTokens: false,
minProximity: 1,
ignorePlurals: true,
advancedSyntax: true,
attributeCriteriaComputedByMinProximity: true,
removeWordsIfNoResults: "allOptional",
},
},
});
</details> |
Let's just go ahead then. If we find ways to improve the search in the future, we can still implement them. This change isn't going to disrupt anything, so it shouldn't be blocking us. I will take care of updating the crawler config and index tomorrow. |
Fixed. |
There was a blocking error and the crawler has been paused. Please resolve it before moving forward. Without intervention, the crawl will be discarded and retried on the next schedule (2 attempts remaining).
Related: https://www.algolia.com/doc/tools/crawler/apis/configuration/
The text was updated successfully, but these errors were encountered: