Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This isn't indexed by Google #71

Closed
merlijn-sebrechts opened this issue Sep 17, 2023 · 7 comments
Closed

This isn't indexed by Google #71

merlijn-sebrechts opened this issue Sep 17, 2023 · 7 comments

Comments

@merlijn-sebrechts
Copy link
Contributor

Site owners can do this manually: https://support.google.com/programmable-search/answer/4513925?hl=en

@tpmccallum
Copy link

tpmccallum commented Sep 21, 2023

Yes, right you are; I can confirm this @merlijn-sebrechts
Using the first four words of the home page for the docs, i.e. if you paste the following search term into Google, it will show that the component-model.bytecodealliance.org is not indexed.

"The WebAssembly Component Model" inurl:component-model.bytecodealliance.org
Screenshot 2023-09-21 at 09 57 54

Recommendation

My recommendation would be to follow this guide that Google has created (FYI, it will also help with all other search engines, so there is no need to duplicate effort; they all follow similar conventions i.e. robots.txt, sitemap.xml and so on). If you need any help, please ping me here, and I will do what I can to help.

@itowlson
Copy link
Collaborator

That would be awesome @tpmccallum. Thanks! I don't think any of us are site owners so can't do the search console stuff (@tschneidereit do you know who can do this?).

(Hopefully the problem will solve itself once the book is more widely used and we gain incoming links. But it would still be good to bootstrap that process!)

@tpmccallum
Copy link

tpmccallum commented Sep 21, 2023

Yeah, highly recommend setting up the Google Search Console!

Just FYI, from the Google documentation, "You don't have to sign up for Search Console to be included in Google Search results”. However, if we can find someone to set this up and verify the domain that would be brilliant. The Search Console allows you to make Google fetch/index pages (by pasting in the URL of a new article, etc.). Also, it allows you to check the health of pages (how they are displayed on different devices), shows geo-locations of where page views are coming from, and so much more.

For starters, if setting up the Search Console is a little way off in the future, we could just focus on the robots.txt and sitemap.xml. Once those are created, the crawlers will slurp it up.

@tpmccallum
Copy link

Hi @merlijn-sebrechts @tschneidereit @itowlson
I have created a new PR that adds the sitemap.xml and robots.txt files.
#72

@tpmccallum
Copy link

Thanks @itowlson
I am just documenting progress in this issue for visibility. @merlijn-sebrechts, thanks for raising this issue, we now have robots.txt and sitemap.xml in place. We should give it a few days to a week and then try the "The WebAssembly Component Model" inurl:component-model.bytecodealliance.org in Google search bar again to confirm the site has been indexed on its own.

Screenshot 2023-09-21 at 12 50 30 Screenshot 2023-09-21 at 12 50 39

@tpmccallum
Copy link

Communicating again here for visibility. @itowlson has created a new PR to automatically generate a new site map when changes are made to the site i.e. new pages added etc. The PR was approved by @kate-goldenring and is now merged. We now have the robots.txt in place (which instructs all web crawlers to go ahead and index the site). The robots.txt file links the web crawler to the autogenerated sitemap. Technically the site will be indexed whenever the crawler finally gets around to it; which could take some time. As the image below shows, this has not happened … yet … in the last few hours at least :)

Screenshot 2023-09-22 at 07 21 17

The next step is to verify the site via the Google Search Console dashboard. This way, we can explicitly tell Google that we have a site and it wants to be indexed. Once we have Console access we can start to enhance the discoverability through the many tools that the Search Console provides for free.

I can create a PR to verify the site by uploading a file to the base of the book (this website). At this point, we are able to log into the Console and move things forward. Let us know how you would like to proceed.

@tpmccallum
Copy link

Hi,
Great news: this documentation has now been indexed by Google.

It is still highly recommended to obtain site verification with a Google Search Console account and so forth. But this is a really great result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants