grunt-escaped-seo

Generate an SEO website for site with google escaped fragments

About

I forked this project from hazart version because it was eating too much memory for my needs (with jsdom). I removed jsdom calls and modified the way phantom opens the pages. Now, we open one phantom for one link otherwise phantom crashes at some point of the process.

Getting Started

This plugin requires Grunt ~0.4.1

If you haven't used Grunt before, be sure to check out the Getting Started guide, as it explains how to create a Gruntfile as well as install and use Grunt plugins. Once you're familiar with that process, you may install this plugin with this command:

npm install skrafft/grunt-escaped-seo --save-dev

Once the plugin has been installed, it may be enabled inside your Gruntfile with this line of JavaScript:

grunt.loadNpmTasks('grunt-escaped-seo');

You can use the example file sitemap.js.

This plugin require a local installation of PhantomJS (phantomjs.org/‎)

npm install -g phantomjs

And the npm "phantom" module ~0.6.1

The "escaped_seo" task

Overview

Thank you to Mathieu Desvé (https://github.com/mazerte) who brought the idea and contributed to some of the code.

Use this plugin to generate a static version of your "single page application" boosted with ajax. This version will be parsed by the googlebot. The generated sitemap.xml will help you to tell google to index your site.

To work with googlebot you need follow the google specifications (https://developers.google.com/webmasters/ajax-crawling/docs/specification). Use #! hash fragment in your urls or add a meta in your html page:

<meta name="fragment" content="!">

Don't forget to add a redirect rule. In Exemple for htaccess with apache server :

<ifModule mod_rewrite.c>
    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=$
    RewriteRule ^$ /seo/index.html [L]
    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
    RewriteRule ^$ /seo/%1.html [L]
</ifModule>

And if you are using pushstate.

<ifModule mod_rewrite.c>
    RewriteCond %{HTTP_USER_AGENT} (Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|YandexBot|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit) [NC]
    RewriteRule ^$ /seo/index.html [QSA,L]

    RewriteCond %{HTTP_USER_AGENT} (Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker|DoCoMo|Twitterbot|TweetmemeBot|Twikle|Netseer|Daumoa|SeznamBot|Ezooms|MSNBot|Exabot|MJ12bot|sogou\sspider|YandexBot|bitlybot|ia_archiver|proximic|spbot|ChangeDetection|NaverBot|MetaJobBot|magpie-crawler|Genieo\sWeb\sfilter|Qualidator.com\sBot|Woko|Vagabondo|360Spider|ExB\sLanguage\sCrawler|AddThis.com|aiHitBot|Spinn3r|BingPreview|GrapeshotCrawler|CareerBot|ZumBot|ShopWiki|bixocrawler|uMBot|sistrix|linkdexbot|AhrefsBot|archive.org_bot|SeoCheckBot|TurnitinBot|VoilaBot|SearchmetricsBot|Butterfly|Yahoo!|Plukkie|yacybot|trendictionbot|UASlinkChecker|Blekkobot|Wotbox|YioopBot|meanpathbot|TinEye|LuminateBot|FyberSpider|Infohelfer|linkdex.com|Curious\sGeorge|Fetch-Guess|ichiro|MojeekBot|SBSearch|WebThumbnail|socialbm_bot|SemrushBot|Vedma|alexa\ssite\saudit|SEOkicks-Robot|Browsershots|BLEXBot|woriobot|AMZNKAssocBot|Speedy|oBot|HostTracker|OpenWebSpider|WBSearchBot|FacebookExternalHit) [NC]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^[#!/]*([\w\/\-_]*)$ /seo/$1.html [QSA,L]
</ifModule>

In your project's Gruntfile, add a section named escaped_seo to the data object passed into grunt.initConfig().

grunt.initConfig({
  'escaped-seo': {
    options: {
      domain: 'http://yourdomain.com'
    },
  },
})

Options

options.domain

Type: String

The final domain of your site, used for the sitemap.xml

options.server

Type: String Default value: options.domain

The server to parse to generate the static version and the site tree. By default options.domain is used

options.delay

Type: Number Default value: 2000

Time to wait before capturing the page. Needed time for javascript to generate the whol page.

options.public

Type: String Default value: dist

Your local current folder corresponding to the public document root folder. The sitemap and the static version will be created inside.

options.folder

Type: String Default value: seo

A local folder into which this static html files will be created.

options.changefreq

Type: String Default value: daily

The changefreq value to use in the sitemap.xml

options.replace

Type: Object Default value: ``

You can define in this object some replace rules for the static html versions. Each value (String or RegExp) will be replace by the corresponding key. If you use String instead of RegExp only the first occurence will be replaced.

Usage Examples

'escaped-seo':
  options:
    domain: 'http://pr0d.fr'
    server: 'http://localhost:9001'
    public: 'dist'
    folder: 'seo'
    changefreq: 'daily'
    delay: 2000
    replace: {
      '[email protected]': /[a-z0-9_\-\.]+@[a-z0-9_\-\.]+\.[a-z]*/gi
    }

Contributing

In lieu of a formal styleguide, take care to maintain the existing coding style.

Release History

0.6.0 Added stderr custom handler to catch phantom crashes and restart the process 0.5.1 Removed jsdom stuff and open one phantom for on link. no-follow has been removed as well. 0.5.0 Add the no-follow possibility 0.4.1 Add index on files inside folders if needed 0.4.0 Add the protocol inside the sitemap loc 0.3.1 Bug correction with sitemap domain 0.3.0 Bug correction with redirection domain 0.2.0 Pushstate compatibility added

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
tasks		tasks
.gitignore		.gitignore
LICENSE-MIT		LICENSE-MIT
README.md		README.md
package.json		package.json
phantom-server.js		phantom-server.js
sitemap.js		sitemap.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

grunt-escaped-seo

About

Getting Started

The "escaped_seo" task

Overview

Options

options.domain

options.server

options.delay

options.public

options.folder

options.changefreq

options.replace

Usage Examples

Contributing

Release History

About

Releases

Packages

Languages

License

skrafft/grunt-escaped-seo

Folders and files

Latest commit

History

Repository files navigation

grunt-escaped-seo

About

Getting Started

The "escaped_seo" task

Overview

Options

options.domain

options.server

options.delay

options.public

options.folder

options.changefreq

options.replace

Usage Examples

Contributing

Release History

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages