Skip to content

Building your first index

cardinal252 edited this page Sep 25, 2013 · 13 revisions

In order to get started with Lucinq, you can simply utilise the NuGet Lucinq package from any .net project or download the source from github.

The sample shows a basic index being built from an folder of downloaded rss feeds from the bbc website and is available to review in the unit tests in source.

Indexing

Index building with lucinq is a breeze

Firstly, we must open a folder for indexing - in this case we are going to use the static Open() method off the native lucene FSDirectory object.

var indexFolder = FSDirectory.Open(new DirectoryInfo(GeneralConstants.Paths.BBCIndex));

Now we have the directory to work with, we need to set the analyzer we are going to use - there are many analyzers available which I am not going to cover in this tutorial, the most common though is the standard analyzer (shown below):

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);

In order to write to an index, we are required to create ourselves an indexwriter object. Note, these are disposable, so we will wrap it in a using to keep things neat.

using (IndexWriter indexWriter = new IndexWriter(indexFolder, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
// our code goes here
}

Ok - so now we are ready to consider writing to our index, but, what to write? For the purposes of this tutorial, I have downloaded every rss feed from the BBC news website that I could find - this allows me some free content that I can index and later search on.

string[] rssFiles = Directory.GetFiles(GeneralConstants.Paths.RSSFeed);
foreach (var rssFile in rssFiles)
{
// do something with our rss feed
}

Ok - so now for the important bit! We need to actually write the lucene document

List<NewsArticle> newsArticles = ReadFeed(rssFile); // gets a list of news
newsArticles.ForEach(
	newsArticle => 
		indexWriter.AddDocument
		(
			x => x.AddAnalysedField(BBCFields.Title, newsArticle.Title, true), // adds an analysed & stored field to the index (thanks to the overload)
			x => x.AddAnalysedField(BBCFields.Description, newsArticle.Description, true),// adds an analysed & non-stored field to the index
			x => x.AddAnalysedField(BBCFields.Copyright, newsArticle.Copyright),
			x => x.AddStoredField(BBCFields.Link, newsArticle.Link),// adds a non-analyzed & stored field to the index for later retrieval
			x => x.AddNonAnalysedField(BBCFields.PublishDate, TestHelpers.GetDateString(newsArticle.PublishDateTime), true)) // adds a non-analyzed field to the index for querying for exact matches.
		);

Finally - we need to optimize and close our index

indexWriter.Optimize();
indexWriter.Close();

Great - so now your index is ready to query! So here is the final sample code:

var indexFolder = FSDirectory.Open(new DirectoryInfo(GeneralConstants.Paths.BBCIndex));
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
using (IndexWriter indexWriter = new IndexWriter(indexFolder, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
	string[] rssFiles = Directory.GetFiles(GeneralConstants.Paths.RSSFeed);
	foreach (var rssFile in rssFiles)
	{
		var newsArticles = ReadFeed(rssFile);
		newsArticles.ForEach(
			newsArticle => 
				indexWriter.AddDocument
				(
					x => x.AddAnalysedField(BBCFields.Title, newsArticle.Title, true),
					x => x.AddAnalysedField(BBCFields.Description, newsArticle.Description, true),
					x => x.AddAnalysedField(BBCFields.Copyright, newsArticle.Copyright),
					x => x.AddStoredField(BBCFields.Link, newsArticle.Link),
					x => x.AddNonAnalysedField(BBCFields.PublishDate, TestHelpers.GetDateString(newsArticle.PublishDateTime), true))
				);
	}

	indexWriter.Optimize();
	indexWriter.Close();
}

What Next?

Now you have a working index, you need to look into basic querying to query your data.