Azure Library for Lucene.Net
License:
-Microsoft Public License (Ms-PL)
-
-
-
-
- Microsoft Public
- License (Ms-PL) |
-
AzureDirectory Library for Lucene.Net
Project
-description
Lucene.Net is a robust open source search technology which
-has an abstract interface called a Directory for defining how the index is
-stored. AzureDirectory is an implementation of that
-interface for Windows Azure Blob Storage.
About
-
This project allows you to create Lucene Indexes and use them in Azure.
-
-This project implements a low level Lucene Directory
-object called AzureDirectory around Windows Azure BlobStorage.
Background
-
Lucene.NET
-
Lucene is a mature Java based open source full text indexing and
-search engine and property store.
-Lucene.NET is a mature port of that library to C#.
-Lucene/Lucene.Net provides:
-Super simple API for storing
-documents with arbitrary properties
-Complete control over what is
-indexed and what is stored for retrieval
-Robust control over where and how
-things are indexed, how much memory is used, etc.
-Superfast and super rich query
-capabilities
o
-Sorted results
o
-Rich constraint semantics AND/OR/NOT
-etc.
o
-Rich text semantics (phrase match,
-wildcard match, near, fuzzy match etc)
o
-Text query syntax (example:
-Title:(dog AND cat) OR Body:Lucen* )
o
-Programmatic expressions
o
-Ranked results with custom ranking
-algorithms
AzureDirectory
AzureDirectory smartly uses a local Directory to cache files as they are
-created and automatically pushes them to Azure blob storage as appropriate.
-Likewise, it smartly caches blob files on the client when they change. This
-provides with a nice blend of just in time syncing of data local to indexers or
-searchers across multiple machines.
-
-With the flexibility that Lucene provides over data
-in memory versus storage and the just in time blob transfer that AzureDirectory provides you have great control over the composibility of where data is indexed and how it is
-consumed.
-
-To be more concrete: you can have 1..N worker roles adding documents to an
-index, and 1..N searcher webroles searching over the
-catalog in near real time.
Usage
-
-To use you need to create a blob storage account on http://azure.com .
-
-Create an App.Config or Web.Config
-and configure your accountinfo:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
-<appSettings>
<!-- azure SETTINGS -->
<add key="BlobStorageEndpoint" value="http://YOURACCOUNT.blob.core.windows.net"/>
<add key="AccountName" value="YOURACCOUNTNAME"/>
<add key="AccountSharedKey" value="YOURACCOUNTKEY"/>
-</appSettings>
</configuration>
-To add documents to a catalog is as simple as
-
-
AzureDirectory
-azureDirectory = new AzureDirectory("TestCatalog");
IndexWriter
-indexWriter = new IndexWriter(azureDirectory, new StandardAnalyzer(),
-true);
Document doc = new Document();
doc.Add(new Field("id", DateTime.Now.ToFileTimeUtc().ToString(),
-Field.Store.YES, Field.Index.TOKENIZED,
-Field.TermVector.NO));
doc.Add(new Field("Title", “this is my title”, Field.Store.YES, Field.Index.TOKENIZED,
-Field.TermVector.NO));
doc.Add(new Field("Body", “This is my body”, Field.Store.YES, Field.Index.TOKENIZED,
-Field.TermVector.NO));
indexWriter.AddDocument(doc);
indexWriter.Close();
}
-And searching is as easy as:
-
-
IndexSearcher
-searcher = new IndexSearcher(azureDirectory);
Lucene.Net.QueryParsers.QueryParser
-parser = QueryParser("Title", new StandardAnalyzer());
Lucene.Net.Search.Query
-query = parser.Parse("Title:(Dog AND
-Cat)");
Hits hits
-= searcher.Search(query);
for (int i = 0; i < hits.Length();
-i++)
{
Document doc = hits.Doc(i);
Console.WriteLine(doc.GetField("Title").StringValue());
}
Caching
-and Compression
-AzureDirectory compresses blobs before sent to the
-blob storage. Blobs are automatically cached local to reduce roundtrips for
-blobs which haven't changed.
-
-By default AzureDirectory stores this local cache in
-a temporary folder. You can easily control where the local cache is stored by
-passing in a Directory object for whatever type and location of storage you
-want.
-
-This example stores the cache in a ram directory:
AzureDirectory azureDirectory = new AzureDirectory("MyIndex", new RAMDirectory());
-And this example stores in the file system in C:\myindex
AzureDirectory azureDirectory = new AzureDirectory("MyIndex", new FSDirectory(@"c:\myindex"));
-
-
Notes
-on settings
-Just like a normal Lucene index, calling optimize too
-often causes a lot of churn and not calling it enough causes too many segment
-files to be created, so call it "just enough" times. That will
-totally depend on your application and the nature of your pattern of adding and
-updating items to determine (which is why Lucene
-provides so many knobs to configure its behavior).
-
-The default compound file support that Lucene uses reduces
-the number of files that are generated...this means it deletes and merges files
-regularly which causes churn on the blob storage. Calling indexWriter.SetCompoundFiles(false)
-will give better performance.
-
-The version of Lucene.NET checked in as a binary is Version 2.3.1, but you can
-use any version of Lucene.NET you want by simply enlisting from the above open
-source site.
Related
-
There is a LINQ to Lucene provider http://linqtoLucene.codeplex.com/Wiki/View.aspx?title=Project%20Documentation
-on codeplex which allows you to define your schema as
-a strongly typed object and execute LINQ expressions against the index.