Skip to content

NoSQL Database Advance Topics

Arpillai edited this page Jan 26, 2018 · 6 revisions

Note: This is a Pre-release Version of the document.

Components of a Document

Interacting with Document Data

You can interact with the document's data just as you interact with any Dictionary:

let document1: Document = ["aString": "string data", "anInt": 123, "aDouble": 3.14]
let stringValue = document1["aString"]

// The value of "stringValue" will be "string data"

You can change the Document data as shown below:

let document1: Document = ["aString": "string data", "anInt": 123, "aDouble": 3.14]
document1["aString"] = "new value"

let stringValue = document1["aString"]
// The value of "stringValue" will be "new value"

// this will add a new key/value pair:
document1["anotherKey"] = "A new key/value pair"

And, of course, Document data can be iterated over:

let document1: Document = ["aString": "string data", "anInt": 123, "aDouble": 3.14]

for (key, value) in document1 {
	print("key: \(key) : value: \(value)")
}

// Will print: 
// 		key: aString : value: string data
// 		key: anInt : value: 123
// 		key: aDouble : value: 3.14

Metadata

All Documents have associated metadata. The metadata is separated from the main document dictionary access but it is accessible via the metaData property of the document. Some, but not all of the metadata is 'readonly' after document creation.

Useful Metadata:

  • id : Unique identifier of the Document
  • createDate : Date the Document was created, if available
  • lastChange : Date the Document was last saved, if available
  • type : User-specified string, useful for organizing documents
  • channels : String array of channels, used in replication to control document access

You can add the Metadata to a document at Document initialization by including it in the document data:

let document1: Document = ["id": "my_document_id"]

print(document1.metaData.id) 

// Will print:
// my_document_id

You can update the metadata properties _type_ and _channels_ as these are read/write properties. All other metadata is read-only.

Attachments

Documents may contain associated blobs of data called attachments. The data in attachments is saved and replicated with the document. Attachments are useful for images, sound clips, videos, etc. You can create an attachment from either a Data object or a URL. You can access the attachments via the _attachments_ array property of the document.

Note - Ensure that attachments are not too large because multiple large size attachments take up device space as well as take longer to synchronize for databases replicating with PredixSync. Additionally, PredixSync has individual document size limitations and this includes the attachments. So a document containing two 10MB attachments is considered over a 20MB document on PredixSync.

Subclassing

While data access from Documents is simple using name/value pairs, Documents are designed to be subclassed for more specific data models. A subclassed document can expose properties that make sense for a data model and ensure that at initialization time the required properties are included. You are encouraged to subclass Documents as needed.

Document Database Interaction

All manipulation of a Document object is in memory, documents must be saved in order to persist that Document to the Database, or fetched to retrieve a document from the Database.

Database methods that write generally return an UpdateResult enumeration, which on success includes an updated Document object. On failure, the enumeration contains an Error with error details.

Triggers

DatabaseChangeDelegate

You can associate a databaseChangeDelegate with the database to establish a trigger to receive information on any database changes. The databaseChangeDelegate is called for all database changes. The delegate receives an array of DocumentChangedDetails, which includes the document id of the changed document, if the source of the change was replication or not, and if the document change was a deletion or not.

Id Factory

Documents created without an id are automatically given an id. The document id is generated in a static closure on the Document class: idFactory. By default a document id is generated from a UUID. However, you can replace this static close and use a custom-scheme to generate document ids.

Date Formatter

While the JSON format does not recognize a Date type, the Document object recognizes data that is a date and automatically converts it to a Date type. By default the Document class uses the ISO8601 date format standard. However, you can use a custom DataFormatter to replace this by assigning an object to the static dateFormatter property.

Default Database Configuration

The default database configuration returned by OpenDatabaseConfiguration.default uses "pm" as the default name and a subdirectory under the Application Support directory for it's path. You can change these by using a subclass of OpenDatabaseConfiguration and overriding the defaultDatabaseName() and defaultLocation() methods. This allows you to create several subclasses of OpenDatabaseConfiguration to support several defaults easily.

CompletionQueue

By default, all completion handlers for the asyncrous database methods are called back on the main queue. If another queue is desired, a the OpenDatabaseConfiguration initializer includes a completionQueue parameter that can be used to provide a custom completion handler queue for the database.

Equality

Two Database.Configuration objects are equal if their database name and file location are equal. All databases opened with equal configurations return the same database object.

Replication

Replication has two ways of controlling which documents are replicated. On the client side, you can establish a filter to prevent certain documents from being sent from the client to the server. Additionally, you can use channels to prevent the server from sending documents to the client from the server.

Filters

Assign a replicationFilterDelegate to the database to establish a filter. The replicationFilterDelegate is called to evaluate each local document and return true, if the document needs to be synchronized to the server. The delegate's method receives the document being evaluated and an optional dictionary of filterParameters. These filter parameters are set in the ReplicationConfiguration object that initiated the replication.

Channels

Channels are part of the Predix Sync service. In the metadata of all documents is an array of strings, the channels property. These are the channel names. On the server side, you can configure Predix Sync to limit the channels a user has access to. Additionally, the ReplicationConfiguration has a limitToChannels property. An empty limitToChannels property (the default) results in the client receiving all documents that the user can access. However, if channel names are added to the limitToChannels property, only those documents that contain those channels are sent from the server to the client. This doesn't override the server security settings, but further reduces the documents the client receives for that replication configuration.

Indexes and Queries

Indexes An array of indexes are configured as part of the OpenDatabaseConfiguration structure used to open the database. Database index is stored as part of the data in database after creation, therefore, when using indexes it's critical to include the index array every time the database is opened.

Indexes adhere to the Indexer protocol. A basic implementation of this protocol is used in the Database.Index class. You can subclass this class or provide your own implementation of an Indexer.

Name — The index name is a string that uniquely identifies the index and is used when running queries. It is a best practice to ensure this name is descriptive.

Version — The index version is a string that uniquely identifies the code used to map the index. Changes to an index require special handling. If any changes are made to the index closures, the _version_ string must be changed to ensure the index is properly updated. Failure to change this value when updating the code will lead to unpredictable results.

Mapping — An index is similar to a table or dictionary, where you have a key and an optional value. The key is used during the query to filter the results, and the value is extra data that is easily accessed without needing to retrieve the entire document from the database during the query execution. The job of the index's _Map_ closure is to add rows or key/value pairs to this dictionary.

The map closure is defined as:

typealias Map = (_ document: Document, _ addIndexRow: @escaping  AddIndexRow) -> Void

and AddIndexRow is a closure and defined as:

typealias AddIndexRow = (_ key: Any, _ value: Any?) -> Void

So, in the map closure, the code receives a Document, and an AddIndexRow closure. The document is then used to determine what rows to add to the index. You can add these rows by calling addIndexRow which provides the index key and the optional value.

Example:


let map = { document, addIndexRow in

	if let totalCost = document["TotalCost"] {
		addIndexRow(totalCost, document["InvoiceNumber"]
	}
}

Breaking down the above example, you have the following flow:

  1. If the document contains an element called "TotalCost"
  2. Add a row to the index where the key is this total cost,
  3. Associate the index value as the value of an element called "InvoiceNumber"

In this example system, you can run a query against this index to search for a range of total costs and getting the invoice numbers. This query is sorted by the TotalCost value, and accessing the InvoiceNumber is extremely fast since it's part of the index data.

Map/Reduce

An optional capability to Indexes is providing a Reduce function. The Reduce function is defined as:

typealias Reduce = (_ keys: [Any], _ values: [Any], _ rereduce: Bool) -> (Any)

Allows map/reduce technique queries where the result rows of the query are summarized by the reduce function before the results are returned.

The reduce function takes an ordered list of key/value pairs. These are the keys and values from the index, as specified by the query parameters. The reduce function then aggregates these results together into a single object and then returns that object.

Common use cases are to provide subtotals, or averages, or summations of data.

Rereduce

The rereduce flag is used when querying large data sets. When the data set is large the underlying system breaks the map/reduce into smaller chucks, runs the reduce function on each chunk and then run reduce function again on the reduced chunks. When this happens the rereduce flag is true, the key array is empty, and the value array contains the partial reduced values.

Example:

Given an index that emits the type string of each document, and no value.

 let reduce =  { ( keys, values, rereduce) in
 
     var result: [String: Int] = [:]
 
     // if this is not a rereduce
     if !rereduce {
         // count each unique key value
         if let sKeys = keys as? [String] {
             for key in sKeys {
                 var count = result[key] ?? 0
                 count += 1
                 result[key] = count
             }
         }
     } else {
         // This is a rereduce, then our value array will be an array of
         // dictionaries of unique key values and their counts from above.
         if let counts = values as? [[String: Int]] {
             // for each result array
             for count in counts {
                 // for each key in the result
                 for key in count.keys {
                     // count and compile a final result dictionary
                     var count = result[key] ?? 0
                     count += 1
                     result[key] = count
                 }
             }
         }
     }
 
     // Return the unique key values
     // Note that regardless of the rereduce flag the result is the same data type. 
     // This must always be the case.
     return result
 }

Sorting

Indexes are sorted according to their keys. For simple data types like strings and number types this order is obvious. However, using a key that is an array is particularly useful to achieve a grouped sorting. Elements are compared in order of their array index. For example, all the first array elements are compared, then all the second elements, etc.

Observing Queries

You can run the queries in the background and call a closure when the query results change. This is known as "observing" the query. The database function observeQuery(on: with: changehandler:) is used to observe the query. The changeHandler parameter is a closure that receives the QueryResultEnumerator whenever changes to the database cause the query results to update. This function returns a QueryObserver object. You can use the database method removeObserver() to stop observing the query and clean up system resources.