Skip to content
This repository has been archived by the owner on Sep 16, 2024. It is now read-only.

Loading data

Rob Rudin edited this page Sep 7, 2023 · 6 revisions

The 3.13.0 release provides a new command for loading data via the support provided in ml-javaclient-util. The intent of this is to provide a simple mechanism for loading data that should be part of any deployment. The support for loading files from disk as documents is already used for loading modules and schemas, so this is just extending that support for loading arbitrary data.

For an ml-gradle project, it's likely you'll still have plenty of reasons for using mlcp to load data, as mlcp provides a number of useful options for parsing delimited data and loading data from zips. This feature is focused strictly on loading a directory of files into a database such that the documents in the database mirror those in the directory.

This feature has the following default behavior:

  1. src/main/ml-data is the default path for finding files to load.
  2. Any file in any data path will be loaded with a URI relative to the data path that it belongs to; e.g. the file src/main/ml-data/my/data/test.json will be loaded with a URI of /my/data/test.json.
  3. Collections and permissions can be specified via files in each directory.
  4. The files will be loaded via a DatabaseClient that uses the port defined by appConfig.getRestPort(). This can be overridden by specifying a database name to load files into, in which case appConfig.getAppServicesPort() will be used for making a connection.

The following properties are available for configuring this feature (all, unless otherwise noted, were introduced in 3.13.0):

Property Description
mlDataBatchSize The number of documents to include in each call to MarkLogic. Defaults to 100.
mlDataCollections Comma-delimited list of collection names assigned to each document. No default value.
mlDataDatabaseName Database to load documents into; if set, then ml-app-deployer will connect via the App-Services app server to load the documents. No default value.
mlDataLoadingEnabled Whether this feature is enabled. Defaults to true.
mlDataLogUris Whether the URI of every document inserted should be logged. Defaults to true.
mlDataPaths Comma-delimited list of data paths. Defaults to src/main/ml-data.
mlDataPermissions Comma-delimited list of permissions (role1,capability1,role2,capability2,etc) assigned to each document. No default value (which typically means you'll get rest-reader/read and rest-writer/update as permissions on each document).
mlDataReplaceTokens Whether tokens should be replaced in each document, where tokens are obtained from the custom tokens map on the AppConfig object. Defaults to true.

Filtering content

Starting in 3.15.0, the DataConfig object belonging to AppConfig defaults its fileFilter property to be an instance of DefaultFileFilter, which ignores every file starting with a "." or in a directory starting with ".".

Cascading collections and permissions

As of 4.6.0, the properties mlCascadeCollections and mlCascadePermissions can be set to true so that the settings in collections.properties and permissions.properties will be applied to child directories, unless a child directory has its own files.