Skip to content

Per project management and workflow

Vladimir Kotal edited this page Apr 1, 2020 · 32 revisions

Motivation

OpenGrok can be run with or without projects. A project is simply a directory directly underneath the OpenGrok source root directory. A project can have zero or more Source Code Management repositories underneath. In a setup without projects, all of the data have to be indexed at once. With projects however, each project has its own index so it is possible to index projects in parallel, thus speeding the overall process.

When working with project data, there are 2 types of processing that can take a long time:

  • synchronization: updating project data so that it matches its origin
    • usually involves running commands like git pull in all the repositories for given project.
  • indexing: updating the index so that it matches the project data

For some projects either or both steps can take a long time. Say you have a repository that has its origin residing on a NFS share across the Atlantic so it has high latency plus it uses legacy VCS that operates not on changesets but on individual files and therefore the repository takes a long time to synchronize. Or, there is a repository that has a large number of files so the initial phase of indexing always takes a long time (due to scanning the whole project directory tree for changed files) even though the incremental changes are small.

Or maybe there is lots of lots of projects that exhibit some of these characteristics.

Previously, it was necessary to index all of source root in order to discover new projects and put them to configuration. The thing with indexer is that either it has to discover projects and their repositories during the indexing preparation or it has to know them in advance.

Starting with OpenGrok 1.1, it is possible to manage and index projects separately.

As a result, the indexing of complete source root is only necessary when upgrading across OpenGrok version with incompatible Lucene indexes.

Combine these procedures with the parallel processing tools (see repository synchronization) and you have per-project management with parallel processing.

The following examples assume that OpenGrok install base is under the /opengrok directory.

Workflow

TODO

Building blocks

The following is assuming that the commands opengrok-projadm, opengrok-groups and opengrok-config-merge tools are in PATH. You can install these from the opengrok-tools python package available in the release tarball.

Using the opengrok-projadm tool (that utilizes the opengrok-config-merge tool and RESTful API) it is possible to manage the projects.

Configuration backup

The next sections start by suggesting to backup current configuration. This could be done by e.g. copying the configuration.xml (that is written by the indexer when using the -W indexer option) file aside, taking file-system snapshot of the directory the configuration is stored in etc.

This is necessary as a prevention if something goes wrong.

Indexing a project

The indexing part of the wiki explains how to run the indexer.

TODO

Adding and indexing a project

  • backup current config
  • add the project data to a directory under the source root directory
    • this usually involves running VCS command such as git clone, extracting source code from an archive, etc.
  • perform any necessary authorization adjustments
  • add the project to configuration (also refreshes the configuration on disk):
   opengrok-projadm -b /opengrok -a PROJECT
  • change any per-project settings (see Web services)
  • index the project
    • it is recommended to use the opengrok-reindex-project script (it downloads fresh configuration from the webapp)
  • save the configuration (this is necessary so that the indexed flag of the project is persistent). The -R indexer option can be used to supply path to read-only configuration so that it is merged with current configuration.
   opengrok-projadm -b /opengrok -r

Deleting a project

  • backup current config
  • delete the project from configuration (deletes project's index data and refreshes on disk configuration). The -R indexer option can be used to supply path to read-only configuration so that it is merged with current configuration.
   opengrok-projadm -b /opengrok -d PROJECT