Skip to content

Latest commit

 

History

History
273 lines (236 loc) · 14.4 KB

RepositoryOverview.md

File metadata and controls

273 lines (236 loc) · 14.4 KB

Development view

This document provides a quick introduction to the organization of the development process of Neo4j. It is intended for new developers, to help them get acquainted with the system.

First, the section Module Structure will discuss how Neo4j is comprised of several components and modules, which work together to provide Neo4j's functionalities. Then the organization of the source code is discussed. This includes an explanation of the directory structure of the source code repository, instructions on how to build and test the system and an overview of the configuration management of the source code.

Module Structure

This model will describe how the source code of Neo4j is split up into smaller pieces called components and modules. Note that some components or modules were skipped, because they are deprecated (e.g. cypher-plugin) or we regard them as not required for the initial understanding of the main system (e.g. testing-utils or the code examples used in the manual).

Main Components

Neo4j is comprised of several main components. Each such component is a maven component and may be comprised of one or more modules. The main components are:

  • community-build
    The main neo4j database, which can be used under a GPL license. It corresponds to the community version described here. This component can be found in the community directory of the source repository.

    If you are looking to fix a bug or add a feature to the community edition, this is the place to be.

  • advanced-build
    This component adds extra functionality to the community-build and corresponds to the advanced version described here. The code for the extra functionality can be found in the advanced directory of the repository.

  • enterprise-build
    This enterprise build adds extra features to the advanced build and corresponds to the enterprise version described here. The source code for these extra features can be found in the enterprise directory of the repository.

  • packaging-build
    This component is responsible for packaging all the other components. It consists of the neo4j-server-qa module, containing tests, and the neo4j-standalone module, which is responsible for creating standalone installers from the other components. If the standalone installers are not working (like discussed here) this component is the place to go.

  • neo4j-manual
    This component pulls together the documentation of the other components and generates a single manual from it.

    This component is only important if you want to change the main outline of the manual. For example, to add a new section or make a new top-level chapter. If you are looking to fix or expand existing documentation, the specific manual files can be found in the directory of the component that they document.

Main Modules

module-overview
This figure gives an overview of the modules of which the main components are comprised. The arrows indicate dependencies between modules. Some modules and their dependencies are colored to help keep crossing arrows apart.

We will now briefly describe the modules and their roles.

Community Modules

  • server-api
    This module provides classes which can be used to create plugins on the Neo4j server. More information about its usage can be found here.

  • neo4j-udc
    This module is a Usage Data Collector which can gather usage data to help improve Neo4j. For more information on what data it gathers and how it can be disabled, see the manual.

  • neo4j-graphviz
    The graphviz module is a library that allows visualization of Neo4j graph data using graphviz. For more information about its usage, see this blogpost by Peter Neubauer.

  • neo4j-jmx
    The jmx module offers a JMX interface to different Neo4j modules. Its details and usage information can be found here.

  • neo4j-graph-algo
    This module offers some graph algorithms which can be performed on your graphs. They can be called using any of the interfaces to the system, like the REST api as documented here, or the traversal API as documented here. Finally, the graph algorithms are integrated into the Cypher Query Language as is illustrated here.

  • neo4j-graph-matching
    This module provides an API to perform pattern matching on graphs. It is mainly intended for internal use by the Cypher Execution Engine. Although its API can be used when using an embedded version of Neo4j, this is not recommended.

  • neo4j-lucene-index This module provides indexing capabilities to allow users to lookup nodes based on their properties. As described in the manual, this way of indexing is no longer recommended if schema indexing would suffice.

  • neo4j-server
    The community version of the Neo4j server. It contains the functionality of the REST API and the WebAdmin tool.

  • neo4j/neo4j-community
    The neo4j module and the neo4j-community module are pretty much the same. The overview only shows the dependencies of neo4j-community, but these are the same as those of the neo4j module. Both modules are dependencies of other modules and therefore in use. They both still exist for historical reasons. It is important to note, however, that only the neo4j module contains documentation.

    Both modules are simply a meta-package which allows you to include all its dependencies at once.

  • neo4j-shell
    The shell module provides a simple shell to monitor a Neo4j database. It is documented here and its manpage can be found in the manpages section of the manual.

  • neo4j-cypher
    This module provides the Cypher Execution Engine, which allows users to query using the Cypher Query Language. As it is partly written in Scala, working with this module requires some extra setup as discussed here.

  • neo4j-kernel
    The kernel is the core of Neo4j. It contains the custom storage system, the embedded API of Neo4j and transaction support as listed here.

Advanced Modules

  • neo4j-server-advanced
    This module contains the extra monitoring features of the advanced version of the Neo4j server.
  • neo4j-advanced
    Just like the neo4j-community module, this is simply a meta package to allow easy inclusion of the other maven modules. This meta package includes modules from the advanced version and also includes neo4j-community.
  • neo4j-management
    The management module extends the neo4j-jmx module with extra monitoring features. These features are documented here.

Enterprise Modules

  • neo4j-ha
    The high availability (ha) module allows the Neo4j server to be clustered, to allow for fault-tolerance and read-scalability as discussed here.
  • neo4j-cluster
    This module is a library to provide Heartbeat and [Paxos][paxoslink] implementations, which are used by the high availability cluster. [paxoslink]: http://en.wikipedia.org/wiki/Paxos_(computer_science) "Paxos"
  • neo4j-backup
    This modules provides the possibility of easily creating backups, even from remote machines. The features of this module and its usage are documented here and the manpage can be found in the manpages section of the manual.
  • neo4j-com
    The communication module supports the communication between the nodes in the high availability cluster.
  • neo4j-consistency-check
    This module contains a tool to check the consistency of a Neo4j data store. It is used by the backup module.
  • neo4j-server-enterprise
    This version of the Neo4j server incorporates the high availability and clustering features into the Neo4j server. It also contains all the features of the advanced and community server.
  • neo4j-enterprise
    This meta package can be used to easily include a lot of the other modules of Neo4j.

Codeline Model

This section will cover the codeline organization of Neo4j. The code is currently hosted on Github and is mainly located in the Neo4j repository.

First, an overview of the directory structure of the repository is given. Then, the build and test approach is discussed. Finally, the use of git and Github for source code configuration management is discussed.

Overview of the directory structure

The main repository reflects the structure of components and modules as discussed in the Module Structure section.

The top-level directories in the repository contain the main components:

  • community contains the community-build component
  • advanced contains the advanced-build component
  • enterprise contains the enterprise-build component
  • packaging contains the packaging-build component
  • manual contains the neo4j-manual component

Inside these component directories, you will find a subdirectory for each module. For example, the community/cypher directory contains the neo4j-cypher module contained in the community component. The directory names may differ a bit from the module names (cypher versus neo4j-cypher), but it should not be a problem to figure out where to find a specific module.

Each module is organized according to maven conventions. So source code can be found in src/main/java/ for Java code and src/main/scala for Scala code. Tests are located in the src/test/java and src/test/scala directories. More information about the maven conventions can be found here.

Finally, each module can have documentation. This documentation is located in the src/docs folder, which is organized as described here. These documentation files can be incorporated into the manual by including them in the neo4j-manual component.

Build, Integration, Test approach

The source code of Neo4j can be built and tested using Apache Maven, a build automation tool used primarily for Java projects.

To build from the sources and run the unit tests, a simple mvn clean install in the main repository should suffice. This will also run the unit tests. If you don't want to run the unit tests, add -DskipTests to the maven call, which will skip the execution of the unit tests. If you don't even want to compile the tests, use -Dmaven.test.skip=true instead. For more information about building Neo4j, please consult the main readme. For further instructions on building the manual, please refer to the manual component's readme.

The test cases are named according to the configuration of the maven surefire plugin. At the time of writing, this configuration can be found in the grandparent pom file and the following names are allowed (using * as wildcard for any number of characters):

  • Test*.java
  • *Test.java
  • *Tests.java
  • *TestCase.java

For unit tests related to the documentation there is addition configuration in the main repository's pom file, which allows the following names:

  • DocTest*.java
  • *DocTest.java
  • *DocTests.java
  • *DocTestCase.java

Integration tests are run using the failsafe plugin. So these should be named according to the configuration of the failsafe plugin. At the time of writing the default configuration is used. So please name your integration tests accordingly:

  • IT*.java
  • *IT.java
  • *ITCase.java

Again, there is extra configuration for the test cases related to documentation. This also allows the following names:

  • DocIT*.java
  • *DocIT.java
  • *DocITCase.java

Contributing Process

If you want to contribute to the system, please read this section of the manual, which describes some general guidelines for contributing. Note that test-driven development (write tests first, code later) is recommended. Your contribution should adhere to the structure as described in section Overview of the directory structure.

Configuration management

Git is used as the version control system for the source code. Work can be done on different releases at the same time, as they are located on their own branch.

Configuration files for the Eclipse and Intellij IDE's can be found here. These files will configure your IDE to use the Neo4j coding style. There is also a configuration file for PMD located here, which can help you find bugs or code smells in your contribution.