-
Notifications
You must be signed in to change notification settings - Fork 173
Contributors
If you are interested in being a contributor, we will need you to complete a contributor agreement. Please contact the project managers for more details.
In order to have wiki edit permission, you must have write access to the blazegraph/database github repository.
Please use the GitHub Issues for project discussion.
The previous sourceforge developer list is archived and can be searched using this link:
http://sourceforge.net/mailarchive/forum.php?forum_name=bigdata-developers
The main development branch should remain stable at all times. Releases are tagged as branches for maintenance. Major change sets should be created in branches (see below). Discussion regarding the project should take place on the developers list so everyone can participate and benefit.
Consistency and coherence in the architecture and the implementation is critical for databases correctness and performance. Coordinate with component owners before making changes to those components. When in doubt, ask first on the developers list. Final resolution for questions concerning the database architecture will be made by the project administrators.
Issues are maintained on JIRA.
Developers must:
- file an issue on jira.blazegraph.com for any planned work;
- accept the issue before making changes; and
- update the status for accepted issues at least weekly (Friday morning).
This provides everyone with oversight on planned and active change sets via the jira dashboard and makes it easier to minimize conflicts in the code base.
The proper process for getting changes into the code base is:
- Discuss the feature on GitHub. Do this first to make sure that the concept has traction with the developer community. Make sure that you are subscribed to the mailing list first since it will not accept email if you are not subscribed.
- Create a ticket for the feature.
- Create a feature branch.
- Do your work in that branch.
- Make sure that you have not broken the tests.
- Create a pull request.
Do not commit to the master. Changes will be merged to master from the pull request by one of the project maintainers.
There is an extremely nice feature in the EGit integration when you can hover over a line of code to see who last modified it. Make sure that EGit is installed. Configure the GIT perspective to point to your local git repository. Right click on an editor and select Team => Show Annotations.
We strongly recommend taking an hour to work through a Git tutorial:
https://www.atlassian.com/git/tutorials/using-branches
Branching and merging is much, much easier under git. If you want to create your own new branch:
git checkout -b my_branch
If you want to checkout out someone else's branch:
git checkout --track origin/daves_branch
To revert to master
git checkout master
Note: The following will put you in a detached head state where your local repository will not track the remote repository.
git checkout origin/master
This is generally undesirable. To recover from this do
git checkout master
To checkout a tagged release, do the following.
git checkout tags/BIGDATA_RELEASE_1_5_0
Again, this puts you in a detached head state so do
git checkout master
to get back to master.
If your feature branch is behind master, you can pull up changes using the following command:
git merge origin/master
Individual developers interested in exploring new concepts may create a private branch to serve as a sandbox in which they can explore those ideas without introducing changes into the trunk.
To discard all changes and revert to a previous commit Find the commit point to restore (or just look at github).
git log
Reset to that commit point:
git reset --hard FULL-COMMIT-HASH
CI results are available at [2]. You can download the result of test suite runs. There is an additional artifact for analyzing the logs for the HA CI test suite. If you need a specific branch to be entered into CI, please contact one of the project admins or if you have GitHub Access try the guide below.
Jenkins is accessed using your GitHub credentials. It is configured to pull automatically from GitHub and spawn up to four EC2 instances dynamically to handle the workload of CI. The authentication is tied to the Github credentials. In general, you should not need to create new Jenkins jobs as the CI should be run through the Github Pull Request integration.
ANT_OPTIONS="-XX:MaxPermSize=256m -Dfile.encoding=UTF-8 -Xmx8g -server -Dsun.jnu.encoding=UTF-8"
The maven options are set in the global Jenkins configuration, but are also included here for reference.
MAVEN_OPTIONS="-XX:MaxPermSize=256m -Dfile.encoding=UTF-8 -Xmx8g -server -Dsun.jnu.encoding=UTF-8"
To get a thread dump, you must have your SSH public key installed on the Jenkins SSH Slave EC2 image. Create a JIRA ticket to make this request and include your public ssh key. Then, determine the SLAVE IP that ran the job and ssh directly to the machine to grab the thread dump.
Generating an SSH Public Key ssh-keygen -t rsa -b 2048 -f ~/.ssh/blazegraph Hit enter twice for a blank pass phrase or choose one.
cat ~/.ssh/blazegraph.pub
Include the output of your public key in the JIRA ticket.
To initiate a CI run from GitHub, first create a Pull Request (PR) from GitHub. The CI job will run automatically. If you need to retest without a code change, in the comment of the PR, include the text Computer, please test this. This will trigger an automatic CI run in Jenkins. The results will be posted back into the PR and you can take an appropriate action based on the results of the CI (Success, Failure, Error).
* (CI job for master for github-module). The exception is the bigdata master, which is called GIT_DEVELOPMENT_MAVEN.
*-PR-tester (Pull request tester for github-module), i.e. bigdata-github-maven-PR-tester
Bigdata has a large and growing test suite. Whether you code the unit tests first or after, do not commit code without writing a test suite for that code and verifying the test suite for your changes plus any affected modules. When in doubt, ask or run the entire test suite. After you commit, please review the CI results to see if you have broken anything.
Some of the test suites in use a "proxy" pattern to allow the same test suite to execute against different implementations or parameterizations of a given implementation. This feature is heavily used to:
- exercise different backend storage models (the RWStore, MemStore, etc.);
- run (nearly identical) test suites in triples vs RDR vs quads modes; and
- exercise the REST API test suite against both embedded and scale-out architectures.
You can specify the delegate for the proxy using
-DtestClass=fully-qualified-class-name
You may need to hunt around a little bit (typically in the TestAll suite) to figure out what are the different proxy class names that you can use on a given proxy test suite. Some of the common ones are:
- TestBigdataSailWithQuads
- TestLocalTripleStore
- TestRWJournal
- TestWORMStrategy
- TestNanoSparqlServerWithProxyIndexManager
You can run the entire test suite using:
mvn clean package
You can run the tests in an individual class in the test suite using:
mvn clean package -Dtest=com.bigdata.journal.jini.ha.TestHA1GroupCommit
An example using the proxy test suite from the command line:
mvn -DtestClass=com.bigdata.rdf.sail.webapp.TestNanoSparqlServerWithProxyIndexManager -Dtest=Test_REST_ASK test
Many of the test suites can be run directly under eclipse. However some of the test suites do have dependencies on external services that must be running before the tests are executed:
- The external text index feature depends on SOLR
These external resources are setup through the maven POM associated with the appropriate projects. You can also start these resources yourself and there are examples on how to do this at the bottom of this page for HA/scale-out.
Add this to Manage Jenkins => Configure under "Global Properties" "Environment variables". The specific path depends on the version of yourkit that is installed on the CI node.
LD_LIBRARY_PATH /nas/install/yjp-2014-build-14100/bin/linux-x86-64
Add this to the Advanced options for the jenkins project configuration, e.g., where it says "-server -ea" etc. This specific command begins the profiler with everything disabled. Once you connect to the process, you can then selectively enable things. Replace port=XXXXX is something like port=10001. This is the port that you will use to connect to yourkit. See here for the background on setting this up.
-agentlib:yjpagent=disableexceptiontelemetry,disablestacktelemetry,port=XXXXX
Setup local port forwarding for the CI machine and ssh into it. Again, replace XXXXX with the specific port.
# ~/.ssh/config
Host ci.bigdata.com
#...
LocalForward XXXXX localhost:XXXX
You can then start yourkit locally and connect to the running CI job (if any).
Everyone who is a contributor is bound by a signed contributor license agreement (CLA).
Your contributions MUST be your own work. DO NOT incorporate code from other projects or other sources. There MUST be an explicit contribution made the the copyright holders before 3rd party intellectual property may be incorporated into the project. Please refer any such matters to the project administrators.
The choice of a dependency is very important and must be made in consultation with the project administrators. In addition to choosing technically sound dependencies, there are also a number of legal rules that must be followed to properly acknowledge the copyright for the dependency and a number of administrative tasks that must be performed to ensure that the dependency is correctly integrated into development, CI, and the various deployment environments.
You MUST NOT add a dependency without contacting the project administrators.
The following all need to be addressed when adding a dependency:
what | definition |
---|---|
build.properties | The dependency version number needs to be declared. |
build.xml | The dependency needs to be integrated into the WAR, stage, bundleJar, and javadoc (external links), and various other deployment targets. This is both tricky and vital. |
pom.xml | The dependency needs to be declared. |
Depends.java | The dependency needs to be declared. This is responsible for generating the list of dependencies at runtime as part of the banner. |
bigdata-XXX/lib | The dependency needs to be placed into an appropriate library directory with the correct bigdata module. The choice of the module depends on the scope in which the dependency will be used. |
bigdata-XXX/LEGAL | The license for the dependency must be placed into the LEGAL directory within the module in which the dependency is housed. The name of the license should include the name of the dependency. E.g., "jetty-license.txt". Many dependencies have the same license, but a separate license file MUST be present for each dependency. |
bigdata/NOTICE | This file must include any text from a NOTICE file associated with the dependency. This is a requirement of the Apache license! |
You MUST verify that the license associated with a dependency has not changed BEFORE updating that dependency.
You MUST NOT update a dependency if there is has been license change. Instead, refer the matter to the project administrators.'
The correct comment block for the head of each source file is the GPL license block as follows:
/*
Copyright (C) SYSTAP, LLC 2006-2014. All rights reserved.
Contact:
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
[email protected]
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
Author tags should be provided on each class you create and on each class where you make major changes. This helps us to track who are the most knowledgeable people for a given class.
Please use the follow tags to mark todos in the code:
- FIXME - Encouraged for more important tasks.
- TODO - Encouraged for minor tasks or possible future directions in the code.
Please:
- Set margins to 80 columns.
- Wrap comments and code at the margin.
- Set display width of tabs to 4 spaces and set editor to convert tabs to spaces (4). In Eclipse, to set tabs to spaces, there are two settings that must be updated:
- Preferences => Java => Code Style => Formatter => Indentation => Tab Policy := Spaces Only
- Preferences => General => Editors => Text Editors => [x] Insert Tabs for Spaces
- Please do not broadly reformat existing code, especially code for which you are not the primary maintainer, since that makes it significantly more difficult to handle merges.
Each class which will have log output should declare its own logger. Loggers should be private, static, and final. Logging at INFO, DEBUG, or TRACE MUST be condition using the pattern:
if(log.isInfoEnabled() {
log.info(...);
}
Conditional logging is critical for performance. Generating log messages (when they are not directly given strings such as "Hello") produces a tremendous amount of heap churn from String concatenation. Heap churn is evil and must be avoided for performance. Hence, the conditional logging pattern.
Do NOT use either System.out or System.err in anything other than a main() routine. It is very difficult to locate the code where such output is being produced and unconditional output not only drives the heap, but it also clogs the CI servers since CI buffers the output of the test suite in memory during the test run.
Eclipse based developers can obtain colorization of their output using grep-console. The defaults colorize java.util.logging output. They can be edited (by removing the square brackets) to also colorize log4j colorizing.