-
Notifications
You must be signed in to change notification settings - Fork 0
Contributing
Pravega really encourages anyone to contribute and to make our system better. We ask contributors, however, to read this guide carefully and follow the established guidelines. We don't claim this is perfect, so these guidelines will change over time as we feel that it is not complete or appropriate. Suggestions on how to improve this guide for contributors are also welcome.
To produce a pull request against Pravega, follow these steps:
- Create an issue: Create an issue and describe what you are trying to solve. It doesn't matter whether it is a new feature, a bug fix, or an improvement. All pull requests need to be associated to an issue.
-
Issue branch: Create a new branch on your clone of the repository. Typically, you need to branch off master, but there could be exceptions. To branch off
master
, usegit checkout master; git checkout -b <new-branch-name>
. -
Push the changes: To be able to create a pull request, push the changes to origin:
git push --set-upstream origin <new-branch-name>
. I'm assuming thatorigin
is your personal repo, e.g.,fpj/pravega.git
. -
Branch name: Use the following pattern to create your new branch name:
issue-number-description
, e.g.,issue-1023-reformat-testutils
. -
Create a pull request: Github gives you the option of creating a pull request. Give it a title following this format
Issue ###: Description
, _e.g.,Issue 1023: Reformat testutils
. Follow the guidelines in the description and try to provide as much information as possible to help the reviewer understand what is being addressed. It is important that you try to do a good job with the description to make the job of the code reviewer easier. A good description not only reduces review time, but also reduces the probability of a misunderstanding with the pull request. -
Merging: If you are a committer and you have write access, make sure that the merge message follows these points:
- It needs to start with the title of the pull request plus the pull request number, i.e.,
Issue ###: Description (#PR number)
. - Clean up the description and remove any commit messages that aren't relevant, like
PR comments
orFixed typo
. Ideally, replace all the text in the text box with the text underChange log description
in the pull request description. I say ideally because it depends on how good of a job the developer has done with the description.
- It needs to start with the title of the pull request plus the pull request number, i.e.,
Another important point to consider is how to keep up with changes against the base the branch (the one your pull request is comparing against). Let's assume that the base branch is master
. To make sure that your changes reflect the recent commits, we recommend that you rebase frequently. The command we suggest you use is:
git pull --rebase upstream master
git push --force origin <pr-branch-name>
Very important: in the above, I'm assuming that:
-
upstream
ispravega/pravega.git
-
origin
isyouraccount/pravega.git
The rebase might introduce conflicts, so you better do it frequently to avoid outrageous sessions of conflict resolving.
The base for these is Oracle Code Conventions - a pretty verbose document, if you have time to read it.
Please refer to our checkstyle rules for all enabled checkstyle rules. The same folder has code style files that can be imported in IntelliJ and Eclipse. Use them.
-
Comments
- Every comment starts with a space, followed by a capital letter (just like a real sentence). Except commented out code.
- Every comment ends with a period (it's a sentence).
- Empty line required before a line of comment (except if precended by another comment line or the first line in a code block)
- No 'author' JavaDoc tags.
-
Code format
- Empty line after the closing brace of each code block.
- If a line needs to wrap around, place each parameter on its own line (i.e., if we have 3 params on a line, then 1, then 2, it looks really ugly)
- Order class components like this:
- Class variable declarations
- Constructor
- Auto-closeable implementation (if applicable)
- Public methods (includes any interface implementations)
- Package-private methods
- Protected methods
- Private methods
- Static methods (public first, followed by private)
- Nested classes, interfaces, enums
-
Annotations
- Use the @Override attribute to mark every method that is implemented from an interface or overridden from a super class.
- All guarded fields should be annotated by '@GuardedBy'.
- Use Lombok for equals()/hash() if possible.
-
Class organization
- No public nested classes. The only exception is data-only classes (no logic).
- For example, if class Foo has class Bar that only has get() methods, this is OK. But if Bar has other complex logic baked in it, then Bar needs to be its top level class.
- Public nested interfaces or enums are OK, as long as they are mainly used inside the hosting class (in method signatures) and not used extensively outside of the class.
- Every class, constructor, method should have the minimum possible visibility. If only private is needed, then make it private. Otherwise package-private, protected, public.
- If a method needs to be made visible for testing, use the '@VisibleForTesting' tag.
- Every public method should do its best to validate arguments, by means of using Preconditions (guava) or Exceptions (our extensions to Preconditions)
- No singletons. Static constructors or factories are preferred.
- No static non-final member variables.
- No public nested classes. The only exception is data-only classes (no logic).
-
Code organization
- Imports at the top of the file should be ordered alphabetically followed by all of the static imports ordered alphabetically.
- Use "assert" to validate state when needed (after complex internal computations or at the beginning of private methods)
- Don't hide a member variable with a parameter name or local variable name. Exception: constructor arguments.
- All members should be final if possible.
- No code labels.
- Methods whose name starts with 'get' should not block the current thread. Return a Future if you need to do async work.
- Check indentation (currently disabled because checkstyle does not like lambda expressions).
- No tabs and 4-space indentation.
- Annotations should go on their own line above the thing they are annotating (AnnotationLocation with default values)
- Max line length (120).
- Javadoc
- Enforce Javadoc on every public method of every protected/public class (currently disabled because there's lots of places where this is missing).
- No fallthrough in switch statement. (IE: each case should end with break;)
- No newlines between the keyword 'new' and name of the class being instantiated.
All top level and non-private classes should be thread safe.
This is accomplished as follows:
- If a class can be made immutable; it should be.
- This is the easiest way to ensure thread safety.
- If a class has members that change; they should be protected by a lock.
- When this lock is held, calls to 'external' objects that are not 'wholly owned' by the class with the lock should not be performed. This generally implies, but does not strictly require that blocking operations not be performed while a lock is held.
- If a blocking operation is performed under lock, it should be very obvious and clear that it is absolutely necessary.
- If there is a non-threadsafe class it should be annotated @NotThreadSafe and the class that holds it is responsible for its synchronization.
- This implies that any non-threadsafe class must have exactly one owner and should not be passed around.
- When a member is protected by a lock it should be annotated with @GuardedBy and the name of the lock object.
- Prefer Lombok's synchronized to the Java synchronized keyword.
- Classes passed into or returned from methods should not be used to circumvent ownership, lock ordering etc. IE: Passing a mutable map to a setter is not ok, even if the map itself is threadsafe as the caller could retain a copy of it and modify it later to the surprise of the class holding it.
- This is generally not a problem for "top level" classes being "wired together", but other mutable class being passed into or returned from a method should be viewed with suspicion, and avoided if possible.
Within the Pravega host there are some more specific rules:
- Multiple operations can occur in parallel to each of the components of the system, even involving the same stream.
- All modifications to stream data should have a strictly defined order.
- Updates to the state for a stream should be atomic and involve no IO.
- Each read should see data from one version. (IE: If events 1, 2, 3 are written it is unacceptable for 1 and 3 to be visible but not 2)
- All IO proceeds in parallel and is performed by dedicated pools.
- Callbacks should not perform blocking operations. They may update in-memory objects and queue blocking operations to be done later.
To be consistent on what log levels mean what, use the following as a rough guide:
- Errors - Something bad happened that can't be automatically be handled, e.g., "There is no configured endpoint to connect to", "Data from X is corrupted".
- Warnings - Something bad has happened that can be handled, but you still might want to alarm on if there is a ton of them, e.g., "A connection has dropped".
- Info - Anything that would be required to be known by someone looking at a log to understand the state of the server in order to make sense of a strange error that happened subsequently. Note that this should not be frequent or very detailed that it imposes a significant performance penalty, e.g., "Connecting to host X."
- Debug - Information that is useful for debugging, it could be frequent and detailed. Keep in mind that occasionally we will want to run in production with debug on, so there should be some effort to minimize the impact to performance. E.g., "Event published successfully"
- Trace - Anything too detailed or frequent to be Debug. We do not expect production systems to run with trace level enabled.
Full stack traces should be logged at Error levels, but never at info levels or below. They should be logged at Warn level only if it is likely to have anything interesting. (e.g., A connection drop would not).
-
Specificity
- Applicability: everywhere.
- Make sure you throw adequate/specific exceptions, so that when someone does catch them, or when they see it in a log, they have a better idea on what to do.
- Example 1: Do not throw general "Exception" - unless you have no other choice.
- Example 2: Do not throw general "IOException" if you have a more specific exception that derives from IOException.
- Example 3: Do not throw general IllegalStateException if you have a more specific exception. It is OK to use Preconditions.checkState to throw this, (see below), but in case you want to indicate an object has been closed and is no longer usable, you should use ObjectClosedException (which derives from IllegalStateException, but delivers a more specific message) - callers can still catch IllegalStateException (which will handle both), or they may decide to catch ObjectClosedException and take action based on that alone.
- Create specific exceptions that derive from a general exception (example: ObjectClosedException extends IllegalStateException)
- Declaring exceptions
- You should declare each type of checked (non-runtime) exception that you may throw
- It is desirable to aggregate some of these into their closest parent exception (i.e., IOException is good, but Throwable is not OK) if the list is longer than 4 , but you need to document them in the method's JavaDoc.
- Ex1: if you have E1, E2, E3, E4 extend IOException, and method foo() throws all of them, it is OK for foo() to declare it throws IOException (provided its JavaDoc describes everything).
- Ex2: If you have E1, E2, E3, E4 extend IOException, but method foo() throws only E1 and E2, then foo() must declare it throws E1 and E2.
-
API/3rd party exceptions
- Applicability: in those modules/classes dealing directly with external (non-Pravega, non Java-library) components.
- Rethrow external exceptions wrapped in your own exception so you still have a trace of the original, without forcing the caller to have a hard dependency on that external API. Not doing this would break encapsulation - your code's users would need to know how your code works and design around it.
- A deviation from this rule is acceptable when the original is a type built into Java that makes sense. However, bear in mind that if you are working on an abstraction layer (such at Tier 1/2/Cache abstraction), then you may want to translate the actual implementation's exceptions into your own standard; otherwise every time you change the implementation you'll get a different set of exceptions, which may cause the calling code not to behave properly.
-
Preconditions
- Applicability
- Public methods arguments
- Constructor arguments (especially in those constructors which just store an argument into a private member)
- Usage
- Check whether the operation is valid given the current state of the object
- Preconditions.checkState()
- Exceptions.checkNotClosed - for those objects implementing AutoCloseable.
- Check whether the arguments are valid
- Preconditions.checkNotNull for null arguments
- Exceptions.checkNotNullOrEmpty for string arguments that cannot be null or empty
- Preconditions.checkArgument will generally satisfy the rest
- Check out Preconditions and Exceptions for a whole list of checks that can be used.
- Check whether the operation is valid given the current state of the object
- Applicability
-
Assertions
- Applicability:
- Private methods arguments
- Non-trivial computations
- Usage
- Use assertions (assert : ) to validate that your input/output/invariant is as expected
- Notes
- Assertions are not enabled by default in Java. It is highly recommended you run your JVM with the "-ea" options to enable this locally.
- If you want assertions that always fire, consider doing "if() { throw new AssertionError();}". This will always execute like regular Java code regardlessof whether "-ea" is passed or not.
- If assertions are disabled, or if is false, the statement on the right () is never evaluated, and it will have no consequences.
- Assertions are not enabled by default in Java. It is highly recommended you run your JVM with the "-ea" options to enable this locally.
- Applicability:
- If you don't know what to do with an exception, don't catch it.
- When catching an exception, never print it to the console, whether using System.out.print or Exception.printStackTrace. This is ok for your own local development, but it makes no sense in a service environment.
- Don't catch, log and then rethrow the exception, because this results in the exception being duplicated in the log.
- When rethrowing the exception, make sure you include the original exception as "cause" - otherwise it will be lost.
- When catching multiple exceptions that are handled with the same logic, the Java 8 multi-catch is preferred to using a parent class.
- Never silently swallow exceptions. Even if you think it's impossible for an exception to be thrown (but its method declares so), consider using @SneakyThrows or rethrowing it as a RuntimeException (see below)
- "Silently" means just catching and doing absolutely nothing about it. It is OK if the exception is expected behavior and you properly handle the exception, without rethrowing it.
- @SneakyThrows
- For the specific case where you know with certainty an exception cannot occur, (i.e., the IOException when writing to a ByteArrayOutputStream) @SneakyThrows can be used to force the compiler to treat the exception as a runtime exception when it is naturally a checked exception. Because it hides the exception from callers @SneakyThrows should be used judiciously, only where this behavior is desired.
- It can be used to allow an exception that is expected and handled to be thrown through an intermediate class who's signature prohibits it. When doing a CompletableFuture.thenApply(...) the function is not allowed to throw, even though if it does the exception is handled correctly by causing the future to fail with the corresponding exception. So rather than wrapping the checked exception in a Runtime which would lose the type, it can be annotated with @SneakyThrows so that the exception does not have to be wrapped in runtime and then unwrapped by the caller.
- Catching Throwable
- Java has Exceptions and Errors. Both inherit from Throwable. Normally you need to only catch Exception, but in rare occasions you may need to catch Throwable (i.e. AssertionError is an Error) so that you may fail a Future that someone else may be waiting for.
- If you need to catch Throwable, ALWAYS check if it's safe to do anything with it. Use ExceptionHelpers.mustRethrow() to verify if it's a terminal exception (OutOfMemory/StackOverflow). If so, you need to rethrow the exception immediately and not do anything else.
- Catching InterruptedException
- When you catch InterruptedException it removes the interrupt from the thread. For this reason, if something throws InterruptedException you should use Exceptions.handleInterrupted() or FutureHelpers.getAndHandleExceptions() if it is coming from a future. This ensures that the current thread will also be interrupted.
- Consider having all Pravega exceptions (except runtime exceptions (NullPtr, InvalidState, InvalidArgument)) derive from a common class - this will make it easier to spot & manage:
-
PravegaException: top-level exception.
- SegmentStoreException: top-level exception that indicates it was raised by the Segment Store Service (derived from PravegaException)
- ControllerException: top-level exception that indicates it was raised by the Controller (derived from PravegaException)
- PravegaClientException: top-level exception that indicates it was raised by the Client (derived from PravegaException).
-
PravegaException: top-level exception.
- Guidelines for when you are creating an exception type, deciding if it is runtime or checked
- Runtime exceptions should be used in cases where the caller cannot reasonably be expected to do anything about it. IllegalArgmentException and IllegalStateException are good examples of what should be RuntimeExceptions
- Checked Exceptions should be used in cases where the caller will almost certainly be handling the exception. An example of this would be an exception on an RPC.
-
Examples from SegmentStore:
- SegmentStoreException: top-level exception, which derives directly from Exception. Like the name implies, this is a very vague exception, but it does imply it has to do with the SegmentStore.
- StreamSegmentException: derived from SegmentStoreException, it indicates an exception dealing with a segment. Its constructor requires a segment name.
- Various exceptions that derive from StreamSegmentException (SegmentExists/SegmentNotExists/SegmentMerged/SegmentSealed, etc)
- The beauty of this is that all exceptions derived from this will have a common message format and are guaranteed to include the segment name. This makes debugging light years faster.
- ContainerException: similar to StreamSegmentException.
By default, all pull requests should include at least one test case. If it is a bug fix, then the test case should reproduce the issue it is fixing. If it is a new feature, then the test case(s) should cover all the new code and changes. Reviewers are expected to enforce this and request test cases from the contributor accordingly.
In some instances, providing a test case is really hard if not impossible, so it is acceptable for some pull requests to not have a test case, e.g., for some nasty race condition that is really hard to reproduce. In some cases, when doing the test is difficult but doable, we expect the contributor to go that extra mile and possibly build some classes that help with future test cases.
We also have a framework for system tests. Consider writing system tests to see how it runs in a real cluster.
We don't want a broken test case with say a deadlock preventing a build from completing indefinitely. To avoid such situations, we typically add timeouts to test case. A fine-grained way of adding a timeout
value is to set a parameter along with the @Test
annotation like this:
@Test(timeout = 30000)
Although we prefer to set a timeout to each test case individually so that we have more control, it is also acceptable to set the same timeout to all test cases in a class by using a @Rule
:
@Rule
public final Timeout globalTimeout = new Timeout(30, TimeUnit.SECONDS);
The default value we suggest is 30s, but if you really think that your test is simple enough to always complete in less, then pick a smaller value. Note that jobs share resources in some build systems, so a conservative value is desirable.
Pravega uses lombok to reduce boiler plate code. If you use an IDE you'll need to install a plugin to make the IDE understand it. See their instruction on how to do that.
Our code makes extensive use of:
These libraries should be used whenever possible rather than adding more dependencies. Additionally feel free to make use of annotations in JSR-305. However additional dependencies need to be weighed carefully. If possible create a reusable class and place it in Common rather than adding an additional external dependency.
If an additional dependency is needed it should be scoped using a gradle target as narrowly as possible, so that other code does not accidently depend on it.
We don't like TODOs in the code. It is really best if you sort out all issues you can see with the changes before we check the changes in. However, if you exceptionally need to leave a TODO comment in the code, we ask you to:
- Create an issue explaining the remaining work that needs to be done
- Reference the issue number in the comment
E.g., // TODO: Writing to /dev/null sounds like a bad idea (Issue #666)
Pravega - Streaming as a new software defined storage primitive
- Contributing
- Guidelines for committers
- Testing
-
Pravega Design Documents (PDPs)
- PDP-19: Retention
- PDP-20: Txn timeouts
- PDP-21: Protocol revisioning
- PDP-22: Bookkeeper based Tier-2
- PDP-23: Pravega Security
- PDP-24: Rolling transactions
- PDP-25: Read-Only Segment Store
- PDP-26: Ingestion Watermarks
- PDP-27: Admin Tools
- PDP-28: Cross routing key ordering
- PDP-29: Tables
- PDP-30: Byte Stream API
- PDP-31: End-to-end Request Tags
- PDP-32: Controller Metadata Scalability
- PDP-33: Watermarking
- PDP-34: Simplified-Tier-2
- PDP-35: Move controller metadata to KVS
- PDP-36: Connection pooling
- PDP-37: Server-side compression
- PDP-38: Schema Registry
- PDP-39: Key-Value Tables
- PDP-40: Consistent order guarantees for storage flushes
- PDP-41: Enabling Transport Layer Security (TLS) for External Clients
- PDP-42: New Resource String Format for Authorization
- PDP-43: Large Events
- PDP-44: Lightweight Transactions
- PDP-45: Healthcheck
- PDP-46: Read Only Permissions For Reading Data
- PDP-47: Pravega Message Queues