An API for reading, editing, and writing SVN dump files.
You can use this library to modify the history of existing Subversion repositories. Some use cases are:
- removing large binary files from the Subversion revision history
- adding a revision
0
so that you can upgrade old repositories to work with a newer version of Subversion tools - convert the Subversion repositories to other version control systems, like git (no support for this out-of-the-box, but you can process all the Subversion history and execute corresponding
git
commands)
SVN dump files are created via the svnadmin dump
or svnrdump dump
commands, and contain all the
history of an SVN repository.
Example command for dumping a random Sourceforce project:
svnrdump dump https://svn.code.sf.net/p/barbecue/code > barbecue.dump
This will create a file named barbecue.dump
which follows the SVN dump file format.
The SVN dump file format is a "serialized description of the actions required to
(re)build a version history" (see original docs).
An SVN dump file contains a list of revisions
(see Revision
), and each
revision contains a list of nodes (see Node
).
Revisions can have properties such as author, date, and commit message. Nodes can have properties too, which are maintained on a node by node basis.
I'm not the first one to have this idea. Here are some links:
- svndumpfilter: comes with svn, limited functionality
- svndumpmultitool: very similar project to this one, written in Python
The SvnDumpFileParser
is an auto-generated parser for SVN dump files
(files created with svnadmin dump
). It will
parse SVN dump files into a Repository
object.
The Repository
representation is
meant to be very light-weight and does minimal validation.
The parser is auto-generated using JavaCC (Java Compiler Compiler) from the svndump.jj
gramar file.
This grammar generates a parser that is dependenent on the Java interfaces and
classes in this project.
To get an svn log
-like summary of your dump file, you can use the
RepositorySummary
(sample output here).
A RepositoryConsumer
consumes the various pieces of a Repository
. Specializations of a consumer are:
RepositoryMutator
: changes the Repository in some wayRepositoryValidator
: validates the correctness of the Repository in some wayRepositoryWriter
: write the Repository in some format
Consumers (and therefore any of its specializations) can be chained together to achieve complex operations on SVN dump files using the continueTo(RepositoryConsumer)
method.
The API allows for changing of an SVN dump file via
RepositoryMutator
implementations.
Some useful mutators are:
ClearRevision
- empties a revision (removes all changes, revision is preserved so that references to revision numbers still work)PathChange
- updates file/dir pathsNodeRemove
- removes an individual file change from a revisionNodeAdd
- add some newly crafted change to a specific revisionNodeHeaderChange
- change a specific property on an existing SvnNode
To apply multiple mutators in sequence, you can chain them together, using RepositoryConsumer.continueTo(RepositoryConsumer)
.
When you start messing with your SVN history via the mutators, you can be left
with an SVN dump file that cannot be imported back into an SVN repository. To
make changing SVN history easier the API has the concept of a
RepositoryValidator
.
Validation is done while the data is in memory, which is much faster
than running it through svnadmin load
.
Some useful validators:
PathCollisionValidator
- checks that file operations are valid (don't delete non-existent files, don't double add files, check that files exist when making copies)
The bin/run-java
shell script will run the CliConsumer
.
The current usage pattern is to modify the CliConsumer
and create your chain programmatically, then do:
mvn clean install dependency:copy-dependencies
cat file.dump | ./bin/run-java > output.dump
or, if your repository is too large for a single file:
mvn clean install dependency:copy-dependencies
svnadmin create /path/to/newrepo
svnadmin dump /path/to/repo | ./bin/run-java | svnadmin load -q /path/to/newrepo
To see how all these pieces fit together to allow you to edit SVN history, you can look at a SVN repository cleanup that I did for the AgreementMaker project. All the operations to the SVN dump file are detailed in this test.
Parsing an SVN dump file is straight forward. Here's an example that uses a single consumer (writes the SVN dump to STD OUT):
RepositoryInMemory inMemory = new RepositoryInMemory();
InputStream is = new FileInputStream("svn.dump");
SvnDumpFileParser.consume(is, inMemory);
Repository svnRepository = inMemory.getRepo();
See SvnDumpFileParserTest
for usage patterns of the parser.
To get a JaCoCo coverage report, run the following:
mvn clean test jacoco:report
The coverage report output will be in HTML format in target/site/jacoco/index.html
.