I’m currently working on a project where we are migrating from SVN to Git. There are a bunch of large applications with fairly complicated build processes that are going to have to be adjusted as we transition from SVN to Git. Because of this, I cannot easily just “import code and throw the switch” to start using Git. It involves multiple days/weeks of transition where we have to edit Maven POM files, change embedded unit test property files, and make other adjustments to the code while we set up automated build jobs to work with the new code.
Because of this, I need a way to import and sync code from SVN to Git over multiple weeks while I do this. There are tools such as “git-svn” out there that could help in some ways, but none of them do specifically what I was looking for. Instead, I’ve successfully used a fairly simple branching pattern to easily pull code over periodically over this multi-week process.
To do this, I use what I once saw described as a “3rd party code versioning pattern” where someone external to you will be modifying code and you get periodic updates that you might have to integrate with local changes you might have made. The concept is to create a “code import” branch that only receives code from the remote 3rd party, then merge those diffs/changes into your mainstream.
In our case, our “3rd party” is the other version control system: SVN. During my initial import into Git, I set up a permanent branch that will only receive changes that are copied in from SVN. The change set of diffs created from the periodic changes can then be merged into my on-going porting effort with Git. Consider this diagram:
In the diagram, the developers of the “ES” system are working on version 14.8 of their software. They are using the SVN ES_14.8 product release branch to make their ongoing changes. This likely involves code changes, test changes, and maybe even minor Maven POM build file changes.
On the lower GIT ES_14.8 branch, I’m making significant changes to the Maven POM build files and some of the test files. For the initial import, I just copied the code straight out of SVN and dropped it into Git. However, I can’t do that for any subsequent code changes or I will simply write over any local modifications that I might have made. I don’t want my POM and test configuration changes to be overwritten and effectively erased.
The solution is to create the middle “Git svn-import” branch. This branch is a parallel branch to SVN and only receives changes that come from SVN. Every time I want to sync with SVN, I simply export the code and drop it on top of the svn-import branch. Git calculates the change set for me so that I can tell exactly what they’ve changed in SVN. (In reality there are a few tricks you can do to make this easier than a bulk copy. More on that later.) After the change set gets properly calculated, I can merge those changes into my mainline Git branch for ES_14.8.
Every once in a while, I run into conflicts between what I’ve been changing in Git and what the developers have been changing in SVN. For eample, a developer might update a test configuration file or make a similar change to a build script that I’ve also been working on. In this case, I sometimes create a mini branch so I can resolve the changes between the svn-import branch and my Git ES_14.8 branch.
The Mechanics
The steps are pretty simple.
- Execute a SVN checkout into some directory. This will result in a directory with your code and a ‘.svn’ metadata directory for SVN
- Run ‘git init’ in that directory. This creates a ‘.git’ metadata directory in that directory
- Setup your .gitignore file to match what you use for SVN ignore. At a minimum, you need to add “.svn” to your .gitignore file to avoid version controlling all your SVN meta data.
- NORMALIZE YOUR FILES (see comments below)
- Commit your initial revision of baseline code.
- Create a new branch “svn-import” from that point in the repository. This will be used for any future imports.
Start development in Git with whatever branches you want. (master, develop, or whatever). Include any changes you want in both code and build scripting (e.g., maven pom files). During this time, you can expect the main SVN branches to change as well.
When you are ready to re-sync against from SVN back into Git, you need to copy the SVN changes onto the ‘svn-import’ branch in Git, then merge those changes into your main stream of Git development.
There are two options here: (1) you kept around your initial import branch that has both .git and .svn metadata directories, or (2) you didn’t keep them and need to just plow the code into place.
Option 1: You have a common directory with .git and .svn metadata that you did the original import. In this case, it’s pretty easy.
- Go to your main directory and make sure you have a clean working tree.
- Use ‘git checkout’ to make sure you are on the correct svn-import branch.
- Use ‘svn status’ and ‘git status’ to make sure you are in a clean, sync’d state as you left off.
- Do a svn update or svn switch or whatever commands you need to pull down the latest changes from SVN. This will modify files in place, add new files, and delete files that have been removed from SVN.
- NORMALIZE YOUR FILES AGAIN (see comments below)
- Use git status and git add and git rm to stage the newly updated file tree for check-in to git.
- Commit/push your changes into Git
- Merge the changes from the svn-import branch onto your mainstream of development branch(es).
Option 2: Setup a new directory to do the sync
- Do a git clone and git checkout to get a copy of the svn-import branch checked out into directory1.
- REMOVE ALL THE FILES from directory1 except the .git metadata directory. This step is important because if you do not remove the files, then SVN will simply lay all the files you are going to copy in the next step on top of your existing git files. If files have been removed from SVN, there will be no way of detecting that the files should not be there.
- Do a svn checkout to get the copy of code you want into directory2.
- Move all the files from directory2 (including the .svn directory) into directory1.
- Use git status and git add and git rm to stage the newly updated file tree for check-in to git.
- Commit/push your changes into Git
- Merge the changes from the svn-import branch onto your mainstream of development branch(es).
The net result is that you can periodically pull over new changesets from SVN and merge them cleanly onto your Git branches while integrating them with any “fixes” that you’ve had to make in the Git branches.
File Normalization
Importing and migrating code presents an opportunity to clean up files that you are brining across. When going into Git, two of the most important aspects of files are line endings (Windows CRLF vs. Unix LF) and file permissions (typically the “execute” bit for scripts). Both of these are easy to adjust/fix when you move from SVN to Git. The other aspect that might be interesting is fixing white-space and formatting. Resist the urge to fix formatting during a continuous import. It makes subsequent merges very difficult. We’ll leave white-space as a separate topic.
When normalizing EOL markers, there are two approaches I’ve used. The first is to write a custom script that traverses all the particular files you care about (e.g., *.java, *.xml) and runs dos2unix on the files. This works, but is fairly brute force. The smoother approach is to actually let Git do it for you. Git has built in EOL transformation with the “.gitattributes” file. I found that using the “auto” mode with a minimal configuration generally does what you want. Google can help you find the right syntax for setting it up, so I’ll leave it as an exercise to the reader.
When normalizing file permissions, I chose a custom script method. What we found with our files is that it was a chaotic mixture of content files that had the “+x” execute bit set (which they should never have) alongside executable shell script and program files that did NOT have the execute bit that they SHOULD have. In my case, I wrote a simple “find” script that REMOVED the execute bit from all files (chmod -x), then re-added it back specific types of files that needed them.
Both of these normalizations (EOL and permissions) work smoothly with subsequent merging of changes from one system to the other.
Pulling all these techniques together should allow you to periodically import changes from SVN to Git over a few days or even weeks before you flip the switch. Even if you don’t use the ongoing svn-import branch pattern, the file normalization process is quite useful when simply migrating from SVN to Git.
Key points to remember:
- Strategic branching can help with smooth conversion between SVN and Git
- Treat SVN-imported code like a 3rd party source when importing to Git
- Take the opportunity to fix/normalize EOL and file permissions during SVN to Git conversion
- Use temporary integration branches for tricky merges during SVN to Git conversion
- Keeping the .git and .svn metadata directories together simplifies updates during SVN to Git migration