CVS to git migration



Introduction

After using three different distrbuted revision control systems for the DSS project, the team settled on git. After the project had ended, the SDD decided to migrate all of the CVS repositories to git. This wiki describes that process.

Migration Tools

First I will say a bit about the migration tools for converting a repository from CVS to git. This migration can be a bit tricky because of the fundamental differences between CVS and git. For example, CVS commits are file based while git commits are based on change sets (multiple files). The migration tool must know how to manage the commits when converting them to change sets. Also, git branches are very different from CVS branches and not all tools migrate them in the same way. Finally, unlike CVS, git is a distributed revision control system where there is not a central repository. Surprisingly, however, this does not come into play so much during a migration.

The two migration tools that were consider for this migration were git-cvsimport and cvn2git. git-cvsimport is a tool that is distributed with git. At first it seemed to work quite while, however after reading some of the documentation on the man page we found that there are well known issues related to branches. For example, the man page explicitly states a warning a the top that says developers should never modify the branches migrated over using git-cvsimport. The following issues are presented at the end of the man page.

ISSUES
       Problems related to timestamps:
       ?   If timestamps of commits in the CVS repository are not stable enough to be used for ordering commits changes may show up in the wrong order.
       ?   If any files were ever "cvs import"ed more than once (e.g., import of more than one vendor release) the HEAD contains the wrong content.
       ?   If the timestamp order of different files cross the revision order within the commit matching time window the order of commits may be wrong.
       Problems related to branches:
       ?   Branches on which no commits have been made are not imported.
       ?   All files from the branching point are added to a branch even if never added in CVS.
       ?   This applies to files added to the source branch after a daughter branch was created: if previously no commit was made on the daughter branch they will erroneously be added to the daughter branch in git.
       Problems related to tags:
       ?   Multiple tags on the same revision are not imported.
       If you suspect that any of these issues may apply to the repository you want to import consider using these alternative tools which proved to be more stable in practice:
       ?   cvs2git (part of cvs2svn), http://cvs2svn.tigris.org
       ?   parsecvs, http://cgit.freedesktop.org/~keithp/parsecvs

We chose to follow the advice of the git-cvsimport man page and used cvs2git instead. So far this seems to have been a good choice.

How to migrate a CVS repository using cvs2git

cvs2git

  • Download and untar cvs2svn, http://cvs2svn.tigris.org
  • cvs2git can be run out of the extracted directory so cd into the directory once you have extracted it.
  • Run something like the following
     ./cvs2git --blobfile=/home/sandboxes/mmccarty/git_migration/cvs2git/git-sparrow-blob.dat --dumpfile=/home/sandboxes/mmccarty/git_migration/cvs2git/git-sparrow-dump.dat /home/gbt2/repository/sparrow --username=cvs2git --retain-conflicting-attic-files
  • This tools works by examining the CVS repository and creates two files (a git dump file and a git blob file)which can be imported into git using the git fast-import utility.
  • It is most likely that if anything will go wrong during the migration, it will happen at this point. Some of the things I have seen are the following.
    • Invalid svn filenames - Huh, svn? It turns out that cvs2git is based off of a well tested tool called cvs2svn. You can guess it's purpose. cvs2git uses a lot of the code in cvs2svn for migrations. When we encountered this in our cvs repositories when a couple of files which had binary (e.g. \001) characters in the name. This was corrected by renaming the files in the CVS repository.
    • Files in the attic and module - use the --retain-conflicting-attic-files flag (as seen in the above example).

Load dump and blob files into a new git repository

Now that the dump and blob files have been created, cd into a sandbox to initialize a brand new shiny git repo! Follow the example below using the sparrow repository. Note, here I'm showing the full paths for clarity.
  • cd /home/sandboxes/mmccarty/git_migration/cvs2git
  • mkdir sparrow
  • cd sparrow
  • git init
  • git fast-import --export-marks=/home/sandboxes/mmccarty/git_mirgration/cvs2git/git-sparrow-marks.dat < /home/sandboxes/mmccarty/git_mirgration/cvs2git/git-sparrow-blob.dat
  • git fast-import --import-marks=/home/sandboxes/mmccarty/git_mirgration/cvs2git/git-sparrow-marks.dat < /home/sandboxes/mmccarty/git_mirgration/cvs2git/git-sparrow-dump.dat
  • git checkout master

Create a bare shared repository

Now you have a new git repository with revisions imported from the CVS repository. Next, we will create a bare shared repository. For doing this, I have setup an area in /home/git1/git/integration-bare. You will also want to make sure your selected group ID is correct, monctrl in this case.
  • newgrp monctrl
  • cd /home/gbt1/git/integration-bare
  • mkdir sparrow; cd sparrow
  • git --bare init --shared
  • git --bare fetch /home/sandboxes/mmccarty/git_migration/cvs2git/sparrow *:*
    • This command will fetch all the branches and tags from the git repository you just created from the CVS repository. You use fetch instead of pull because pull will fetch and merge. Here we do not want to merge because a bare repository has no working directory, so there is no place to merge.
  • At this point you are done. It is recommended that you now go back to your sandbox and clone a new repository from the bare shared repository for development.

Repositories to Migrate (and Status)

These are the repositories in /home/gbt2/repository

Repository Status
43m  
45ft.old  
bignell  
casper  
dss Deprecated
EMS  
firmware  
gbi  
GBTdocumentation  
GBTObsTool.deleteme  
IRCv6  
mclark Deprecated
mjd  
oapi  
oui  
parParser  
poof Migrated
reservations Deprecated
rxlab  
Starlink  
themis  
ygor Migrated
45ft  
atlas Migrated
bos Deprecated
contrib  
doc  
DSS Deprecated
fg_util  
gb Migrated
gbt Migrated
gbtidl Migrated
GBTServo  
MarkIII  
metrology  
modules  
oofholography  
PennArray  
pro  
rfi  
sparrow Migrated
temp  
VibMonitor  

This topic: Main > TWikiUsers > MikeMcCarty > CVS2GitMigration
Topic revision: 2011-12-09, BobGarwood
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback