Cookbook for Modifying CasaCore on github

Note: latest revision 1 September 2015

The cookbook is intended to help new users perform a modification to CasaCore and get it into the main CasaCore codebase. This codebase is located on github.com so the use of git is required; git was a version control system created by Linux Torvalds while github.com is a commerical site that provide a cloud-based version control system based on git. The cookbook assumes that the user has made the changes and is happy with them using the CASA build and test sytem. Then they will move those changes into a separate, local CasaCore code base (a clone of a github repo); this might be as simple as copying the modified files from the CASA code tree to the git repo or it might require merging the changes in. Once we get more experience with git and github we might be able to streamline the process.

Once your changes are merged into CasaCore on github they will eventually migrate back to CASA and be accessible through our SVN system. Again, this process will improve in the future but might be somewhat manual and request-driven for a while.

Table of Contents

Setting Up github

The first step is for you to set up and configure an account on github.com.

Getting github Account

NOTE if you are using an older browser, such as firefox on NRAO's standard RHEL 5 install, you may need to use a newer browser. I switched to Safari on OSX 10 and that seemed to work for me
  1. Open the URL "https://github.com".
  2. Push the "Sign up for =github=" button
  3. Provide a username. I used JimNrao but feel free to be as creative as you like.
  4. Provide an email address. I used my NRAO address.
  5. Give a password Don't use your NRAO password here
  6. You will receive an email asking you to confirm your registration. Do that.
  7. Select a free plan when it asks in the next step.
  8. The other steps ought to be fairly straightforward.

Generating and Installing SSH Keys

From Linux the consensus seems to be that access github using the ssh protocol is easier than using the github=-recommended =https protocol. Using the ssh protocol is easier if you make use of ssh keys to provide identification when you attempt to perform an operation involving a repo on github. There are two required steps: creating an SSH key on your computer and then installing it on your github account. If you don't want to enter the ssh key passphrase each time you issue a git command involving the github repo, you can configure ssh-agent to allow you to enter the passphrase once per login session.

Creating SSH Keys on your computer

If you've already created an SSH key on your you can skip this step. These keys have been previously created if you have id_rsa and id_rsa.pub in ~/.ssh/ . You will need to know the passphrase for this key; if you have forgotten it, you will need to create a new key.

From the shell issue the command below to generate a key.

ssh-keygen -t rsa -b 4096 -C "your_email@nrao.edu"

The program will know where to save the key. The default, ~/.ssh/id_rsa, should be fine so press enter.

Next you'll be asked for a passphrase to protect this key. Pick a good one and save it somewhere safe since if you forget it you'll have to discard the key. You typically have to provide the passphrase anytime you use the key although using ssh-agent ( see below) you'll only need to enter it once per session.

When you're done ssh-keygen will output something like this:

Your identification has been saved in /Users/you/.ssh/id_rsa.
# Your public key has been saved in /Users/you/.ssh/id_rsa.pub.
# The key fingerprint is:
# 01:0f:f4:3b:ca:85:d6:17:a1:7d:f0:68:9d:f0:a2:db your=email@example.com

Installing Keys Into github

  1. Go to https://github.com and sign in.
  2. View your profile (click on the funny icon marking a dropdown in the upper right corner of the screen).
  3. Click the "Edit Profile" button (has a pencil icon) located near the upper right of the screen.
  4. From the left column titled "Personal settings", click the "SSH keys" item.
  5. The main panel of the screen will be titled SSH keys and show you and keys associated with the account.
  6. Click the "Add SSH key" button at the upper right of the "SSH Keys" panel.
  7. Give it a title.
  8. On your computer open the file ~/.ssh/id_rsa.pub using the One True Text Editor (although I suppose a lesser one like vi could be used ;-). EDITOR NOTE: Goodness, some of us are very editorist, aren't we? I prefer my editor-fluid existence smile . Embrace the differences!
  9. Select the entire text in the file (including any whitespace, line breaks, etc.) and copy it into the clip board.
  10. Paste the text you just copied into the box labelled "Key" on the github "SSH Keys" page you opened above.
  11. Press "Add Key".

Voilà

Forking CasaCore

The next step is to create a fork of CasaCore in your personal github repo.

  1. Open the URL github.com/casacore/casacore (the first casacore is the account name and the second casacore is the name of a repo on that account).
  2. Fork the repo by clicking on the button labelled "Fork" ("y" shaped icon on button as well). This will ask you where you want to fork it to; select your github account.
  3. If you got to your account, you will now see a repo named "casacore".

Configuring your computer

Now that your account on github.com is configured and you have a branch of casacore located on your github account it's time to setup your local computer.

Making a Repo on your computer

You now need to get a copy of your github repo onto your computer and also do a bit of plumbing. FIrst off create a directory to hold it. I created one called /home/orion/casa/github (when you clone the repo it will create a casacore subdirectory below this one) but put it wherever you like, although I would keep it separate from your main source tree.

Now clone the the repo by doing:

git clone ssh://git@github.com/YourAccountName/casacore

If the ssh route does not work for you (as it does not work for me), you can try git clone https://github.com/YourAccountName/casacore.git

You should now have a local repo; in my case it lives at /home/orion/casa/github/casacore. So now you are involved with three git repos:

  • The one on your computer.
  • The one on your github.com account
  • The one on github.com/casacore.

A pitfall and a solution

github suggests using https as the protocol for connecting to repos located there; however, it turns out to be more convenient for Linux users to use ssh instead. If you clone your repo (maybe do other things?) using github's suggested https+ protocol your future git commands might returns "fatal: HTTP request failed"; in this case, you might need to do some surgery.  From the top directory of your local repo (it will contain the directory ".git") change into ".git".  Then issue the shell command =sed -i 's^https://^ssh://git@^' config (edits the file "config" in place replacing "https://" with "ssh://git@". This will make the git commands work (probably breaks access if using a GUI-based tool?).

Some plumbing

The relationship between your local repo and any others can be seen by doing, within your repository directory, git remote -v. At this time it will show you a connection to an "!MyGithubRepo" repo which is located on your github.com account. For future work we'll add another read-only connection to the casacore repo:

git remote add GithubCasacore git://github.com/casacore/casacore

Note: git commands have to be issued somewhere within the directory tree where you cloned the repo. Otherwise you'll get an error message stating that you are not in a git repository.

This can be useful if you need to update from casacore prior to submitting your changes to casacore. Do the git remote -v command again to see the new "!GithubCasacore" connection.

The remote is read-only to help avoiding accidental updates of the main repository. Directly pushing to the GithubCasacore remote is discouraged, changes should go over a pull request so they are tested by the CI system and to give other people a chance to review them. For trivial changes that can be directly pushed to GithubCasacore it is recommended to type out the read/write url instead of using a named remote.

Branching

The main branch of a repo is usually called "master" (equivalent to our SVN "trunk" branch). You should not directly modify that one. Instead you should create a feature branch from the GithubCasacore master branch:

git fetch GithubCasacore # update the GithubCasacore remote

git checkout GithubCasacore/master -b MyBranch

Name your branch something short but relevant to your modifications. The above command actually does two things: it create a new branch and checks it out. Checking it out means that the source files are now a snapshot of the branch; thus any mods will be committed to the branch. If you type git branch it will list all the branches in the repo and indicate the currently checked out repo with an asterisk. At this point you should have two branches: master and the one you just created. You can switch between branches by using git checkout A_Branch without using the "-b" option.

Making and Committing the Mods

Finally, you're ready to start modifying the code.

Modify the Code

You can now modify the code in your repo as you desire. Whenever you like you can commit your changes to the local repo. Do this as often as you like since the changes do not leave your local repo until you explicitly cause them to leave.

Commit the Changes Locally

The current status of the file modifications can be seen by doing git status. This will show you what branch you are on, what files have been modified (under "Changed but not updated", what files have been marked as being ready to commit—in git parlance the files are said to be "staged"—(those below "Changes to be committed:"). Only files that are identified as being committed will take part in the commit operation. Files can be staged in several ways:

  • git add theFile _you can use wild cards, multiple files, paths, etc.
  • git add -u adds all files in the repo that were updated since the last commit
  • git add -p patch mode add; it goes through every changed block of code and asks whether it should become part of the commit
  • git reset theFile removes a file or files from the set of files to be committed

Now that the appropriate files are ready to be committed, it's time to pull the trigger:

git commit

This will pop up a text editor containing some information about the commit: the branch the changes are being committed to and the files that make up the commit. At the top of the buffer, add any commit comment you desire; lines starting with a "#" will be stripped out of the saved comment. You can abort the commit by exiting the text editor without saving the changes. When you save the text editor contents and exit the editor the commit to the current branch is completed. Remember, because of the distributed nature of git you can commit as often as you want; this will allow you to keep personal versions of your modifications as you progress so that you can back up if you make a major mistake.

The commit message convention for git is a 72 character subject line followed by a empty line. Then a longer description aligned to 80 characters. github issues can be linked with gh-<issue-number> or #<issue-number>. The number will be replaced with a link to the issue automatically in the web gui. If prepended with closes the associated issue will be closed automatically when the commit is merged.

Two others commands can be useful, though they are a little bit more advanced:

  • git commit --amend which can be used to update the previous commit and could be helpful to clean up before pushing the results to a github repo.
  • git rebase -i which can be used to combine several commits into a new one; this could be used to clean up a messy sequence of commits into a single, cleaner commit thus keeping the github repo and history as simple as possible.

Push Changes to your github repo

When all of the desired changes have been committed it's time to get them off your machine and back to github. This is called "pushing" the changes to another repo. But first one should check that the created branch contains the correct changes by looking at the changelog from the branch point (usually GithubCasacore/master) to the HEAD of the currently checked out branch:

git log -p GithubCasacore/master..

If there are unexpected commits they can be cleaned up via git rebase -i GithubCasacore/master.

If ssh did not work for you when you did previously did git clone above, but https did, you will want to do git remote set-url GithubCasacore https://YourAccountName@github.com/YourAccountName/casacore.git

When everything is in order, push it to the fork via:

git push GithubCasacore MyBranch

This command pushes the changes made on MyBranch to my repo's "!MyGithubRepo" (use git remote -v command to see the URL associated with "!MyGithubRepo"). On successful completion of the push, your changes now reside on your github repo. If you open (or refresh) your github account page (URL github.com/YourAccountName/casacore) you can now see the changes. You can select your new branch using the "branch" dropdown located just below the big horizontal line. Once your branch is selected you can click on a link starting with "latest commit" (line immediately above the directory listing of the repos, on the right side) to see what you added to the branch.

Making pull request

At this point your changes are sitting on the github repo where the are publicly visible (your repo is termed "public" by github but you have to explicitly grant others the right to make changes to the repo). The next step is to migrate the changes to CasaCore proper. This is done by generating a "pull" request. You "push" to your own repo but you have to ask an owner of another repo to "pull" your changes into their repo.

  1. Open the casacore page in your browser (github.com/casacore/casacore).
  2. Click on "Pull requests" in the far right column. This shows the currently open pull requests (a filter setting can allow seeing other subsets of the pulls).
  3. Click on the "New Pull Request" button. A page titled "Compare Changes" appears. Click on the link "compare across forks" in the text immediately below "Compare Changes". This will add some additional buttons below the text.
  4. Select the head repo for the compare using the third dropdown from the left. In the dropdown select the entry marked "YourAccountName/casacore".
  5. Select the appropriate branch in your repo using the fourth dropdown.
  6. Now push the "create pull request" button.
  7. Places to add a title for the request and a detailed description appear. Fill them out.
  8. Press the "create pull request" button and it's created.

Once the pull request is created one of casacore's "owners"—github parlance for people having modify privileges—will need to merge the pull request into casacore's master branch. You should assign the pull request to the appropriate "coordinator" for the relevant code modified. Go back to the MyGithubRepoal pull requests page (maybe refreshing it) and you'll see your pull request. Click on a pull request and a page similar to the "comments" section of a JIRA ticket will appear. This provides a way for people to comment on the pull request; the automated continuous integration system will also make comments (the NSA normally keeps their opinions to themselves ;-). Sometimes the pull request may require modifications to address issues from people or the automaton (e.g., mod fails to build with current casacore); ithe next section looks at modifying your pull request.

When code is added or modified in a pull request, the travis continuous integration process builds the code and runs the CasaCore unit tests on it. As currently configured the committer will not get an email if travis has problems with the commit. Thus, the user should check back shortly after making a commit to the pull request to see if problems have occurred.

Revising code on pull request.

Modifying the pull request is fairly simple:

  1. Make the mods in your local repo (be sure you're on the correct branch).
  2. Commit your changes to your repo.
  3. Push the changed branch up to your personal github repo. If you get some kind of "out of sync" message, first check that your code tree is up-to-date (git diff origin/MyBranch) then you may have to use -f to force the push.
  4. The changes are automatically used to update the pull request on casacore (pretty cool). The notice of the update to the pull request shows up in fairly small print on the pull request page.
  5. Make a comment on the pull request (at github.com/casacore/casacore) since no email notifications are sent with the update. Otherwise your pull request will just sit there.

If modifications to the code have to be made before the pull request is merged, consider using git commit --amend or git rebase -i to simply the commit as seen by the pull request. If you have used either of these commands, you will need to use git push -f when pushing the local commit up to your github repo.

Updating your repo

When you make your fork of casacore onto your github repo it is current. However, the codebase can change and require you to update your fork.

git fetch GithubCasacore

Will update your local repo. Most likely the only relevant changes will be on the "master" branch. After the repo is updated, you need to get the changes into your working code tree. Make sure you're on the desired branch ( see above) by issuing

git checkout YourBranchNameHere

Now rebase the current work so that it is based on the HEAD of the casacore master:

git rebase GithubCasacore/master

Useful git tips:

Display working copy status and current branch in shell prompt

Source this file in shell profile: https://github.com/git/git/blob/master/contrib/completion/git-prompt.sh and add __git_ps1 " (%s)" to the PS1 shell variable. E.g:

GIT_PS1_SHOWDIRTYSTATE=true
PS1='\w$(__git_ps1 " (%s)")$ '

This will show the current branch and its state in the prompt. * means unstaged changes. + are staged changes. E.g.

~/casacore (preadwrite *) $ # on branch preadwrite with unstaged changes which can be viewed with git diff or git status

Tab completions

For tab completion source https://github.com/git/git/blob/master/contrib/completion/git-completion.bash in the shell startup script. Variants for other shells than bash are also available. Then you can tab complete all git commands their options and most notably also branches and tags.

Track main repo with master branch

Usually in git workflows one never modifies the master branch, all changes are done on feature branches. This means that as described in this page the master branch which is cloned from the fork will often be outdated and not match the master branch of the main repository. Thus is it can be useful to switch the master branch to track the main repository instead of the fork:

git checkout HEAD^ # go to a detached head so we can delete master

git branch -d master

git checkout -b master --track GithubCasacore/master

Now git pull on master will update from the main repository and can be used for comparisons and rebases against feature branches. Which branch is tracking what can be displayed with git branch -vv

For this it is recommended that GithubCasacore is a read-only remote to avoid accidental pushes. If not take care to always specify exactly what you want to push where (git push <remote> <branch>). Disabling the default behaviour of git push to push the current branch to the tracking remote also helps avoiding mistakes ( git config --global push.default nothing )

Useful configuration

The global git configuration lies in $HOME/.gitconfig by default, it can be overridden by local repository configuration in .git/config. see man git config for details. It can be used to create aliases for commonly used commands, e.g.:

[alias]
ci = commit
co = checkout
b = branch
st = status
ust = status --untracked-files=no
flog = log -p --stat --color
nlog = log -p --name-status --color

llog = log ORIG_HEAD..
cdiff = diff --cached
cos = "!f() { git checkout $1 && git submodule update --recursive; }; f"
[color]
diff = auto
status = auto
branch = auto

Most useful commands and their options

See their very good manpage for more details

git add -p: interactive staging of changes

git log -p --color: Show log and the associated patches, no pipeing to less necessary

git log --name-status: Show log and the changed files

git diff revision1..revision2: show difference between two revisions/branches/working copy --cached shows staged differences

git clean: remove untracked files from working copy, -xfd removes all ignored and untracked files

git mergetool: When a merge needs manual resolving this will open a three-way merge editor

git gui blame: Gui variant of git blame -C, allows easy navigation of history

git bisect: binary search for commit range for the commit that causes problems.

git stash: temporarily stash away unstaged and staged changes

git rebase -i: merge/split/reword commits, useful to cleanup feature branches, should not be used on the master branch

Special reference variables

Similar to svn, git also has symbolic names for certain revisions/references that can be used by all commands (e.g. diff, log, checkout, ...)

HEAD: latest revision of the current branch, (~svn's CUR)

HEAD^: first parent of the latest revision, (~svn's PREV)

HEAD~3: third parent of the latest revision, depth-first

ORIG_HEAD: previous state of HEAD, so e.g. after a pull/merge git log ORIG_HEAD..HEAD shows the log for all merged revisions

FETCH_HEAD: HEAD of the last fetched branch, via git fetch (note git pull is a git fetch followed by a git merge FETCH_HEAD)

Letting SSH-Agent Manage Keys

By default, you will have to manage your keys enter the passphrase for your ssh key each time you do a git command. You can avoid this by using ssh-agent. If you put the code below in the directory where you keep your scripts, you can simply source it from your .bashrc file. Once the agent is started at some point in the future you can do an ssh-add command. This will as for the passphrase for key(s) you want the agent to unlock. Once a key is added to the agent, you can issue commands on github without having to enter a passphrase.
#!/bin/bash

# Script that starts a new ssh-agent process if it already exits.  It then
# sources the output of the agent start so the appropriate environmental
# variables are set.  User needs to do ssh-add to add IDs to the agent.


sshAgentPids=`ps -ef -U $USER | grep [^/]ssh-agent | grep -v grep | awk '{print $2}' | xargs`

agentCommandFile=~/.sshAgentCmds

if [ "$sshAgentPids" = "" ]; then

   # No existing agent belonging to us, so remove old file and start a new agent.
   
    if [ -e $agentCommandFile ]; then rm $agentCommandFile; fi

    ssh-agent -s >& $agentCommandFile

fi

source $agentCommandFile

Glossary

  • clone —create a local copy of a repo
  • fetch —retrieval of changes from one repo into another; this does not in itself change the files in the working directories
  • fork —similar to a clone but occurs between two repos on github; thus it's a github rather than a git term.
  • merge —the merge of files from a repo and branch into a working directory
  • master —the name given the main branch of a repo. Equivalent to CASA's SVN "trunk" branch
  • !MyGithubRepo —the name conventionally given to the repo from which another repo was cloned.
  • pull request —a request to a repo's owners to merge a set of changes into their repo
  • push —insertion of changes from one repo to another; privileges are required to perform this operation
  • repo —short for a git repository; a repo can be located on computers and/or web sites.
  • stage —a modified file is said to be staged if it has been added to the set of files that will be committed; git add ... is used to stage files.
  • unstage —removing a stage file from the set of files to be committed; use git reset theFile.
  • !GithubCasacore —name conventionally given to the repo that was the source of an "!MyGithubRepo" branch"
  • working directories —the files in the git file tree that you actually edit. This is analogous to your SVN working directories.

Revision Notes

1 September 2015

In an earlier revision I referred to two github repos as "origin" and "upstream". I got the idea from some github examples, but they don't seem to be particularly helpful in understanding the roles of the two repos. In this revision I have renamed them and "MyGithubRepo" (for the modifiers personal github repo) and "GithubCasacore" (the place where the main Casacore repo lives). If you set up the names before you can change them by doing git rename (e.g., git rename +upstream GithubCasacore =).

-- JimJacobs - 2015-06-19
Topic revision: r21 - 2017-02-08, PamFord
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback