Version control with Git

Venkatesh Choppella

2020-07-17 Fri 16:00

Created: 2020-08-07 Fri 12:59

\( %% Your math definitions here % \newcommand{\alphaequiv}{{\underset{\raise 0.7em\alpha}{=}}} \newcommand{\yields}{\Rightarrow} \newcommand{\derives}{\overset{*}{\yields}} \newcommand{\alphaequiv}{=_{\alpha}} \newcommand{\tto}[2]{{\overset{#1}{\underset{#2}{\longrightarrow}}}} \newcommand{\transitsto}[2]{{\overset{#1}{\underset{#2}{\longrightarrow}}}} \newcommand{\xtransitsto}[2]{{\underset{#2}{\xrightarrow{#1}}}} \newcommand{\xtransitsfrom}[2]{{\underset{#2}{\xleftarrow{#1}}}} \newcommand{\xto}[2]{{\xtransitsto{#1}{#2}}} \newcommand{\xfrom}[2]{{\xtransitsfrom{#1}{#2}}} \newcommand{\xreaches}[2]{{\underset{#2}{\xtwoheadrightarrow{#1}}}} \newcommand{\reaches}[2]{{\underset{#2}{\xtwoheadrightarrow{#1}}}} %\newcommand{\reaches}[2]{{\overset{#1}{\underset{#2}{\twoheadrightarrow}}}} %\newcommand{\goesto}[2]{\transitsto{#1}{#2}} %\newcommand{\betareducesto}{{\underset{\beta}{\rightarrow}}} \newcommand{\betareducesto}{\rightarrow_{\beta}} %\newcommand{\etareducesto}{{\underset{\eta}{\rightarrow}}} \newcommand{\etareducesto}{\rightarrow_{\eta}} %\newcommand{\betaetareducesto}{{\underset{\beta\ \eta}{\rightarrow}}} \newcommand{\betaetareducesto}{\rightarrow_{\beta\eta}} \newcommand{\preducesto}{\rhd} \newcommand{\psimplifiesto}{\stackrel{\scriptstyle{*}}{\rhd}} \newcommand{\lreducesto}{\rightsquigarrow} \newcommand{\lsimplifiesto}{\stackrel{\scriptstyle{*}}{\lreducesto}} \newcommand{\rewritesto}{\hookrightarrow} \newcommand{\goesto}[1]{\stackrel{#1}{\rightarrow}} \newcommand{\xgoesto}[1]{\xrightarrow{#1}} \newcommand{\reducesto}{\stackrel{}{\rightarrow}} \newcommand{\simplifiesto}{\stackrel{\scriptstyle{*}}{\rightarrow}} \newcommand{\connected}[1]{\stackrel{#1}{\leftrightarrow}} \newcommand{\joins}{\downarrow} \newcommand{\evaluatesto}{\Longrightarrow} %\newcommand{\lit}[1]{\hbox{\sf{#1}}} \newcommand{\lit}[1]{{\sf{#1}}} \newcommand{\true}{\lit{true}} \newcommand{\false}{\lit{false}} \def\Z{\mbox{${\mathbb Z}$}} \def\N{\mbox{${\mathbb N}$}} \def\P{\mbox{${\mathbb P}$}} \def\R{\mbox{${\mathbb R}$}} \newcommand{\Rp}{{\mathbb{R}}^+} \def\Bool{\mbox{${\mathbb B}$}} \def\sA{\mbox{${\cal A}$}} \def\sB{\mbox{${\cal B}$}} \def\sC{\mbox{${\cal C}$}} \def\sD{\mbox{${\cal D}$}} \def\sF{\mbox{${\cal F}$}} \def\sG{\mbox{${\cal G}$}} \def\sL{\mbox{${\cal L}$}} \def\sP{\mbox{${\cal P}$}} \def\sM{\mbox{${\cal M}$}} \def\sN{\mbox{${\cal N}$}} \def\sR{\mbox{${\cal R}$}} \def\sS{\mbox{${\cal S}$}} \def\sO{\mbox{${\cal O}$}} \def\sT{\mbox{${\cal T}$}} \def\sU{\mbox{${\cal U}$}} \def\th{\mbox{$\widetilde{h}$}} \def\tg{\mbox{$\widetilde{g}$}} \def\tP{\mbox{$\widetilde{P}$}} \def\norm{\mbox{$\parallel$}} \def\osum{${{\bigcirc}}\!\!\!\!{\rm s}~$} \def\pf{\noindent {\bf Proof}~~} \def\exec{\mathit{exec}} \def\Act{\mathit{A\!ct}} \def\Traces{\mathit{Traces}} \def\Spec{\mathit{Spec}} \def\uns{\mathit{unless}} \def\ens{\mathit{ensures}} \def\lto{\mathit{leads\!\!-\!\!to}} \def\a{\alpha} \def\b{\beta} \def\c{\gamma} \def\d{\delta} \def\sP{\mbox{${\cal P}$}} \def\sM{\mbox{${\cal M}$}} \def\sA{\mbox{${\cal A}$}} \def\sB{\mbox{${\cal B}$}} \def\sC{\mbox{${\cal C}$}} \def\sI{\mbox{${\cal I}$}} \def\sS{\mbox{${\cal S}$}} \def\sD{\mbox{${\cal D}$}} \def\sF{\mbox{${\cal F}$}} \def\sG{\mbox{${\cal G}$}} \def\sR{\mbox{${\cal R}$}} \def\tg{\mbox{$\widetilde{g}$}} \def\ta{\mbox{$\widetilde{a}$}} \def\tb{\mbox{$\widetilde{b}$}} \def\tc{\mbox{$\widetilde{c}$}} \def\tx{\mbox{$\widetilde{x}$}} \def\ty{\mbox{$\widetilde{y}$}} \def\tz{\mbox{$\widetilde{z}$}} \def\tI{\mbox{$\widetilde{I}$}} \def\norm{\mbox{$\parallel$}} \def\sL{\mbox{${\cal L}$}} \def\sM{\mbox{${\cal M}$}} \def\sN{\mbox{${\cal N}$}} \def\th{\mbox{$\widetilde{h}$}} \def\tg{\mbox{$\widetilde{g}$}} \def\tP{\mbox{$\widetilde{P}$}} \def\norm{\mbox{$\parallel$}} \def\to{\rightarrow} \def\ov{\overline} \def\gets{\leftarrow} \def\too{\longrightarrow} \def\To{\Rightarrow} \def\points{\mapsto} %\def\yields{\mapsto^{*}} \def\un{\underline} \def\vep{$\varepsilon$} \def\ep{$\epsilon$} \def\tri{$\bigtriangleup$} \def\Fi{$F^{\infty}$} \def\Di{\Delta^{\infty}} \def\ebox\Box \def\emp{\emptyset} \def\leadsto{\rightharpoondown^{*}} \newcommand{\benum}{\begin{enumerate}} \newcommand{\eenum}{\end{enumerate}} \newcommand{\bdes}{\begin{description}} \newcommand{\edes}{\end{description}} \newcommand{\bt}{\begin{theorem}} \newcommand{\et}{\end{theorem}} \newcommand{\bl}{\begin{lemma}} \newcommand{\el}{\end{lemma}} %\newcommand{\bp}{\begin{prop}} %\newcommand{\ep}{\end{prop}} \newcommand{\bd}{\begin{defn}} \newcommand{\ed}{\end{defn}} \newcommand{\brem}{\begin{remark}} \newcommand{\erem}{\end{remark}} \newcommand{\bxr}{\begin{exercise}} \newcommand{\exr}{\end{exercise}} \newcommand{\bxm}{\begin{example}} \newcommand{\exm}{\end{example}} \newcommand{\beqa}{\begin{eqnarray*}} \newcommand{\eeqa}{\end{eqnarray*}} \newcommand{\bc}{\begin{center}} \newcommand{\ec}{\end{center}} \newcommand{\bcent}{\begin{center}} \newcommand{\ecent}{\end{center}} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\bcor}{\begin{corollary}} \newcommand{\ecor}{\end{corollary}} \newcommand{\bds}{\begin{defns}} \newcommand{\eds}{\end{defns}} \newcommand{\brems}{\begin{remarks}} \newcommand{\erems}{\end{remarks}} \newcommand{\bxrs}{\begin{exercises}} \newcommand{\exrs}{\end{exercises}} \newcommand{\bxms}{\begin{examples}} \newcommand{\exms}{\end{examples}} \newcommand{\bfig}{\begin{figure}} \newcommand{\efig}{\end{figure}} \newcommand{\set}[1]{\{#1\}} \newcommand{\pair}[1]{\langle #1\rangle} \newcommand{\tuple}[1]{\langle #1\rangle} \newcommand{\union}{\cup} \newcommand{\Union}{\bigcup} \newcommand{\intersection}{\cap} \newcommand{\Intersection}{\bigcap} \newcommand{\B}{\textbf{B}} %\newcommand{\be}[2]{\begin{equation} \label{#1} \tag{#2} \end{equation}} \newcommand{\abs}[1]{{\lvert}#1{\rvert}} \newcommand{\id}[1]{\mathit{#1}} \newcommand{\pfun}{\rightharpoonup} %\newcommand{\ra}[1]{\kern-1.5ex\xrightarrow{\ \ #1\ \ }\phantom{}\kern-1.5ex} %\newcommand{\ras}[1]{\kern-1.5ex\xrightarrow{\ \ \smash{#1}\ \ }\phantom{}\kern-1.5ex} \newcommand{\da}[1]{\bigg\downarrow\raise.5ex\rlap{\scriptstyle#1}} \newcommand{\ua}[1]{\bigg\uparrow\raise.5ex\rlap{\scriptstyle#1}} % \newcommand{\lift}[1]{#1_{\bot}} \newcommand{\signal}[1]{\tilde{#1}} \newcommand{\ida}{\stackrel{{\sf def}}{=}} \newcommand{\deduce}[1]{\sststile{#1}{}} % %% \theoremstyle{plain}%default %% \newtheorem{thm}{Theorem}[section] %% \newtheorem{lem}[thm]{Lemma} %% \newtheorem{cor}[thm]{Corollary} %% % %% \theoremstyle{definition} %% \newtheorem{defn}[thm]{Definition} %% % %% \theoremstyle{remark} %% \newtheorem{remark}[thm]{Remark} %% \newtheorem{exercise}[thm]{Exercise} %% \newtheorem{prop}[thm]{Proposition} \newcommand{\less}[1]{#1_{<}} \newcommand{\pfn}{\rightharpoonup} \newcommand{\stkout}[1]{\ifmmode\text{\sout{\ensuremath{#1}}}\else\sout{#1}\fi} % \DeclareMathSymbol{\shortminus}{\mathbin}{AMSa}{"39} \newcommand{\mbf}[1]{\mathbf{#1}} \)

Outline

List of Exercises

Setting up git and creating a project

Basic project lifecycle

Tagging, Branching and Merging

Motivation

The nature of software development

software artefacts are documents
programs, test cases, documents, makefiles, etc.
software documents are in plain text
S/W developers rarely write their programs in Excel or Word or pdf.
a typical software project has many files
Large projects have 100's of files, many have 1000's.
and multiple people
many users, distributed but working on the same code base.
and multiple versions
production version, bug-fix version, new-feature version, etc.

Example

versions of file A.c
A1.c, A2.c, A3.c
versions of file B.c
B1.c, B2.c, B3.c
only some combinations make sense
  • v1: A1.c, B1.c
  • v2: A1.c, B2.c
  • v3: A2.c, B3.c
tracking gets worse with more files
tracking working combinations is too error-prone to do manually. Combinatorial explosion with increasing number of files.
gets impossible with multiple users
Hard to do by hand when users are working locally and asynchronously. The problem is much more acute when the documents are programs, because programs need to work correctly and are brittle.

What is version control?

creation and management of project versions
Each version corresponds to a checkpoint or milestone of your project.
independent evolution of multiple versions
Each version evolves along a branch.
automatic merging of different versions
Combine two versions intelligently and flag conflicts if any.
access to all past versions
Any past version may be accessed.
sharing of versions by multiple users
protocols to share versions between multiple users.

Are git and gitlab for version control difficult to use?

quite simple for personal use
If you plan to use it for just yourself, git can be very easy to use.
could get tricky with multiple users
with multiple users, you need to manage permissions, merges, merge-requests, and occasional conflicts.
demands expertise when managing a large project
Will need you to understand issue tracking, continuous integration, typical in large software or enterprise projects.

Do I need git version control?

You will find git version control invaluable when

programming and writing software
Programming involves multiple files, source code, build files, configuration files, test files. Version control is designed for programmers.
dealing with lots of text files
e.g., s/w development, scientific paper writing in \(\LaTeX\)
working in a team
e.g., writing a joint paper or executing a large software project.
driving a complex workflow
Git systems like Gitlab and Github work as excellent backups systems.
backing up
gitlab and github are at the least great backup systems storing all past versions of your project.

You might not need version control

You will not need version control if

You have shared documents in google docs or Office 365
Google docs (and similar services) work well for collaboratively editing a small set of documents.

What is git?

Commands
A set of commands manage the versioning of a project.
Architecture
A collection of entities and artefacts and their relationship that together achieve versioning.
Workflows
A collection of patterns normally employed by users collaborating on a project.

What are these other names?

Gitlab
An online platform and repository for projects versioned using git. Manages your projects, groups, permissions, etc. We will be using Gitlab for this tutorial.
Github
Like Gitlab, but hosts many more open source projects. We will not be using Github for this tutorial.
Github classroom
A platform for managing homeworks in a class. We will not be using it for this tutorial. With enough simple scripts, gitlab is sufficient for managing homeworks.

Getting Started with Git

Install a git client

OS Command Line client Other clients
  Installation  
Linux apt-get install git-all magit (for Emacs users)
    Github Desktop fork for Linux
Windows GitForWindows Github Desktop
    TortoiseGit
MacOS git --version Github Desktop
Other docs Attlassian docs for git installation Using Github desktop with Gitlab
  Installation docs from the Git manual  
  List of GUI clients  

Exercise: create an account on Gitlab

Gitlab
Go to https://gitlab.com.

Exercise: install git clients

Read installation options
See git installation page.
Install command line tools
This depends on your machine's OS.
Install Github Desktop
This is an graphical alternative to a command line interface to git.

Learn git

Atlassian git tutorials
Atlassian is a company offering git hosting services. Start from the tutorial what is version control?
Git tutorial from Pro Git
A short tutorial on git. You'll find it more handy as a quick reference.
Git Cheatsheet (pdf)
A 2-page cheatsheet with a summary of git commands and a nice diagram. Here is an interactive cheat sheet.
Pro Git book
The standard online reference book. Comprehensive. Complements the git reference manual.

Git: Basic Concepts

Project and Versions

A project is a set of files
A project is a logical collection of files existing on a repository (like gitlab.com or your own system).
A project evolves
Files are created, modified or deleted during the lifetime of a project.
A Version is a snapshot of your project
The snapshot snapshot can taken at any time during the evolution of your project.
How is versioning done?
Using a set of git commands described presently.

Lifecycle of a git project

Creation
A project is created
Cloning
A project is cloned into a directory on your machine, called the project workspace.
Editing
Files in your workspace are created and deleted. While they are there, they can also be edited multiple times.
Adding (Staging)
The current state of the specified file is captured in preparation for a new version.
Committing
The state of all your added files during their most recent add is captured as a single version and assigned a number.

Git project lifecycle diagram

git-wf-cropped-w1000.png

Exercise: Create a project on Gitlab

1. Login to gitlab
go to https://gitlab.com username/passwd
2. Choose ``New Project''
click the green colored "New Project" button on the top right.
3. Name your project
choose a name for your project
4. Add project description
write a short blurb about your project.
5. Pick visibility level
choose from `Private' or `Public'.
6. Initialize with README
click this option to have a README.md file automatically added.
7. Create project
click the green button at the bottom to create the project.
Result
Your project is now created on gitlab.

Cloning a project

What is cloning?
Cloning is the process of getting a copy of your project on the gitlab server to your local machine.
Where does the clone reside on my machine?
The clone will be directory on your machine.
What is command to clone
git clone <url-of-project>

Exercise: Clone your project

1. Navigate to your project
You can search for your project on your gitlab landing page after you login.
2a. Locate project url
If you're at your project, it's the url on the OR
2b. or use the Clone button
click on the blue button on the right and choose HTTPS and copy the url.
3. Over to your terminal

Type the command

git clone <project url>

Result
Your project is now cloned as a directory on your machine. You may edit it by creating and modifying new files.

Exercise: Editing and status

Assumption

(a) Your current directory is your workspace.

(b) You have created and edited some files in the directory: a.txt and tmp.txt.

Check status

run git status

This lists the files that are untracked or modified but yet to be staged (added) for a commit.

Exercise: Adding (staging) files

Add (stage) files

After some editing, stage the file a.txt:

git add a.txt

Check status
check the status again using git status. The status indicates changes to be committed.

Exercise: Committing and commit logs

Commit command
git commit -m "<nice commit message>"
git log

The see a list of all your commits [additionally, in pretty printed short format].

git log [--pretty=short]

git status
A git status command after the commit should tell you everything is `up-to-date'.

Exercise: Pushing commits

Commit version in workspace
a git commit leaves the committed versions in the workspace.
Push `commit' to server
git push
Result
The gitlab server now has the new version of your project.

Checksums and Tagging

Commits have clumsy names
These are system generated names (aka checksums). Try git log --pretty=oneline
Tags identify milestones
Tagging is a way of generating human names for your commit.
Tagging
git tag <tag-name>
Annotated tagging
git tag -a <tag> -m "<msg>"
Listing tags
git tag
Examining a particular tag
git show <tag>
Sharing tags
git push origin --tags or git push origin <tag>
Tagging an old commit
git tag -a <checksum-or-part-of-it>

Exercise: Tag the initial commit

Find all commits
git log --pretty=oneline
Locate the initial one
The log message is probably `Initial Commit'
Tag the initial commit
git tag -a v-init-0.0 <checksum-prefix>

Git Branching and Merging

What is a branch?

Branch = evolutionary path
The path is dotted with commits.
Commits happen on branches
A commit is always done within the context of a branch.
Master branch
This is the default branch on which all commits happen.
Multiple branches
A project may evolve along multiple branches.
Motivation
You have a working version of your code on branch master. But you suddenly want to explore a separate idea. You create a new branch, say experimental. Later on you can merge the two branches.

Exercise: Commit to a different branch

List existing branches
git branch
Create a new branch
git branch <new-branch>
Switch to the new branch
git checkout <newly-created-branch>
Start working on this branch
Create, edit and commit on this branch.
Switch back to master
If you created a file in the new branch, you won't see it on master.

Exercise: Switch to a tag on a new branch

List tags
git tags
Initial commit
Its tag is v-init-0.0.
Switch to this tag (on a new branch)
git checkout tags/v-init-0.0 -b <new-branch-name>

Merging

What is merging
It is the process of absorbing the changes of another branch into the current branch.
Motivation
Once your experiment is successful, you want to incorporate it into production, so you need to merge the experiment branch to production.
Diffuse changes across branches
Merging allows for changes to diffuse into other branches in a controlled way.

Exercise: Merge a branch with another

Switch to master branch
Use git checkout master if you're on a different branch.
Note files in master
Examine the files in master.
Merge the master with experimental branch
git merge exp (Here exp is the experimental branch.)
Note files in master
The changes introduced by exp are now incorporated into master.

Conflicts

What is a conflict?
A situation when two incompatible changes have been made to the same file. Normally, git is smart enough to merge changes to the same file in two different branches, but occasionally it is stumped.
Locating a conflict

A conflict in a file looks like this:

<<<<<<<< HEAD

content in your current branch

=================

content in the other branch

>>>>>>>> other_branch_checksum

Resolving a conflict

Edit the offending file
Choose (which parts of) which versions should stay or go. You can also completely edit the file, adding whatever you like. Just make sure the conflict markers are gone.
Home (our) branch overrides
git checkout --ours <conflicted-file>
Incoming (their) branch overrides
git checkout --theirs <conflict-file>

Exercise: introduce and resolve a conflict

Switch to master
Create a new file conflict.txt with a line "This is the master branch." Add and commit.
Switch to exp branch
Create a new file conflict.txt with a line "This is exp.". Add and commit.
Merge with master
git merge master
Identify the conflict
Examine conflict.txt. Fix conflict (in any of the different ways) and commit.

Git for Collaboration

Remotes

git project
A git project is identified by a url (or a directory if its in your own machine).
git is peer-to-peer
Git allows branches between any two projects to be shared.
what is a remote?
A remote in a project is a handle to any other git project. That project may exist anywhere: on the internet, your local network, or even on your machine. A project may be connected to any number of remotes, or act as a remote for any number of other projects.
How are remotes used?
Once a remote identifies a project, branches from that remote may be fetched into your local git project. They may then be merged with your local branches.

Listing Remotes

git remote -v lists all the remotes of a project

$ git remote -v
origin	https://gitlab.com/vxcg/teach/git.git (fetch)
origin	https://gitlab.com/vxcg/teach/git.git (push)

The type of a remote
A remote may be a fetch or a push. If a remote is fetch, branches from that remote may be fetched. If it is push, then the branches from a local repository may be merged with branches on the remote. A remote can be both push and fetch.
Url of a remote
Each remote is identified by a url. In the above example, the remote's url is https://gitlab.com/vxcg/teach/git.git
Local name of a remote
A remote also has a local name which makes it easy to refer to it. In the above example, the remote is called origin.

Exercise: Setting up a remote

Fetching from a remote

Pulling from a remote

Controlling access to your project on gitlab

Creating a merge request

Approving a merge request on gitlab

Git Best Practices

Initialize and protect project

Keep repositories private
You can always open it up to others later.
Add a README file
If you plan to share your repository with others, add a README.md or README.txt file, describing the project.

Commit frequency and etiquette

Do frequent commits!
Commit often, so your versions are close enough and you can get back into a past version mostly closely resembling what you want.
Commit only `consistent' states
In a programming project, make, only commit code when your programs compile, and perhaps pass some of the test cases. When working in a a team, you'll quickly lose friends if you commit code that doesn't compile.
Write good commit messages
Good messages let you and others know what changed and why.

Liberally use Tags and Branches

Tag often!
Tagging helps you identify important checkpoints that you can get back to.
Create branches
Experiment fearlessly in new branches and commit and tag them them as well. You or your team may want to work on multiple features of your project at the same time. Branches are indispensable for that.

Thank You

Advanced topics
Many advanced topics have been skipped.
More sessions
If there is interest, we could do more sessions.
Meanwhile
Complete the exercises, start using git and refer to the vast online documentation, tutorials and Stack Overflow discussions!
2020-08-07 Fri 12:59