Git

Topics related to Git:

Getting started with Git

Git is a free, distributed version control system which allows programmers to keep track of code changes, via "snapshots" (commits), in its current state. Utilizing commits allows programmers to test, debug, and create new features collaboratively. All commits are kept in what is known as a "Git Repository" that can be hosted on your computer, private servers, or open source websites, such at Github.

Git also allows for users to create new "branches" of the code, which allows different versions of the code to live alongside each other. This enables scenarios where one branch contains the most recent stable version, a different branch contains a set of new features being developed, and a yet another branch contains a different set of features. Git makes the process of creating these branches, and then subsequently merging them back together, nearly painless.

Git has 3 different "areas" for your code:

  • Working directory: The area that you will be doing all of your work in (creating, editing, deleting, and organizing files)
  • Staging area: The area where you will list the changes that you have made to the working directory
  • Repository: Where Git permanently stores the changes you have made as different versions of the project

Git was originally created for managing the Linux kernel source. By making them easier, it encourages small commits, forking of projects and merging between forks, and having lots of short-lived branches.

The biggest change for people who are used to CVS or Subversion is that every checkout contains not only the source tree, but also the whole history of the project. Common operations like diffing of revisions, checking out older revisions, committing (to your local history), creating a branch, checking out a different branch, merging branches or patch files can all be done locally without having to communicate with a central server. Thus the biggest source of latency and unreliability is removed. Communicating with the "upstream" repository is only needed to get the latest changes, and to publish your local changes to other developers. This turns what was previously a technical constraint (whoever has the repository owns the project) into an organisational choice (your "upstream" is whomever you choose to sync with).

Browsing the history

Working with Remotes

Staging

It's worth noting that staging has little to do with 'files' themselves and everything to do with the changes within each given file. We stage files that contain changes, and git tracks the changes as commits (even when the changes in a commit are made across several files).

The distinction between files and commits may seem minor, but understanding this difference is fundamental to understanding essential functions like cherry-pick and diff. (See the frustration in comments regarding the complexity of an accepted answer that proposes cherry-pick as a file management tool.)

What's a good place for explaining concepts? Is it in remarks?

Key concepts:

A files is the more common metaphor of the two in information technology. Best practice dictates that a filename not change as its contents change (with a few recognized exceptions).

A commit is a metaphor that is unique to source code management. Commits are changes related to a specific effort, like a bug fix. Commits often involve several files. A single, minor bug fix may involve tweaks to templates and css in unique files. As the change is described, developed, documented, reviewed and deployed, the changes across the separate files can be annotated and handled as a single unit. The single unit in this case is the commit. Equally important, focusing just on the commit during a review allows the unchanged lines of code in the various affected files to be ignored safely.

Ignoring Files and Folders

Git Diff

Undoing

Merging

Submodules

Committing

Aliases

Rebasing

Please keep in mind that rebase effectively rewrites the repository history.

Rebasing commits that exists in the remote repository could rewrite repository nodes used by other developers as base node for their developments. Unless you really know what you are doing, it is a best practice to rebase before pushing your changes.

Configuration

Branching

Every git repository has one or more branches. A branch is a named reference to the HEAD of a sequence of commits.

A git repo has a current branch (indicated by a * in the list of branch names printed by the git branch command), Whenever you create a new commit with the git commit command, your new commit becomes the HEAD of the current branch, and the previous HEAD becomes the parent of the new commit.

A new branch will have the same HEAD as the branch from which it was created until something is committed to the new branch.

Rev-List

Squashing

What is squashing?

Squashing is the process of taking multiple commits and combining them into a single commit encapsulating all the changes from the initial commits.

Squashing and Remote Branches

Pay special attention when squashing commits on a branch that is tracking a remote branch; if you squash a commit that has already been pushed to a remote branch, the two branches will be diverged, and you will have to use git push -f to force those changes onto the remote branch. Be aware that this can cause issues for others tracking that remote branch, so caution should be used when force-pushing squashed commits onto public or shared repositories.

If the project is hosted on GitHub, you can enable "force push protection" on some branches, like master, by adding it to Settings - Branches - Protected Branches.

Cherry Picking

Recovering

Git Clean

Using a .gitattributes file

.mailmap file: Associating contributor and email aliases

Analyzing types of workflows

Using version control software like Git may be a little scary at first, but its intuitive design specializing with branching helps make a number of different types of workflows possible. Pick one that is right for your own development team.

Pulling

git pull runs git fetch with the given parameters and calls git merge to merge the retrieved branch heads into the current branch.

Hooks

Cloning Repositories

Stashing

Stashing allows us to have a clean working directory without losing any information. Then, it's possible to start working on something different and/or to switch branches.

Subtrees

Renaming

Pushing

Upstream & Downstream

In terms of source control, you're "downstream" when you copy (clone, checkout, etc) from a repository. Information flowed "downstream" to you.

When you make changes, you usually want to send them back "upstream" so they make it into that repository so that everyone pulling from the same source is working with all the same changes. This is mostly a social issue of how everyone can coordinate their work rather than a technical requirement of source control. You want to get your changes into the main project so you're not tracking divergent lines of development.

Sometimes you'll read about package or release managers (the people, not the tool) talking about submitting changes to "upstream". That usually means they had to adjust the original sources so they could create a package for their system. They don't want to keep making those changes, so if they send them "upstream" to the original source, they shouldn't have to deal with the same issue in the next release.

(Source)

Internals

git-tfs

Git-tfs is a third party tool to connect a Git repository to a Team Foundation Server (“TFS”) repository.

Most remote TFVS instances will request your credentials on every interaction and installing Git-Credential-Manager-for-Windows may not help. It can be overcome by adding your name and password to your .git/config

[tfs-remote "default"]
  url = http://tfs.mycompany.co.uk:8080/tfs/DefaultCollection/
  repository = $/My.Project.Name/
  username = me.name
  password = My733TPwd

Empty directories in Git

git-svn

Cloning really big SVN repositories

If you SVN repo history is really really big this operation could take hours, as git-svn needs to rebuild the complete history of the SVN repo. Fortunately you only need to clone the SVN repo once; as with any other git repository you can just copy the repo folder to other collaborators. Copying the folder to multiple computers will be quicker that just cloning big SVN repos from scratch.

About commits and SHA1

Your local git commits will be rewritten when using the command git svn dcommit. This command will add a text to the git commit's message referencing the SVN revision created in the SVN server, which is very useful. However, adding a new text requires modifying an existing commit's message which can't actually be done: git commits are inmutable. The solution is create a new commit with the same contents and the new message, but it is technically a new commit anyway (i.e. the git commit's SHA1 will change)

As git commits created for git-svn are local, the SHA1 ids for git commits are different between each git repository! This means that you can't use a SHA1 to reference a commit from another person because the same commit will have a diferent SHA1 in each local git repository. You need to rely in svn revision number appended to the commit message when you push to the SVN server if you want to reference a commit between different copies of the repository.

You can use the SHA1 for local operations though (show/diff an specific commit, cherry-picks and resets, etc)

Troubleshooting

git svn rebase command issues a checksum mismatch error

The command git svn rebase throws an error similar to this:

  Checksum mismatch: <path_to_file> <some_kind_of_sha1>
  expected: <checksum_number_1>
    got: <checksum_number_2>

The solution to this problem is reset svn to the revision when the troubled file got modified for the last time, and do a git svn fetch so the SVN history is restored. The commands to perform the SVN reset are:

  • git log -1 -- <path_to_file> (copy the SVN revision number that appear in the commit message)
  • git svn reset <revision_number>
  • git svn fetch

You should be able to push/pull data from SVN again

File was not found in commit When you try to fetch or pull from SVN you get an error similar to this

<file_path> was not found in commit <hash>

This means that a revision in SVN is trying to modify a file that for some reason doesn't exists in your local copy. The best way to get rid of this error is force a fetch ignoring the path of that file and it will updated to its status in the latest SVN revision:

  • git svn fetch --ignore-paths <file_path>

Archive

Rewriting history with filter-branch

Migrating to Git

Show

Shows various Git objects.

  • For commits, shows the commit message and diff
  • For tags, shows the tag message and referenced object

Resolving merge conflicts

Bundles

The key to making this work is to begin by cloning a bundle that starts from the beginning of the repo history:

 git bundle create initial.bundle master
 git tag -f some_previous_tag master  # so the whole repo does not have to go each time

getting that initial bundle to the remote machine; and

 git clone -b master initial.bundle remote_repo_name

Display commit history graphically with Gitk

Bisecting/Finding faulty commits

Blaming

The git blame command is very useful when it comes to know who has made changes to a file on a per line base.

Git revisions syntax

Many Git commands take revision parameters as arguments. Depending on the command, they denote a specific commit or, for commands which walk the revision graph (such as git-log(1)), all commits which can be reached from that commit. They are usually denoted as <commit>, or <rev>, or <revision> in the syntax description.

The reference documentation for Git revisions syntax is the gitrevisions(7) manpage.

Still missing from this page:

  • [_] Output from git describe, e.g. v1.7.4.2-679-g3bee7fb
  • [_] @ alone as a shortcut for HEAD
  • [_] @{-<n>}, e.g. @{-1}, and - meaning @{-1}
  • [_] <branchname>@{push}
  • [_] <rev>^@, for all parents of <rev>

Needs separate documentation:

  • [_] Referring to blobs and trees in the repository and in the index: <rev>:<path> and :<n>:<path> syntax
  • [_] Revision ranges like A..B, A...B, B ^A, A^1, and revision limiting like -<n>, --since

Worktrees

Git Remote

Git Large File Storage (LFS)

Git Large File Storage (LFS) aims to avoid a limitation of the Git version control system, that it performs poorly when versioning large files, especially binaries. LFS solves this problem by storing the contents of such files on an external server, then instead committing just a text pointer to the path of those assets in the git object database.

Common file types that are stored via LFS tend to be compiled source; graphical assets, like PSDs and JPEGs; or 3D assets. This way resources used by projects can be managed in the same repository, rather than having to maintain a separate management system externally.

LFS was originally developed by GitHub (https://github.com/blog/1986-announcing-git-large-file-storage-lfs); however, Atlasssian had been working on a similar project at nearly the exact same time, called git-lob. Soon these efforts were merged to avoid fragmentation in the industry.

Git Patch

Git statistics

git send-email

Git GUI Clients

Reflog - Restoring commits not shown in git log

Git's reflog records the position of HEAD (the ref for the current state of the repository) every time that it is changed. Generally, every operation that might be destructive involves moving the HEAD pointer (since if anything is changed, including in the past, the tip commit's hash will change), so it is always possible to revert back to an older state, before a dangerous operation, by finding the right line in the reflog.

Objects that are not referenced by any ref are usually garbage collected in ~30 days, however, so the reflog may not always be able to help.

TortoiseGit

External merge and difftools

Update Object Name in Reference

Git Branch Name on Bash Ubuntu

Git Client-Side Hooks

Git rerere

Change git repository name

Git Tagging

Tidying up your local and remote repository

diff-tree