git bisect - debugging regressions

12 October 2016

The regression nightmare

The QA Team comes with our worst nightmare: a regression bug. Since the last release, the development team had made more than 100 commits, adding new features to the product.

Developer fighting a bug

How to find the regression bug in a faster manner?

You may be questioning yourself. What if we had a tool that helps us to find regressions in Git?

The good news: there is! It’s called git bisect, and we are going to take a brief look on it!

Binary-search theory (heavy text)

To understand what git bisect does, we first must understand what is Binary Search. Binary-search is a way to iterate a collection of sorted items to find a item that matches a defined criteria. It takes advantage of the sorted nature of the collection, allowing it to split the domains in half, check if the criteria is met, and then reapplying the binary search algorithm (classic Divide and Conquer application).

To demonstrate how effective it is applied to commit trees, let’s apply the algorithm to a small git tree:

Git Bisect, steps 1, 2 and 3

Step 1: we have our sandbox small tree to apply the binary search algorithm. We know that the bug was not present in Release 2.3 tag (e17074e), so we choose it to be our Good Commit. We also know that the regression bug is contained in HEAD commit (007f62e), so we choose this as our initial Bad Commit. We defined that the bug is contained in commit 46e285d, but in a real scenario we wouldn’t know where the regression bug was introduced.

Step 2: after choosing Good Commit and Bad Commit, we can start applying the binary search to find in which commit the regression bug was introduced. We take the range of commits between Bad Commit and Good Commit and select the median (midpoint) commit, in our case d2ab62b.

Step 3: now our working directory is at d2ab62b, we must check if the bug is present. After testing the working directory, we found out that d2ab62b doesn’t contain the regression bug, so this is also a Good Commit. All the commits after d2ab62 are also marked as Good Commit.

We finished the first iteration of the Binary-Search algorithm, now we have a smaller range of commits that we know the regression bug was introduced. We will reapply the Binary-Search to the smaller range of commits (from 007f6e2 to d2ab62b).

Git Bisect, steps 1, 2 and 3

Step 4: now we do exactly how we did in Step 2, we have a range of commits, we know our regression bug was introduced in this smaller range, so we select median commit, now 9d3d2b6.

Step 5: We test if 9d3d2b6 contains the regression bug, and we found out it contains! So, we know the regression was introduced in a commit before 9d3d2b6, we move forward and mark all the commits after 9d3d2b6 as Bad Commit and restart the Binary Search with the new commit range.

Step 6: now our range goes from 9d3d2b6 to d2ab62b commits, there’s only one commit left in the middle, we must check if the commit in the middle contains the regression bug, if it contains, that one is our target. Otherwise, 9d3d2b6 will be our target.

Playing with Git Bisect

We will use a automatic git tree generator, which generates a tree with one hidden “regression” bug. After generating our git tree, we will start the bisect and play with it.

Generating a sandbox to play Git Bisect

To generate a random Git Tree, with 100 commits 3 files and several appends in each commit, we will be using a Python3 tool called Bisect Sandbox.

You can fetch this script at Bisect Sandbox GitHub repository.

# Generate a Git Tree with 100 commits,
# using my Bisect Sandbox described above
> python3 play.py
# Several outputs from several commits :)

The generator give us 100 commits, the text !BUG! will be randomly appended and our goal is: to find in when the generator committed the text !BUG!.

Starting git bisect

Now our sand-box is ready, we can start the bisect process. We have two methods to start bisect:

Using a single line command:

> git bisect start bad good    # Start the bisecting process

Using a verbose three lines command sequence:

> git bisect start             # Start the bisecting process
> git bisect good GOOD_COMMIT  # Set the initial good commit
> git bisect bad BAD_COMMIT    # Set the initial bad commit

Choosing Good and Bad commits

The sandbox generator gave us 100 commits, we will use the range [HEAD .. HEAD-100] as initial Good Commit and Bad Commit:

> git bisect start          # Start the bisecting process
> git bisect good @~100     # Set the initial good commit
> git bisect bad @          # Set the initial bad commit
Bisecting: 49 revisions left to test after this (roughly 6 steps)
[a645b201713dd4b5b1f828f35ba23f67f40817f4] 259053644

We are using git rev parse shortcuts:
@ is HEAD commit,
@~100 is HEAD minus 100 commits

In the real world, you could use tags or commit hashes

Iterating git bisect candidates

We must have a way to verify if the bug exists, this can be made manually or even automatically (passing scripts to git bisect). We choose to do it manually, making easier to understand git bisect:

# grep
#  -i          -> ignore case
#  -c          -> count matches per file
#  -R          -> recursive, check all files
#  --exclude   -> ignore file
> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:1

In this case, we found that file 593135436 contains the criteria one time. To continue the git bisect, we mark the commit as good or bad:

> git bisect bad   # when the commit contains the regression bug
> git bisect good  # when the commit DOES NOT contains the regression bug
Bisecting: 24 revisions left to test after this (roughly 5 steps)
[007f6e2eb294d21849f8afcfa2901a1008fec9c4] 279015154
# Note: in the first iteration we had 49 revisions, now only 24.

After marking the commit as good or bad, git bisect will step forward:
* checking out the next candidate commit * reporting the result

When you found the criminal, the last candidate

After few iterations, git bisect will eliminate all the suspects and return the success report:

> git bisect bad
e7effd2bb7f6a0776659854b6282d39db6680d59 is the first bad commit
commit e7effd2bb7f6a0776659854b6282d39db6680d59
Author: fernando baroni <fernando@baroni.tech>
Date:   Mon Oct 10 16:26:19 2016 -0300

    random commit message

We can git show the given commit (e7effd2bb7f6a0776659854b6282d39db6680d59) and see what introduced the bug.

Restoring the workspace

After we found the cause of the regression, we have to reset our working directory to continue working:

> git bisect reset  

Now you know the criminal commit, you can save the world and continue adding new features to your product!

My entire session

Here is my entire bisect session:

> git bisect start
> git bisect bad @
> git bisect good @~100
Bisecting: 49 revisions left to test after this (roughly 6 steps)
[a645b201713dd4b5b1f828f35ba23f67f40817f4] 259053644

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:1

> git bisect bad
Bisecting: 24 revisions left to test after this (roughly 5 steps)
[007f6e2eb294d21849f8afcfa2901a1008fec9c4] 279015154

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:0

> git bisect good
Bisecting: 12 revisions left to test after this (roughly 4 steps)
[be9997c8ec6ad6150900a4cb335579f69d33c3ee] 201228274

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:0

> git bisect good
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[793844038dfec1cd6ae9dfbea457a3f78b8eac92] 744392540

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:0

> git bisect good
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[f7537d30dbc59127752fcd82b2842cd9b072cc7d] 996103973

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:0

> git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[d05d610c2356b0161fcb83d32fb726a5c8dab95e] 944908823

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:1

> git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[e7effd2bb7f6a0776659854b6282d39db6680d59] 280896075

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:1

> git bisect bad
e7effd2bb7f6a0776659854b6282d39db6680d59 is the first bad commit
commit e7effd2bb7f6a0776659854b6282d39db6680d59
Author: fernando baroni <fernando@baroni.tech>
Date:   Mon Oct 10 16:26:19 2016 -0300

    the criminal commit

Tip: custom good/bad terms

Don’t like using bad and good terms? git bisect allow us to set custom terms with --term-good=? and --term-bad=.

For example:

git bisect start --term-good=ohhyeahhh --term-bad=nooo GOOD_COMMIT BAD_COMMIT
git bisect ohhyeahhh # it works!
git bisect nooo      # it doesn't work =[
git bisect nooo
git bisect ohhyeahhh
git bisect reset
comments powered by Disqus