git bisect - debugging regressions

The regression nightmare

The QA Team comes with our worst nightmare: a regression bug. Since the last release, the development team had made more than 100 commits, adding new features to the product.

Developer fighting a bug

How to find the regression bug in a faster manner?

You may be questioning yourself. What if we had a tool that helps us to find regressions in Git?

The good news: there is! It’s called git bisect, and we are going to take a brief look on it!

Binary-search demonstration

To understand what git bisect does, we first must understand what is Binary Search. Binary-search is a way to iterate a collection of sorted items to find a item that matches a defined criteria. It takes advantage of the sorted nature of the collection, allowing it to split the domains in half, check if the criteria is met, and then reapplying the binary search algorithm (classic Divide and Conquer application).

To demonstrate how effective it is applied to commit trees, let’s apply the algorithm to a small git tree:

Git Bisect applied

Playing with Git Bisect

We will use a automatic git tree generator, which generates a tree with one hidden “regression” bug. After generating our git tree, we will start the bisect and play with it.

Generating a sandbox to play Git Bisect

To generate a random Git Tree, with 100 commits 3 files and several appends in each commit, we will be using a Python3 tool called Bisect Sandbox.

You can fetch this script at Bisect Sandbox GitHub repository.

# Generate a Git Tree with 100 commits,
# using my Bisect Sandbox described above
> python3 play.py
# Several outputs from several commits :)

The generator give us 100 commits, the text !BUG! will be randomly appended and our goal is: to find in when the generator committed the text !BUG!.

Starting git bisect

Now our sand-box is ready, we can start the bisect process. We have two methods to start bisect:

Using a single line command:

> git bisect start bad good    # Start the bisecting process

Using a verbose three lines command sequence:

> git bisect start             # Start the bisecting process
> git bisect good GOOD_COMMIT  # Set the initial good commit
> git bisect bad BAD_COMMIT    # Set the initial bad commit

Choosing Good and Bad commits

The sandbox generator gave us 100 commits, we will use the range [HEAD .. HEAD-100] as initial Good Commit and Bad Commit:

> git bisect start          # Start the bisecting process
> git bisect good @~100     # Set the initial good commit
> git bisect bad @          # Set the initial bad commit
Bisecting: 49 revisions left to test after this (roughly 6 steps)
[a645b201713dd4b5b1f828f35ba23f67f40817f4] 259053644

We are using git rev parse shortcuts:
@ is HEAD commit,
@~100 is HEAD minus 100 commits

In the real world, you could use tags or commit hashes

Iterating git bisect candidates

We must have a way to verify if the bug exists, this can be made manually or even automatically (passing scripts to git bisect). We choose to do it manually, making easier to understand git bisect:

# grep
#  -i          -> ignore case
#  -c          -> count matches per file
#  -R          -> recursive, check all files
#  --exclude   -> ignore file
> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:1

In this case, we found that file 593135436 contains the criteria one time. To continue the git bisect, we mark the commit as good or bad:

> git bisect bad   # when the commit contains the regression bug
> git bisect good  # when the commit DOES NOT contains the regression bug
Bisecting: 24 revisions left to test after this (roughly 5 steps)
[007f6e2eb294d21849f8afcfa2901a1008fec9c4] 279015154
# Note: in the first iteration we had 49 revisions, now only 24.

After marking the commit as good or bad, git bisect will step forward:

When you found the criminal, the last candidate

After few iterations, git bisect will eliminate all the suspects and return the success report:

> git bisect bad
e7effd2bb7f6a0776659854b6282d39db6680d59 is the first bad commit
commit e7effd2bb7f6a0776659854b6282d39db6680d59
Author: fernando baroni <[email protected]>
Date:   Mon Oct 10 16:26:19 2016 -0300

    random commit message

We can git show the given commit (e7effd2bb7f6a0776659854b6282d39db6680d59) and see what introduced the bug.

Restoring the workspace

After we found the cause of the regression, we have to reset our working directory to continue working:

> git bisect reset  

Now you know the criminal commit, you can save the world and continue adding new features to your product!

My entire session

Here is my entire bisect session:

> git bisect start
> git bisect bad @
> git bisect good @~100
Bisecting: 49 revisions left to test after this (roughly 6 steps)
[a645b201713dd4b5b1f828f35ba23f67f40817f4] 259053644

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:1

> git bisect bad
Bisecting: 24 revisions left to test after this (roughly 5 steps)
[007f6e2eb294d21849f8afcfa2901a1008fec9c4] 279015154

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:0

> git bisect good
Bisecting: 12 revisions left to test after this (roughly 4 steps)
[be9997c8ec6ad6150900a4cb335579f69d33c3ee] 201228274

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:0

> git bisect good
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[793844038dfec1cd6ae9dfbea457a3f78b8eac92] 744392540

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:0

> git bisect good
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[f7537d30dbc59127752fcd82b2842cd9b072cc7d] 996103973

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:0

> git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[d05d610c2356b0161fcb83d32fb726a5c8dab95e] 944908823

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:1

> git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[e7effd2bb7f6a0776659854b6282d39db6680d59] 280896075

> grep -icR --exclude=play.py --exclude=LICENSE --exclude README.md \!BUG\!
320869619:0
682058646:0
593135436:1

> git bisect bad
e7effd2bb7f6a0776659854b6282d39db6680d59 is the first bad commit
commit e7effd2bb7f6a0776659854b6282d39db6680d59
Author: fernando baroni <[email protected]>
Date:   Mon Oct 10 16:26:19 2016 -0300

    the criminal commit

Tip: custom good/bad terms

Don’t like using bad and good terms? git bisect allow us to set custom terms with --term-good=? and --term-bad=.

For example:

git bisect start --term-good=ohhyeahhh --term-bad=nooo GOOD_COMMIT BAD_COMMIT
git bisect ohhyeahhh # it works!
git bisect nooo      # it doesn't work =[
git bisect nooo
git bisect ohhyeahhh
git bisect reset

Comments

comments powered by Disqus