23 Oct 2013, 23:19

Git tip: binary diffing

When dealing with binary files on Git repositories, we often would like to be able to diff and see changes on such files. However, git diff is not very helpful when working with non-textual files.

Fortunately, it's relatively simple to help git show us meaningful diffs for these files. We only need to add some lines to our configuration file and have the right tools installed.

For basic diffing with hexdump, we can add the following to ~/.gitconfig

[diff "hex"]
	textconv = hexdump -C

Then, on .gitattributes or $REPO/.git/info/attributes, we can use something like

*.so		diff=hex
*/*/bin/*	diff=hex
*/*/sbin/*	diff=hex

This will allow us to get an output similar to the following when using git diff:

diff --git a/blah.so b/blah.so
index 8891af2..5c2e1f5 100644
--- a/blah.so
+++ b/blah.so
@@ -339,7 +339,7 @@
 00001520  4d 61 73 6b 00 5f 5f 61  65 61 62 69 5f 75 6e 77  |Mask.__aeabi_unw|
 00001530  69 6e 64 5f 63 70 70 5f  70 72 30 00 67 73 6c 5f  |ind_cpp_pr0.gsl_|
 00001540  6c 69 6e 6b 65 64 6c 69  73 74 5f 61 6c 6c 6f 63  |linkedlist_alloc|
-00001550  6e 6f 64 65 00 6f 73 5f  6d 61 6c 6c 6f 63 00 67  |node.os_malloc.g|
+00001550  6e 6f 64 65 00 6f 73 5f  63 61 6c 6c 6f 63 00 67  |node.os_calloc.g|
 00001560  73 6c 5f 6c 69 6e 6b 65  64 6c 69 73 74 5f 67 65  |sl_linkedlist_ge|
 00001570  74 6e 6f 64 65 62 79 69  64 00 67 73 6c 5f 6c 69  |tnodebyid.gsl_li|
 00001580  6e 6b 65 64 6c 69 73 74  5f 66 72 65 65 6e 6f 64  |nkedlist_freenod|

You can define as many conversion rules as you wish, and then apply them to specific files to get meaningful diffs. This way, you can diff ODF or MS Word documents, image metadata and even Eagle layout files. The posibilities are endless.