Using diff

diff is a *nix command that takes two files and compares them, reporting on the differences between them. For example:

$ diff foo.bar foo.bar
$

When the files are identical, no output is generated (as in this case, comparing a file to itself).

The problem with diff is that its output may not be immediately obvious to the ordinary user. For example, consider the two files:

FIRST LINE = 'foo.bar'
this is a simple 3-line file
this is the third line

and

FIRST LINE = 'foo.baz'
this is a simple 3-line file
this is the third line
AND SURPRISE! a fourth line


Run diff like this:

$ diff foo.baz foo.bar
1c1
< FIRST LINE = 'foo.baz' --- > FIRST LINE = 'foo.bar'
4d3
< AND SURPRISE! a fourth line

The output is very clear--if you are a computer. diff is particularly useful for using in scripts that look at two files and then pass along the differences to some other program, which can then do something with that information automatically.

As a human being, it's not immediately clear--until I can understand how it works: the output indicates what lines in the first file have to be changed to make it identical to the second file. The output starts with a line number(or line numbers) in the first file, which indicates where in the file there are differences. Next, a letter that tells me what needs to happen with that line(s) in the first file, and finally, the line number(s) from the second file that need to be added, changed or deleted ("a" for add, "c" for change, and "d" for delete).

Following the codes that specify line numbers and action, there will be one (or more) lines underneath, with either the "<" or ">" symbols at the left. ">" means "put this in" and "<" means "take this out". The first 4 lines of the diff output, 1c1 tells me that I need to change the first line in foo.baz from “FIRST LINE = ‘foo.baz'” (delete the first line) to “FIRST LINE = ‘foo.bar'” (add that as the first line).

The last two lines tell me that I need to delete the fourth line, and then the files will be synchronized from the third line onward.

Comparing the two files the “other” way gives this result:

$ diff foo.bar foo.baz
1c1
< FIRST LINE = 'foo.bar' --- > FIRST LINE = 'foo.baz'
3a4
> AND SURPRISE! a fourth line

This time, the first line needs to be changed, but instead of deleting a line at the end of the file, an extra line (“AND SURPRISE! a fourth line”) needs to be added to make the two files identical.

It makes more sense the more I experiment with files that are almost identical and use diff to see how it works.

This entry was posted in Command Line, computer science, How-To, Open source software, Practical Cryptography, Programmer's Tools, Secure computing, Security/Cryptography, Utilities. Bookmark the permalink.