Goto Home Page

Using grep to count the number of instances of a text string within a given file

In this video I am going to show you how to use the Linux grep command, to quickly count the number of instances of a text string, found within a given file.

The example will demonstrate how to quickly process a web servers access log file, for certain activity. Last month I released the TIFF splitter utility which is a free piece of software to split multi page TIFF files into their individual pages. It’s worth pointing out that this software will run on Linux when used in conjunction with Wine. Anyway, I wanted a quick and easy way to check how many downloads of the software ZIP file there have been.

Here I have a copy of the access log file, for the last weeks activity, and if I use the cat command, to output the contents of the file to screen, you can see it contains a lot of information. So we need to use the grep command to filter out the lines we are interested in. In my case, I want to find all the lines which have a dot zip reference in them, as this will give me the actual requests for the software download.

grep .zip < access.log

The command we need to type is grep, space, followed by the criteria your searching for, in this case dot zip. We then need to pipe in the file name we are going to process, for example access dot log. The grep output is now restricted, showing just the lines within the log file which have the dot zip reference, highlighting your search term for convenience.

grep -c .zip < access.log

To get the actual count, we just need to add the dash c switch to the grep command, which gives us a count of seventy two. So I now know my software has been downloaded seventy two times in the last week.

To make life even easier, I have assigned this command sequence to an alias using the letter k. Now, all I need to type is k and hit return, to get the current download count. I think you’ll agree that’s pretty convenient for a web master who wants to quickly check certain activity on their web site.

grep -c Googlebot < access.log

Likewise, if you wanted to know how many times Google Bot has visited your web site, then just enter the term Googlebot as the search criteria, in my case, you can see it has visited just over a hundred times.

If you would like more information on the grep command, then I would recommend the grep pocket reference book, published by O’Reilly.

About the author

Paul Bradley is a full time software developer, and while he has to use Windows for work he chooses to use Linux and open source software on all his personal computers. He has been using Linux since 2008 and is currently using Xubuntu and CrunchBang Linux.

This article was first published on 14.10.2010.


© copyright 2004–2012
HomeContactColophonDisclaimer