Cvsplot is used for collecting statistics from CVS controlled files. Simple statistics such as how the total number of files and lines of code change against time. It runs under any flavour of UNIX, and under Windows (assuming Perl from http://www.activestate.com is installed). As an example, the following graph was generated using cvsplot, which plotted the history of the OpenBSD project, spanning over five years:
The latest version of cvsplot can be obtained from: here.
Note: this project used to be known as cvsstat, however another unrelated script by this name already exists, so the project name was changed to cvsplot.
A simple invocation would be:
cvsplot.pl -cvsdir :ext:cvsbox:/usr/local/cvsroot -rlog product \ -linedata linedata.txt -filedata filedata.txt
Note, if using perl 5.8, the DateManip module contains characters that are non UTF-8 characters. All invocations of cvsplot.pl should be done with the LANG environment variable set to "C". With the bash shell, this would be:
LANG=C cvsplot.pl ...I have been told this is no longer necessary with DateManip version 5.42 and above.
The above command effectively retrieves cvs history information for all CVS controlled files in the "product" module from the CVS repository. The -cvsdir argument is the same as the CVSROOT environment variable. The results are stored into the linedata.txt and filedata.txt files in a simple text format. Each line consists of a data point (corresponding to a CVS commit), which includes the date of the commit, and the corresponding number.
For linedata.txt, this number represents the total number of lines for active files that exist in the repository up until that date. For filedata.txt, this represents the total number of files that exists in the repository up until that date. Note files which have been indicated as binary to CVS are ignored.
If the period of interest is well defined, then it is possible to trim the statistics reported by optionally specifying start and/or end dates. For example, if we are only interested in statistics starting from the 28th March, 2001, then the following can be entered:
cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \ -filedata filedata.txt -start "28th March, 2001"
The -end option is used to specify the final date, for example:
cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \ -filedata filedata.txt -start "28th March, 2001" \ -end "2nd May, 2001"
The date formats supported are very flexible, as the Date::Manip perl module is used for date parsing and manipulation. The above command could have also been expressed as:
cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \ -filedata filedata.txt -start "2001/03/28" \ -end "2001/05/02"
It is possible to specify filesets in order to restrict what statistics are generated. For example, assuming we are only interested in C files, header files and java files, the following command could be specified:
cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \ -filedata filedata.txt -include '\.java$' -include '\.c$' \ -include '\.h$'
The argument given to the -include option is in the syntax of a perl regular expression. To avoid shell expansion, single quotes must be used. It is also possible to specify files that should not be included. Assuming we are interested in java and C files, but don't want to run statistics down the "kernel" sub-directory, then the following command could be issued:
cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \ -filedata filedata.txt -exclude '^kernel' \ -include '\.java$' -include '\.c$' -include '\.h$'
The order of the -exclude and -include options is important. Whenever cvsplot examines a file, it runs through the list of -exclude and -include options in the order specified on the command line. If the filename matches a -exclude option, it is skipped. If a filename matches a -include option, it includes the file when collecting statistics. If no -include or -exclude options have been specified, then the default behaviour is to include all files. If -include or -exclude options have been specified, and a file doesn't match any of the include or exclude patterns, then it is not included when collecting statistics.
The -include and -exclude options and semantics were based on the --include and --exclude options from rsync.
It is also possible to provide a specific CVS branch in which to gather statistics. The above command can be run against the RELEASE1 branch as follows:
cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \ -filedata filedata.txt -exclude '^kernel' \ -include '\.java$' -include '\.c$' -include '\.h$' \ -branch RELEASE1
In addition to generating statistics into a text file, it is also possible to generate plots as a png file, assuming gnuplot is installed on your system. I run cvsplot.pl in a cron job so that my team's statistics are updated nightly on our internal web-server. Gnuplot supports many other file formats, such as jpg, gif and postscript.
To generate png files which will plot the statistics, the -gnuplot options are specified:
cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \ -filedata filedata.txt -gnuplotfiledata filedata.png \ -gnuplotlinedata linedata.png
The filedata.png file presents the statistics in filedata.txt as a png file generated by gnuplot. Similarily for linedata.png.
Its also possible to generate plots which combine both the line and file data into a single plot, using the -gnuplotlinefiledata switch.
cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \ -filedata filedata.txt -gnuplotfiledata filedata.png \ -gnuplotlinedata linedata.png \ -gnuplotlinefiledata linefiledata.png
For gnuplot users, it is possible to specify the "terminal parameters" sent to gnuplot when generating the plots. For example, to generate the plots as a postscript eps using Times-Roman font, the following could be specified:
cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \ -filedata filedata.txt -gnuplotfiledata filedata.eps \ -gnuplotlinedata linedata.eps \ -gnuplotsetterm "post eps 'Times-Roman'"
Its also possible to specify general gnuplot commands (separated by semi-colons) which will get executed before the final "plot" commands to generate the graphs. One possibility might be to change the formatting of the x values, from their default %m/%y (month/year) format, such as %d/%m (day/month) and to set the title of the graph:
cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \ -filedata filedata.txt -gnuplotfiledata filedata.eps \ -gnuplotlinedata linedata.eps \ -gnuplotsetterm "post eps 'Times-Roman'" \ -gnuplotcommand "set format x '%d/%m'; set title 'CVS History'"
The above commands use the "cvs rlog" command, to retrieve the relevant information. This command is properly implemented in CVS versions >= 1.11.1. If you have an older version of CVS, you can still use cvsplot, however you need a checked out version of the module you want to gather statistics from (to run "cvs log" against). Make sure the checkout is done without the -P flag, and those directories which are pruned will be excluded from the statistics, which is not what you want. The only difference in command syntax is the argument to -cvsdir refers to your checked out sandbox, and the -rlog command is omitted. For example, the previous command would be the following, assuming the product module was checked out in the ~/product directory.
cvsplot.pl -cvsdir ~/product -linedata linedata.txt \ -filedata filedata.txt -gnuplotfiledata filedata.eps \ -gnuplotlinedata linedata.eps \ -gnuplotsetterm "post eps 'Times-Roman'" \ -gnuplotcommand "set format x '%d/%m'; set title 'CVS History'"
Finally, for large plots, it is definately worth trying the -linestyle option, as this can dramatically improve readability.
For platforms (such as windows) where gnuplot may not be in the standard PATH, and/or has a different name, the -gnuplot option can be used to specify the full path to the gnuplot binary.
Since version 1.7.0, it is now possible to retrieve per-user statistics as well. The -userdata option specifies the file which will store user commit information. This file is used as input for gnuplot, when plotting per-user information. An example invocation is:
cvsplot.pl -cvsdir ~/product -linedata linedata.txt \ -filedata filedata.txt -userdata userdata.txt \ -gnuplotfiledata filedata.png \ -gnuplotlinedata linedata.png \ -gnuplotuserdata userdata.png \ -userlist fred,joe,peter,paul
This command will create userdata.png, which will contain CVS line counts for contributions made by usernames fred, joe, peter and paul. If there is no -userlist argument, this will default to all found users in the CVS logs, which for some installations can be very large, and produce unreadable graphs.
It is also possible to specify groups of users, as another way of reducing graph clutter. The userdata graph will have a line displayed per-group rather than per-user.
cvsplot.pl -cvsdir ~/product -linedata linedata.txt \ -filedata filedata.txt -userdata userdata.txt \ -gnuplotfiledata filedata.png \ -gnuplotlinedata linedata.png \ -gnuplotuserdata userdata.png \ -userlist group1=fred,joe \ -userlist group2=peter,paul
It is also possible to specify a "default" group, so that any users not explicitly listed will be automatically become a member of this group. This is achieved via the -defaultusergroup option, as shown below, in the case of the default group known as "group3".
cvsplot.pl -cvsdir ~/product -linedata linedata.txt \ -filedata filedata.txt -userdata userdata.txt \ -gnuplotfiledata filedata.png \ -gnuplotlinedata linedata.png \ -gnuplotuserdata userdata.png \ -userlist group1=fred,joe \ -userlist group2=peter,paul \ -defaultusergroup group3
An option to mention which affects all linecount statistics (-linedata and -userdata options) is -countchangedlines. If this option is specified, then the line counts reported are the number of lines _changed_, not lines _added_. For example, if a commit involved removing two lines and adding three lines, with -countchangedlines, this would be recorded as an addition of five lines. Without -countchangedlines, this would be just one line. Some users requested -countchangedlines, as it can be used a form of a very rough productivity meaurement.
CVS global arguments can be set for all CVS commands executed by cvsplot.pl, via the -cvs-global-args switch. For example, the command below:
cvsplot.pl -cvs-global-args "-f -q" ...
Will ensure that ~/.cvsrc will not be read for default settings, and that all CVS commands will output the minimal amount of information to stderr. Execute cvs --help-options for the complete list of global arguments available.
Gnuplot comes with most Linux distributions, and can be found at http://www.gnuplot.org. If the -gnuplot options are not used, then its not necessary to install gnuplot. Gnuplot is also available for Windows.