Today, I discovered a website called Ohloh. This is a service that evaluates Open Source projects under certain criteria. I decided to try it out and registered both TinyUML and ZMPP there.
After some time, an information screen appears, listing the data it determined for the project. The summary for both looks like this:
- mostly written in Java
- well commented source code
- short source code history
- only a single active developer
Interestingly, the scoring whether code is well-documented is dependent on the average for a programming language. A value of 30 % would be extremely high for C, for example. The website determines the following values for average comment-to-code ratios:
- Java: 35%
- C#: 24%
- PHP: 24%
- C/C++: 20%
- Ruby: 16%
- Perl: 14% (!)
- Python: 12%
Seemingly, for Java projects, this average is usually very high, much higher than for other languages. As I already expected, the value is much lower for Ruby and Python, probably because they are more easy to read (I could not get the values for Groovy, but I guess it would be in the same ballpark). Big surprises for me are Perl and C/C++. For any non-trivial software project, 14% or 20%, respectively look way too low.
The reason for the higher average on Java projects is probably the wide availability and acceptance of automatic code review tools (such as checkstyle and PMD) among Java developers, but it is only a guess.
My personal opinion is that larger C/C++ or Perl projects could in fact benefit from more code documentation, that’s what pod and doxygen are for.
Another interesting feature is the calculation of project costs, which is seemingly completely dependent on the Lines of Code metrics. For TinyUML it determines 14415 LOC and an effort of 3 (!!!) person years. While the 14415 Lines Of Code measured might be correct, I am really surprised about the estimated effort of 3 person years (another thing would be the average salary of 55000 USD/year – I think the salaries for software engineers here in Washington are way higher).
According to this, the average productivity of a programmer per year for 55000 USD is less than 5000 LOC. While the LOC does not say anything about the usefulness or quality of the software, this number seems terribly low for average productivity, assuming people working full-time on the project.
On the other hand, it does not account for different programming languages, it usually takes much more effort to write an equivalent program in C compared to Java and projects written in scripting languages tend to be even smaller.
In a comparison of KFrotz and ZMPP which are both implementations of the Z-machine with a similar feature set, KFrotz has 35204 LOC and ZMPP 18035. Still not the whole truth, Ohloh counts comment lines and test code, which account for two thirds of the overall ZMPP code, so it is actually only 8840 NCSS.
Test code is a factor that might make projects difficult to compare for Ohloh. Of course it contributes to the overall cost and effort for a project, but not directly to the functionality. That would be an interesting additional feature for the future, since more and more projects adopt automatic testing.
Still, I think, Ohloh is an interesting way to look at a software project and there sure will be some improvements in the future.