Do Metrics Suck? Really? Even Crap4j? Let’s Find Out!
There is a severe lack of perspective around metrics, maybe even Crap4j. Sure, it has a funny name, and it was created by two really cool guys, but is it actually helping projects recognize problems and pointing the way to action that improves the success of those projects? Maybe, if we all shared our metric scores and categorized our projects in common ways we could get the necessary data to decide if metrics are useless or valuable. This would require some extra metrics like bug counts and development maintenance effort to compare metric scores against. Until that happens, at the very least, we will be able to see how we benchmark with our peers on a metric. We won’t know which competitor sucks more (or less), but we’ll know where we stand in our field!
Tags! You’re It!
Starting with the 1.1.6 release of Crap4j, we have added tagging to our benchmarking site, Crap-o-rama. This means you can compare your CRAP scores against similar projects by comparing against like-tagged projects! And it can still be done anonymously!
This feature set is just getting started and will become valuable as the number of projects grows. But it is easy and anonymous to contribute scores, so, please, share. We hope we will find out which metrics are meaningful for which categories of projects, and which metrics are categorically useless. We are starting with the CRAP metric, since that is the metric we currently produce, but we hope to add more in the future by opening it up as a web service.
What’s in a Name?
We chose tags as a mechanism for categorizing projects because it is completely free-form. The community can create the tags they find useful. It will be interesting to see how far we can take tagging, and what, if any, other classification mechanisms we will need.
I can see some obvious problems when it comes to updating a project’s CRAP score across releases. If I had a tag “v1.0”, and then changed it to “v2.0” with a new CRAP score, I will have lost the comparison against other version 1.0 projects.
Conceivably, I could write something that digs into our historical data for projects (used in presenting historical trending of CRAP scores) and find tags that were applied to a previous version of a project (like “v1.0”). That probably won’t scale well in the current implementation. Another option might be to create a new project for the next version, but share a tag on both uploads that is unique to the two of them; perhaps the project name, like “crap4j”. Then you could compare your version 1 project to your version 2 project as two separate projects. Of course, the historical trending charts wouldn’t exist across the two projects. Hmm. Have to think about this. Ideas?
Barring temporal tags, there are a lot of other useful comparisons that can be captured with tags. I’d like to discuss some of them to seed ideas for people who want to share and benchmark projects with their community. Maybe each community will come up with its own specialized set of tags that allow them to compare projects. Anyway, here are some tags that seem useful. Please provide your own and why you think they would be useful.
Examples
“java” — OK, this one is redundant right now since we are the only tool that uploads projects to Crap-o-rama and we only look at Java projects. But I know there are ports out there. They should be able to compare too. Crap4j is a language-neutral metric. So, we should apply this tag in preparation for the future. It might even show how similar apps in different languages compare (let the flame wars begin!)
“developer tool”, “financial app”, “game” — If you are writing tools for developers, it would be cool to compare against other developer tools. This is really to introduce the notion of vertical categorization tags. I happen to know that developer tools exhibit certain common types of complexity and other types of application may not exhibit the same range of CRAP scores. For example, dev tools usually do a lot of visitor patterns and switch statements, which produce particular CRAP scores. I expect other verticals may experience similar patterns.
“small team”, “5 developers”, “2 QA” — A team description might be useful to compare your CRAP score against other teams output. Again, this one is problematic because it may change over time. Leads me to think that searching historical project data, including the then-current set of tags might really be necessary. Any ideas?
“6 months time”, “1 weekends time” — Capture vaguely the amount of work involved. Again this one applies to versions so is transient.
“Waterfall methodology”, “XP method”, “Death march method” — The idea here is to find out if a methodology produces CRAPpier code than another.
“library”, “application”, “tool”, “service” — Another categorization for projects that probably have different architectural styles. For that matter, architectural styles could be labels as well. Then developers could ask, “How do I compare against other pipes-and-filters systems, or enterprise bus systems?”
We might even start to understand how different types of applications compare to each other.
These are a few ideas, I am sure that you, dear reader, have great ideas I haven’t even imagined. Please share.








