Archive for January, 2008

Response To My Post About Crappy Code in Crap4J - Agile Software Process Improvement

Monday, January 14th, 2008

Jason Gorman has a thoughtful reply on his blog about using Crap4j.

He also brings up the potential for self-interest due to the fact that I work at Agitar.

“I think it’s worth highlighting the fact that Bob works at a company that sells a tool designed to help teams improve their test coverage, so I can’t help wondering if there isn’t a little bias here towards metrics that highlight lack of coverage. Which is fine, because Bob’s absolutely correct in saying that complex methods with low test assurance are pretty much universally crappy. It is a problem, and you should address it.”

I’d like to respond to that part first. Causality is always tricky. It turns out that I work at Agitar because I think these things are important. Agitar happens to be a great way to work on code quality and testing issues which are near to my heart. We invented the CRAP metric and Crap4j as ways to point out particular code quality problems. Naturally, as opposed to being cynically profit-driven, we do also offer tools that try to address these problems. Being an engineer, I don’t think I could bring myself to espouse any methodology or ideas just for my material gain. I have to believe in it. Of course, if anyone wants to offer me millions of dollars to espouse some helpful technology, I will be glad to contemplate the dilemma :-)

Now, on to his point.

“Getting developers started with quantitative code analysis - metrics - is one of the things I do for a living, and I’ve always found the opposite. You need to try to establish balance right from the start. Focusing in on just one area of quality, no matter how valuable by itself, can lead to people ignoring other important areas.”

It is a point we discussed when trying to identify the metric. It is a valid point, and I am glad there are people like Jason working to help teams get a balanced picture of their systems. I hope eventually that the CRAP metric, or at least the tool, will do sort of a triage of code quality issues. It will present the most egregious metric failures first, and then offer more refined levels of CRAPpiness for the developer to ponder as they clear away the more immediate issues. Imagine a video game where you have to beat the level 1 boss to move to level 2.

This raises an issue that I haven’t spent time analyzing yet, but intuitively I know will present itself. There isn’t a hierarchical order between all the problems that can exist in a project, so, at some point you have to give them multiple, equally crappy, problems in a report. Perhaps this is like being able to get to a certain point in a video game where you can take multiple paths, but ultimately you have to finish the breadth of those paths to move on to the next level?

This is in tension with our other goal of giving teams a measure that is easy to act to remedy. It should be obvious what problem is being presented, and what can be done to fix it. Otherwise, I imagine that there could be paralysis, and consequently, no action taken to improve quality.

CRAP is not all forms of crap

Thursday, January 10th, 2008

Jason Gorman, posted a critique on his blog about some code in the Crap4j codebase. He mentions one particularly long method, SystemCrapStats.toXml(). It was indeed quite a long method that was responsible for persisting the statistics gathered into the report.xml file. He asked the question:

Can I take it that Crap4J doesn’t test for long methods, then? Is this further evidence to support the theory that you get what you measure (or, in this case, you don’t get what you don’t measure)?

I answer, “Certainly.” Alberto expounded on our aims in his original blog posts over at Artima, but some of that bears repeating.

First, though, I would like to say that I fixed this method :-). It already delegated the project specific information and the method crap scores to their respective object’s toXml methods, but it was still doing quite a bit in this method. The new method is a composition of methods, that I think is much more comprehensible. It could certainly be even better, but we have other fish to fry and it is good enough. It looks like this:

  public String toXml() {
    MyStringBuilder s = new MyStringBuilder();
    s.start("“);
    crapProject.toXml(s);
    writeStats(s);
    writeMethods(s);
    s.end(”“);
    return s.toString();
  }

Back to our goal with Crap4j and a discussion of Jason’s question. We were looking for an unequivocal characterization of crappy code. That is, if the tool said the project is CRAP, then it had better be crappy. We believe, based on decades of experience, a lot of physical examination of code and interviewing of developers, and study and reasoning over piles of research papers that two simple measures can be pretty successful at indicating a whole class of crappy projects.

One of the properties is Cyclomatic complexity. Really complicated methods are hard to understand, thus hard to implement correctly, and probably more importantly hard to maintain and modify correctly.

The second property is the existence of automated unit tests. Tests serve to verify correctness and to catch regressions that might occur in maintenance. Without tests, the developer is left to reason through each change, and all of its implications for the rest of the code base. Tests extend the developer’s attention in a mechanical way and they show (when done reasonably well) a certain attention to correctness initially that may generally be relied on to increase confidence in the code.

Nowhere do we look at long methods. However, long methods can indeed be problematic. A straight-line method may still be hard to understand by virtue of the numerous responsibilities it has. If the definition and assignment of variables happens far from where they are used, that creates more opportunities for bugs by virtue of being more things that must be kept in the developer’s head while they work in that method. But, some long straight-line methods can arguably be pretty simple to understand. So, it is not as convincingly crappy as methods with many paths and no tests.

I would say that right after reducing the CRAPpy method count to a tolerable level, then by all means, pursue these other aspects of code quality. Look at things like long methods or the Chidamber Kemerer metrics such as cohesion and coupling. However, keep in mind that they are much more subjective due to their contextual interpretation. We don’t want CRAP to be subjective. If it says it is crappy, then everyone should agree.

That said, we expect to evolve the metric as everyone gets more experience with it. So, here is a question. Has anyone found disagreement with a particular method that Crap4j has identified as crappy? We would really appreciate knowing about it.

Do Metrics Suck? Really? Even Crap4j? Let’s Find Out!

Tuesday, January 8th, 2008

There is a severe lack of perspective around metrics, maybe even Crap4j. Sure, it has a funny name, and it was created by two really cool guys, but is it actually helping projects recognize problems and pointing the way to action that improves the success of those projects? Maybe, if we all shared our metric scores and categorized our projects in common ways we could get the necessary data to decide if metrics are useless or valuable. This would require some extra metrics like bug counts and development maintenance effort to compare metric scores against. Until that happens, at the very least, we will be able to see how we benchmark with our peers on a metric. We won’t know which competitor sucks more (or less), but we’ll know where we stand in our field!

Tags! You’re It!

Starting with the 1.1.6 release of Crap4j, we have added tagging to our benchmarking site, Crap-o-rama. This means you can compare your CRAP scores against similar projects by comparing against like-tagged projects! And it can still be done anonymously!

This feature set is just getting started and will become valuable as the number of projects grows. But it is easy and anonymous to contribute scores, so, please, share. We hope we will find out which metrics are meaningful for which categories of projects, and which metrics are categorically useless. We are starting with the CRAP metric, since that is the metric we currently produce, but we hope to add more in the future by opening it up as a web service.

What’s in a Name?

We chose tags as a mechanism for categorizing projects because it is completely free-form. The community can create the tags they find useful. It will be interesting to see how far we can take tagging, and what, if any, other classification mechanisms we will need.

I can see some obvious problems when it comes to updating a project’s CRAP score across releases. If I had a tag “v1.0”, and then changed it to “v2.0” with a new CRAP score, I will have lost the comparison against other version 1.0 projects.

Conceivably, I could write something that digs into our historical data for projects (used in presenting historical trending of CRAP scores) and find tags that were applied to a previous version of a project (like “v1.0”). That probably won’t scale well in the current implementation. Another option might be to create a new project for the next version, but share a tag on both uploads that is unique to the two of them; perhaps the project name, like “crap4j”. Then you could compare your version 1 project to your version 2 project as two separate projects. Of course, the historical trending charts wouldn’t exist across the two projects. Hmm. Have to think about this. Ideas?

Barring temporal tags, there are a lot of other useful comparisons that can be captured with tags. I’d like to discuss some of them to seed ideas for people who want to share and benchmark projects with their community. Maybe each community will come up with its own specialized set of tags that allow them to compare projects. Anyway, here are some tags that seem useful. Please provide your own and why you think they would be useful.

Examples

“java” — OK, this one is redundant right now since we are the only tool that uploads projects to Crap-o-rama and we only look at Java projects. But I know there are ports out there. They should be able to compare too. Crap4j is a language-neutral metric. So, we should apply this tag in preparation for the future. It might even show how similar apps in different languages compare (let the flame wars begin!)

“developer tool”, “financial app”, “game” — If you are writing tools for developers, it would be cool to compare against other developer tools. This is really to introduce the notion of vertical categorization tags. I happen to know that developer tools exhibit certain common types of complexity and other types of application may not exhibit the same range of CRAP scores. For example, dev tools usually do a lot of visitor patterns and switch statements, which produce particular CRAP scores. I expect other verticals may experience similar patterns.

“small team”, “5 developers”, “2 QA” — A team description might be useful to compare your CRAP score against other teams output. Again, this one is problematic because it may change over time. Leads me to think that searching historical project data, including the then-current set of tags might really be necessary. Any ideas?

“6 months time”, “1 weekends time” — Capture vaguely the amount of work involved. Again this one applies to versions so is transient.

“Waterfall methodology”, “XP method”, “Death march method” — The idea here is to find out if a methodology produces CRAPpier code than another.

“library”, “application”, “tool”, “service” — Another categorization for projects that probably have different architectural styles. For that matter, architectural styles could be labels as well. Then developers could ask, “How do I compare against other pipes-and-filters systems, or enterprise bus systems?”

We might even start to understand how different types of applications compare to each other.

These are a few ideas, I am sure that you, dear reader, have great ideas I haven’t even imagined. Please share.

Version 1.1.6 of Crap4j and the Crap-o-rama benchmark!!

Tuesday, January 8th, 2008

New!

Happy New Year!
Version 1.1.6 is out!!

We’re really excited to announce this version. There are a lot of little fixes in the Eclipse client and in the Ant client. But the big news: the stats page where you can benchmark your CRAP score against other projects anonymously or publicly, now allows you to tag your projects with labels and to see historical trends for your score.

The tag system will allow you to compare your project against other projects that share the same tags. For example, you could tag your application’s CRAP scores with ‘finance’, ‘v1.0′, ‘java’, ‘agile_methodology’, or whatever else. Then you can compare against other projects tagged ‘finance’, ‘v1.0′, and so on. Even without sharing your project, you can view projects by collections of tags. http://www.crap4j.org/benchmark/stats/.

We hope this will bring new meaning to the CRAP score, and eventually all metrics. If you previously uploaded a project, please re-upload it and add some tags to it.

We’re really excited to hear how people like this new feature. Please contact us, post on the forum, file bugs and feature requests, or if you really like it, just send cash (small and large bills accepted.)

Bob and Alberto