Validating of an Approach for Improving Existing Measurement Frameworks

February 13th, 2008

The paper Validation of an Approach for Improving Existing Measurement Frameworks by
Manoel G. Mendonça and Victor R. Basili reports their experience in validating and improving measurement frameworks used by companies.

In particular, they are mining the data gathered by the frameworks for interesting latent information using attribute focusing. Also, they are using GQM to structure the measuring activity.

The paper also contains links to several good background papers on software measurement techniques and their validation.

Response To My Post About Crappy Code in Crap4J - Agile Software Process Improvement

January 14th, 2008

Jason Gorman has a thoughtful reply on his blog about using Crap4j.

He also brings up the potential for self-interest due to the fact that I work at Agitar.

“I think it’s worth highlighting the fact that Bob works at a company that sells a tool designed to help teams improve their test coverage, so I can’t help wondering if there isn’t a little bias here towards metrics that highlight lack of coverage. Which is fine, because Bob’s absolutely correct in saying that complex methods with low test assurance are pretty much universally crappy. It is a problem, and you should address it.”

I’d like to respond to that part first. Causality is always tricky. It turns out that I work at Agitar because I think these things are important. Agitar happens to be a great way to work on code quality and testing issues which are near to my heart. We invented the CRAP metric and Crap4j as ways to point out particular code quality problems. Naturally, as opposed to being cynically profit-driven, we do also offer tools that try to address these problems. Being an engineer, I don’t think I could bring myself to espouse any methodology or ideas just for my material gain. I have to believe in it. Of course, if anyone wants to offer me millions of dollars to espouse some helpful technology, I will be glad to contemplate the dilemma :-)

Now, on to his point.

“Getting developers started with quantitative code analysis - metrics - is one of the things I do for a living, and I’ve always found the opposite. You need to try to establish balance right from the start. Focusing in on just one area of quality, no matter how valuable by itself, can lead to people ignoring other important areas.”

It is a point we discussed when trying to identify the metric. It is a valid point, and I am glad there are people like Jason working to help teams get a balanced picture of their systems. I hope eventually that the CRAP metric, or at least the tool, will do sort of a triage of code quality issues. It will present the most egregious metric failures first, and then offer more refined levels of CRAPpiness for the developer to ponder as they clear away the more immediate issues. Imagine a video game where you have to beat the level 1 boss to move to level 2.

This raises an issue that I haven’t spent time analyzing yet, but intuitively I know will present itself. There isn’t a hierarchical order between all the problems that can exist in a project, so, at some point you have to give them multiple, equally crappy, problems in a report. Perhaps this is like being able to get to a certain point in a video game where you can take multiple paths, but ultimately you have to finish the breadth of those paths to move on to the next level?

This is in tension with our other goal of giving teams a measure that is easy to act to remedy. It should be obvious what problem is being presented, and what can be done to fix it. Otherwise, I imagine that there could be paralysis, and consequently, no action taken to improve quality.

CRAP is not all forms of crap

January 10th, 2008

Jason Gorman, posted a critique on his blog about some code in the Crap4j codebase. He mentions one particularly long method, SystemCrapStats.toXml(). It was indeed quite a long method that was responsible for persisting the statistics gathered into the report.xml file. He asked the question:

Can I take it that Crap4J doesn’t test for long methods, then? Is this further evidence to support the theory that you get what you measure (or, in this case, you don’t get what you don’t measure)?

I answer, “Certainly.” Alberto expounded on our aims in his original blog posts over at Artima, but some of that bears repeating.

First, though, I would like to say that I fixed this method :-). It already delegated the project specific information and the method crap scores to their respective object’s toXml methods, but it was still doing quite a bit in this method. The new method is a composition of methods, that I think is much more comprehensible. It could certainly be even better, but we have other fish to fry and it is good enough. It looks like this:

  public String toXml() {
    MyStringBuilder s = new MyStringBuilder();
    s.start("“);
    crapProject.toXml(s);
    writeStats(s);
    writeMethods(s);
    s.end(”“);
    return s.toString();
  }

Back to our goal with Crap4j and a discussion of Jason’s question. We were looking for an unequivocal characterization of crappy code. That is, if the tool said the project is CRAP, then it had better be crappy. We believe, based on decades of experience, a lot of physical examination of code and interviewing of developers, and study and reasoning over piles of research papers that two simple measures can be pretty successful at indicating a whole class of crappy projects.

One of the properties is Cyclomatic complexity. Really complicated methods are hard to understand, thus hard to implement correctly, and probably more importantly hard to maintain and modify correctly.

The second property is the existence of automated unit tests. Tests serve to verify correctness and to catch regressions that might occur in maintenance. Without tests, the developer is left to reason through each change, and all of its implications for the rest of the code base. Tests extend the developer’s attention in a mechanical way and they show (when done reasonably well) a certain attention to correctness initially that may generally be relied on to increase confidence in the code.

Nowhere do we look at long methods. However, long methods can indeed be problematic. A straight-line method may still be hard to understand by virtue of the numerous responsibilities it has. If the definition and assignment of variables happens far from where they are used, that creates more opportunities for bugs by virtue of being more things that must be kept in the developer’s head while they work in that method. But, some long straight-line methods can arguably be pretty simple to understand. So, it is not as convincingly crappy as methods with many paths and no tests.

I would say that right after reducing the CRAPpy method count to a tolerable level, then by all means, pursue these other aspects of code quality. Look at things like long methods or the Chidamber Kemerer metrics such as cohesion and coupling. However, keep in mind that they are much more subjective due to their contextual interpretation. We don’t want CRAP to be subjective. If it says it is crappy, then everyone should agree.

That said, we expect to evolve the metric as everyone gets more experience with it. So, here is a question. Has anyone found disagreement with a particular method that Crap4j has identified as crappy? We would really appreciate knowing about it.

Do Metrics Suck? Really? Even Crap4j? Let’s Find Out!

January 8th, 2008

There is a severe lack of perspective around metrics, maybe even Crap4j. Sure, it has a funny name, and it was created by two really cool guys, but is it actually helping projects recognize problems and pointing the way to action that improves the success of those projects? Maybe, if we all shared our metric scores and categorized our projects in common ways we could get the necessary data to decide if metrics are useless or valuable. This would require some extra metrics like bug counts and development maintenance effort to compare metric scores against. Until that happens, at the very least, we will be able to see how we benchmark with our peers on a metric. We won’t know which competitor sucks more (or less), but we’ll know where we stand in our field!

Tags! You’re It!

Starting with the 1.1.6 release of Crap4j, we have added tagging to our benchmarking site, Crap-o-rama. This means you can compare your CRAP scores against similar projects by comparing against like-tagged projects! And it can still be done anonymously!

This feature set is just getting started and will become valuable as the number of projects grows. But it is easy and anonymous to contribute scores, so, please, share. We hope we will find out which metrics are meaningful for which categories of projects, and which metrics are categorically useless. We are starting with the CRAP metric, since that is the metric we currently produce, but we hope to add more in the future by opening it up as a web service.

What’s in a Name?

We chose tags as a mechanism for categorizing projects because it is completely free-form. The community can create the tags they find useful. It will be interesting to see how far we can take tagging, and what, if any, other classification mechanisms we will need.

I can see some obvious problems when it comes to updating a project’s CRAP score across releases. If I had a tag “v1.0”, and then changed it to “v2.0” with a new CRAP score, I will have lost the comparison against other version 1.0 projects.

Conceivably, I could write something that digs into our historical data for projects (used in presenting historical trending of CRAP scores) and find tags that were applied to a previous version of a project (like “v1.0”). That probably won’t scale well in the current implementation. Another option might be to create a new project for the next version, but share a tag on both uploads that is unique to the two of them; perhaps the project name, like “crap4j”. Then you could compare your version 1 project to your version 2 project as two separate projects. Of course, the historical trending charts wouldn’t exist across the two projects. Hmm. Have to think about this. Ideas?

Barring temporal tags, there are a lot of other useful comparisons that can be captured with tags. I’d like to discuss some of them to seed ideas for people who want to share and benchmark projects with their community. Maybe each community will come up with its own specialized set of tags that allow them to compare projects. Anyway, here are some tags that seem useful. Please provide your own and why you think they would be useful.

Examples

“java” — OK, this one is redundant right now since we are the only tool that uploads projects to Crap-o-rama and we only look at Java projects. But I know there are ports out there. They should be able to compare too. Crap4j is a language-neutral metric. So, we should apply this tag in preparation for the future. It might even show how similar apps in different languages compare (let the flame wars begin!)

“developer tool”, “financial app”, “game” — If you are writing tools for developers, it would be cool to compare against other developer tools. This is really to introduce the notion of vertical categorization tags. I happen to know that developer tools exhibit certain common types of complexity and other types of application may not exhibit the same range of CRAP scores. For example, dev tools usually do a lot of visitor patterns and switch statements, which produce particular CRAP scores. I expect other verticals may experience similar patterns.

“small team”, “5 developers”, “2 QA” — A team description might be useful to compare your CRAP score against other teams output. Again, this one is problematic because it may change over time. Leads me to think that searching historical project data, including the then-current set of tags might really be necessary. Any ideas?

“6 months time”, “1 weekends time” — Capture vaguely the amount of work involved. Again this one applies to versions so is transient.

“Waterfall methodology”, “XP method”, “Death march method” — The idea here is to find out if a methodology produces CRAPpier code than another.

“library”, “application”, “tool”, “service” — Another categorization for projects that probably have different architectural styles. For that matter, architectural styles could be labels as well. Then developers could ask, “How do I compare against other pipes-and-filters systems, or enterprise bus systems?”

We might even start to understand how different types of applications compare to each other.

These are a few ideas, I am sure that you, dear reader, have great ideas I haven’t even imagined. Please share.

Version 1.1.6 of Crap4j and the Crap-o-rama benchmark!!

January 8th, 2008

New!

Happy New Year!
Version 1.1.6 is out!!

We’re really excited to announce this version. There are a lot of little fixes in the Eclipse client and in the Ant client. But the big news: the stats page where you can benchmark your CRAP score against other projects anonymously or publicly, now allows you to tag your projects with labels and to see historical trends for your score.

The tag system will allow you to compare your project against other projects that share the same tags. For example, you could tag your application’s CRAP scores with ‘finance’, ‘v1.0′, ‘java’, ‘agile_methodology’, or whatever else. Then you can compare against other projects tagged ‘finance’, ‘v1.0′, and so on. Even without sharing your project, you can view projects by collections of tags. http://www.crap4j.org/benchmark/stats/.

We hope this will bring new meaning to the CRAP score, and eventually all metrics. If you previously uploaded a project, please re-upload it and add some tags to it.

We’re really excited to hear how people like this new feature. Please contact us, post on the forum, file bugs and feature requests, or if you really like it, just send cash (small and large bills accepted.)

Bob and Alberto

Season’s greetings

December 17th, 2007

seasons_greetings.jpg

Crap4j will be kind of quiet over the next few weeks as we head off for the holidays.

We have a lot of fixes in the latest version, and a bunch of cool new features for the Crap-o-rama statistics website, but they’ll have to wait for the new year. We want them well-tested and we want to be around to support anybody who tries the new features, so we’re going to hold the release off until we get back from holiday.

In the meantime, please keep using crap4j, and letting us know how it goes. Some people will still be around looking at the forums occasionally, and I’ll read all the wiki updates and tickets when I get back.

See you in the new year, with lots of nifty new features and fixes.

Happy holidays. Peace and good will to all.

Crap4j Developers Wiki

December 5th, 2007

The wiki has been updated with several new pages. There are pages on:

  • how to download the crap4j sources
  • how to build crap4j
  • crude descriptions of how the tool is organized
  • feature story proposal pages
  • meeting/organizing/collaborating on porting activity

Get a login and tell us about the features or make some new pages. (We wanted to require no login, but the comment and wiki spammers already found us.)

Apple VM and Eclipse 3.3.1

November 15th, 2007

If you get the ‘Java started on first thread’ AWT exception running crap4j in Eclipse, here is a fix.

Inside <eclipse install dir>/Eclipse.app/Contents/MacOs/eclipse.ini, add the following two VM switches.

-Dapple.awt.usingSWT=false

-Djava.awt.headless=true

I ran into this for the first time tonight, and not sure what changed in my Eclipse 3.3.1 install. Everything still works fine in Eclipse 3.2.2. Anyway, those two switches solved the problem. Someone else mentioned removing the -XstartOnFirstThread flag, but only the combination of the two above actually worked for me.

Why could this happen? This is a perennial problem with Apple Java and Eclipse SWT. I had long been afraid it would happen since Crap4j renders the bar graph for ‘% CRAP’ using Java 2d, which loads in AWT. However, it hadn’t blown up on the Mac until today. I hope no one else will have the problem.

Version 1.1.5 w/ Benchmarking!

November 15th, 2007

 

New!

Version 1.1.5 of Crap4j is available at http://www.crap4j.org/downloads/.

It has some bug fixes, and a really exciting new feature or two.

The big feature is that you can share your CRAP results anonymously (or publicly) and see how you fare against all the other projects uploaded. It even puts the global average in the report graphic to make it more useful. (You can turn this off if it’s just too painful :-)

You can see stats on other projects on the benchmark site.

We hope that this will help us understand the CRAP metric better in the wild, and that it will provide users an opportunity to check themselves against fellow developers.

Enjoy, and please let us know how you like it.

 

Crap4J - The Movie

November 7th, 2007

Well not quite a movie, but I’ve created a 6 min video showing the basic usage of Crap4J:

1) Run Crap4J

2) Identify crappy methods

3) Add tests or refactor crappy methods until you have eliminated them

The video quality is not very good due to YouTube’s compression, but it should help people get the basic idea of how to use Crap4j.

Here’s the link to YouTube: Crap4J Basic Demo.

Alberto