Neuma White Paper:
Metrics and Process Maturity
In keeping with the season, I'll try to keep this month's article on the Light side (both Chanukah and Christmas are Festivals of Light). Not easy to do when talking about metrics. If you're serious about attaining SEI CMM Level 5 certification, or about improving your processes in an effective manner, metrics are critical. Changing processes based on gut feel, or even based on some other organization's best practices can lead you backwards. Metrics not only permit you to detect this, but give you the basic data you need to improve your processes.
Why do I Like Metrics?
I enjoy looking at metrics. Even more, I enjoy devising new metrics to deal with specific issues. So, why? Why would I want to spend so much time on non-core activities.
First of all, they're interesting. They teach me things I didn't know. They confirm or reject my suspicions. Sure Dept. A is delivering more lines of code than Dept. B. But Dept. B is delivering more Features. Less code to maintain too.
Secondly, I agree with the philosophy: What you measure (publicly), improves. Testers not productive enough? Put up a chart of the weekly number of problems uncovered by each tester or test group. People spending too much time surfing the net at work? Put up a chart each week of how much surf time occurs in each dept. during nine-to-five (if you can measure it!).
Thirdly, the need to tune processes. The biggest part of tuning a process, for me, is dealing with the most frequent cases and ensuring the process handles these well. Metrics point me in the right direction. The low frequency cases may have as big an effect, but I can deal with low frequency without automation - not so for high frequency.
Why else? Forecasting. Mostly, I need to be able to predict, accurately, when resources will be required and when product will be ready.
How about identifying change - my metrics tell me that something different is happening - that makes me want to isolate the cause. On the flip side, when I change my processes, I can identify the impact of the changes.
It's about process, it's about productivity, it's about accuracy. Metrics are important.
Making Metrics Work
What can you do to make metrics work well. First of all, compare apples to apples. Don't use a Java line count metric to compare to a Perl line count metric (unless you're studying the virtues of different languages).
Secondly, prime the pump. You're starting to pump out metrics regularly - you'll likely make some adjustments to the metrics and then the data will come in. Don't read too much into the first set of measurements. There will be bumps and anomalies along the way. Get samples across a significant part of your axis (usually time). Identify some of the bumps so that you know what to look out for in the future. For example, if 50 new problems were raised in the first week, don't push the panic button and assume the same number will occur in the second and third. If it does, then you have a reasonable measure, and perhaps cause for concern. But if the numbers settle down to 20, identify what caused the 50 and watch out for it in the future.
Thirdly, post your metrics. Make them visible. If there are anomalies, you may find others speaking up to tell you about them before you have the chance to see it. But be careful what you post - don't post a metric you don't want people spending time improving. And use political sense - posting a metric that will create competition is good, but one that will create division is not so good.
Finally, be careful about what you're measuring. For example, don't measure activity/feature checkpoints which vary wildly in size. If 50 of your 53 features are under 2 weeks effort and 3 are over 2 years effort, your metrics may give some false indications. Set some guidelines for feature sizes. For example, state that they must be of a size between 1 week and 2 months, and if not, they should be either combined so that they are at least a week, or split out into sub-features which fit the sizing guidelines. Then when you hear the 5 of 50 features are incomplete, you know roughly where you stand. Another example: a design problem can manifest itself in dozens of ways. Don't count the dozens of ways as problems, just the root problem. Otherwise it will look like you're fixing a whole host of problems with one simple change. Good testers should know how to recognize problems stemming from the same root. If not, they should be consulting directly with design, or else the problems they raise should be screened before entering the "design" problem domain.
My Favorite CM Metrics
I like to look at trends in my CM world. Good tools help. Every so often I'll spend a few minutes issuing a bunch of queries just to get a feel for the landscape, to get my finger on the pulse. It's usually quite interesting. Some of the queries are useful to look at regularly (i.e over time) or across subsystems. These are the metrics - they show a trend over time or across some other dimension such as subsystem or build. Make sure you have tools that allow you to do this. If you don't, you won't do much learning and your processes won't improve adequately.
I've grouped some of my favorite CM metrics below by functional area.
Problem Arrival and Fix Rates - These are good to see if we're staying ahead of the game. Usually they are reported separately for each development stream. Far more significant to me is the stream to stream comparison. For example, we see the peaks and valleys in problem arrival rates in one stream and map them to events such as verification test runs, field trials, etc. When the next stream comes along, we see a similar pattern - but this time we're more confident in predicting when each peak will settle back down.
# Problems Fixed per Build - Typically we see a fairly constant failure rate for problem fixes (until we change our process to improve that rate). This number tells us how bumpy the build is expected to be until we get the fix fixes turned around. It is also a good indicator for overall stability of a release stream.
# Problems Fixed and # Features implemented per Release - This one, again, needs to be compared on a stream to stream basis. It tells us how complex a release is likely to be - how much training, documentation and other resources will be required.
Outstanding Problem Duration by Priority - This is a process monitoring metric. Are problems being solved within the specified periods indicated by the priority. (At least, that's how we use internal priorities.)
Problem Identification by Phase Found - This one lets us know if too many bugs are getting out the door. It also lets us know how effective things like verification, and beta testing are.
Duplicate Problem Frequency - If this gets too high, its too hard for the reporting teams to identify a duplicate problem before raising a new one. Perhaps they just need training.
CM Tool Usage
Multi-site Bandwidth Requirements - The requirements for Multi-site bandwidth are measured by looking at the transactions to be sent across sites, or the transactions sent over the past while. There are two types of bandwidth - average (MB/day) and peak (peak MB/hour). By looking at what the behaviour has been, we're prepared to predict what will happen when someone decides to load in 20,000 new files to be distributed to all sites.
License checkout usage - This allows us to know when we will need more licenses. It also helps determine our ratio of floating licenses to users and average tool usage times.
CM Repository growth rate - This will affect both performance and disk space, both to verify against the vendor's performance claims and to ensure our overall data growth has been adequately planned out.
Application data growth rate - This is a finer measure of growth rate on a per application basis. If the overall growth rate is too fast, this will help pinpoint and react to the areas of concern.
Administrator Effort to Support CM Tool Operation - A CM tool has many cost components. If Administration is a key component, track the costs. This will give you a cost-savings objective for the next time you need to upgrade your CM tool suite. (There are very good CM tools out there that require almost no administration.)
Build Preparation Time for a Single Build - This metric is a useful one to help identify how well your CM tool is working. After product development work, how much effort does it take you to prepare for a build: promotion, merging, defining the build contents, producing build notes, retrieving source code, launching the build processes. If your effort is significant, you may want to split this into different metrics, and that's good. But I would also recommend you have the overall number. You want to drive this cost down to near-zero.
Changes per Designer - You might want to break this one out further. If a designer has a lot of changes, is it because a lot of rework is required, or because you have Super-Designer on your team?
Files per Change (Bug fixes, Features) - Here's a good one to look at. How many files change for the average bug fix. I'll bet it's very close to one. But if it's the same for feature implementation, you can probably reduce file contention by breaking these changes into a series of changes.
Lines of Code per Change - How many lines of code are added/modified/deleted per change. Check this out over time. If it's growing, then your modularity plan may not be working.
Header File Changes per Build - You'll want to look at this across an entire stream. Ideally, this is high at the start of a stream, low afterwards, and almost non-existent once verification testing has started. This is a good measure of your product stability.
Most frequently Changed Files - How many times to your files change. I looked at this today and I thought I had some interesting results. But when I zoomed in, it was pretty much what I should have expected. Still I was able to identify files which are candidates for architectural re-engineering. If a file is changing a lot, there will likely be contention on it. So the high runners are a good place to look and ask why? You may find that there are architectural solutions which can simplify your product desing considerably.
Most frequently Changed Interface/Header Files - When interface files are changing frequently, there's a problem in their initial design. The high runners should be the target of a design review. It may be simply that there's a symbolic range that keeps growing (e.g. for each new message type, or each new command or GUI button). That's not a serious problem, but you still might want to consider a dynamic allocation of these range elements.
Files and Directories
Average fan-out of a "directory" - Large directory fan-outs are hard to look at. They require real estate and/or scrolling. It's harder to classify their contents. You may want to set some reasonable guidelines here, especially considering that a suggested maximum of only 20 is enough to give you 8,000 files at the third level of fanning out. It's a lot easier to navigate 8,000 files with a fanout of 20 than it is when you have a couple of directories with hundreds of files. I'm sure that with some CM tools, this could make a performance difference as well.
Number of Files per Subsystem, per Product - How big and complex are the subsystems and the product overall. More important perhaps, is how this is changing over time. You might want to relate this to the lines-of-code metrics to get some meaningful trends.
Lines of Code (by file, by subsystem, by file type, etc.) - Why is this such an interesting metric? Certainly it's useful to compute coding/design error rates. But usually it's just useful to compare the size of projects. Unfortunately, I've seen the same, or even better, functionality often produced with just a tenth the amount of code - so perhaps its a good way to measure the effectiveness of your designers.
Revisions of a File by Stream - This is not much different than the "most frequently changed files" metric above. However, breaking it down by stream allows you to look at maintenance and support as opposed to the initial introduction of the file and the related functionality.
Branches per File - This is interesting in a stream-based development environment (where only one branch is created per stream) to get a good idea of what percentage of the files are modified for each release. In a more arbitrary branching environment, it can reflect any number of characteristics, depending on the branching strategy.
Delta compression level per file type - This metric, which gives the compression ratio for each file type, is useful for disk space requirements for your repository.
Other Functional Areas
There are plenty of other areas I haven't covered. Each one is important. As you go through the list below, think about what other metrics are important to you and how can they help you improve your processes. You may want to consider, as part of your process improvement process, putting together a list of metrics for each process area - those you currently use, and those which you need.
Test Suite Management
- Test case coverage (by Problem, by Feature, by Requirement)
- Test case failure rates
- Project checkpoint completion rates (S-Curve and predictions)
- Effort per Activity/Feature
- Actual vs Planned Effort per Activity/Feature (by Dept)
- New Documents Per Week
- Changed Documents Per Week
- Average Time to Review
- Average Approval Time
- Customer Requests Raised/Implemented per Release
- Number of Communications per Customer
Metrics on Metrics - How Well are They Working
When I want to know how well metrics are working, I look at a number of factors. These include some other metrics which I would tend to measure more on a quarterly basis.
Changes per Process Area per Quarter - More specifically, how many process changes are occuring as a result of other metrics?
Missing data fields per record [field values per 100 records] per Quarter - When data fields are not being set, metrics are going to be suspect. Processes need to change to ensure they are set. This will track whether or not proper attention is being paid to data entry forms. As this number goes down, my data is more complete.
Actual vs Planned Effort per Activity/Feature (by Dept) per Quarter - This metric will show whether or not my effort estimations are improving over time. The first few times someone makes an estimate, it's not unusal to see a difference of a factor of 2 or 3. But if the estimator is not learning from his/her mistakes, it's time to review the process improvement agenda.
CM Tool Performance Queries - Is the CM tool working?
While we're at it, how about some metrics to assess how well your CM tool is working. I might challange the CM vendors out there to post their own results for these metrics. These are common use cases for CM users and managers. Assume a client platform of about 2.5 GHz, with whatever configuration server you recommend to your clients.
Basic CM Tool Metrics
- Time to query full file history (all change meta data) [sec. per 100 revisions]
- Time to query full file delta history (all code changes) [sec. per 100 revisions]
- Time to retrieve files [sec. per 1000 files - (avg. file size 50K)]
- Time to perform build comparison (code changes) [sec. per 100 revisions]
- Time to search problem reports (by title only, by full description only) [sec. per 1000 problem reports]
- Time to bulk load files [sec. per 1000 files]
- Time to change context view [sec]
- Time to generate sorted Report for Problems - single line, full descriptions [sec. per 1000 problems]
- Time to start CM Tool - i.e. when client has control (command line client, GUI client) [sec]
- Time to create a baseline (based on a set of marked files/changes) [sec per 1000 members]
- Time to freeze a baseline [sec]
Advanced CM Tool Metrics
- Time to perform build traceability (all meta data - problems fixed, features, change descriptions, etc.) [sec. per build]
- Time to produce traceable source file (each line traced to file revision) [sec. per 1000 lines]
- Time to identify which revisions of a file contain a Function name [sec. per 100 revisions, avg. 1000 lines per file]
- Time to automatically identify and check-in changes to a file tree [sec. per 1000 files]
- Time to generate Requirements Tree document [sec. per 100 requirements]
- Time to generate an Activity WBS (Work Breakdown Structure) document [sec. per 100 requirements]
- Time to distribute a basic data change to all sites (MultiSite normal operation) [sec.]
- Time to distribute file change to all sites (MultiSite normal operation) [sec. per 100KB file]
- Time to generate MakeFile [sec per 100 dependants]
- Time to identify Dependencies on a header file [sec per 1000 files in scope]
Metrics vs Limits for a CM Tool
Another related area for CM tools, as for any system, is the set of quantifiable limitations on the tool. These are really a different form of metric - one that will change much more slowly over time. How about the following easy ones - I would hope they're all sufficiently high.:
- Maximum number of directories/files supported
- Maximum file size
- Maximum revisions per file
- Maximum files per change
- Maximum number of problems
- Maximum number of changes
- Maximum number of branches
How easy is it for you to make a new metric visible?
You'll need a tool that makes it easy to track metrics. If it's not, you'll find you're not using metrics as much as you should be. Ideally, your CM tool suite has suitable metrics functionality. There are a number of things to consider here.
First, you want it to be easy to obtain your metrics. Maybe you have an overnight job generating them. Or perhaps you can just go into the tool and query them easily as you need. Perhaps they come out as numbers, perhaps as graphs as well. Great. Even better if you can export your data to a graphical slicer and dicer (e.g. Excel spread sheet).
Next you need to be able to dig down into the metrics - why is this number what it is. Click on the extra long or extra short bar on your bar graph and identify the specific items. Or zoom in on a cell of your tabular summary. This is more than an problem solving exercise - it's a learning exercise. You want to be able to understand the metrics, not just see them. You want to investigate anomalies too. Unfortunately, over-night generated metrics (as opposed to interactive ones) will require a bit more work to dig.
Finally, you need to be able to specify what you want to look at. Perhaps your queries are grouped by related areas. For example, metrics which are shown across development streams, or metrics dealing with the administration of your CM tool (or your environment). Here's what I would recommend.
- Specify the domain of your metric - what set of objects are you looking at.
- Specify the attribute you want to measure.
- Specify the granularity of your measurement.
- Group your metrics - tag them if possible so that metrics may appear in more than one grouping.
The good thing about a CM repository is that it has all of the history in it (I hope!). You shouldn't have to worry about saving your metrics because you can reproduce them at any time. OK, some metrics may take longer to produce than others - for example if you have tens of thousands of files and you're computing function point metrics, these are not going to pop up in a point-and-click response time - at least not until we go through another 10 years of technology advances. So you may want to save some of your metrics.
Ideally, you want to be able to set an arbitrary context view and compute many of your metrics based on that. You want to be able to point to a specific stream and compute metrics for it. To a specific build and compute metrics for it. To a part of the organization and compute metrics for it. This is data mining, but in a revision-rich repository containing, hopefully, everything you've ever wanted to know about metrics but were afraid to ask.
Too much data? The next level of optimization is the ability to look through your metrics and tell you the concerns. This doesn't have to be a 22nd century artificial intelligence capability. It can be a simple set of checks, hard-coded for each metric, looking for large deviations from the norm and giving metric specific interpretations for these deviations. For example, department xyz has an actual to projected effort ratio which is 3 times that for the rest of the project. If your metrics are easy to specify, you won't mind spending a bit of effort to automatically interpret and hi-lite the results. If your metrics are hard to specify and implement, you'll likely not get past the specification step. If most of your metrics are easy, and a few really difficult, focus on the easy ones first. Again, the value of a good tool cannot be under-estimated. I generally have to specify a single line to add a new metric to my list - grant it, not something like computing function points, but more like the above cited metrics. If I had to write a program (or even worse, steal somebody's time to do it), I'd be thinking twice before looking at extending my metrics.
Anyone know what my average article length is. Or the number of points I make in each. How does that compare to the other articles in this journal?
Who cares? Well, as a writer I may need to pay some attention to the article complexity. On the other hand, my audience is dealing with one of the more complex management applications.
It's easy to look at all sorts of numbers. It's interesting. But there is a need to focus. Go through the metrics above. Go through the guidelines. Select the ones that are most important and try them on for size. Select 3 or 4 metrics to go public with - post them on your bulletin board, in your WAR room, on your internal web site and see what happens. It's a great way to get the process improvement focus geared up. While you're at it, add some of these points to your CM tool requirements.
I didn't really want to give you a detailed Metrics road map in this article. Just wanted to get you thinking so that when you're stuffed with turkey and ham, your mind has something to work on. Better yet, have a Merry Christmas or Happy Chanukah!
Joe Farah is the President and CEO of Neuma Technology. Prior to co-founding Neuma in 1990, Joe was Director of Software Architecture and Technology at Mitel, and in the 1970s a Development Manager at Nortel (Bell-Northern Research) where he developed the Program Library System (PLS) still heavily in use by Nortel's largest projects. A software developer since the late 1960s, Joe holds a B.A.Sc. degree in Engineering Science from the University of Toronto.
You can contact Joe at email@example.com