Too Many Bugs
There is a time or two in every software engineer’s life when he experiences a deluge of open bug reports. I’ve seen times where there are ten new defect reports per day for two weeks straight against one module. At that point checking your “My Bugs” list in Bugzilla is a daunting task. You don’t want to see how far down the page scrolls. Worse, you don’t want to triage all those bugs; each one will take at least five minutes.
Right off the bat you have to read and comprehend the defect report. Not all reports are clear and concise Sometimes the problem still isn’t obvious just from reading the report and you’ve got to wait on clarification. Worse, sometimes the bug report is perfectly clear and you can’t determine whether the issue in question is a feature or a defect. You can expect to spend a lot of time going back and forth with dev. management, QE, customer support and product management discussing whether or not the functionality is correct, especially if you work at a large company. So let’s say you know that some functionality is not correct or at least not desired. Now you’ve got to think about the steps to reproduce it and what the root cause could be. Still, you’re just thinking about it; you haven’t yet dug into the unit tests searching for a missed failure condition. Despite not knowing the root cause for certain there has to be a business decision about this bug: is it worth our time to fix it? Is there a business case that says that this functionality must be corrected before it is released into the wild? Sometimes yes, sometimes no. More than five minutes has gone by in either case and you’re still not done. Someone in management is going to want to know the delivery time table. Now repeat this process for 74 more bugs. If you’re triaging by yourself and you work at near-100% efficiency you’re looking forward to at least one full work day of triaging and the bug reports are still incoming.
Realistically you’re not working by yourself and you don’t set aside an entire day to triage your bug list. In real life the situation is a lot worse because a long time ago somebody in management invented the bug triage meeting. Bug triage meetings can last for hours or even days where usually too many people get into a room, close the door and stay caged in there until they triage some percentage of the product or service’s entire bug list. Instead of spending one person-day on bug triage, these meetings make it possible to burn entire person-weeks in the blink of an eye by turning lots of little technical problems into one large social problem. The technical problems don’t matter here because, easy or hard, there is a fix for (almost) every bug. The social problem is a lot bigger: getting people to agree on the priority and severity of each bug in the list.
Remember the process to triage each bug? Now everyone in the room has to go through that process for each and every bug that comes up in the meeting. Unless your company is the Borg Collective (no, not Microsoft. I actually mean the Borg from ST:TNG) there are going to be questions that require in-depth answers and those answers will foster discussion to get everyone up to speed on the history of the bug, why someone thinks it’s high priority, why someone else thinks it’s low priority, until everyone agrees and the meeting moves on to the next bug. The bug triage meeting also introduces the notion that you’re not going to be dealing with only your own bugs. Other developers are going to be in the meeting and their bugs have to be triaged too. This means that a significant portion of the meeting time is going to be spent NOT talking about your own bugs which is a recipe for boredom even if you have the best intentions.
It would seem that I don’t have a very high opinion of bug triage meetings but that’s actually not the case. Bug triage meetings are vitally important to a project at the right time, with the right people and the right material to work with. Unless you’re into fooling yourself no product or service ever ships with zero bugs. Sure you can get all kinds of management involved to do requirements-lawyering and invalidate bugs in bulk but then the zero bug metric loses its meaning. I don’t know about you but I’d rather have a list of known risks going into release instead of sweeping them under the rug because company policy dictates that everything must ship with zero bugs and is completely perfect, absolutely flawless. One of them seems just a bit more realistic than the other too.
Anyway, because there are always bugs we need the triage meeting to decide which ones to fix right now. These are usually the bugs which will cause the most damage in the wild either to the customer or the company’s reputation. The rest should be put in the parking lot for consideration in the next release cycle. If you’ve done your engineering cycle right then a vast majority of the bugs will be put in the parking lot. If development is complete and you’re working on driving your bug list down for the release then 75% of the bugs in the list should not be classified as “must be fixed”. Maybe development isn’t complete after all. So if you know that most of the meeting discussion is about bugs that don’t need to be fixed right away then why spend so much time talking about them? It’s because people lack initiative. Do you want to be a rock star team player adored by your peers and management? There are a few things you can do to make bug triage more valuable to everyone involved and give your career a big boost by addressing the social issues before they become social problems.
The reason we have bug triage sessions in the first place is no one has bothered to talk about these bugs before dealing with the entire list became a high priority. Everyone pays the high price of brutally long bug triage sessions because the low price of communicating on a consistent basis hasn’t been a priority right along. In order to make triage sessions more valuable you have to make it your top priority to get to the meeting with the fewest, highest quality bugs possible. This doesn’t mean close all the small bugs before the meeting and hope no one notices or to assign them all to someone else. It means effectively dealing with bugs while they’re incoming. In order to do this right you’ll need to build a relationship with someone in QE. Yes I know QE and developers are supposed to be like cats and dogs or water and oil. In this case the relationship needs to be more like peanut butter and jelly. Or peanut butter and chocolate. Or peanut butter and whatever else you like since it goes well with just about everything. You need to be like peanut butter.
Part of the beauty in doing this is that your chosen QE co-conspirator doesn’t even need to know they are a co-conspirator, though eventually they will realize their job has become much, much easier to do effectively. One of your goals is to stop bugs from coming in by stopping them at the source, which means you need to get QE to reduce the number of bugs filed on your software. There are a few ways to go about this and not all of them are good. Some can even get you in trouble with HR. If you want to reduce the number of incoming bugs you have to first find out what software QE is testing. If you’re late to the game then you should start just after the code is released to QE, though starting any time is better than not starting at all. In order to find out what’s going on in QE you have to, gasp, talk to someone in QE! It’s true, they know how to communicate using spoken language, not just Bugzilla. You can ask your QE person, “hey, how’s it going?” and they will tell you everything is going fine, testing is going fine, day to day tasks are going fine and then move on to filing more bug reports on your software. Or you could ask, “hey <QE person>, have you had a chance to look at <my software module in question>? If you have a chance I’d like to go over the work I’ve done with you in order to answer any questions you have on it.” The second way always gets better results than the first and it’s surprising how very few software engineers are capable of doing this. Asking the question in this way sets up a discussion by showing the QE person that you are open to questions and feedback. It gives them assurance that their input is valuable because it is! If they’re testing software and they don’t understand what it’s doing they’re not going to be responsive to the first question of “how’s it going” unless they have an ax to grind. You’re not going to get anything useful out of that question because it gives no indication that questions or constructive criticism are welcome even if they are. Do you really want to reduce your bug count? Spend a half hour going over your work with them. Answer every question they have and go over the implementation. Draw diagrams, play charades, do everything you can to help them understand what you’ve unleashed. All of those bugs that are really just setup and configuration questions will be vaporized if you do this. Ditto for the “what does this software actually do” bugs that would have been filed. You have just taken the first step to reducing the number of low quality bugs you have to deal with.
I remember an iteration where I had a fairly high profile deliverable. I felt like I nailed the business requirement. All the features were complete and robust. The implementation was impeccable. This deliverable was a feature that had been lacking for a long time and I had a feeling that I’d be heralded as a hero for delivering just what everyone was begging to have for such a long time. I didn’t think anyone would be able to open even a single bug on my project. How could they? They would be too busy drooling over the amazing quality of what I just delivered.
Imagine my surprise when QE started opening up bug after bug after bug on what most people in development considered perfection. I watched in panic all day long as I saw my bug list get longer and longer. And almost everyone one of the bugs was shallow. The QE person didn’t dig very far at all for a reason to file a bug. If they didn’t understand what was supposed to happen they filed a bug. If they thought the software was supposed to do something and it didn’t then there was a bug. If they thought the software should have additional features they considered nice to have there was a bug. Needless to say that iteration’s acceptance process was very stressful for me. That’s when I realized the value in stopping bugs at the source.
Almost all of the bugs were in the class that would have gone away if I just communicated what was going to happen before the release. Fast forward to the next iteration. Once again I was working on high profile features that should be a slam dunk as far as quality goes. The week before it was released to QE I asked around to find out who would be testing my features. I scheduled a meeting with them to talk about my deliverable and give them a demo. In my meeting invite I sent a brief description of what I wanted to talk about and encouraged them to show up with questions or relevant topics to discuss. During the meeting we went over just about every facet of every feature in great detail. We went over how to configure the operating system environment, the topology and the application. I wanted to do everything in my power to make sure they had all the information they needed to meaningfully test my software. Meaningless bugs are a waste of everyone’s time. Once the software was officially released to QE I didn’t see a single bug filed on the first day of testing. One was filed on the second day of testing. None on the third. I had only a handful of bugs for that iteration’s release because I made sure to discuss my concerns with QE before they started testing it. I also had more time to concentrate on the meaningful bugs.
By eliminating the setup/config/curiosity class of bugs you should notice an immediate drop off in the amount of incoming bugs. It’s easy to file bugs on things that aren’t quite clear at first. By helping others get on board with what you’ve produced you’ve started setting expectations on where to test to get the most value out of the process. This isn’t to say that you as a software developer tell QE what to test and how to test it. That would defeat the purpose of having QE. They know how to test software and they’re good at it. Rather, during your ongoing dialogue with QE (it is ongoing, right?) there should be a discussion where you provide them with a list of the implementation topics that you’re most concerned about. You should address both business concerns and technical concerns.
The business concern is directly traceable to the business requirement. This is really just asking the question, “does this software fulfill this requirement?”. The answer should be fairly obvious but sometimes the requirement a software deliverable fulfills is not immediately apparent. Sometimes the software address multiple business requirements and there should be a discussion as to how they depend on each other and just as importantly how they do not depend on each other. Functionality of one requirement should not interfere with the functionality of a different requirement. It’s important that QE have some idea about the desired dependent and independent behavior in order to write tests to prove the software is correct.
The technical concerns you discuss arise from how you feel about the way you implemented the business requirement. Pointing out areas where the software is very flexible is always a good idea. This gives QE the hint that the software accepts a wide range of input and they should spend a little more time in validating what constitutes “correct input” and that the software only operates on correct input. If there is a point of extensibility in the software QE should take that as a directive to test that part as a high-risk to software security.
You shouldn’t limit yourself to talking about just flexibility or extensibility, these are just examples of the topics you should bring to QE. Remember, you’re taking the initiative to put a stop to brutal bug triage meetings and if the quality of your software improves as a result then just accept it as a positive side effect. Discussing these concerns with QE is a sure way to get some high-quality bugs out of them by allowing them to focus on areas of high-risk and core features. If you ever want to stop dealing with junk bugs then you have to do this. If there is ever a question on whether or not to talk to QE then always err on the side of over communicating. Too much is never enough.
There’s the old saying “when you’re in a hole stop digging”. So far we’ve talked about how to stop digging but we haven’t talked about how to get out of the hole yet. A bug list is a bit like Pandora’s box in that once the bugs are open they’re like all the sorrows let out of the box. You just can’t put the sorrows back in the box and you just can’t get rid of all the bugs. What happens to you when you’ve got a bug list a mile long? You try to avoid looking at it for one. But there are always people in management that subscribe to the mange-by-numbers theory who take bug metrics a little too seriously. Eventually you’re going to make it to the top of the naughty list when they’re looking at who has the most open bugs. This can be interpreted in a few ways and none of them are good. They could think you’re a terrible developer if you write so many bugs. They could think you’re lazy if you’ve got such a huge backlog. They could think you can’t prioritize well if they see something related to their pet topic is on your bug list and you’re working on something different. They could think any number of different things and whatever they come up with definitely will not benefit you one bit. You need to massage your bug list to stay off the slacker radar.
Start by picking off some bugs that are related somehow. They can be related on feature, topology specificity, platform specificity. Pick a few that you don’t understand well. Pick a few that you think are configuration error, user error, documentation error. Make up a list with these bugs and a summary of each one or print out each bug report. I hope you can guess what’s coming next. That’s right – go get your QE person or people and ask when they will have time to sit down with you to address your concerns. What you’re doing right here is proving to QE that they are valuable and their work is not in vain. You’re validating their presence in the organization by asking them to talk about their work. They need someone to listen to what they’re saying and it’s your job as a developer to do it. You’re making them feel good by making them feel important. This is a great way to start off a meeting or meeting request.
Let them know what bugs you want to discuss before the meeting starts to give them time to prepare too. This meeting is going to go very smoothly if everyone comes prepared. Your goal here is to find out enough information to resolve all of the bugs on your list you prepared. Every one of them must have some resolution before you leave this meeting. This is not a bug triage meeting because you know going into it that all the bugs must be resolved. This is a technical meeting that results in action items from all parties.
Start the meeting by saying “I understand you have some concerns about… <fill in the software module blank>. I was wondering if you could give me some insight into what you’re looking for from me”. Begin the discussion on each bug by asking them how they encountered it even if it’s already logged in the bug report. Ask questions about their process. A lot of extra information that wasn’t captured in Bugzilla will come out of this conversation.
Sometimes enough information will come out that you will all see that the bug is clearly due to incorrect expectations or incomplete documentation. These are easy bugs to resolve. Sometimes you’ll get that a bug is clearly a bug and through your conversation you will find out exactly what you need to do to fix it. These are also pretty easy to deal with. Another kind of bug that can end up here is one that turns out to be an enhancement request. I’ll talk in-depth about how to deal with enhancement requests another time, but if they come up in the meeting it’s best to just move on to the next bug. If you don’t you’ll end up at a mini-triage meeting trying to decide the priority, whether or not it’s an enhancement request and so on.
At the end of the meeting there should be a list of developer action items for you to take care of and a list of QE action items for them to take care of. Have these meetings as often as you can in order to drive down your list of bugs. Remember you’re trying to show up at the bug triage meeting with the shortest bug list possible and this will drive down your list very quickly.
I can derive another relevant example from the first iteration I wrote about in part two. Remember that QE had opened a huge number of bugs in a very short time period on my deliverable. I was overwhelmed by the incoming load and I couldn’t just mark all the bugs “won’t fix”. If QE wasn’t riled up already that would be the thing to do it. Most of the bugs came from two people in QE so I had to get them into a room with me, if not to talk to them then just to keep them away from their keyboards so they couldn’t open any more bugs for me.
I went in to the meeting with a print out of every bug I wanted to discuss. Each one was related to the feature that I had just delivered. I started the meeting by saying, “John, I understand you have some concerns about this feature. It seems like there are some things that are preventing you from testing further into the feature and I would like to know what I can do to help you get further along”. Notice how I didn’t say anything about “Please stop opening bugs or I’m going to quit my job” or “I hate you QE guys so much that I’ve written a thousand more bugs that you’ll never discover” or even something nicer like “There are an awful lot of bugs against this feature and I’m not sure some of them are valid”. Everything I said was phrased in a way that conveys my want to help QE. And I really did want to help them. Helping them helps me. There is no rule that says developers can’t help QE and if there is a rule that says that it’s wrong. Dev./QE is a two way street.
After that we went through each bug. We discovered that some of them were duplicates of each other. That gave QE the action item to close each duplicate bug. Some of the bugs were discovered to be due documentation that wasn’t explicit or detailed enough. Those bugs came into my action item list to fix the documentation. I asked the QE guys what documentation details would have helped them navigate through the steps safely so that I could use it in my resolution.
Some of the bugs were because the user tried to set up the software on an unsupported platform. You can’t use SuSE 7 and Perl 4 when the supported platform is Red Hat ES 4 and Perl 5. Those bugs become action items for QE; retest those bugs on supported platforms.
The most interesting bugs were real bugs caused by unexpected QE needs, but needs vital to their testing. Getting a firm grasp of what is causing these bugs is important and hopefully you can gain enough information that you can propose a fix in a relatively short period of time. Those bugs became action items for me to resolve.
In creating these action items it’s also important to provide a time table as to when QE can expect to see the resolution. Talking about the bug is one thing, but the goal is to drive down the bug list and the only way to do that is to actually fix the bugs. For my action items I told the QE guys that they could expect to see a fix in the next patch release they get from development and I made sure to deliver. Setting joint expectations and then driving toward them is a recurring pattern here and your consistency in doing so is important to your success.
To some people it might seem that a lot of what I’ve written about here is common sense. The condensed version of this text might be “go talk to QE”. But surprisingly there are exceedingly few people that actually do it. QE, or any other group for that matter, is a mysterious team with bad intentions that lives on the other side of the fence we through our software over. Working with QE over Bugzilla and email is like a 2400 bps modem. You just don’t get what you need over it. Software developers are typically even more prone to this reclusive behavior due to the kind of people attracted to the field. I struggled with the same social problems for years, usually with less than satisfactory results. Eventually I found a way to work with others that works well for everyone. I had to struggle against my strongly introspective personality and force myself to communicate verbally, in person, with people around me and affected by my work. Why? Because it didn’t seem like anyone else was doing it and there had to be a better way to deliver high quality software that people wanted to use.
By paying attention to these things the quality of my technical work improved and my visibility to people both on and outside of my teams skyrocketed. I’ve made an effort to improve how I work with others and in doing so I’ve made a name for myself. Possessing superior technical skill is of little use if no one recognizes it. Think about that for a minute. You can be the greatest programmer of your generation and you’re not going anywhere if you can’t convince others you’re worthwhile.
Oh yeah, what effect does this have on the bug triage meeting? If you’ve read this far then you know you’re showing up loaded with ammo. Every bug you talk about is going to be important because you’ve eliminated all the small ones along the way. Imagine being in a room with your peers and a cross functional team where every word you say is inherently more valuable because you’ve done your homework. You’ve put in the effort to make this meeting more valuable for everyone else that attends. You’re not wasting their time by even looking at bug 34987 that only had a minor impact on documentation. You’re in front of an audience that wants to discuss things that matter, business or technical.
No one wants to get up in the morning and think, “I want to sit in a bug triage session all afternoon today and again tomorrow”. In this meeting you’re able to talk about every bug as it comes up. You know the issue, you know the root cause and you know what kind of scope it takes to fix the problem. That’s the kind of information that’s needed at bug triage sessions. That information helps to determine the severity and priority. It let’s people decide whether or not the bug needs to be fixed right now and that is exactly the point of bug triage meetings. People will notice that you’ve done your homework and they will appreciate it. It will get you noticed. People will realize you have in-depth technical skill and you can work well with others which almost guarantees a position as a successful technical lead and it will open up a lot of other doors in your career. What, you just wanted to make the bug triage meeting shorter?
Well, like software this paper isn’t perfect. I’m certain there are missing topics, incorrect statements and gaping holes in logic and arguments. There are entire books written on this subject and I don’t have any delusion that I’ve covered even one one-hundredth of 1% of the social issues that hold our industry back from consistently releasing high quality software, on time, on budget. But, like I do for my software, I keep a list of known risks for this paper and I’ve decided that it’s finally time to ship it, bugs and all. Congratulations on reading the entire thing and bigger congratulations for just skipping to the end if that’s how you got here. If you’ve found this to be worthwhile please submit it to Digg, Slashdot, StumbleUpon, del.icio.us or whatever other social network you use. Also please leave your comments and discuss what I’ve written. Feel free to add topics you think would be interesting.