Migrating Legacy Systems
A Lesson in Requirements Archeology
The most difficult part of software development is figuring out what the program should do.
This is a widely acknowledged fact among programmers but is misunderstood by clients. After all, doesn’t the client say what the program should do when they hire the programmer? The problem is that “we want you to automate the generation of these reports” or “we want you to put our products online so people can buy them,” are perfectly reasonable requests, but they’re not the sort of requirements that can be formalized into effective computer software.
An Illustration
I’d like to show how software requirements get collected by introducing you to Jerry.
He’s a logistics clerk for the Acme Products Corporation. One of his many duties is to produce a report on lead times from each supplier. Which is to say, he has to look all the orders the purchasing department makes and find the receipt from the warehouse from when it actually got delivered, and stick all that in a spreadsheet to work out how long it takes to get the stuff they buy.
He’s overworked and getting more‐so every day. Beth, his supervisor, notices that the lead time report is really taking a ton of time. She reasons that it is something that aught to be automated.
For a fraction of the salary of another logistics clerk, they can put the data in a computer and have it just do the report for her. What’s more, she’ll be able to get the report whenever she wants instead of waiting for Jerry to get around to tabulating it up.
So she hires a programmer, let’s call him Jordan. Jordan comes in and talks to Beth. She describes what she wants and gives him a copy of the last report Jerry made. He speaks with Jerry, and goes through the report to learn how every part is calculated.
Jerry also tells him that the data comes from Mary, the purchasing manager, and Bill, the warehouse supervisor. They both put sales and delivery information into spreadsheets and email them to Jerry. Except, there’s a lot of differences between the two sheets that he has to figure out.
For instance, when they order something from Acme Asia, Mary writes that down as “AA” because it’s short and Jerry knows what she means. Bill writes “ACME‐AS” because that’s what’s printed on the shipping label. But sometimes he writes “ACME WORLD” or sometimes just “WORLD” because that’s what it was called when he started working in the warehouse. Besides, Jerry knows what he means!
So Jordan takes the report Beth gave him and all the notes he got from Jerry and writes up a proposal. He also checks in with Bill and Mary to make sure he gets the same story. He figures they need a way to link up orders with invoices that’s more robust than “Jerry knows what we mean” so he comes up with a utility that lets Mary write in the information for a company in the database, and have a little auto‐completing search tool to make it easy for Bill to select the same company when he puts in the receipt.
He sends off the proposal to Beth. She circulates it between Jerry and Mary and Bill and they all like it too, so after all the lines are signed, Jordan sits down and writes the software. The little search utility works great and all Beth has to do to run a new up‐to‐the‐minute report is click a link on their LAN website.
She’s happy because she gets the report more quickly and has saved a ton of money. Bill and Mary are happy because the system’s just as easy to use as the old spreadsheet system, and Jerry’s happy because his job got a great deal less stressful.
Except, a few weeks later later, Jordan gets a phone call from Jerry. “Hi Jordan,” he says. “This lead time system you wrote, for us, I need to cancel some orders. How do I do that?”
“Cancel them?” Jordan asks.
“Yeah, there are orders in there that got canceled so they won’t ever get delivered.”
Jordan rubs his eyes. “There isn’t a way to do that currently, because it wasn’t in the requirements we all put together. I wished you’d mentioned that then.”
“Well I didn’t think of it,” Jerry responds. “It doesn’t happen that often.”
While this is a wrinkle, Jordan isn’t the least bit surprised to have recieved the call. This happens every project. No matter how much you talk to clients, you will never get all the requirements out of them. They just don’t know how to think like a computer programmer, so little details like the need to cancel orders sometimes slips by them.
Dealing with changing requirements is just part of the job. Jordan will have to come up with a way to cancel orders, get Beth to approve the new work, and figure out how to properly integrate the new functionality so as to not compromise the design of the system.
To avoid another iteration, though, he decides to ask Jerry: “now that you’ve been using it, is there anything else the system doesn’t do?”
“Oh no!” Jerry says. “Other than this canceled order thing, it’s working great.”
“Except...”
“Except what?” Jordan asks.
“Well, is there a way to say when the warehouse is closed?” Jerry asks.
“It doesn’t count weekends or holidays when calculating lead time, like you said,” Jordan responds.
“Right, I see that,” Jerry says. “The thing is, the warehouse closes on the second Thursday and Friday of the month to do inventory. Nothing that arrives then is unpacked until the next Monday, so we count those as being days off.”
“Ok, so the second Thursday and Friday of each month aren’t counted as business days for the lead time. Is that ever not the case?”
“No, that’s it.” Jerry says. “Pretty much.”
“What do you mean by ‘pretty much?’” Jordan asks.
“That’s it.” Jerry says. “It’s always like that.. Except during December, we don’t do inventory because the warehouse is too busy.. And sometimes November too.”
So the system will need a way to manage “warehouse closed days” that Jerry or Bill can use to tell the computer which days shouldn’t be counted as business days.
I could keep on going with this little hypothetical but I’m sure you get the idea. The key point is that despite these problems, this is how software development is supposed to happen. This is the ideal. We have a ‘brand new piece of software. We have everybody on board with it. Jerry’s glad it’s happening and is going to help us out. He may not be able to describe the process perfectly (I wouldn’t expect him to) but he’s doing his best. And he’s always there to answer any question.
The Legacy System
Now let’s paint a slightly different picture. This is where a lot of organizations find themselves these days. There isn’t some horribly inefficient human process that they’re trying to get rid of, where someone is still trying to battle mighty datasets with their bare hands. That got eliminated years ago.
In this version, Beth figured out in the late 1990s that Jerry’s report work is counterproductive. She went out looking for programmers to fix it, but between the internet boom and Y2K, good help was difficult to find. Besides, even having PCs on every desk was still a novelty for many people. The idea that web‐based reporting tools required serious development effort was too much of a stretch.
So Beth ended up going to a web job board and finding a guy in Yugoslavia. He didn’t spend any time talking with Jerry or Bill because.. he was in Yugoslavia. To even do a phone interview he would have had to get up at three in the morning, and his spoken English wasn’t so solid anyway.
So Beth did all the requirement work. She meant well and she’s pretty up on technology but she’s no programmer. Plus she had her plate full doing her real job. So there wasn’t any serious requirements gathering. Instead there was a long stream of email back and forth where Beth mis‐described the problem and the programmer misunderstood her mis‐description.
And eventually something came out of that process that sorta worked.
They never really resolved the problem with company names, so instead Mary and Bill share a printed out sheet of names. They also never got the issue with canceled sales resolved, so they just stick in dummy “receipts” to make them go away. And the warehouse closed day problem.. well, everyone quit thinking about that. The numbers still come out more‐or‐less plausibly even when that’s not properly handled.
And this all sounds really horrible but it is a little less work than the old way. And after ten years all the weird little workarounds become part of the routine.
But the list of things that the software should do but doesn’t keeps getting longer. Plus, every time Mary or Bill makes a typo in the company name, they get these junk lines in the report that an intern has to edit out. So while everyone has gotten use to what they have now and they are nervous about changing anything, Beth’s thinking that it might be time to fix all the long‐standing issues.
Requirements Archeology
So now Beth hires our hypothetical programmer named ‘Jordan.’ The situation is the same as before. They have an archaic system that isn’t working anymore. What’s needed is to interview the stakeholders and figure out what they need. There’s just one problem, though.
Now there is no Jerry.
Jerry retired some years back, and his replacement has never ran a report manually. He doesn’t deal with the software system either, because that’s handled by Mary and Bill and sometimes the intern Beth tasks with fixing the reports.
Mary and Bill don’t know what the report does either. I mean, they can go on all day long about all the little tricks they do to make the system work, but neither of them really knows how the reports get calculated.
What’s happened is that the institutional knowledge of how the reports really work got coded into the computer system and then forgotten by the humans involved. That’s a completely expected outcome from automation, and would perfectly fine if the computer system was well‐written and documented.
But in this case, the code is a train‐wreck. All the variable names are in Serbian. And the programmer hasn’t been heard from since Milosevic was in power.
So where do we go from here? I find that the best first step to do is look at the data. Even very old web systems tend work against a database system like MySQL or Microsoft SQL, and the schema can be extracted from that.
A schema basically a structural description of the data, like “the data consists of customers and orders; customers have addresses and phone numbers; orders have a date for when they got placed and when they got fulfilled,” et cetera.
You might not think that a schema would be particularly useful, but it’s surprising how often you can figure out what a program does (or at least, make a solid guess) simply by looking at the structure of the data. The great Fred Brooks once wrote:
Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious. 1
We can forgive Dr. Brooks for the archaic language, he wrote that in 1975. His point remains true today however, and all the more so now that we have modern database systems that require all programmers to put a schema together, and gives subsequent programmers the means to get access to it.
I find that the best thing to do after examining the data and answering as many questions as possible is to produce a provisional requirements document, distribute that at the client organization, and then write a prototype system.
I can then import the data into the prototype software and perform some of the same actions that the old software provided to make sure that the output is the same between them. If there is any discrepancy, it is often obvious what change needs to be made to make new system produce the same output at the old. If not, at the very least I have a good indicator to use to dig through the code of the old system to discover why the discrepancy exists.
At any rate, it is my experience that the source code of poorly written legacy software is at best a tertiary source of answers to questions I need answered to write a replacement.
I should add a note of clarification here. If the software is well‐written, then the source code is a wonderful resource.
Good software is written using techniques that break all the complexity up into little manageable chunks. So, for instance, if you have a program that stores addresses, there may be a little piece of the code that requests an address from the user. If the programmer is tasked with now making that software accept international addresses, he’ll go to that little piece of code and modify it to accept a country, and maybe call the zip code a postal code.
If the programmer isn’t careful or professional, though, he won’t keep everything bundled up in their chunks. In our address example, he may have a piece of code that requests the address from the user, but when he finds that the user also needs to provide a country, he’ll stick that in some completely different part of the code, perhaps in a different file, because he finds it easier at that instant than going back doing it properly. After a few iterations of this, the software becomes a mess and the source code becomes nearly impossible to read.
It is software in that state that needs replacement. So we are left with the unhappy circumstance where systems that are in most need of replacement are ones for which the source code is a poor reference. But once we have examined the data schema and built our test system, if there are still unanswered questions, the only resource we have left may be to the source code.
In such circumstances, I approach the code as obliquely as possible. it’s necessary to have a very clear idea of the question that needs answered, and a detailed understanding of how results differ from what is expected beforehand. Arguably, this part is the real archeology. One must address the code as methodically as an archeologist sifting for artifacts.
The Human Factor
I would like to close with a word on human issues surrounding legacy system replacement. I admit the above narrative is unrealistically sunny in this regard.
I mentioned Beth deciding that the system needs replaced. It is rarely the case that a person in her position would make that decision on their own, because the initial cost of development is seen as a sunk cost that “shouldn’t be wasted,” because they may fear being questioned on why the original system was ordered if it was such poor quality that it has to be replaced, and because the cost differential between replacing the old system and trying to continue using it is often exaggerated.
It’s important to keep in mind that the rapid speed in the development of the web this decade has lead many systems to become obsolete relatively quickly. Many of the poor‐quality systems were developed in the early days of the web simply because it wasn’t yet recognized that the web was a software platform where traditional notions of development apply.
It is very common for organizations to find themselves with technology stacks that need replacing. This is not a sign of bad decision making, but a sign that the organization is a progressive one that adopts new solutions as they become available.
The question of “extend versus replace” is interesting. Companies will spend large amounts of money maintaining old systems which, by today’s standards, don’t do very much and would be very cheap to replace using modern technologies.
I once replaced a PHP system that was highly insecure and had cost the client thousands of dollars a year in maintenance costs. The entire functionality of the old system took me a week of work to duplicate using Django, and the replacement system had neither the security problems nor the ongoing maintenance requirements of the old system. Yet it took them several years of self‐imposed suffering to finally authorize the replacement.
I think one problem is that when software development goes badly, customers get a false impression of all development. In our above example, Beth might have gone away from her arduous experience with the outsourced programmer thinking that the process is always a nightmare. The last thing she would want to do is go through it again.
I also find that people in Mary’s and Bob’s situation often are not entirely happy seeing even poor systems get replaced. While the old software may not be doing the job, at least it is familiar and they have learned to work around its deficiencies.
Most people dislike computers. They resent the time it takes to learn new technology. When geeks come promising new and wonderful features, all they see is a disruption of their routine and time in their busy schedules being consumed by computer training.
I’d like to say that I have an answer to this problem. Unfortunately the best I’ve managed to do is approach every user with humility and empathy, and make it clear as best I can that my job is to make their life—at least in the long term—just a little bit easier.