Wednesday, July 28, 2010

On the evolution of software projects

One of the things that frustrates a junior developer that browses through a large software project's codebase is the variety of approaches to solving the same problem. From the first glance this seems to be a bad thing - we don't like confusion, all these many ways require separate treatment when we are to change things and so on. Could it be different however? After many years of working in different projects I have a feeling that it's the way of life and the best we can do is to learn to live with this fact.

Why does it happen? Why there should be more than one way to handle the same thing in the same project? There are several reasons for this. First, there is lack of communication. When the project becomes large, people split into teams and talk less to each other. Even in the same team, we don't always ask our neighbor every small question we have. On the contrary, we love to solve problems and we take pride in solving them ourselves. NIH - not invented here.

Second, we all have our own opinion. Even when we know there is a solution, we don't always like it. We think we know better. Sometimes it is even true. This is how competing solutions are born. If both are valid, they start to recruit followers. Often the division is drawn along the teams' boundary. Two teams each have their own "methodology" and argue that their approach is better than the one the neighbors use. I like to view this silent "competition" as a battle for survival between species. It promotes mutations and evolution. If one of the approaches turns out to be superior to its competitor, teams boundary doesn't help and there start to appear "dissidents" who assume the "neighbors'" views. In the end one of them extincts.

Finally, there is a third and the most interesting reason for the "disorder". It is a planned transition. The problem with large projects is that it is practically impossible to carry out a replacement for an existing approach in one giant leap. Even when everyone agrees that a certain technology (or simply a way of doing things) should be replaced by a better one, the switch could not be done overnight. Especially when there are persistent artifacts that have to be taken care of, like files in a certain format or a database structure. So we enter a transition period when the two technologies co-exist.

There are several interesting points about such transitions. First, they take time. And when I say "time" I mean a lot of time - sometimes years. This itself causes an interesting phenomena. Sometimes, if the transition is not pushed hard enough, a second wave starts before the first one has finished - a third approach is born and then the three of them co-exist. Since I tend to stay in the projects long enough, I sometimes find myself in a position of a historian who teaches the newcomers why there are so many ways of doing the same thing. Unfortunately, many times the transition is never finished. Either its initiator leaves the project and no one picks up the glove or the ratio of cost to gain is too high. No one wants to do the dirty job of cleaning up.

Second thing about the transitions - and I shall again refer to biological analogies - is the self-preservation of the technique being eradicated. Let me explain. Most of us program "by example". We look for another place in the code doing similar thing and mimic the solution implemented there. Now, if there are sufficient places using the old approach, it continues replicating itself like a virus. Sometimes I found myself wishing there would be a way to magically mark all existing places with a comment "don't copy me!"

So what should we do? How do we cope with this? As I stated in the beginning, the most important thing is the acceptance. Unlike single-person university projects, the real projects are diversified. They are written by a bunch of people with different views, they evolve over time and they do not transform with a wave of a wand. That's one of the reasons I like them!