Large Programming Projects and their Organization

Sierpinski cube on a camel

My programs need to grow in order to become useful. It's clear now that it doesn't happen by accident. Unless you give your programs an ability to man up, they will accumulate hoops and complications until they become impossible to maintain. Their growth peaks before they become really useful.

Last time I saw someone writing about large scale software, he portrayed a directory tree listing of his project as if that mattered. The project's directory hierarchy is relatively indifferent. We could also ignore files, process boundaries, and all the other operating system-level hogwash.

Only how the program is internally organized matters. Specifically if you want it to be able to scale up, your program should consist of sub-programs in a way that:

After understanding the core part of the program, you can study just the subprogram you're interested on, disregarding the remaining subprograms the program consists of.
You can experiment with the subprograms without having to restart the whole program.
The subprograms may consists of subsubprograms, etc. Like a fractal.
The subprograms interact with each other through well-defined mechanisms.
Introduction of a new subprogram doesn't break others.
It is easy to create new subprograms and modify old ones.

To allow itself to scale up, it is common for a program to collect structures common for interpreters and operating systems. You'll get toolbars and command lists that evaluate user's input. Every subprogram goes to fill their entry to an appropriate list. That way there are furrows towards Data-driven programming.

You might question that these ideas are too generic to be useful. You could check whether these rules hold up for the programs, and study how difficult they are to improve or modify.

You might question these ideas as arbitrary. The world is littered with large programs that more or less, either have those properties or suffer from the lack. Any Linux system consists of small programs that work and can be taken up, apart, studied and tried individually. Equally emacs and blender has some of that. Gimp, krita seem to lack some qualities and suffer from that. If you like to study yourself further on this, I'd propose to look into the aosabook.

You might argue I don't have enough experience about large programs. Programs I've been writing myself have been becoming larger and harder to maintain. It no longer matters that each of them are small. I need to equally document what they do to remember that later. In a way my repository collection is one mega-program. Part-reason why I'm holding this blog is to improve my writing skills and documentation.

Filter-based programs form an exception, but they're themselves kind of subprograms inside an operating system. You often combine them to get the desired results. Internally they consists of smaller units that can be tested individually without having to run the filter necessarily.

I don't know how one would do it, but it'd be perhaps possible for an operating system to decompose large interactive programs into filter-like subprograms. I'm aware about dynamic linking. You need to contort it quite a lot to get it do this.

I don't know if this is the only way. I'd be interested to know if someone has solved the program scalability in some other way. How to know that you've solved it? If your program can scale up, you can write a kitchen sink into it and it won't sink.

Large Programming Projects and their Organization

Similar posts