Thursday, January 21, 2010

Why COBOL Is Bad For Your Health

Before you think that I'm going to continue one of the eternal developers discussions like Windows x Linux, or C# x Java or even OGL x DX, I'm not. COBOL is a useful language and will remain that way for a very long time. It has and keeps serving its purpose, which is to be a language targeted at non-programmers, mostly business analysts, with very few or none programming knowledge whatsoever. What I'm about to state here are the deficiencies of COBOL: being business oriented has its cost and COBOL pays dearly for it. Also appreciate that I have good knowledge over COBOL, Mainframe, and Batch architecture. However, I was groomed in C/C++ and specialize in distributed systems, so I have a reasonable understanding of both worlds. 

Recently a fellow in my team asked me why I hated COBOL so much. To keep things short, my answer was that I did not hate COBOL at all, but I thought that there were better languages which could do COBOL work better; I stated that COBOL syntax might be easy and simple, however, COBOL programs are semantically obscure and can often lead very bad algorithms. I'm now going to explain why I think that.
Remember that I'm not a doom-sayer. COBOL isn't dead, nor is it going to die. It has its purpose, and it does it well enough. People will keep learning COBOL for a long time now, and many enterprises will continuously grow their mainframe platform. 
So, without further ado, let's give you my reasons why I believe that COBOL is bad for your health:


I. All Variables Are Global
It probably goes without saying (at least to any weathered, non-COBOL programmer), that you shouldn't use global variables in your programs. Globals are bad for your health because it's hard to predict their value, the reason for that beings that every single instruction inside the program might modify it. 

If your program has less than two hundred lines and variables follow good naming rules and if you're not using redefines, maybe you can find out where the variable is accessed and predict its value. But in a world where the average COBOL program has way more than ten times that number of lines, you are in for a very hard "mind compiling" experience.

Moreover, there's a have a side-effect of variable cramming. Whenever a programmer needs to extend or fix the source of a COBOL program, instead of using the variables that already are there (since he can't know where the given variable is accessed), he declares a new variable. The effect of this is that the source end up having more variables than effectively necessary and gets even harder to read. Add to that the fact that COBOL doesn't allow variables to be declared within procedure code (like old C), and now you have this programmer hell: many variables whose declaration are very far on the code from their use spot, which means a lot of scrolling up and down the source.

II. Variables Aren't Type Safe
Type safety is a very complex subject. Many languages that are perceived as type safe actually aren't -- C/C++ can cast anything to void, and void can be cast to anything -- but COBOL goes way beyond that when it gives programmers REDEFINES. REDEFINES allows anything to be seem as a different type at compile time, and is, in many ways, a cast. One can argue that C/C++ presents us with a similar structures with unions. However, for some reason, C/C++ programs hardly ever use unions, preferring to have a bytestream that is then copied to a new instance of a certain type.

Besides, COBOL also have "untyped" variables, called group items. Group items in COBOL are similar to  C structs, being a definition of a group of variables that are aligned together in the memory. However, in COBOL, those group items doesn't have a defined type, and the compiler allows that any date be moved to such group or from the group. There is no runtime boundary checking as well, so you can easily overflow the area. What COBOL does for you instead is area truncating. It's completely left to the programmer the responsibility of knowing the types fit.*

II. Variables Aren't Really Typed
This one will probably be the most polemic point here. COBOL use a typing system that includes mainly two types of variables, numeric and text. Numeric variables can be of COMPUTATIONAL type, which means that they allow numeric data but such data is stored on a different way -- compacted. The first criticism here is that, for a language called 3GL, COBOL, exposes a lot of the underlying implementation to its programmer, which has to know the differences between compacted COMPUTATIONAL data and "common". Of course, there are historical factors that led to this implementation, namely, the fact that storage was way more expensive when COBOL was conceived. But using this as an excuse only proves that COBOL is obsolete and should be dumped.

SECTION B - Code Structure

I. Where You Write Your Code Matters
COBOL still inherits a lot from punched-card days. In COBOL, code can only be contained between column 8 and 72, and column 7 is reserved for "indicators", that can help you inform the compiler that the following line is a commentary or a continuation from the previous line. Add to that the fact that some commands need to start on what COBOL calls AREA B. Area B starts at the column 11. This means that you have only 61 characters to input commands, which are very long in nature already (you need a least 11 characters to write an attribution, for example). And remember that variables tends to have lots of prefixes and suffixes, because the COBOL scope member operator OF is never used.

II. Periods Are Both Scope And Statement Terminators
Another one of COBOL strange behaviors that will make you shiver. In COBOL, you can finish statements with a period ("."). You can, because most of the time you don't need to. Most of the time, because sometimes they are necessary. Already confused? Well, it gets worse. In COBOL, you also close scope with a period. So, if you begin an IF construct and stick a period just after the first statement, the scope is terminated and whatever comes afterwards is considered outside from the IF. 

So, if you decided to stick periods after all sentences, you can't. So you decide to abandon periods, and be on the safe, never ending a loop or scope accidentally... but, just like we said, you can't.  

III. Idiosyncrasies
COBOL is a champion when it comes to idiosyncrasies. For example, assignments in COBOL are written as MOVE variable TO variable. Moving is usually conceived as taking something from one place and putting it somewhere else, but assignment works by copying the value of a certain variable to the value of another one, and that's exactly what the MOVE operator does in COBOL.

In sum...
There are a lot of reasons why COBOL should be avoided at your enterprise. Sure, you can have a person trained in COBOL in less than a week, but how long will you take to remove bugs from his code? How many bugs will appear in the future? COBOL is a counter-productive language, that encourages bad developers to write bad code.

* COBOL has evolved during the years, and so have the compilers. I wouldn't be surprised if there was a compiler directive that allowed such checking to be made, but I must say that I never saw anyone using it.


  1. "Section A
    I. All Variables Are Global
    It probably goes without saying (at least to any weathered, non-COBOL programmer), that you shouldn't use global variables in your programs...

    This simply is not true for modern COBOL dialects. It is an issue with older versions of COBOL and there are work arounds even then.

    "Besides, COBOL also have "untyped" variables, called group items. Group items in COBOL are similar ..."

    This is doubly misleading. First, C is not type safe either. C structures are just lumps of memory which the compiler helps you understand by vaguely aligning variables into it. The simple fact that C arrays are actually pointers breaks any notion of type safety. Indeed, one could strongly argue that the requirement to use pointers in C makes it less type safe than COBOL where they are a option if you need them.

    Secondly, Managed COBOL (for example, Micro Focus COBOL for .net) is fully type safe with generics and type inference. For example, I wrote a post which covers the use of generics and type inference for interoperation between COBOL and F#.

    "II. Variables Aren't Really Typed"
    This is true for native COBOL. It is also true - yet again - for C and other similar languages. In C I can, if I put my mind to it, store character data in an integer variable. This really is not a COBOL issue, it is a unmanaged language issue.

    Again, variables are typed (if you want them to be) in managed COBOL. Here I posted on using typed tuples created from generic object factories - in COBOL - for example.

    Section B
    "II. Periods Are Both Scope And Statement Terminators"

    Yeh - kind of a pain isn't it (nothing is perfect). MF COBOL has pretty much fixed this though. Now, if you really want a pain – how about forgetting to put a semi-colon at the end on a class definition in a C++ header file? It will work sometimes and not others and the compiler will never give you a useful error message!

    "III. Idiosyncrasies"

    Why is 'move' especially idiosyncratic? Why should equality imply assignment for example? If you really want idiosyncratic how about a language in which *
    means both 'pointer to' and 'de-reference pointer'!

    char* x;

    Coming from FORTRAN to C, this sort of thing really freaked me out. Equally (IMHO) in C# can be a bit odd at times: < can mean 'start generic definition' or 'less then'. And, => creates a lambda whilst >= is a logical operation. All languages seem to be idiosyncratic. Once you learn enough of them, you come to realised that it is only the first language you learn which does not seem idiosyncratic.

    Final thoughts:
    In general, you seem to have learned one language (C) and tried to understand another (COBOL) using the paradigm you used for first. What you have then done is not keep up with where COBOL is moving. I feel your pain! I would just suggest that blaming your pain on COBOL its self if incorrect; your issues are with the sort of COBOL you are using and the level of training you have had.

    By analogy, you learned to ride a motorcycle, and then used that knowledge to work out how to drive a model T Ford. Your experiences in this regard lead you to believe that all cars are hateful. If you had learned to drive a car as a car and then started driving a top end Ford Mondeo you would have a balanced view of both cars and bikes.

    Best wishes - AJ

  2. @Alex Turner

    About topic A.I, you're completely right. I forgot about the OF operator, which enables the programmer to define scope. I don't really like it syntax, but that is besides the point.

    About type-safety, I never said that C is type safe and it isn't my intention comparing COBOL to C; They are both different languages with different purposes. However, many languages have typing to structures. In fact, Object Orientation accomplishes that through the use of classes. Sure, classes can be cast to compatible types, but that still means they have a type.

    About Managed COBOL, the language extension is indeed very superior to common COBOL. I have used it very little in the past (even though I've used Micro Focus extensively with pure COBOL).

    This really is not a COBOL issue, it is a unmanaged language issue. My point in this particular topic was that in COBOL you define a variable by defining it's storage method. This is true for most languages, but that doesn't means it's good. High-level languages like Python doesn't let the programmer worry about such technicalities -- the type only informs what kind of data can be there, not it's size or formatting. I prefer to say to the compiler hey, this is a integer and let it worry about how that is stored than to say Compiler, here I want a numeric variable, which can hold at most 10 digits, and will be compressed using the packed decimal format. That's way too much information for a programmer to know when you are effectively talking about a business oriented language. Again, this is historical: when COBOL was conceived, every nibble counted. But that means COBOL is obsolete, then?

    About moves, I find them to be specially idiosyncratic because you are writing, in plain English, to move on variable to another. But that's not what's happening, the values are being copied. Other languages use a special assignment operator (i.e., Pascal's ":=") to make sure that isn't mistaken for equality. As an aside, using "*" both for de-referencing and pointer definition is also horrible.

    All that aside, I liked your summation, mainly the analogy. You are partially right, I've trained most of my life with other languages (not exclusively C/C++, though) and sometimes I look at COBOL and say, "heck, this was way easier in that other language". But I was trying to explain why I think COBOL have some big flaws. All languages does, nothing is perfect. But what is worse, I don't really see an advantage in using COBOL. One argues that COBOL requires less training than any other language. However, every COBOL programmer that I met had formal programming education. You take more time to write programs in COBOL than in any other language. It's hard to tell what a program is doing. And reuse is more accomplished through the use of copy & paste than modularization. So I'm not sure why do Enterprise Modernization with Managed COBOL for .Net if you could, instead, use C#, which was designed with the managed paradigm in mind.

  3. Usually the weird criticisms of COBOL come from people who claim no knowledge of COBOL. Here you claim knowledge, although not so somewhere else at about the same time.

    This is so much nonsense, from the idea that Business Analysts write COBOL programs onwards, through contradictions and errors.

    Is it that you can't write COBOL successfully, so you have to blame COBOL?