|
Brad: When I was finishing my Master's here at CMU, we were using a PDP-11/45 that was showing incipient senility. One week before the final demo, the RT-11 monitor stopped powering up properly and instead took to halting the machine at some incredibly non-obvious spot. This was not acceptable performance, so we scratched our heads faster and faster for about two days trying to fix it. Finally, in desperation, we single-stepped the RT-11 boot sequence, and found that it was doing a memory check that it believed was failing. It then tried to jump to a "memory check failed" diagnostic that it expected to find in memory, which of course was not there. What was there, however, was a random collection of bits that just happened to look like a jump to the original totally bogus location that we could see on the lights of the front panel. (Incidentally, we could read and write the supposedly bad memory location using the front panel). The solution? We powered up the machine with the halt switch asserted. Then we loaded in a "Return from Interrupt" instruction where the random bit collection was. Presto. By the way, until this problem occurred, we were competing for use of the 11/45 with two other groups of students. Since they all gave up when this difficulty hit, we had sole use of the machine until it got officially fixed. Bob ----Message 12 (993 chrs) is---- Mail-From: ARPANET host USC-ISIB received by CMU-10A at 27-Oct-82 20:08:14-EDT Date: 27 Oct 1982 1708-PDT From: Dave Dyer <DDYER at USC-ISIB> Subject: horrors To: allen at CMU-10A On a tops-10 system I was responsible for, I made a typo installing a bug fix to the monitor's file system code. The result was that for several days (until the file system began seriously degrading) a randomly selected physical block of the disk was written with a copy of the retrieval information for the system's accounting files. Another, we had installed a new memory box, which unknon to us was responding with the wrong word once in 10^8 or so operations. We ran with this flake for about a month before the bit decay was tracked down to the culprit. At that point, EVERYTHING that had been done during the bad time was "possibly" damaged, and quite a few were in fact damaged. It took about a year before the last artifacts of that episode were filtered out. ------- ----Message 13 (857 chrs) is---- Mail-From: ARPANET host MIT-ML received by CMU-10A at 27-Oct-82 20:37:13-EDT Date: 27 October 1982 20:40-EDT From: Peter Szolovits <PSZ at MIT-ML> Subject: Hacking horror stories To: Brad.Allen at CMU-10A cc: PSZ at MIT-ML My first paying programming job was to convert some FORTRAN programs from the 7094 to an IBM 360 in 1966 at UCLA. Some of these were unbelievably hairy (doing memory management within Fortran, character manipulation before there were characters in Fortran, etc.) and obscure (some of the code was in fact Fortran II code that first needed conversion to Fortran IV). The real horror was that my predecessor had been taken away by the men in the white coats, and lived in a mental hospital; so there really was no way to get any additional info on much of this code, and I had a graphic example of where my job led. ----Message 14 (2082 chrs) is---- Mail-From: CMUFTP host CMU-CS-VLSI received by CMU-10A at 27-Oct-82 20:44:03-EDT Date: 27 Oct 1982 20:30-EDT From: James.Gosling at CMU-CS-VLSI at CMU-10A Subject: Re: Hacking horror stories To: Brad.Allen at CMU-10A Message-Id: <82/10/27 2030.262@CMU-CS-VLSI> Several years ago I was doing some development work on a compiler for a language like Pascal. And like most Pascal implementations, the compiler was written in the same language and was used to compile itself. It was broken into many modules. To make a change to the compiler I would just recompile the affected module and link it back in with the rest of the modules. At some point, I took one of these test versions of the compiler and replaced the production compiler with it -- it seemed to be just fine. In fact, it was fine for quite a while. So long that this new version got onto the backups and all of the backups of the production compiler were lost. There was also the problem that the old production compiler couldn't have compiled the new compiler anyway, since the language had changed quite a lot. Well... In one of the modules that had never been through the new compiler was a piece of code that tickled a bug in the code generator. The bug was a cooperative one between one of the new pieces of code and one of the old one. What I ended up with was a compiler which I couldn't recompile because fixing the bug involved compiling a module that tickled the bug. Because of the circularity in the compiler (that it compiled itself) I was up the proverbial creek without a paddle. There was no way that I could recompile or shuffle anything to fix the beast. All backups were either of the broken compiler or had been overwritten. The solution was incredibly messy: I spent a long time doing intensive octal surgery on the object modules that I had. This was made very difficult because there was essentially no information left around to correlate program text to compiled code and because the bug caused bad code to be generated in many places. James. ----Message 15 (1169 chrs) is---- Mail-From: ARPANET host MIT-XX received by CMU-10A at 27-Oct-82 22:29:30-EDT Date: 27 Oct 1982 2231-EDT From: Larry Seiler <SEILER at MIT-XX> Subject: Bug fix horror story To: Allen at CMU-10A cc: Seiler at MIT-XX Maybe this is not quite what you have in mind, but in case it is... My most painful bug was a simple uninitialized variable (I had moved the initialization statement to a position after the first reference). This variable was a pointer, and its position in the call stack just happened to contain an address in code space. So running the program caused certain instructions in a different procedure to be changed into noops, with bizarre results. Loading the debugger caused the program to work correctly, by tranferring the target of the modification into an unused part of the debugger (I think). Even after I discarded my innocent assumption that the code I wrote was the code that was being executed, I still had to guess what routine was writing to code space (and by what mechanism). Total time required to fix the bug: 8 hours. How embarrassing. Why am I telling you this? Well, why not? Larry Seiler ------- ----Message 17 (1759 chrs) is---- Mail-From: ARPANET host Utah-20 received by CMU-10A at 28-Oct-82 02:11:44-EDT Date: 28 Oct 1982 0012-MDT From: JW-Peterson at UTAH-20 (John W. Peterson) Subject: Re: Hacking horror stories To: Brad.Allen at CMU-10A cc: JW-Peterson at UTAH-20 In-Reply-To: Your message of 27-Oct-82 1516-MDT In trying to learn the graphics/animation biz, I've run into a few. In making some films this summer I wound up working strictly at night, to help prevent any light from entering the room. The filming had to be completed entirly over the weekend, so it would interfere with normal bussiness activity (like turning the lights on...). Worse yet the old Bolex I was using had no way for the computer to trip it's shutter, so I had to manually press the cable release every time the computer rang the terminal bell; for several hours at a strech. Some other animation stories: Before color graphics CRT's & framebuffers were invented, the poor filmmaker had to sleep next to the camera. When the bell rang, he would wake up, change the color filter wheel to the next primary color, backwind the film all the way, and go back to sleep... Perhaps best of all is Jim Blinn's "Korean Janitor" movie. During the creation of the DNA sequences for "Cosmos", they decided to let the camera run over night, with the computer tripping it every several seconds. So the locked up the room and put a big "Filming in process: Do Not Enter" sign on the door. Unfortunatly, the Korean janitor could not read the english sign but DID have a pass key. The resulting film shows a DNA molecule twisting in space, a flood of light, and then a jerkey sequence of the janitor cleaning the room at 200mph, seen as a reflection in the screen. jp ------- ----Message 19 (1595 chrs) is---- Mail-From: ARPANET host MIT-XX received by CMU-10A at 28-Oct-82 10:50:59-EDT Date: 28 Oct 1982 1054-EDT From: Geoffrey H. Cooper <GEOF at MIT-XX> Subject: Re: Hacking horror stories To: Brad.Allen at CMU-10A cc: geof at MIT-XX In-Reply-To: Your message of 27-Oct-82 1716-EDT This is our favorite "what happens when people are taught higher level models before the lower level ones" story. I get this second hand, so some of the details might be a little off. It may not be of the sort you had in mind, but it's amusing enough to bear repeating anyway. Around here, we teach a course in software engineering in which the students are taught and write programs in CLU (a language which lets user defined abstractions work the same way that the language defined ones do). One common final project for the course involved writing an assembler in CLU. The problem statement required that numbers be input and output in octal, rather than decimal. Most of the students, I am told, defined an OCTAL abstraction, with all the normal integer arthmetic operations, and with Parse and Unparse operations that converted strings into OCTAL's and back again. This was implemented by representing an OCTAL as an array of integers, each of which represented an octal digit. The arithmetic operations simulated octal arithmetic on this representation. None of the students was apparently aware that the normal integer data abstraction that they had been using was really just stored as bits, which were more easily converted to octal than decimal. -Geof Cooper ------- ----Message 20 (1069 chrs) is---- Mail-From: ARPANET host CMU-20C received by CMU-10A at 28-Oct-82 10:57:26-EDT Date: Thursday, 28 October 1982 10:57-EDT From: Jon Webb <Webb at Cmu-20c> To: Brad.Allen at CMU-10A Cc: webb at CMU-20C Subject: Hacking horror stories Well, here it is: I was working as an undergraduate programmer at my undergraduate university, and I basically had the run of the time-sharing user interface (it was TSO, on an IBM 360/65). I decided it would be nice if you could edit lines you'd typed, like the facility in the C-shell on unix except more primitive. Well, it was a pretty trivial change to allow this, but unfortunately to be effective the change had to be installed in the system, I couldn't test it in advance. So I installed it one night, and TSO wouldn't work anymore. Very embarassing, especially as the backup method I thought would work didn't. In fact one of the systems programmers had to be called in to fix the system, in the middle of the night. I gave up on editting in TSO. This is an argument for personal computers. Jon ----Message 21 (910 chrs) is---- Mail-From: ARPANET host UCB-C70 received by CMU-10A at 28-Oct-82 11:57:13-EDT Date: 28 Oct 1982 08:55:57-PDT From: CSVAX.bitar@Berkeley To: Brad.Allen@CMU-10A Subject: Hacking horror story I was working late one night developing a file under the Unix operating system. I was in a hurry at one point, and wanting to rename the file, I executed the unix move cmd. A moment later Unix complained of indigestion, and I noticed that instead of typing 'mv oldname newname', which is Unix's way of renaming a file, I had typed 'rm oldname newname'. So Unix had executed 'rm oldname', then run into newname and vomited. I nearly did the same. Fortunately I did have a backup copy of the file, which I subsequently re-editted, bringing it up to date. After that incident, though, I was very careful about slight cognitive mistakes, such as thinking 'move' (mv) and typing 'rm' (remove) instead. ----Message 22 (1801 chrs) is---- Mail-From: local user C410RF60 at 28-Oct-82 12:06:03-EDT Date: 28 October 1982 1155-EDT From: Robert Frederking at CMU-10A Subject: Re: Hacking horror stories To: Brad.Allen@CMU-10A Yourdon's book on software engineering has a few of these. Most of my really horrible experiences happened due to politics or manufacturer's screw-ups. (Example of first): CWRU was building a network, and had to pick between DEC and Harris computers (Harris one because one of their VPs was a trustee at CWRU - they were clearly inferior machines). Besides teaching their staff how to program, we had to constantly show them that feature X was broken, and how to fix it. The project finally collapsed due to their crufty machines. The operating system was *not* virtual memory (altho user space was), and while adding networking software to their OS, they ran out of room. "Sorry". (Example of second): in trying to microprogram Intel's hack-of-a- bit-slice-machine, you had to fit your instructions into a 2-dimensional address space! Some instructions could only branch in rows, others only in columns, yet others only to specific clusters of locations. It was clearly a hack to cover running out of instruction bits. They even had to sell a program designed to find a fit for your microcode to the available space (I think the problem is NP-complete - 2d bin packing). The best example is the interupt disable instruction on the 6800. If the least significant bit of the *preceding* instruction is 1, the whole processor hangs when you try to disable the interupt. Also, some of the illegal opcodes (which aren't masked out) will cause the processor to hang so badly, it can't be reset. You have to turn it off, and wait for the dynamic RAM register to fade out! Bob ----Message 24 (1536 chrs) is---- Mail-From: CMUFTP host CMU-CS-Speech received by CMU-10A at 28-Oct-82 14:51:46-EDT Date: 28 Oct 1982 14:47:27-EDT From: David.Cunnius at CMU-CS-SPEECH at CMU-10A To: Allen@CMU-10A Subject: Hacking horrors The old 15-311, Software Engineering Methods, will probably be one of the more fertile sources of horror stories. The semester I took this course, Spring '80, one of the tasks was a database implementation for a science- fiction wargame. Looking back now, I think our project group was doomed from the start. Of the original five-man team, one dropped the course before anyone else even met him, one had to take some time off to deal with a family crisis around mid-term, and one simply disappeared for a period of three weeks, coming back without even a memory of where he'd been. Despite all that, we did get something together for the final demo. We were using a modular design and had divided the task into thirteen subtasks. At the demo, four of the thirteen modules worked properly, two that had tested out perfectly the previous day didn't work at all at the demo, and most of the other seven hadn't even been coded yet. Of the four modules that worked, the most impressive one was the display package; unfortunately, that was also the only module which was optional in the original specification. Two of the members of the group somehow managed to pull 'D's as our final grade; to this day I haven't had the nerve to ask the other two what their grades were. Dave Cunnius (dac@CMU-CS-Speech) ----Message 25 (2873 chrs) is---- Mail-From: ARPANET host Washington received by CMU-10A at 28-Oct-82 16:18:31-EDT Date: 28 Oct 1982 1318-PDT From: Bob Bandes <JUGGLE at WASHINGTON> Subject: Re: Hacking horror stories To: Brad.Allen at CMU-10A In-Reply-To: Your message of 27-Oct-82 1416-PDT As a senior project when I was going to school at UC Santa Cruz, I put together a real-time voice controlled operating system. The entire thing was written in assembly language on a PDP-11/32 running RT11. Since this was a single user system with a fixed disk, it was necessary to make a tape backup at the end of every session. Well, after one particularly furious day of hacking, I decided to write my backup tape and go home for the day. My normal procedure was to mount my backup tape and use ROLLIN to copy an entire disk-image to the tape. Unbeknownst to me, the procedure that I used had the effect of first initializing the tape before making the backup. This had always worked just fine. But on this particular day, I had been working on my disk I/O routines and apparently had somehow managed to write garbage on some unknown portion of the disk. I had no idea that anything was wrong as I went to make my backup tape. As usual, first the tape was initialized, then, as ROLLIN began to write the disk image, the program hung! There I was with no backup tape and having major problems making a backup. My next move was to panic. After settling down somewhat, I tried rebooting the operating system and making the backup again. Still the same problem. Then I remembered about the DECtape drive on the machine. If I could only find a DECtape and manage to individually tranfer the files that I needed I would be home free. I ran over to the cabinets and began frantically looking for DECtapes. AHA! I found one! As I ran back over to the computer, I took a bounding step and landed on the side of my ankle. I proceeded to lie on the floor writhing and screaming in agony for the next fifteen minutes. "This just isn't my day," I was saying to myself. When the pain began to subside I tried to get up. I couldn't walk on the ankle since it hurt so much. So I hopped over to the DECtape drive and mounted the DECtape. Then I hopped over to console and sat down. At least something went right that day, as the machine allowed me (without hanging) to individually transfer all my files to DECtape. I then read a clean version of the operating system onto the disk and proceeded to tranfer all of my files from DECtape back onto the disk. This time all went normally with the magtape backup and the world was safe again for future hacking. Fortunately my ankle wasn't broken. It was only severly sprained. For the next few weeks I was forced to do my hacking with an ace-bandage wrapped around my ankle. --Bob Bandes ------- ----Message 29 (721 chrs) is---- Mail-From: ARPANET host UCB-C70 received by CMU-10A at 28-Oct-82 23:30:51-EDT Date: 28 Oct 1982 20:26:51-PDT From: Kim.norvig@Berkeley To: brad.allen@cmu-10a Subject: Re: Hacking horror stories Lucky for me, most of the stories I remember are happy ones, not horror stories. My favorite story about someone else is when Jim Meehan was writing TALESPIN, his AI program that generated stories, mostly about birds and bears running around the forest. One story started off fine, then started to slow down, and finally ended with the line Joe Bear thinks that FREE STORAGE IS EXHAUSTED Oh well, @b(I) thought it was cute. Can I be put on the mailing list to see your collection of anecdotes? program to ----Message 33 (1413 chrs) is---- Mail-From: ARPANET host MIT-MC received by CMU-10A at 30-Oct-82 16:38:45-EDT Date: 30 Oct 1982 1635-EDT From: RG.JMTURN at MIT-OZ at MIT-MC Subject: Re: Hacking horror stories To: Brad.Allen at CMU-10A In-Reply-To: Your message of 27-Oct-82 1832-EDT The experience that still makes my skin crawl is the time I was debugging some Lisp Machine board at the MIT AI lab. I had spent several hours trying to isolate a noisy signal which seemed to be tied to another one, but I could not find a common wire and I had replaced all the common chips. In desperation, I pulled out the the board and yanked the extender, about to give up hope. As I stared down at the extender, I muttered some curse to the designers of the machine...and noticed a solder splash on the extender shorting two lines! For ghu's sake, if you can't trust your tools, what can you trust. On the other hand, for an example of the other extreme, this week, I was in Montreal doing an installation for Lisp Machine, Inc. A crufty Bus Interface seemed to be making the machine go 1/2 speed, and sometimes fail entirely. The person I was working with and I decided to call it a day around 5, and go to our hotel. When we came back the next morning, the machine worked perfectly. The best we can figure it, the machine wanted us to be able to have a night in Montreal, and the afternoon the next day... JAmes ------- ----Message 38 (1003 chrs) is---- Mail-From: ARPANET host UCB-C70 received by CMU-10A at 1-Nov-82 23:02:54-EST Date: 30 Oct 1982 03:44:28-PDT From: CSVAX.fishkin@Berkeley To: Allen@CMU-10A Subject: painful hacks Hi there, My name is Ken Fishkin, and I'm a grad at Berkeley. My most painful hack occured while hacking a 6K line C database program at the University of Wisconsin-Madison as an undergrad. My program worked perfectly, with all debug prints on. When I set my 'const' debug to false, however, the program would crash! To make things even more fun, if I deleted 1 debug print the program would still run correctly, but if I deleted another instead it wouldn't! I wound up doing a sort of tree traversal, individually deleting some 200! debug prints individually, finding the proper sequence of delete-compile-delete that would keep my program intact. To this day, I still have no idea what was wrong with the program. If possible, could you mail me your final collection of horrible hacks? Ken ----Message 40 (1981 chrs) is---- Mail-From: ARPANET host CMU-20C received by CMU-10A at 2-Nov-82 11:29:15-EST Date: 2 Nov 1982 1128-EST From: MASON at CMU-20C Subject: horror stories To: brad.allen at CMU-10A Many roboticists have reported the following demo problem: when filming or demonstrating, we often raise venetian blinds, turn on the lights, or bring in floods. The increase in ambient light may cause optical-interrupt type sensors on the robot to stop functioning, and the heat from floods may affect other components of the system. Thus a system which has functioned flawlessly for months begins to malfunction the very minute the generals arrive. Real-time programming has its special frustrations, but the most difficult bugs arise from difficulties in the timing of process interactions. Most of these are too complicated to make good stories. One of the most confusing PDP11 bugs I had may be worth telling. When a byte is pushed onto the stack, the stack pointer is first incremented to keep the pointer at word boundaries. Hence the odd byte is garbage, left over from no-longer-active stack frames. I had a program which pushed a byte, but popped a word, thus accessing this garbage. Even careful inspection of the code didn't turn up this violation of stack discipline. The worst part is that the manifestation of the bug would vary depending on which process last used the stack. In particular, the bug became invisible when single-stepping with our symbolic debugger---the debugger (im)providentially cleared the relevant byte in the act of saving some registers. This reminds me of another PDP11 bug. Our 11/40 had a micro-code error. The SOB instruction (subract one and branch, used for simple loops) didn't test the TRAP bit, which is used by debuggers for single-stepping. Hence, when single-stepping, the programmer was not shown the instruction following the SOB. It was executed "in secret", with very confusing results. ------- ----Message 32 (621 chrs) is---- Mail-From: ARPANET host MIT-XX received by CMU-10A at 3-Nov-82 15:20:23-EST Date: 2 Nov 1982 17:19:35-EST From: jfw at mit-vax at mit-xx To: allen@cmu-10a Subject: Programming horror stories Two summers ago, while I was working on an improvement to our UNIX at LL-ASG, I fired up a test version a little too fast, and watched with puzzlement as the filesystem check program started printing out random things. I wound up killing a 100Mb filesystem full of useful things. After 2 weeks of poring over the code I wrote which did that, I found the bug: " = " instead of " |= ". One character did all that... ----Message 37 (1934 chrs) is---- Mail-From: local user C410MS40 at 4-Nov-82 00:37:41-EST Date: 4 November 1982 0036-EST (Thursday) From: Mark.Sherman at CMU-10A To: Brad.Allen at CMU-10A Subject: Re: Hacking Horror Stories Message-Id: <04Nov82 003626 MS40@CMU-10A> As an undergrad I worked as a systems staff on a time sharing system that resembled Multics (called DSL/TSS - think of it as Unix on HP21 series machines). On such systems, the login program is like any other program; when a user sits down he "calls" this program from a predefined file system path to gain access to the system. For some unrememberable reason, I had to make some modifications to this program, did so, and installed the new version. The only real way to try this program out was to log out and then log back in. Having logged out, I tried to log back in. To my chagrin, I had accidently set the protection on the new login program to read instead of its normal read-execute. Thus the system refused to run the login program. By S.O.P., this would not be a problem - when doing such a drastic change, we always made sure that at least one other systems programmer was logged in so that he could patch anything that was necessary, like changing access control on the login program. Before my attempt to change the login program, there were two other systems programmers logged in. After my mistake, I walked over to the two other staff people only to find that they had both logged out - after all each knew that the other was logged in and so saw no reason to stay on as the "protection". Thus there was no way to log into the system and no way to patch it while it ran. We had to move the system to a spare disk, boot a backup system, bring up the extra disk with the file system containing the bogus protection as a "raw" disk and use a special disk utility to set the one necessary bit giving execute access to the login program. Mark ----Message 38 (3657 chrs) is---- Mail-From: ARPANET host CMU-20C received by CMU-10A at 4-Nov-82 01:40:45-EST Date: Thursday, 4 November 1982 01:39-EST From: Skef Wholey <Wholey at CMU-20C> To: Brad.Allen at CMU-10A Subject: Horrorful horrors CMU's 15-311 is indeed a source of horrors, and I experienced a rather horrible in that class last year. There were five of us in our group, which we called "SPAM", each of us competent hackers. Our project was a 68000 simulator and debugger, which would run 68000 machine code and let you look at registers and memory and so forth. Our work progressed on schedule (with the aid of many all-nighters), and we were able to run simple assembly language programs just about a week before the demo. Being a rather noisy bunch, wanting our demo to be as slick as possible, we decided that we'd run a backgammon program written in C compiled with cc68. We had used small programs compiled with cc68 to test the simulator. The programs were small enough to compile and assemble on a Vax, print the hex object code, and type it into file which we would load into our simulator. The backgammon program was too large for this, obviously, so the object code was FTP'ed to another machine, put on tape, and brought to the Computation Center, where we pulled it off of tape and loaded it into our simulator. The program didn't work. It didn't work the day before the demo. We found a few bugs in our simulator, but worst yet we found bugs in the cc68 compiler, now N machines away. Fixing these we found bugs in the game playing program itself. Compiling the program on the Vax and transporting the object code was out of the question at this point -- too little time left before the demo (we had all announced that we'd appear in coat and tie). So we ever so carfully patched the hex files, and voila! The program ran beautifully. That year Comp Center gave each undergrad who needed a computer account an account on each undergrad machine (TOPS-D and TOPS-E). These machines were on Comp Center's DECnet: not a reliable network at that time. We had the current version of our system and the patched hex files on TOPS-D, because the load was lower there that night, but were scheduled to demo on TOPS-E terminals. DECnet was, of course, down for quite a while, but finally came up. We quickly transferred the current system to the E and ran back to our rooms or homes to shower and dress. We marched triumphantly into the terminal room and sat at our terminals while our SPAMmascots fed cookies to the waiting crowd and our professor. The system came up fine, and we demonstrated how to deposit into and read from memory and registers before moving onto the demo programs. We loaded the hex files, set breakpoints at our test locations, and lo! IT DIDN'T WORK. We were all somewhat bummed and embarrassed, and managed to muddle through at the mercy of this mysterious adversary that had destroyed a system that worked an hour before. The professor suggested that we get our act a little more together and have a somewhat less flashy demo in his office a few days hence. The problem: we had neglected to copy the patched hex files from the D to the E. We were demoing buggy 68000 code. The second demo went a bit better. We now laugh about the first. Comp Center no longer gives out accounts to one student on more than one machine. Good idea. --Skef [What be your motive for knowin' this stuff, eh? Doo ye like to feed on stories o' suffrin'? Are ye writin' a book? I enjoyed reading those sent to you so far and enjoyed sending you this one. Good topic.] ----Message 39 (1236 chrs) is---- Mail-From: CMUFTP host CMU-CS-VLSI received by CMU-10A at 4-Nov-82 09:40:16-EST Date: 4 Nov 1982 8:36-EST From: Ed.Frank at CMU-CS-VLSI at CMU-10A Subject: Hacking horror stories To: Brad.Allen@cmua Message-Id: <82/11/04 0836.841@CMU-CS-VLSI> While working on the software for a Graphics terminal we built at Stanford, I ran into the following problem. The software was written in assembly language, and was burnt into EPROMS. For a long time the software easily fit in four 2708 (1K x 8) EPROMS. Well, one week after adding the graphics support code to the terminal, I simply could not get it to work. I spent literally dozens of hours going over at most 500 assembly language statements, to no avail. Things were so bad in fact that I seriously began to question my abilities as a programmer. One evening while I was checking the output of the assembler (for at this point I was convinced it was an assembler bug) I noticed that that one of the target addresses of a jump was greater than FFF (hex). I didn't think anything of it, until a few seconds latter when it occured to me that addresses > 4K required 5 proms. I quickly went back to work, burned the extra eprom, and the program worked perfectly! Ed ----Message 40 (731 chrs) is---- Mail-From: local user C410RK40 at 4-Nov-82 09:58:20-EST Date: 4 November 1982 0955-EST (Thursday) From: Richard.Korf at CMU-10A (C410RK40) To: Brad.Allen at CMU-10A Subject: hacking horror story Message-Id: <04Nov82 095535 RK40@CMU-10A> Brad, My favorite bug of all time concerned an ASR35 Teletype. I was trying to format some output and found that directly after printing a long line, the second line was indented by one space. Naturally, the bug went away when I ran the debugger. It finally turned out that the printing head was physically bouncing off the left hand stop. If it didn't have to print again immediately, it would have a chance to settle back to the beginning of the line. -rich ----Message 41 (1799 chrs) is---- Mail-From: local user C410SS40 at 4-Nov-82 11:42:32-EST Date: 4 November 1982 1134-EST (Thursday) From: Steven.Shafer at CMU-10A (C410SS40) To: brad.allen at CMU-10A Subject: Horrors! Message-Id: <04Nov82 113429 SS40@CMU-10A> Brad -- I had a nasty experience with an old PDP-11/40E running UNIX. I had written a program which juggled several processes, one of which was the largest core-image of any program in existance on the machine (<64K, of course). One day, it died a sudden death. I started tracking it down with print statements. At first, the problem looked like something being set to 0; then, as I added more debugging code, the 0's jumped around. I never knew which routines they would crop up in, or whether global data structures were affected, or even if code itself was being overwritten. Sometimes, the program would die even though the debugging code showed nothing extraordinary. I eventually gave up and rewrote the program from scratch, using smaller processes and succeeding. Several months later, a paging bug was fixed: it was responsible for writing 0's on pages when the core-image of a process was beyond a certain length. What makes this a horror story is a UNIX vagary tickled by the bug: within the code being executed, there was a statement to close a file. The file, like all UNIX files, was indexed by a small integer. When the zeroes struck this variable, the effect was to close file 0, i.e. disconnect the keyboard! So, not only did the program die, but it refused to talk to me long before the actual moment of death, leaving me to watch helplessly as it writhed in agony, unable to talk to it, unable to interrupt it, and never knowing where the Flying Fickle Finger of Fate would strike next! -- Steve ----Message 43 (390 chrs) is---- Mail-From: local user C410BL50 at 4-Nov-82 12:30:02-EST Date: 4 November 1982 1214-EST (Thursday) From: Bruce.Lucas at CMU-10A (C410BL50) To: brad.allen at CMU-10A Subject: horrors Message-Id: <04Nov82 121457 BL50@CMU-10A> On Unix, I once meant to type "rm *.BAK" but instead typed "rm * .BAK". Fortunately, I hadn't made too many changes since the last backup to tape. Bruce ----Message 46 (1054 chrs) is---- Mail-From: local user C410EL80 at 4-Nov-82 14:26:58-EST Date: 4 November 1982 1411-EST From: Ellen Lowenfeld at CMU-10A Subject: Re: Hacking Horror Stories To: Brad Allen This one's kind of embarrassing, looking back on it... When I was a sophomore at Brown, I took a course which had a big project, I guess like 311 here, except that the groups were pairs. So that I and my partner could test pre-compiled code separately (IBM 370, batch mode) we each had a dummy main routine. Mine printed its name, and then called whatever routine(s) I wanted to test. Unfortunately, I left out the quotes around its name, and sent it into infinite recursion. IBM's great error message once I found it after looking in 3 manuals, and poring over pages of IEFH01X (or something like that), was "user error". Not until I had spent most of a day looking for a wizard did I go back and just look at the code I had written. Was my face red when all the people I had talked to while trying to find out the problem asked what it turned out to be! ----Message 47 (1310 chrs) is---- Mail-From: CMUFTP host CMU-RI-FAS received by CMU-10A at 4-Nov-82 14:38:21-EST Date: 4 Nov 1982 13:09:55-EST From: Neil.Swartz at CMU-RI-FAS at CMU-10A To: ba0c@cmua Subject: Horror stories Several stories come to mind. At Princeton, they had WATFIV on a 360/91. You got 2 seconds of computer time and 600 lines of output. One job came out in WATFIV that printed a line of characters and then overstruck the characters again and again. The computer counted this as one line so it would do this forever. The print heads tore through the paper, the ribbon and started in on the carriage. The system was down for more than 12 hours. Another good one which I have heard about- (If anybody knows more about this I would like to hear about it) The Phantom Teletype Program. The way it worked was this: At a random time interval the program would start up and pick a teletype on the system. It would print "The Phantom Teletype Strikes Again!!" and then it would copy itself somewhere else on disk, set up the parameters for its re-execution, and delete the old copy. System programmers could find out where it had been, but not where it was currently. Because it was too difficult to track, they left it on the system. There are lots of good(bad) stories running around. Neil ----Message 49 (2598 chrs) is---- Mail-From: ARPANET host UTexas-20 received by CMU-10A at 4-Nov-82 16:41:21-EST Date: 4 Nov 1982 1538-CST From: CMP.LSMITH at UTEXAS-20 Subject: some horror stories To: brad.allen at CMU-10A My first hacking horror story goes back to my very first programming course. My program kept exceeding its time limit and aborting. I checked my code carefully and decided it was correct, but only needed a little more time to finish. So I confidently upped my limit from 7 seconds to a CPU minute of CDC 6600 time. I was really horrified when it timed out again, blowing my entire semester's allotment. A sharp consultant found my bug. I made the FORTRAN equivalent of "FOR X = 1.0 BY 0.1 TO 10.0," with my final test an equal. Since 0.1 is a repeating fraction in binary, it never equaled 10, so it went past and on to infinity. Years later I was working on a PDP11/45 Unix system. The system began crashing some time after we retrieved something from the backup tapes, using Unix's raw mode access to the tape. In cooked mode, things worked right, so we knew it couldn't be a hardware problem. After some months of trying to debug the problem, we modified the tape device handler so that it spun and monitored its registers until the transfer completed. One of the high bits in the address register was sticking off. In cooked mode, Unix read into its system buffers in low core and everything worked because that bit stayed off anyway. In raw mode, it read into user space directly. Whenever the address register was incremented past that bit boundary, the DMA transfer would drop down and wipe out some random locations and the system would slowly collapse. The worst horror stories are when you spend days hacking at a program, only to discover that you've invoked a compiler bug. We are extremely fortunate to have the ELISP system. I had a problem with a lengthy computation sometimes returning NIL from compiled code. Between the (RETURN RESULT) in the called function and (SETQ X (CALLED ...)) in the caller, the value was being lost. Interpreted, it worked. If I traced the function, it worked. If I traced any function in a chain below it, it worked. It turns out that if you have a chain of calls about 10 deep, then a MAPCAR over a list of at least 3 values, then about three more calls down, and all the functions are compiled, then the time bomb NIL is stuck up on the stack. If any function in the chain is interpreted, for example by tracing it, then the behavior goes away. As far as I know, this bug still hasn't been found. ------- ----Message 50 (1130 chrs) is---- Mail-From: CMUFTP host CMU-CS-IUS received by CMU-10A at 4-Nov-82 21:16:47-EST Date: 4 Nov 1982 20:08-EST From: Victor.Milenkovic at CMU-CS-IUS at CMU-10A Subject: Re: Hacking Horror Stories To: Brad.Allen at CMU-10A Message-Id: <82/11/04 2008.913@CMU-CS-IUS> One version of the PL/I debugger at Yorktown had no provision for displaying the hex values of pointer variables. However, it would, on request, display the hex address of any other type of variable, as well as its value. And so, in my program, I would create records, containing a single float variable, based at the pointer I wanted to see, and recompile. By requesting the address of these records, I could determine the value of the pointer. In PL/I one can allocate an area of memory and declare offset variables into it. One can freely assign offset variables into pointer variables and back again -- or so I thought. If a pointer to offset assignment results in a negative offset, nothing complains (although it should), but if one assigns the offset back to the pointer, it gets garbage. This peculiarity caused a very tenacious bug. ----Message 51 (304 chrs) is---- Mail-From: local user C410BL03 at 4-Nov-82 21:52:38-EST Date: 4 November 1982 2151-EST (Thursday) From: Bruce.Leverett at CMU-10A To: Brad.Allen at CMU-10A Subject: Re: hacking horror stories In-Reply-To: <04Nov82 210911 BA0C@CMU-10A> Message-Id: <04Nov82 215100 BL03@CMU-10A> Don't remember. ----Message 52 (2968 chrs) is---- Mail-From: local user C425EC0F at 4-Nov-82 22:12:20-EST Date: 4 November 1982 2210-EST From: eddie caplan at CMU-10A To: brad allen at CMU-10A Subject: hacking horror stories i was doing research in the computer music lab. i was trying to generate emotional responses in subjects by producing sympathetic vibrations from the 64 loudspeakers surrounding the listening room. normally, we would add sub- and ultrasonic frequencies to classical "standards", and then play them to the subjects. now, usually we just use frequency modulation to synthesize the instruments of the classic orchestras. but one day as i was making an undergraduate volunteer retch to beethoven's seventh symphony, a thought struck me. if i changed to additive synthesis for the instruments, i could elicit REALLY BIG responses! i mean, i had been having pretty good results up 'til then, and i wasn't complaining. but, with FM there was lots of data lost. additive synthesis would make the music itself generate an emotional response. full fidelity beethoven combined with me could convert hasidic jews to catholicism! so, i spent the next week redoing the beethoven. i finished at 2:30am, and the only other person around was my officemate, dana. i asked her if she had heard beethoven's seventh recently. i told her that i had a recording of boston symphony conducted by klaus tennstedt. i still remember her eyes lighting up at the prospect. i hated to lie to her, but she couldn't be told the truth or the data would be tainted. i had to expose her to it without her suspecting. i put dana into the listening room and turned on the music with my sub- and ultrasonic frequencies added. i watched through the soundproof glass from the observation room. during the first movement, dana cried uncontrollably. she curled up in the chair and wimpered. dana laughed insanely, and had what appeared to be several orgasms. "i've done it!", i cried. but then, the second movement began. i shudder still when i think of it. i looked in at dana. she was sitting upright in the chair, staring straight ahead, her hands gripping her knees. there was blood starting to drip from her fingernails. she was becoming catatonic and starting to shake. i had to halt the processor before permanent damage was done. but before i was able to stand, dana let out an excrutiating scream. she shook violently and fell to the floor. then, dana began to float into the air. i pulled open the door and rushed into the listening room. dana was screaming far above my head. beethoven was screaming from the 64 speakers. then, i called her name. it was too much. dana dissolved. i think that the added sound of me yelling to her exceeded the threshold. i know now that i am to blame for her dissolving, and that i'm responsible for bringing her back. perhaps it can be done with bartok. dana always liked bartok. eddie ----Message 53 (2694 chrs) is---- Mail-From: CMUFTP host CMU-CS-Spice received by CMU-10A at 4-Nov-82 22:58:54-EST Date: 4 Nov 1982 22:08-EST From: Rob.MacLachlan at CMU-CS-SPICE at CMU-10A Subject: Hacking Horrors To: Brad.Allen@cmua Message-Id: <82/11/04 2208.881@CMU-CS-SPICE> I ran into my most obsure bug last summer when I was working on a boot image builder for Accent to run under Accent. What I had to do was convert the original program, which had POS filesystem calls that read and wrote random things scattered throughout it to use the Accent primitives, which are read and write an entire file. After factoring this code out into a separate module I found that the program died the same way about one time out of five. Since the debugger was virtually non-existant I proceeded to put in debugging code. First I put in a check where it was dying for the fatal condition, which would print various information. I found that when the error occured the cause was that the Pascal Get intrinsic was returning a random value instead of the correct one, but no particular pattern was observable. I then put in code to dump the contents of the pascal file object after every value read from the file to see if it was getting clobbered; with this code in place the program died with an illegal memory reference inside the system print routine inside of one of the debugging WriteLn's. At this point it was obvious that something earlier in the program was damaging the environment somehow, so I tried successively commenting out earlier parts of the program to find the offending code, and I found that if I did not read an earlier file, than the problem did not occur. This caused me to suspect my file handling module, so I put debugging code in it to check that all of the pointers it was returning were valid. When this debugging code was inserted the program then died earlier in the program, but this time consistantly during the reading of the third microcode file. Insertion of debugging code at this point revealed that to a point the buffer contained the correct data, but the rest was zero. At this point I felt reasonably sure that I had found a bug in Accent, so I called in the wizards, who looked at the address of the buffer and said: 'Oh that crosses a 64k boundry'. Evidently it was a "Known" bug that a pascal object could not cross a 64k boundry, because the address calculations wrap around, and the ReadFile routine I was calling read the file into a place in memory such that it crossed a 64k boundry. The Execution of the debugging code I put in caused storage to be allocated, thus causing the heap to cross a 64k boundry earlier in the program. ----Message 54 (1784 chrs) is---- Mail-From: local user C410TL19 at 5-Nov-82 01:22:19-EST Date: 5 November 1982 0122-EST (Friday) From: Tom.Lane at CMU-10A To: Brad.Allen at CMU-10A Subject: Re: Hacking Horror Stories Message-Id: <05Nov82 012212 TL19@CMU-10A> Well, after reading your accumulated file I felt like I should contribute one of my own. I have spent too many years of my life hacking systems which tried to enlarge a processor's address space by using software-controlled bank switching (C.mmp/Hydra & Cm* locally, Hewlett-Packard 9845 out in the real world; personal computing CP/M systems seem to be going down the same garden path). These machines extend a processor with (say) a 64K address space to handle megabytes, by dividing the processor address space into two to 16 blocks. Each block is mapped to a block of physical memory by means of an associated processor register. Accessing a particular memory location requires loading up one of the map registers with the block number of the location, then accessing the processor- visible address "register number * block size + location's offset within block". This scheme is a LOSER. The majority of bugs found in each system I have worked with have been directly related to bank switching; it's just too easy to forget to load or restore a map register. This leads to reading or clobbering semi-random locations in blocks other than the one wanted. Worse, the bugs are often very difficult to duplicate, since they only show up when two data structures being manipulated at once happen to reside in different physical blocks. HP's testing records showed that 75% of the bugs discovered during system testing were of this ilk; many of them required an unreasonable amount of effort to track down. tom lane ======================== END OF FILE ============================