• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

Your hardest/longest to catch bug

 
Ranch Hand
Posts: 310
18
MS IE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As in the thread subject. It can be any language, maybe even a non-IT life situation.

One of mine was in the times where I was still mainly coding in PHP. In the company we had a custom-made (by me) software that does complex analysis of web traffic. For some reason, after some time, some part of the analysis were obviously incorrect. We checked all the input data, everything was correct, so the problem definitely had it's source in the software.

It took quite long time to check up every module responsible for collecting, processing and presenting data. I debugged almost every variable, array, object in the system. After enormous amounts of coffee drank, I finally found out the problem.

PHP is a loosely typed language - it means that any variable can has any type of data. You can assign to a variable a string, and then assign to it an int and it's perfectly valid.

The problem lied in how one of our external libraries handled data. One of it's methods instead of returning boolean values was returning a "false" or "true" string. That was the entire source of the problem. Everytime the method was returning a "false" string, and I was checking if the result is a boolean true. In PHP, every non-empty string is evaluated to a boolean true, thus the "false" string was all the time a boolean true.

I have finally figured out it's a string when I noticed double quotes around "false" in the debugger. Boolean values are displayed without the quotes. To spot this simple difference it took me really long and bad time.
 
lowercase baba
Posts: 12871
62
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Two bugs come to mind...both in C

One was an overly-complicated printf statement. The format string had leading zeros, decimals, strings, and on and on and on...somehwere, buried in the middle of it, there was a transposed "." and "%". Since you never REALLY look at the format string, it took me a while to see it..

The other was a report some facility ran. They said it printed garbage - they had been using it for years, but now it didn't work. I got the ticket a day later, signed, on, ran the report with no problem I tried for 2-3 days, and could not reproduce the bug. Closed it out as unreproducable.

The next week it came back. Again, I spend a few days trying to reproduce it, with no luck...Told them AS SOON as they saw it again, they should call me directly.

Long story short...there was a string defined to hold a formatted date/time. If the report was run on a Wednesday in September, on or after the 10th of the month, between 10:00a and 12:59p, after we made our Y2K fixes to print a four digit year, the string was defined being ONE character too short.

 
Bartender
Posts: 9615
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When I was a lowly programming intern, I heard it said that the MS C compiler the company used was garbage because they always had to use debug settings for their code. If one tried to use the complier options to make code more "compact", the code would crash. Putting the debug settings back would make the issue impossible to reproduce, therefore, it was the compiler's fault. Since this was before the dawn of the Internet, if knowledge wasn't in one of your employee's experience or in the reference manuals on your desk, it was unobtainable. It wasn't until I finished my internship and Java came out that I realized they likely had buffer overruns in their code and the debug code inserted by the compiler was insulating them from the catastrophic consequences.
Right now we're dealing with an issue going on two weeks or so. Parsing XML produces an exception between Xerces classes in an external JAR and classes of the same name that were incorporated in the JDK in Java 5. Removing the redundant external JAR requires a cascade of other JAR upgrades or removals because of hard-coded dependencies between libraries. No idea why this is cropping up now, when we've been on Java 6 for years...
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are different kinds of bugs:

1) Bugs that take a long time to find the cause of, but once you've found the cause, the fix is quick and easy.

2) Bugs that take a short time to find the cause of, but for which the fix is very hard or practically impossible.

The first kind of bugs are usually implementation bugs, so just a simple mistake somewhere hidden in the code, like the ones that the people above mentioned.

The second kind of bugs are usually bugs in the architecture or design. They are most of the time very hard to fix, because the whole idea behind how the software works is wrong. You'd have to redesign and rewrite (part of) the software to really fix it, which is often not practically possible. The final "solution" then sometimes consists of a workaround to avoid the problem.

The hardest bugs I've seen are also of the second type. One that I remember was on a system where multiple processes were writing data to a single database in a number of steps. Under rare circumstances, if the timing was wrong, you could get a deadlock - both processes were waiting for the other one to continue.
 
Ranch Hand
Posts: 789
Python C++ Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A piece of firmware, there was a clock chip I had to read and write to. A write of a new time to the clock was fine but if you read the clock, changed the value and wrote it back, it would be off by a small amount. This would accumulate into a bad error. It was a bug in the chip. Another thing was an IDE I wrote in Turbo C to run on DOS. Every now an then it would just crash. I went over every line many times but never found the problem. It might have been the compiler. Also, I was working for a guy who needed something done and he had a bunch of bare boards for a Z80. I put one or two together and they would do erratic things. It was exacerbated because I was adapting a C compiler that I had the source to to give Z80 assembly.. After a week or two of this the guy said so-and-so could never get them working either. At that point I knew something was up with the boards. I gave one a really good scrubbing to get rid of possible microshorts and that improved it 90%. Eventually changed the platform though. Wasn't going to fight that. Another thing, same guy, he was modifying a piece of equipment in a factory, A thing used three stepper motors to move a heavy thing upward some critical amount. The motors, the controller, and the software the controller ran was all from the same company. There was feedback from the motors to give you the amount they actually traveled so you could theoretically be very accurate. He would be like it was very good today, only off by .002". Next time, he would say pretty bad now, off by .002". Come to find out there was no spec - the spec was only as good as possible. And he actually didn't know what good enough was. I think temperature was having an influence too. Expansion and contraction. That project eventually got abandoned. He was too disorganized. It does feel good now to have made most my mistakes already


 
Guillermo Ishi
Ranch Hand
Posts: 789
Python C++ Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Another time I was doing firmware at a start-up. This was a highly financed place full of ex-IBM'ers. All of a sudden a new batch of boards would act up sometimes. After a couple of days the head of engineering comes in going to fix it himself and starts going on about a code problem. I said I had looked over the code and there was nothing in it that would cause what they were seeing. It was assembly language so I had good confidence in that. He's getting red hot now and we go into the lab together and there's the two of us and about four technicians and he starts probing the board and soldering on test wires. It's a little board about 3"x3". All of a sudden something happens on the scope and he says "See there it is!" This happens a couple of more times. "Fix the code" he says. I noticed it was happening only when he held the board a certain way. I took it and started twisting it and could make it happen at will. Manufacturing problem.
 
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
At the start of my career (nearly twenty years ago) I was working on a project which included running iterations over a 3rd party program which did some simulations, and we collected statistics from the output values. We'd typically run 100 iterations, and it took approximately 12 hours to complete. And at some moment it started to crash on the 100th iteration. Of course, the first thing I tried was to reduce the number of iterations, but the bug didn't reproduce with one, two, three, four or ten iterations. Then I tried to exclude doing the actual simulations by the external program and dry-running the 100 iterations, but again the bug didn't appear. I then resorted to debugging the thing, but it meant that in the morning I've started the debug, waited till the evening, stepped through the code, found out that it wasn't what I suspected, formed a new theory about the bug, started it another time and went to sleep. Next morning I stepped through the code a little bit, found out that it wasn't what I suspected, formed a new theory about the bug... It took me several days (a better part of a week) to figure it out.

After I've identified the bug, it was easy to fix (confirming Jesper's bug classification ) and it was also easy to figure out how to reproduce the bug on less than 100 iterations (which at least allowed me to test the fix in less than 12 hours). I must admit that the details are a bit hazy after all those years, but it was one of the most intriguing bugs I've experienced.

I believe that today I'd be able to do better - I no longer remember why I hadn't put extensive logging in instead of debugging interactively, for example.

There's one more memory connected to the project. What our program did was preparing and writing to files the data for simulations, running the external program, reading output files (in text format), and collecting the statistics. I've used a profiler on it and found out that full 30% of our program was spent in C's scanf routine. And since the external program produced all its output values in a specific, unchanging format, I was able to replace the scanf with my own procedure tailored to that specific format. It reduced the time it took to parse these data by some 90%. Since then I've never managed to achieve such a dramatic improvement.
 
Martin Vashko
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Joe Ess wrote:When I was a lowly programming intern, I heard it said that the MS C compiler the company used was garbage because they always had to use debug settings for their code. If one tried to use the complier options to make code more "compact", the code would crash. Putting the debug settings back would make the issue impossible to reproduce, therefore, it was the compiler's fault. Since this was before the dawn of the Internet, if knowledge wasn't in one of your employee's experience or in the reference manuals on your desk, it was unobtainable. It wasn't until I finished my internship and Java came out that I realized they likely had buffer overruns in their code and the debug code inserted by the compiler was insulating them from the catastrophic consequences.


Yeah, I know that one too. There was one more difference between "debug" and "release" - in debug build, global variables were initialized to zero. In release build, global variables were essentially random.

I remember discovering a library in MS C/C++ (in late nineties) that guarded memory allocations against overruns and helped identify memory leaks. It was a tremendous help. Actually, it is still here: https://msdn.microsoft.com/en-us/library/x98tx3cf.aspx
 
Ranch Hand
Posts: 574
VI Editor Chrome Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Our system used to crash every few hours on average. It was completely random, sometimes it would run for a few minutes, other times it would run overnight. We even had a magic laptop that seemed to make the system crash more often. The code was rock-solid, it had been in use for years. It was just running on a new system.

The crash was the watchdog timer firing, so memory was still intact. I figured out which initialization routine configured the memory management system, then found out which process was running when it died. I then created a circular array and put checkpoints in it as the code executed. When it crashed I'd look at my array to see where the code went, then add more instrumentation. Eventually I narrowed it down to a single register, once in a while when writing to it it would hang.

Total time to find/fix? Maybe 4 hours. Total elapsed time? About 2 weeks, most of it twiddling my thumbs waiting for a crash. The real problem was I only had 1 JTAG debugger, and couldn't borrow another one, so when I say I spent 2 weeks twiddling my thumbs I mean that literally.
 
Greenhorn
Posts: 12
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are two that stand out the most.

Many years ago I was writing a game in C++ and DirectX and the final boss would follow a path, but once he got to a certain position in that path, he went straight for a corner of the screen. Took me three months to hunt it down, and it was due to a completely unrelated part of the game using the wrong variable(a "j" instead of an "i") to index an array, which was somehow overwriting the path logic. Moral of the story? Write clean code!

The second bug(well, "issue" is more precise) I recently identified was due to my lack of knowledge of power supplies. In a nutshell, I enjoy software rendering with Java(Graphics and BufferedImages, you get the idea) and for the life of me I could not figure out why neither the Raspberry Pi 1 or 2 could not handle software rendering. For the last year I have been pulling every trick in the book to speed up my rendering code and yet the speed of my demos were horribly slow. Here's what it was; In the official manual for the Raspberry Pi they recommend a 5v, 700mah supply, so I thought I'd be clever and use a 1000mah adapter for peace of mind...until last month I ventured upon the "Official" Raspberry Pi power adapter; 2000mah? Looking at the adapter for the model 3 Pi, it was 2500mah! With that adapter, suddenly my collection of Pi's were now able to handle resolutions of 800*600. The Pi model 1, sadly, does not hold up due to being a single-core, and so just moving the mouse around or pressing buttons on the keyboard will cause lag, but it was incredible to discover what a difference a good power supply can make. Also, to improve things further, I read up on how to create an event loop to sustain 30 fps, which was the icing on the cake. On one hand I feel gutted it took me a year to discover the root of the problem, but on the other the journey taught me so much about graphics programming that it was totally worth it. It at least allows me to say confidently that the Raspberry Pi model 3 is a BEAST and definitely performs significantly better than the model 2 and totally destroys the model 1. If only they managed to beef up the memory to 2GB, it would been perfection.

Otherwise, I havent come across many bugs of late as I'm writing much cleaner code. ^_^
 
So it takes a day for light to pass through this glass? So this was yesterday's tiny ad?
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
    Bookmark Topic Watch Topic
  • New Topic