Originally, I intended to give this posting the title “Anything that can go wrong…” but I decided that Smrender and my talk is more important than Murphy’s law. But stay tuned! You will read what happend.
Yesterday I gave a talk about Smrender at the Linuxwochen 2012 in Vienna. Three things happened the first time. First, I was the first time at the FH Technikum Wien. Second, it was the first talk about Smrender in the public. And third, I had some troubles with Smrender during my talk which I never had before.
Regarding the university, in my opinion the arrangement of the student cafeteria is much better than at the FH St. Pölten . It is much more student like and they even sell draft beer 🙂 But there’s a big disadvantage: they do not sell Club Mate!
Outline of the Talk
I started the talk1 with an introduction of myself and the motivation of creating this new OSM rendering engine. I am software developer, yacht sailing trainer and instructor, and contributor to several open source projects, thus it is obvious that I am interested in computer aided marine mapping.
I quickly presented the OpenSeamap project, its goals and features.
The main advantage of Smrender is its ability to produce charts suitable for print-out. Furthermore it supports a flexible ruleset and several specific match operations. Additionally, Smrender supports a set of special features such as auto-rotation of captions and images, area-dependent caption sizes, closing of open polylines, polyline refinement, and many more.
Finally I described some internal details about memory structures, iterative rule processing, and so on.
Directly after the talk I intended to give a quick live demonstration of running Smrender. I am a practiced speaker, hence, I have had everything prepared before. I gave some explanations on the command line arguments and ran the program. Everything looks fine and then suddenly the program stuck with the CPU running at 100%. This never happend before. I’m serious. I terminated and restarted it and it successfully finished its run.
Of course, I could not simply ignore the fact that my program stuck in an endless loop. But how to find such a bug? The debug output during the lecture showed that it stuck somewhere in a function of libsmfilter. I tried to reproduce it and started it over and over again. After about the 15th time it stuck again exactly at the same position. I attached gdb to the running process and retrieved a stack trace 😉 The following shows a snippet of it.
#0 __memcpy_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:1081
#1 0x00007f24366bc78f in vsector (o=0x30a46c0) at libsmfilter.c:273
#2 0x000000000040d9e5 in traverse (p=0x2dc5c70, rd=0x6154a0, dhandler=0x40d550 <apply_rules0>, idx=<optimized out>, d=16, nt=0x30a6d20) at smrender.c:353
It seems to loop in memcpy_ssse3() which is function somewhere in libc. Stack frame #1 shows that this function was called in line 273 of libsmfilter.c which is memcpy(). I had a closer look and it turns out that I misused the function because I copied an overlapping memory area. The man page is completely clear in that point: “The memory areas must not overlap.”. Actually, I knew that, of course. Nevertheless, I illegally used memcpy() instead of memmove(), although I do not know why. Probably, I partially rewrote that part of the code and the memory areas did not overlap in an earlier version. After replacing the function, the failure did not occur any more.
But there is still one question left: why didn’t this never every happen before?
Sometime ago I upgraded my system from Debian Squeeze (stable) to Wheezy (sid). The big differences in respect to development is the new Linux kernel (2.6.32 -> 3.2.0), a new version of libc (2.11 -> 2.13), and a much newer version of gcc (4.4 -> 4.6). Most probably there was an essential change in libc.
Another question is: why didn’t it happen just sometimes and not always?
I do not know but I guess it is a result of moving stack positions. The absolute address of the stack changes in contrast to most other memory locations if you rerun a program.
Lessons learned: even the best programmer may fail 😉