“I coulda had a V8!”

Valgrind [1].

Why didn't I think of that sooner [2]?

Anyway, I installed valgrind on the virtual server and started up a test run [3] to see if I could locate the current bug.

Not two minutes in, and I got the following:

==27967== Use of uninitialised value of size 4
==27967== at 0x80508DF: crc32 (crc32.c:83)
==27967== by 0x804A6C1: send_packet (main.c:566)
==27967== by 0x80496FC: mainloop (main.c:151)
==27967== by 0x80493A5: main (main.c:68)
==27967==
==27967== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==27967== at 0x40EDFD1: sendto (in /lib/tls/libc-2.3.4.so)
==27967== by 0x80496FC: mainloop (main.c:151)
==27967== by 0x80493A5: main (main.c:68)
==27967== Address 0xBEEDB2EA is on thread 1's stack
==27967==
==27967== Source and destination overlap in memcpy(0x4251028, 0x4251028, 256)
==27967== at 0x401D236: memcpy (mac_replace_strmem.c:394)
==27967== by 0x804F00D: tuple_expire (tuple.c:231)
==27967== by 0x804BC3D: handle_sigalrm (signals.c:278)
==27967== by 0x804B6BF: check_signals (signals.c:50)
==27967== by 0x80493BD: mainloop (main.c:83)
==27967== by 0x80493A5: main (main.c:68)

I was mystified by that first warning, as the function in question isn't all that big:

CRC32 crc32(CRC32 crc,const void *data,size_t size)
{
const byte *p = data;
while(size--)
{
crc = ((crc >> 8) & 0x00FFFFFF) ^ m_crc32_table[ (crc ^ *p++) & 0xFF ];
}
return(crc);
}

What could it possibly be complaining about? Only when investigating the second warning did I realize what the problem was—while I wasn't calculating the CRC (Cyclic Redundancy Check) over the entire structure, valgrind didn't know that. That explains the first two warnings. The third one deals with this bit of code:

for (i = j = 0 ; i < g_poolnum ; i++)
{
/* other code */
if (difftime(Tao,g_pool[i].atime) < c_timeout_gray)
{
g_pool[j++] = g_pool[i];
continue;
}
/* rest of loop */
}

g_pool is an array of structures, and I'm using the C89 feature of structure assignment to avoid having to write memcpy(&g_pool[j++], &g_pool[i], sizeof(struct tuple)). But memcpy() (which is what the compiler changes the code to internally) has undefined semantics when the memory blocks overlap, and they do when i == j. This doesn't seem to cause a problem, but who knows?

I also found out that valgrind stops reporting errors after 100,000 are found:

==27967== More than 100000 total errors detected. I'm not reporting any more.
==27967== Final error counts will be inaccurate. Go fix your program!
==27967== Rerun with --error-limit=no to disable this cutoff. Note
==27967== that errors may occur in your program without prior warning from
==27967== Valgrind, because errors are no longer being displayed.

Most of these deal with the child process created to save the state. I don't bother with cleaning up too much since upon calling _exit() all memory is reclaimed anyway, so I'm not terribly concerned about it. But I did lose two hours of testing to this.

Update on Tuesday, October 16^16, 2007, 6 hours, 20 minutes and 9 seconds after starting the current test run of the greylist daemon [4]

Bad news—it's still running.

You might think this is good news, but it's not. It means that the debugging conditions have changed the conditions the code runs under, so something very subtle is happening.

I did, however, make one small change to the code, clearing up two of the warnings mentioned above. I would hate to think that making that one small change fixed the program. If it's still running in the morning, then I'll run it again without valgrind and see if it works.

Sigh.

=> [1] http://valgrind.org/ | [2] /boston/2007/10/13.1 | [3] /boston/2007/09/24.3 | [4] /boston/2007/08/16.1

=> Gemini Mention this post | Contact the author

Proxy Information
Original URL
gemini://gemini.conman.org/boston/2007/10/15.1
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
772.936275 milliseconds
Gemini-to-HTML Time
2.974378 milliseconds

This content has been proxied by September (ba2dc).