PSA: Potential libstdc++ hang in std::filesystem::symlink_status

=> home

Something I caught while running TLGS's crawler. Due to how drogon works. It needs to always successfully open a new socket on TCP connection. Otherwise the entire process is killed. But Gemini does not support multiple requests on the same connection. So it's easy to endup with a lot of open sockets (though most are waiting to be closed). My hack to solve this conflict is to periodically walk /proc/self/ and check how many sockets are open. And rest if there're too many. The socket counter looks like this:

size_t countOpenSockets()
{
    namespace fs = std::filesystem;
    size_t count = 0;
    for(const auto& fd : fs::directory_iterator(fs::path("/proc/self/fd/"))) {
        if(std::filesystem::is_symlink(fd)
            && std::filesystem::read_symlink(fd).generic_string().starts_with("socket:["))
        count++;
    }

    return count;
}

This kind works. Sometimes, for unknown reasons, the crawler will hang completely and consum ~25% of CPU (mostly in kernel). Attaching a debugger to the process and dumping the stack trace shows the following:

Thread 2 (Thread 0x7fef8d775640 (LWP 1439864) "DrogonIoLoop"):                                                                                                                      
#0  __GI___fstatat64 (fd=-100, file=0x7fef88ebeb10 "/proc/self/fd/647", buf=0x7fef8d772290, flag=256) at ../sysdeps/unix/sysv/linux/fstatat64.c:166                                 
#1  0x00007fef8e464a82 in std::filesystem::symlink_status(std::filesystem::__cxx11::path const&, std::error_code&) () from /lib/x86_64-linux-gnu/libstdc++.so.6                     
#2  0x00007fef8e464deb in std::filesystem::symlink_status(std::filesystem::__cxx11::path const&) () from /lib/x86_64-linux-gnu/libstdc++.so.6                                       
#3  0x00005628fe869fd3 in countOpenSockets() ()                                                                                                                                     
#4  0x00005628fe876c6d in GeminiCrawler::dispatchCrawl()::{lambda()#1}::operator()(GeminiCrawler::dispatchCrawl()::{lambda()#1}::operator()()::_ZZN13GeminiCrawler13dispatchCrawlEvE
NUlvE_clEv.Frame*) [clone .actor] ()                                                                                                                     
#5  0x00005628fe996aa8 in trantor::TimerQueue::handleRead() ()                                                                                           
#6  0x00005628fe9832e0 in trantor::Channel::handleEventSafely() ()                                                                                       
#7  0x00005628fe9789b0 in trantor::EventLoop::loop() ()                                                                                                  
#8  0x00005628fe979ba8 in trantor::EventLoopThread::loopFuncs() ()                                                                                       
#9  0x00007fef8e3c22c3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6                                                                                
#10 0x00007fef8e130b43 in start_thread (arg=) at ./nptl/pthread_create.c:442                                                              
#11 0x00007fef8e1c2a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81     

This doesn't look right. fstat shouldn't have a negative fd. Even if it passed on. It shouldn't cause a hang. So I assume this is not a bug of the Linux kernel itself. Instead I'm guessing it's stdlibc++ not correctly handling an invalid fd.

To workaround the issue, I replaced the check from using C++'s filesystem module to using C's stat function. My crawler seill hangs unfortunately. But this time it seems to be my homebrew lockfree algorithm.

Proxy Information
Original URL
gemini://gemini.clehaxze.tw/gemlog/2022/05-15-psa-potential-stdlibcpp-hang-filesystem.gmi
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
1417.878852 milliseconds
Gemini-to-HTML Time
0.223956 milliseconds

This content has been proxied by September (ba2dc).