Description of problem: Fedora builds cURL against NSS, which has an integration bug where libcurl asks NSS to receive data with no timeout, regardless of the timeout option set in libcurl. Version-Release number of selected component (if applicable): All (including upstream's master branch) I'm leaving out reproduction data because we're already looking at a fix upstream: http://comments.gmane.org/gmane.comp.web.curl.library/39357 I'll post back once it's in libcurl's master branch.
I believe this is already fixed in rawhide and upstream: http://github.com/bagder/curl/commit/9d0af301 How are you confirming that the recv/send functions provided by NSS can block?
I think you're right. I'm using the F17 curl package, and that predates the fix for properly setting the non-blocking status for NSS. I just looked at the relevant NSS source, and it seems to always assume the sockets it uses could be non-blocking while simulating blocking behavior by looping internally until the timeout gets hit if fd->secret->nonblocking is false for the connection. If F18 and earlier, I'm guessing it thinks that nonblocking variable is false right now, based on behavior in my traces. But, the timeout is set to a PRIntervalTime of PR_INTERVAL_NO_TIMEOUT. So, in F18 and earlier, combined with the (1) libcurl setting the actual socket to non-blocking and (2) *not* properly setting non-blocking properly for NSS, NSS just polls for at least eight hours. In any case, I'll update upstream and still push for using a timeout of PR_INTERVAL_NO_WAIT. It will help unmask future regressions around non-blocking status in NSS and increase code clarity around how the send and receive functions behave in potentially blocking scenarios. Would you be amenable to back-porting the fix in setting the non-blocking status to F17 and F18 packages? It would be very helpful to have HTTPS timeouts work properly in these existing releases.
Setting version to F19. In all cases where I mention F17 and F18, I believe I actually mean "F19 and earlier."
Also, let me directly respond to your question. > How are you confirming that the recv/send functions provided by NSS can block? I never saw them blocking, just looping around a poll() call for hours even with a three-minute timeout set in libcurl. I think this is because libcurl sets up the socket as non-blocking, but NSS loops around a poll() unless it's also set properly to do non-blocking.
I'm trying to test out this change on F17, but I keep running into pycurl issues. After rebuilding curl, libcurl-devel, libcurl, and curl-debuginfo and then installing them, pycurl ceases to work with this error: ImportError: build/lib.linux-x86_64-2.7/pycurl.so: undefined symbol: CRYPTO_num_locks I also can't rebuild pycurl with my rebuilds of curl in place. It's funny because pycurl seems to expect OpenSSL resources (like the CRYPTO_num_locks symbol), but curl builds with NSS in the Fedora packages. It's not clear how to make pycurl happy. And, of course, a broken pycurl means a broken Yum.
> I also can't rebuild pycurl with my rebuilds of curl in place. It's funny because pycurl seems to expect OpenSSL resources (like the CRYPTO_num_locks symbol), but curl builds with NSS in the Fedora packages. It's not clear how to make pycurl happy. Maybe it's because I'm disabling libssh2 in my curl build? Keeping libssh2 breaks the curl build with a Valgrind test failure, though.
Okay, things work now with libssh2 enabled in the build and test582 disabled. I can't get the build to work with test582 enabled, even for the stock SRPM.
(In reply to comment #4) > Also, let me directly respond to your question. > > > How are you confirming that the recv/send functions provided by NSS can block? > > I never saw them blocking, just looping around a poll() call for hours even > with a three-minute timeout set in libcurl. I think this is because libcurl > sets up the socket as non-blocking, but NSS loops around a poll() unless > it's also set properly to do non-blocking. If you saw poll() being called repeatedly by strace, it was most likely happening at the NSS level, which means the PR_Recv/PR_Send() calls were seen as blocking by libcurl. This should not happen in rawhide and will be fixed in stable Fedora releases. (In reply to comment #7) > Okay, things work now with libssh2 enabled in the build and test582 > disabled. I can't get the build to work with test582 enabled, even for the > stock SRPM. I suspect you are hitting bug #821440 -- you can either upgrade nss-softokn to a version that contains the fix, or create a valgrind suppression for that memory leak.
already fixed in curl-7.29.0-3.fc19
curl-7.24.0-9.fc17 has been submitted as an update for Fedora 17. http://admin.fedoraproject.org/updates/curl-7.24.0-9.fc17
curl-7.27.0-10.fc18 has been submitted as an update for Fedora 18. http://admin.fedoraproject.org/updates/curl-7.27.0-10.fc18
> If you saw poll() being called repeatedly by strace, it was most likely happening at the NSS level, which means the PR_Recv/PR_Send() calls were seen as blocking by libcurl. That is correct. It's the effect of giving NSS a non-blocking socket fd but not telling it to treat it as non-blocking. > This should not happen in rawhide and will be fixed in stable Fedora releases. Thank you. You're awesome. This is why I love free, open-source software.
Package curl-7.24.0-9.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing curl-7.24.0-9.fc17' as soon as you are able to. Please go to the following url: http://admin.fedoraproject.org/updates/FEDORA-2013-7797/curl-7.24.0-9.fc17 then log in and leave karma (feedback).
curl-7.27.0-10.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.
curl-7.24.0-9.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.