Skip to content

htslib seek/libcurl_seek() doesn't fail on errors #604

@ramyala

Description

@ramyala

The following python code would call into htslib seek() and hts_iter_next() for reads. When libcurl_seek fails the following code still passes and leads to erronous output. I've noticed this happen when connections fail with ssl issues or 503 Service Unavailable errors. Ideal behavior should be for libcurl_seek to return a failed code instead of success which would allow an exception to be triggered on pysam (so user can either re-establish connection or handle the issue gracefully).

num_reads = 0
for read in pysam.fetch(chrom, start, end):
   process(read)
   num_reads += 1
print "my_process:Total reads: ", num_reads
[I::hseek] fp: 0xc6d37e8520, mode: SET, seek to: 5895168049 
* Found bundle for host gdc-api.nci.nih.gov: 0xc6d3816310 [can pipeline] 
* Hostname gdc-api.nci.nih.gov was found in DNS cache 
*   Trying 192.170.230.228... 
* TCP_NODELAY set 
* Connected to gdc-api.nci.nih.gov (192.170.230.228) port 443 (#71) 
* found 664 certificates in /etc/ssl/certs 
* ALPN, offering http/1.1 
* gnutls_handshake() failed: Error in the pull function. 
* Curl_http_done: called premature == 1 
* stopped the pause stream! 
* Closing connection 71 
INFO:my_process:Total reads: 0 

Error 2:

[I::hts_itr_query] idx: 0x6525a40da0, tid: 6, beg: 51396407, end: 51396425 
[I::hts_itr_next] GRAB MORE: bgzf_fp: 0x65249c7a10, iter: finished: 0, read_rest: 0, curr: (0:0-0@0x0), inst: (6:51396407-51396425@1), i: -1, n_off: 1, offset: 0x65249642d0 
[I::hseek] fp: 0x65239704d0, mode: SET, seek to: 4518809043 
* Found bundle for host gdc-api.nci.nih.gov: 0x652399b310 [can pipeline] 
* Hostname gdc-api.nci.nih.gov was found in DNS cache 
*   Trying 192.170.230.228... 
* TCP_NODELAY set 
* Connected to gdc-api.nci.nih.gov (192.170.230.228) port 443 (#49) 
* found 664 certificates in /etc/ssl/certs 
* ALPN, offering http/1.1 
* SSL connection using TLS1.2 / ECDHE_RSA_AES_256_GCM_SHA384 
* 	 server certificate verification OK 
* 	 server certificate status verification SKIPPED 
* 	 common name: gdc-api.nci.nih.gov (matched) 
* 	 server certificate expiration date OK 
* 	 server certificate activation date OK 
* 	 certificate public key: RSA 
* 	 certificate version: #3 
* 	 subject: C=US,postalCode=60305,ST=IL,L=River Forest,street=400 Lathrop Avenue,O=Open Cloud Consortium,OU=PremiumSSL,CN=gdc-api.nci.nih.gov 
* 	 start date: Thu, 18 Jun 2015 00:00:00 GMT 
* 	 expire date: Sun, 17 Jun 2018 23:59:59 GMT 
* 	 issuer: C=GB,ST=Greater Manchester,L=Salford,O=COMODO CA Limited,CN=COMODO RSA Organization Validation Secure Server CA 
* 	 compression: NULL 
* ALPN, server did not agree to a protocol 
> GET /data/a8fe0928-3d0c-4d16-be11-a5dd077bdb2c HTTP/1.1  
Host: gdc-api.nci.nih.gov  
Range: bytes=4518809043-  
User-Agent: htslib/1.4.1-86-g4a68964 libcurl/7.52.1  
Accept: */*  
X-Auth-Token:  REMOVED
* The requested URL returned error: 503 SERVICE UNAVAILABLE 
* Curl_http_done: called premature == 1 
* stopped the pause stream! 
* Closing connection 49 
INFO:my_process:Total reads: 0 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions