-
Notifications
You must be signed in to change notification settings - Fork 459
change bgzf_*open to set binary mode in windows #283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Not all We would be more likely to accept a PR that just implements the existing TODO re Set binary mode (for Windows). This will fix the problem for stdin/stdout, and in other cases of pre-specified fds I think it's reasonable to expect the calling code that opens the fd to open it appropriately. |
|
Thanks for clarifying the I understand that all On the other hand, the |
|
The general policy in htslib is to open everything in binary mode, and accept both LF and CR-LF as line terminators when reading files as text. (See for example kseq.h which is currently where this happens for SAM and VCF files.) Implications are:
|
|
On Mon, Oct 05, 2015 at 04:16:43AM -0700, John Marshall wrote:
This is quite easy to add. You can already see this sort of thing in O_BINARY for open in htslib #ifdef _WIN32 or O_BINARY on an explicit set_mode in cram/mFILE.c: #ifdef _WIN32 James James Bonfield ([email protected]) | Hora aderat briligi. Nunc et Slythia Tova The Wellcome Trust Sanger Institute is operated by Genome Research |
|
There is a reason why the general policy in htslib is to open everything in binary mode. The point is that when a data file is opened, we don't know whether it will be a binary file or a text file. This is not just a question of adding a flag to Since the majority of users are on platforms where all this is immaterial, the claim is that the always-binary approach is a simple maintainable implementation that is ideal for most and workable for the platforms that distinguish text and binary (i.e., Windows). Perhaps @ctsa can comment on whether the |
|
Thanks for clarifying the intent -- 'always binary' is a very easy policy to follow. I don't have any insight on the ctrl-Z issue and am not familiar with any cases where this would come up. Will submit revised PR based on this policy once I have a moment to test. |
|
John: your TODO comment does indeed look like a good starting point for many cases. Sorry I didn't notice it was a link to some actual source earlier, I was reading in plain old mutt. Although it may be better to do the actual setmode in the neighbouring hdopen as some functions call hdopen direct and don't go via the stdin/stdout version. An example of this is bgzf_dopen(), which is called by the bgzf_fdopen macro, in turn called in bgzf.c's main function directly on fileno(stdin). Chris: the control-Z issue is one of how Windows handles reading a binary file when opened in non-binary mode (the default for stdin/stdout). If it sees a control-Z it interpets it as EOF and promptly ceases reading, despite potentially megabases of data on the stream after it. Basically the same things that solve cr/lf typically also solve control-Z. Hence using _setmode(_fileno(fp), _O_BINARY) or an equivalent solves it too. |
This edit, together with the existing O_BINARY setting in hfile_oflags, ensures that nearly all hFILE open calls are set to binary mode on windows.
|
Ok, revised the PR down to this minimal edit -- this solves all of my motivating failure cases on windows (things like "sortVcf.py | bgzip.exe -c > file.vcf.gz", and a few similar uses). A more general change to make bgzip & tabix uniformly apply the 'always binary' policy should probably be the subject of another PR. |
|
All that stuff with (The and is unable to terminate giving input. However it turns out (at least on Windows 7) that even with stdin in binary mode typing I would prefer not to call
bzgip.c will need its own overhaul to work fully on Windows anyway. |
|
Please give current develop a go. b8204c1 avoids Bgzip still opens its non-BGZF files, i.e., the file to be compressed or the output from decompression, in non-binary mode. However as bgzip is probably mostly used with text files, that's perhaps the right thing to do. No change from previous versions, in any case. |
|
Thanks John! I haven't been on a windows iteration in some time. I see this made it into the 1.3 release so I will test when we do the htslib bump. |
|
Should have been fixed by fe8b210. Please reopen if there is still a problem here. |
Always set binary mode on the BGZF* file descriptor for
all bgzf_*open calls on windows. Note the occurance of
'b' in the mode string is ignored, because this file
type must be binary mode to work correctly on windows.
This design will correct all bgzf file pointers (stdin/stdout,
other cases of pre-specified fd...) instead of just
those using bgzf_open with the current O_BINARY define
in hfile_oflags.