- 
                Notifications
    You must be signed in to change notification settings 
- Fork 459
Windows compilation. #531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows compilation. #531
Conversation
-stop git from converting line endings on the reference fasta files. -make the test scripts handle CRLF line endings, too. Perl should take care of it, according to the docs, but it doesn't. Added a fail-fast flag to test/test.pl. When the flag is given, the test stops on the first error it encounters. Improve usage instructions for test_view. Also replaced magic numbers with constants to make the code more readable. Minor code documentation.
- Added an AppVeyor config (travis equivalent for Windows). - The test harness now copes with windows style pathnames under msys by using 'cygpath'. You will likely already also need to set MSYS2_ARG_CONV_EXCL=* environment variable prior to "make check". Thanks to Anders Kaplan for hints at pathname handling. anderskaplan@0c52ae6 It also now handles nl vs cr-nl and the extra digits that appear in printf %g output. - Fixed the code looking for connect/recv calls. The previous -lsocket check was incomplete but we also now work with -lws2_32 on windows. - Setting of _POSIX_C_SOURCE=600 define so that snprintf return values work and to enable %lld and %z formatting options. - Fixed various shared library types in the Makefile. The existing cygwin code there is unmodified, but also untested. - Added hts_os.[ch] for operating system specific tweaks. This only affects windows/mingw at the moment and includes various drand/random implementations and a tweak to handle 1 less argument in mkdir. win/rand.[ch] contains a BSD implementation of the drand48 code, which produces identical random numbers to linux (needed for the tests to work). - Added fseeko and fsync checks to autoconf with appropriate ifdefs in the code. - Cope with seek on pipes returning EINVAL instead of ESPIPE. - Cope with the lack of SIGPIPE. We spot this is fd_write and raise a SIGTERM instead in lieu of anything better to do. - Additional enabling of O_BINARY mode in various places. - Avoidance for enum / define clases in windows (cram block method ERROR and MF_APPEND). - Support for windows C:/path as a full pathname. - Work around the gcc __format__ attribute requiring a different type (printf vs gnu_printf).
| Based on investigations to samtools/bcftools#610 I've realised I accidentally set the wrong define in the configure.ac - it should be _XOPEN_SOURCE=600. Brain fade! I'll postpone pushing that update until I get more checking done on other OSes, specifically an ancient linux (if I can get any of them to work at all!). | 
| struct cram_slice; | ||
|  | ||
| enum cram_block_method { | ||
| ERROR = -1, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change and its parallel MF_APPEND are caused by a clash with Windows.h. The workaround used here certainly works, but I'd recommend a less intrusive one instead: define NOUSER and NOGDI when compiling libhts. That shouldn't interfere with anything on any other platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noted that the affected header files aren't part of the public API. So perhaps less of an issue than I originally thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I figured it's probably a good move anyway to minimally name-space that ERROR enum as the potential for clashes on other systems will be reduced.
| #endif | ||
|  | ||
| // On mingw the "printf" format type doesn't work. It needs "gnu_printf" | ||
| // in order to check %lld and %z, otherwise it defaults to checking against | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that %lld should be avoided anyway, use the standard & portable PRIxx constants instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, where we have data types that are int32_t or int64_t we should (and do) use PRId64 etc. The HTS_FORMAT macro though is generic and we don't have any way to enforce programs that use htslib stick to only int64_t and not "long long", so the change is simply to make it as robust as it can be.
We probably ought to fix such cases in our own code though - eg samtools/bam_stat.c has long long all over the place and hence (correct, given the type) %lld too.  It should use uint64_t for most of those.
        
          
                Makefile
              
                Outdated
          
        
      |  | ||
|  | ||
| LIBHTS_OBJS = \ | ||
| hts_os.o\ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other entries here are clearly sorted by klib, then alphabetically, then subdirectories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
| echo '#define HAVE_LIBBZ2 1' >> $@ | ||
| echo '#define HAVE_LIBLZMA 1' >> $@ | ||
| echo '#define HAVE_FSEEKO 1' >> $@ | ||
| echo '#define HAVE_DRAND48 1' >> $@ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than adding infrastructure for these functions, wouldn't it be better to see where they are used: fseeko (it isn't); drand48 (just in ks_shuffle, which could use native alternatives)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fseeko is used and drand48 is used by more than ks_shuffle. Possibly we could rewrite the code that uses both, but as these are POSIX functions the intent is simply to put OS-specific deficiencies in one single place so we can maintain our position of desiring a POSIX compatible compilation. It also makes the PR much simpler, and safer, as it doesn't contain a lot of rewriting of code just to make it run on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place that needs to use fseeko() is zfseeko(), which is itself unused.  Moreover for zfio.c in general: if someone is in a tidying mood, see #552.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in zfopen() which is used, so the requirement for fseeko in current code is real.  However your point about it being legacy from io_lib is valid.    I agree we ought to rewrite it using official htslib functions and then the problem goes away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one in zfopen() could be changed as is to fseek or rewind, hence I wrote “that needs to use”.  James, there's no need to find something to disagree with in everything I say! 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not, but please say what you mean then. You started by saying it's unused, then by saying it's used by code which isn't used, and now by saying it's used in code that could be rewritten. The conversation would be shorter if you just said to start with that it could be avoided by replacing zfio with calls to bgzf - which I fully agree with.
However this just wasted all our time instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO I was perfectly clear for people willing to read between the lines at all; i.e., approaching the conversation with a view of ”what is the reason behind that particular statement” rather than “that statement is wrong”. Anyway: PR #553 fixes a minor bug too. Win/win.
| buffer = malloc(WINDOW_SIZE); | ||
| #ifdef _WIN32 | ||
| _setmode(f_src, O_BINARY); | ||
| #endif | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the decision here is to have bgzip open everything as binary (cf second paragraph of this comment), wouldn't it be better to just use hopen() rather than sprinkling _WIN32 platform crap all over the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we decided that hopen musn't do the O_BINARY setmode thing itself, indeed I recall you squashing a PR on that very reason.  Plus again we'd be changing bgzip on Unix in order to make it run on Windows, which comes with potential bugs and risks.  I'm happier with a windows-only change personally.
The change was made for a reason (it fixes bugs). If your view is that bgzip must NOT open everything as binary then we need to change the tool further still, adding a command line argument to tell it whether or not to treat stdin as text or binary. Yet more complications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reread my comment.  It's talking about changing bgzip.c to use hopen instead of open.  It does not propose altering the implementation of hopen itself.
I changed bgzip's BGZF files from open to (effectively) hopen in b8204c1 during the PR you mention, which is linked in my previous comment.  I would have rather liked to change its other files too as it simplifies and regularises the code, but I had not analysed the effects that would have on Windows, as I noted at the time.
The change was made for a reason (it fixes bugs).
Apparently you have now analysed the effects, though that wasn't apparent from the commit messages.  Grand; and I guess it gives bgzip -i a chance to have the right offsets.
So the maintenance question is about the risks of a change that affects all platforms while regularising and simplifying the code, versus one that adds the maintenance burden of #ifdef spaghetti.
| else return NULL; | ||
|  | ||
| if (i == 0 || i >= sizeof scheme) return NULL; | ||
| // 1 byte schemes are likely windows C:/foo pathnames | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/byte/character/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's char*, not wchar*, so byte vs character is splitting hairs here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a string provided by the user. A readable string, not binary data.
| #else | ||
| #elif defined(HAVE_FSYNC) | ||
| hFILE_fd *fp = (hFILE_fd *) fpv; | ||
| ret = fsync(fp->fd); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less invasive to add a #elif defined HAVE_FLUSHFILEBUFFERS-based version (or add a FlushFileBuffers()-based fsync function elsewhere for the existing fd_flush code to use).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get what you mean.  If we define HAVE_FLUSHFILEBUFFERS then we need our own custom autoconf code to generate that define, instead of using the easy existing macros for HAVE_func style checking, which sounds more invasive rather than less.
Could you give an example of what you mean please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean code like jmarshall/htslib@39146e2 to ensure fsync() is always available (but probably moved to hts_os.h or so).
Alternatively that code as a function in hfile.c or just inline in fd_flush(), likely controlled by an explicit platform HAVE_ macro like these ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A windows implementation of fsync makes sense, but it probably ought to be surrounded by a #ifndef HAVE_FSYNC check and appropriate autoconf magic.  Indeed hts_os.h sounds like the right spot for it too.
I'm not such a fan of #define HAVE_STRUCT_STAT_ST_BLKSIZE appearing mid-way through a file as the use of it can lead people to wrong conclusions (that it's in a header file somewhere and can be used elsewhere), given the uses of these are several pages below their defines.
Edit: Arguably here we just need fdatasync implementation and get that to call fsync if undefined, or with an OS-dependent implementation if that isn't defined.  Our preference is for fdatasync so that looks like the right one to be implementing, which also removes the other ifdef in fd_flush too.
        
          
                hfile.c
              
                Outdated
          
        
      | if (strncmp(url, "file://localhost/", 17) == 0) url += 16; | ||
| #ifdef _WIN32 | ||
| else if (strncmp(url, "file:///", 8) == 0) url += 8; | ||
| #else | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely a little obscure, and doesn't handle file://localhost/C:/foo/bar.txt.  Clearer alternative would be adding something like this below:
#ifdef _WIN32
if (url[0] == '/' && url[2] == ':' && url[3] == '/') url += 1;
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
| Incidentally, from the commit message: 
 The Cygwin code was of course tested when it was added (in 2cddfb3). | 
| Ok point taken, but untested by me then since these changes. I could easily have broken cygwin in this process, but haven't tested that. | 
        
          
                configure.ac
              
                Outdated
          
        
      | # This also sets __USE_MINGW_ANSI_STDIO which in turn makes PRId64, | ||
| # %lld and %z printf formats work. It also enforces the snprintf to | ||
| # be C99 compliant so it returns the correct values (in kstring.c). | ||
| CPPFLAGS="$CPPCFLAGS -D_POSIX_C_SOURCE=600" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change -D_POSIX_C_SOURCE=600 to -D_XOPEN_SOURCE=600.
- Fixed POSIX vs XOPEN macro mistake. - Sort order of Makefile. - Clearer logic of C:/path handling. NOT fixed in this update: - Removal of cram/os.h (to do later on and merge into hts_os.h). - Removal of fseeko (this branch still needs it - fix this when we merge with develop and remove the requirement for it). - Bgzip binary vs non-binary modes. We need clearer understanding on this, and command line extensions if we wish to support both. - fdatasync reimplementation for systems that don't have it.
Given this removes cram/zfio.c, I've also now removed fseeko from configure.ac and Makefile.
| In lieu of having a local working windows box of my own at the moment, I prodded AppVeyor on this latest commit and sadly it fails everywhere: https://ci.appveyor.com/project/jkbonfield/htslib/build/vers.44 I'm not sure what caused it, whether it was something that has moved on in develop since the PR was initially made or whether it was a requirement on HAVE_FSEEKO being defined (there are lots of related warnings, maybe cull the bit from cram/os.h?) I am unsure. It may be best to just use the original commit (ace007f) and apply the changes one by one to see what broke it. It's hard to tell without being in front of an actual windows setup. Edit: Correct - it's the removal of HAVE_FSEEKO that broke it, and also culling the use of the macro in cram/os.h fixes it as expected. Phew. | 
Does this fix AppVeyor builds? Regardless it's wrong to check this macro now we've removed it again from configure.ac, but I don't (yet) know if we need to remove it here too or whether we need to keep this as-is and add it back to configure.ac again. Hopefully the former, especially given we have no support for MSVC anyway.
| Squashed two most recent commits, rebased on fresh develop and merged. | 
Implemented building on Windows with Mingw64 & msys.
Also contains an appveyor configuration for Windows CI. See https://ci.appveyor.com/project/jkbonfield/htslib/build/vers.31 for an example of this.