Skip to content

Conversation

bnoordhuis
Copy link
Member

Call uc_init() after u_setDataDirectory() to find out if the data
directory is actually valid.

This commit removes parallel/test-intl-no-icu-data, added in commit
46345b9 ("src: make --icu-data-dir= switch testable"). It no longer
works now that an invalid --icu-data-dir= argument is rejected.
Coverage is now provided by abort/test-abort-icu-data-dir.

Note that a bad path is now a fatal error, not a silent error. I could make node print a warning and fall back to the builtin i18n data (if #ifdef NODE_HAVE_SMALL_ICU) but I don't know if ICU supports initializing twice. It seems to works for me locally with the patch below, /cc @srl295.

diff --git a/src/node_i18n.cc b/src/node_i18n.cc
index 30394f3..e85e115 100644
--- a/src/node_i18n.cc
+++ b/src/node_i18n.cc
@@ -430,6 +430,12 @@ bool InitializeICUDirectory(const std::string& path) {
   } else {
     u_setDataDirectory(path.c_str());
     u_init(&status);
+#ifdef NODE_HAVE_SMALL_ICU
+    if (status != U_ZERO_ERROR) {
+      status = U_ZERO_ERROR;
+      udata_setCommonData(&SMALL_ICUDATA_ENTRY_POINT, &status);
+    }
+#endif  // !NODE_HAVE_SMALL_ICU
   }
   return status == U_ZERO_ERROR;
 }

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. i18n-api Issues and PRs related to the i18n implementation. labels May 16, 2017
@refack
Copy link
Contributor

refack commented May 16, 2017

Note that a bad path is now a fatal error, not a silent error. I could make node print a warning and fall back to the builtin i18n data

IMHO it should fail, like any other bad argument. Fallback might be unexpected by user (or even not available).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add case for NODE_ICU_DATA via env?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@srl295
Copy link
Member

srl295 commented May 16, 2017

I don't know if ICU supports initializing twice

No, it doesn't support that. It will already be initialized with fallback data. (It may try again to load other data, but it's not really in a great state, cache wise, at that point.) The failure from u_init doesn't mean it wasn't initialized, technically, but that when it initialized, it couldn't find even a minimal data file.

You could think of u_init() as "Try to load a bunch of stuff so we can fail here instead of down in some specific request". ICU long ago stopped having any situations that required explicit initialization.

Edit: Actually, there is a u_cleanup(). If you called u_cleanup() and THEN reset the data path, called u_init then you would be good. This resets all caches, invalidates any data objects ICU has created for you already- but at this part of bootstrap that's where we should be.

Please note also, that calling u_init() will hit memory, disk, etc. which users might not otherwise expect… it will allocate heap, mmap() that data file, etc., which the other ICU calls in this file won’t do. The advantages are great, because you're not just checking for a valid dir, but checking that the ICU data is valid. Just should be noted that if you provide a path, ICU will get initialized.

@gibfahn
Copy link
Member

gibfahn commented May 16, 2017

Worth adding a:

Fixes: https://github.com/nodejs/node/issues/13043

to the commit.

@refack
Copy link
Contributor

refack commented May 16, 2017

No, it doesn't support that. It will already be initialized with fallback data. (It may try again to load other data, but it's not really in a great state, cache wise, at that point.) The failure from u_init doesn't mean it wasn't initialized, technically, but that when it initialized, it couldn't find even a minimal data file.

You could think of u_init() as "Try to load a bunch of stuff so we can fail here instead of down in some specific request". ICU long ago stopped having any situations that required explicit initialization.

@srl295
Like I said I prefer the fast fail, that's similar to other invalid arguments, but I have a question:
Are udata_setCommonData & u_setDataDirectory "compatible" with each other?
That is; if we limit the if to just the else case (if path not empty set u_setDataDirectory), will we end up with the "fallback" situation?
And the bigger question, will it make sense from the user's perspective?

  bool InitializeICUDirectory(const std::string& path) {
+   UErrorCode status = U_ZERO_ERROR;
-   if (path.empty()) {
-     UErrorCode status = U_ZERO_ERROR;
      // install the 'small' data.
      udata_setCommonData(&SMALL_ICUDATA_ENTRY_POINT, &status);
-     return (status == U_ZERO_ERROR);
-   } else {
+   if (!path.empty()) {
      u_setDataDirectory(path.c_str());
    }
+   u_init(&status);
+   return (status == U_ZERO_ERROR);
  }

P.S. there is a type in the first line of http://icu-project.org/apiref/icu4c/udata_8h.html#a406559067e309c05fb90b2d532f11835 (I would do the SVN+Trac dance, but... I'm lazy 😞)

@refack
Copy link
Contributor

refack commented May 16, 2017

Worth adding a:

Fixes: https://github.com/nodejs/node/issues/13043

to the commit.

Maybe also

Ref: https://github.com/nodejs/node-gyp/issues/1199

@bnoordhuis
Copy link
Member Author

Updated with suggestions, CI: https://ci.nodejs.org/job/node-test-pull-request/8114/

I turned it from an abort into an exit(9) in order that the test can live in test/parallel and be exercised regularly (unlike tests in test/abort.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows fails probably since argv[0] === node.exe or even absolute path.

@bnoordhuis
Copy link
Member Author

Fixed test, new CI: https://ci.nodejs.org/job/node-test-pull-request/8119/

Call uc_init() after u_setDataDirectory() to find out if the data
directory is actually valid.

This commit removes parallel/test-intl-no-icu-data, added in commit
46345b9 ("src: make --icu-data-dir= switch testable").  It no longer
works now that an invalid --icu-data-dir= argument is rejected.
Coverage is now provided by parallel/test-icu-data-dir.

Fixes: nodejs#13043
Refs: nodejs/node-gyp#1199
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Steven R Loomis <[email protected]>
@bnoordhuis bnoordhuis closed this May 19, 2017
@bnoordhuis bnoordhuis deleted the fix13043 branch May 19, 2017 18:27
@bnoordhuis bnoordhuis added the semver-major PRs that contain breaking changes and should be released in the next major version. label May 19, 2017
@bnoordhuis
Copy link
Member Author

Thanks for the reviews, landed in 46e773c and tagged as semver-major.

@bnoordhuis bnoordhuis merged commit 46e773c into nodejs:master May 19, 2017
@srl295
Copy link
Member

srl295 commented May 22, 2017

@refack

Are udata_setCommonData & u_setDataDirectory "compatible" with each other?

If udata_setCommonData is called, u_setDataDirectory will only be used for override content.

typo

=> http://bugs.icu-project.org/trac/ticket/13213#comment:1

@refack
Copy link
Contributor

refack commented May 22, 2017

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. i18n-api Issues and PRs related to the i18n implementation. semver-major PRs that contain breaking changes and should be released in the next major version.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants