Add hyper-pararmeter-optimization notebook with Hyperband by mrocklin · Pull Request #1 · coiled/coiled-examples · GitHub

This repository was archived by the owner on Apr 24, 2021. It is now read-only.

Member

mrocklin commented Jul 18, 2020

Currently depends on dask/dask-ml#701

This could be improved by using an estimator that benefitted from large
amounts of data.

mrocklin added 3 commits

July 18, 2020 16:29


          Add hyper-pararmeter-optimization notebook with Hyperband

1f97cc2

Currently depends on dask/dask-ml#701

This could be improved by using an estimator that benefitted from large
amounts of data.


          squashme

72b479b


          cleanup

3bedf54

Member Author

mrocklin commented Jul 27, 2020

@stsievert if you have any time do you have any thoughts on how this example might be improved?

stsievert reviewed

View reviewed changes

stsievert left a comment •

edited

Loading

Thanks! This is nice too see.

There are some improvements to make I think. I've left some comments below to give users a better idea about why they're using Dask, and also some nits. I also think it'd help to provide some text below each title describing what the cell does and why it's required to use Dask. I'd probably point to Dask-ML's hyperparameter optimization docs too.

If you'd like, I might be able to modify this example.

hyper-parameter-optimization.ipynb Outdated

+                 "cell_type": "markdown",
+                 "metadata": {},
+                 "source": [
+                  "## Train model"

stsievert Jul 27, 2020

I think I would split this into two sections: "Define model and hyperparameters search space" and "Find the best hyperparameters."

hyper-parameter-optimization.ipynb Outdated

+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "from sklearn.linear_model import SGDClassifier\n",

stsievert Jul 27, 2020

I'm not sure SGDClassifier is relevant according to the 4 categories in https://ml.dask.org/hyper-parameter-search.html. It's a linear model with 6 features; I don't know if I'd label that as "compute constrained."

I think there are a couple options:

Have a more computationally constrained model (e.g, MLPClassifier or PyTorch). (I might use an MLPClassifier then say "Realistically, a PyTorch model might be used. To do that, ... (skorch) ....").
Use IncrementalSearchCV. I think this is the appropriate classifier for the example as written: it's memory-constrained, not compute-constrained.
Search over more hyperparameters. This would make it more computationally constrained; it'd require a higher max_iter in Hyperband.

stsievert Jul 27, 2020

I'd also point to the docs to make sure the users know why they're using Dask: https://ml.dask.org/hyper-parameter-search.html

Member Author

mrocklin Jul 28, 2020

Yeah, I think I chose SGDClassifier just because it was simple. These are all black boxes to me, so I chose the simplest black box about which I could find the most examples :)

hyper-parameter-optimization.ipynb Outdated

+                  "        \"store_and_fwd_flag\": \"category\",\n",
+                  "        \"PULocationID\": \"UInt16\",\n",
+                  "        \"DOLocationID\": \"UInt16\",    \n",
+                  "        \"payment_type\": \"UInt8\",\n",

stsievert Jul 28, 2020

Nit: Some columns are included here are never seen again, like PULocationID.

Member Author

mrocklin Jul 28, 2020

Yeah, this was a copy-paste job from another notebook. I should probably remove some of these columns with usecols= I guess.

hyper-parameter-optimization.ipynb Outdated

+                  "    blocksize=\"16 MiB\",\n",
+                  ")\n",
+                  "\n",
+                  "data = df[[\"passenger_count\", \"trip_distance\", \"RatecodeID\", \"payment_type\", \"fare_amount\"]]\n",

stsievert Jul 28, 2020

Nit: RatecodeID is categorical with 5 categories according to the column descriptions docs. Maybe OneHotEncoder should be used on that column?

from dask_ml.preprocessing import OneHotEncoder
rate_indicators = OneHotEncoder().fit_transform(df["RatecodeID"])
# put rate_indicators back into df

hyper-parameter-optimization.ipynb Outdated

+                  "data = df[[\"passenger_count\", \"trip_distance\", \"RatecodeID\", \"payment_type\", \"fare_amount\"]]\n",
+                  "data = data.fillna(0)\n",
+                  "\n",
+                  "labels = (df.tip_amount / df.fare_amount) > 0.25\n",

stsievert Jul 28, 2020

Nit: I might predict taxi trip duration to mirror https://www.kaggle.com/c/nyc-taxi-trip-duration/. That would imply a regression problem, not a classification problem.

Member Author

mrocklin Jul 28, 2020

Oh cool. It would be nice to reflect an existing Kaggle problem.

stsievert Jul 28, 2020

And maybe at the end show how many minutes you are off:

pred_time = model.score(X_test)
err = np.abs(pred_time - real_time)
pd.Series(err).plot.hist()

hyper-parameter-optimization.ipynb Outdated

+                 "metadata": {},
+                 "outputs": [],
+                 "source": [
+                  "search.score(X_test.sample(frac=0.1, random_state=123), y_test.sample(frac=0.1, random_state=123))"

stsievert Jul 28, 2020

Maybe add a comment on why frac=0.1 is used?

Member Author

mrocklin Jul 28, 2020

Yeah, this was interesting, and something that we might want to think about in Dask-ML.

My current understanding is that search.score calls a scikit-learn scorer on the inputs, and so these are brought into local memory. I imagine that this is because we haven't made dask-compatible scorers for everything. Is that correct?

stsievert Jul 28, 2020

Is ParallelPostFit relevant? It takes a trained model and maps the score/predict functions to each chunk.

Member Author

mrocklin Jul 28, 2020

Maybe? (cc @TomAugspurger)

My sense is that scorers will likely need to be handled one at a time, and that there isn't an obvious way to map them all automatically. It looks like there is a mapping in dask_ml/model_selection/scorer.py. Maybe ParallelPostFit uses that. If so, IncrementalSearchCV and friends (Hyperband) should maybe use the same tricks?

TomAugspurger Jul 28, 2020

If so, IncrementalSearchCV and friends (Hyperband) should maybe use the same tricks?

Wrapping it in ParallelPostFit should do the trick, but every key in the hyperparameter dict params would need to be prepended with estimator__.

I'll think more about doing this automatically. My initial reaction is "no", since the default is to fall back to the estimator's default. I wouldn't want to complicate that.

One thing we should be doing is to make something like Hyperband(..., scoring="accuracy") work. Right now we use sklearn.metrics.check_scoring. But if that used dask_ml.metrics.check_scoring things would work. I'll open an issue.

stsievert Jul 28, 2020

Is ParallelPostFit relevant?

I meant this: ParallelPostFit(search.best_estimator_).score(X_test, y_test).

Member Author

mrocklin commented Jul 28, 2020

If you'd like, I might be able to modify this example.

That would be very very welcome :)

TomAugspurger mentioned this pull request

BaseIncrementalSearchCV should use dask_ml.metrics.check_scoring dask/dask-ml#714

Open

stsievert added 4 commits

July 27, 2020 22:41


          Make edits to Hyperband example

7ab89eb


          Note about when to use IncrementalSearchCV

a49ccda


          wording

0dd0a06


          Add visualiation

14c41c0

stsievert commented Jul 28, 2020 •

edited

Loading

I've made some edits. A summary of the changes:

I changed the model to MLPRegressor. I say this is a simple model standing in for a more complicated model that could use GPUs. I point to PyTorch/skorch models and say they can be used.
Clarified when to use Hyperband and when to use IncrementalSearchCV. This pointed to the docs: https://ml.dask.org/hyper-parameter-search.html
I point to Hyperband's rule-of-thumb to determine the Hyperband's max_iter and the Dask Array chunk size.
Added some text explaining each cell/section.
Added an error visualization.

This is rough; this is far from a polished draft. @mrocklin let me know what questions you have.

Member Author

mrocklin commented Jul 28, 2020

Oh, cool. This is fun to play with.

I changed the model to MLPRegressor. I say this is a simple model standing in for a more complicated model that could use GPUs. I point to PyTorch/skorch models and say they can be used.

Thoughts on using PyTorch/Skorch here instead? Would that make things much more complex?

Clarified when to use Hyperband and when to use IncrementalSearchCV. This pointed to the docs: https://ml.dask.org/hyper-parameter-search.html

I think that pointing to docs for a lot of this is good. I like the idea of using Hyperband here, but I don't like the idea of explaining all of the knobs behind Hyperband in a first exposure example like this. I'm curious, are the defaults bad in this case? Would it be ok to omit extra parameters here or do we need to expose those to have things make sense.

I ran into an issue with the classes= keyword not being accepted. Did you run into this too? (Might not be an issue if we get pytorch properly running).

stsievert commented Jul 28, 2020

I'm curious, are the defaults bad in this case?

You're talking about max_iter and chunk_size? I don't think that's critical here, and I added it to be complete. The default are reasonable: max_iter=81 trains models for 81 calls to partial_fit(X_chunk, y_chunk) and samples 143 hyperparameters. I think that's sufficient for a lot of use cases. But it might be able to be a little better.

I'd still link to the rule of thumb (probably the one in the example; the one in the docstring is hard to link to). I'd also add a note something like "if you want to sample more parameters or train your models for longer, look at HyperbandSearchCV's rule of thumb. Luckily, it's simple and only requires knowing how many hyperparameters to sample and how long to train the model."

Thoughts on using PyTorch/Skorch here instead? Would that make things much more complex?

👍 I think it'd be nice to have PyTorch; we don't have a PyTorch + Hyperband example yet in dask-examples yet. I suspect your users don't want to be tied to Scikit-Learn. Having a PyTorch example would allow users more freedom.

Looking at skorch's getting started guide, it'd amount to this much code:

from skorch import NeuralNetRegressor
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F

class HiddenLayerNet(torch.nn.Module):
    def __init__(self, n_features=10, n_outputs=1, n_hidden=100):
        super().__init__()
        self.fc1 = nn.Linear(n_features, n_hidden)
        self.fc2 = nn.Linear(n_hidden, n_output)

    def forward(self, X, **kwargs):
        return self.fc2(F.relu(self.fc1(x)))

net = NeuralNetRegressor(
    module=HiddenLayerNet,
    module__hiden=200,
    optimizer=optim.SGD,
    optimizer__lr=0.1,
    max_epochs=10,
    # Shuffle training data on each epoch
    iterator_train__shuffle=True,
)

PyTorch modules require float32 input. I'd convert the dataset first.

Member Author

mrocklin commented Jul 29, 2020

I'd still link to the rule of thumb (probably the one in the example; the one in the docstring is hard to link to). I'd also add a note something like "if you want to sample more parameters or train your models for longer, look at HyperbandSearchCV's rule of thumb. Luckily, it's simple and only requires knowing how many hyperparameters to sample and how long to train the model."

FWIW I suspect that while many researchers find those questions simple to answer I suspect that many practitioners don't have good answers. I think that one of the reasons why Scikit-Learn was popular was that many things worked out of the box with sensible defaults. I wonder if there is a good default solution in this case. (that's probably a problem to solve later though).

+1 I think it'd be nice to have PyTorch; we don't have a PyTorch + Hyperband example yet in dask-examples yet. I suspect your users don't want to be tied to Scikit-Learn. Having a PyTorch example would allow users more freedom.

If you're interested in writing this up I'd be in favor. (I'm really just trying to get as much free labor as I can out of you :) )


          pytorch

fc2a73e

stsievert commented Jul 30, 2020

I've integrated PyTorch.

I didn't have time to debug an issue I ran into: the output of ParallelPostFit(search.best_estimator).predict(X_test) is reported by Dask to be (100, ), but when I compute it's actually (100, 50).

Member Author

mrocklin commented Jul 30, 2020 via email

I'll take a look!

On Thu, Jul 30, 2020 at 3:11 PM Scott Sievert ***@***.***> wrote: I've integrated PyTorch. I didn't have time to debug an issue I ran into: the output of ParallelPostFit(search.best_estimator).predict(X_test) is reported by Dask to be (100, ), but when I compute it's actually (100, 50). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTGOFRRKXNFOHX6BQMLR6HVYPANCNFSM4PAQOSDA> .

TomAugspurger commented Jul 31, 2020 via email

I think ParallelPostFit.predict is just incorrect at https://github.com/dask/dask-ml/blob/5c3179eb7eaa6bf830e0b6df162902f805a9b3c0/dask_ml/wrappers.py#L275. It doesn't handle multi-dimensional output. I'd hoped that we could pass an empty array to `.predict()`, to find the output shape, but at least some scikit-learn estimators validate that the array is non-empty. On Thu, Jul 30, 2020 at 5:17 PM Matthew Rocklin <[email protected]> wrote:

I'll take a look! On Thu, Jul 30, 2020 at 3:11 PM Scott Sievert ***@***.***> wrote: > I've integrated PyTorch. > > I didn't have time to debug an issue I ran into: the output of > ParallelPostFit(search.best_estimator).predict(X_test) is reported by > Dask to be (100, ), but when I compute it's actually (100, 50). > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#1 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AACKZTGOFRRKXNFOHX6BQMLR6HVYPANCNFSM4PAQOSDA > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOIQ7OJCXUXUJMVOJCETR6HWPBANCNFSM4PAQOSDA> .

jrbourbeau added 3 commits

August 4, 2020 13:14


          Merge branch 'master' of https://github.com/coiled/coiled-examples in…

4bd43aa

…to hpo


          Some updates

848710d


          Remove old notebook

b8b0319

Contributor

jrbourbeau commented Aug 5, 2020

Thanks for your work on this @mrocklin @stsievert @TomAugspurger! I pushed a few small updates. Namely:

Created a coiled-examples/pytorch cluster configuration to be used in this example
Added df = df.categorize(categorical_features) just after the initial read_csv to avoid NotImplementedError: get_dummies with unknown categories is not supported.
Commented out several hyperparameters in the search grid / reduced max_iter for HyperbandSearchCV as I found both of these lead to ValueError: Input contains NaN, infinity or a value too large for dtype('float32') errors being raised while calculating model score during the search (full traceback below). I'm not sure what is causing large or non-finite values to appear in the model score. Perhaps an extreme value in model weights? @stsievert have you run into this before? Any thoughts on what might be causing this?
Commented out the "Why not simply sampling instead?" code cells as I found them to take a very long time to run.

Traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-9db7871d3c39> in <module>
      1 y_train2 = y_train.reshape(-1, 1).persist()
----> 2 search.fit(X_train, y_train2)

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in fit(self, X, y, **fit_params)
    700         client = default_client()
    701         if not client.asynchronous:
--> 702             return client.sync(self._fit, X, y, **fit_params)
    703         return self._fit(X, y, **fit_params)
    704 

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    831         else:
    832             return sync(
--> 833                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    834             )
    835 

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    337     if error[0]:
    338         typ, exc, tb = error[0]
--> 339         raise exc.with_traceback(tb)
    340     else:
    341         return result[0]

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/distributed/utils.py in f()
    321             if callback_timeout is not None:
    322                 future = asyncio.wait_for(future, callback_timeout)
--> 323             result[0] = yield future
    324         except Exception as exc:
    325             error[0] = sys.exc_info()

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_hyperband.py in _fit(self, X, y, **fit_params)
    400 
    401         _SHAs = await asyncio.gather(
--> 402             *[SHAs[b]._fit(X, y, **fit_params) for b in _brackets_ids]
    403         )
    404         SHAs = {b: SHA for b, SHA in zip(_brackets_ids, _SHAs)}

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _fit(self, X, y, **fit_params)
    658                 random_state=self.random_state,
    659                 verbose=self.verbose,
--> 660                 prefix=self.prefix,
    661             )
    662         results = self._process_results(results)

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in fit(model, params, X_train, y_train, X_test, y_test, additional_calls, fit_params, scorer, random_state, verbose, prefix)
    476         random_state=random_state,
    477         verbose=verbose,
--> 478         prefix=prefix,
    479     )
    480 

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _fit(model, params, X_train, y_train, X_test, y_test, additional_calls, fit_params, scorer, random_state, verbose, prefix)
    260     # async for future, result in seq:
    261     for _i in itertools.count():
--> 262         metas = await client.gather(new_scores)
    263 
    264         if log_delay and _i % int(log_delay) == 0:

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1845                             exc = CancelledError(key)
   1846                         else:
-> 1847                             raise exception.with_traceback(traceback)
   1848                         raise exc
   1849                     if errors == "skip":

/opt/conda/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _score()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py in _passthrough_scorer()

/opt/conda/lib/python3.7/site-packages/sklearn/base.py in score()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_regression.py in r2_score()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_regression.py in _check_reg_targets()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite()

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

Member Author

mrocklin commented Aug 5, 2020

Some thoughts!

First, it's great that we have an environment set up. It was surprisingly hard for me to construct a conda environment with torch and skorch in it for some reason.
I think that we should persist after the read_csv to avoid multiple passes during categorize
I think that we need a few more hyperparameters for this to look interesting. I share the desire to have this complete in a reasonable time. My thinking is 1-3 minutes.
I notice that the partial_fit calls are pretty slow, around 40s. We might want to repartition by size ahead of time, something like df = df.repartition(partition_size="10 MiB") (@stsievert probably knows more)

Member Author

mrocklin commented Aug 5, 2020

When I reduce the partition size I get

Traceback

[CV, bracket=4] creating 81 models
[CV, bracket=3] creating 34 models
[CV, bracket=2] creating 15 models
[CV, bracket=1] creating 8 models
[CV, bracket=0] creating 5 models
[CV, bracket=0] For training there are between 46291 and 98195 examples in each chunk
[CV, bracket=2] For training there are between 46291 and 98195 examples in each chunk
[CV, bracket=3] For training there are between 46291 and 98195 examples in each chunk
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-9db7871d3c39> in <module>
      1 y_train2 = y_train.reshape(-1, 1).persist()
----> 2 search.fit(X_train, y_train2)

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in fit(self, X, y, **fit_params)
    700         client = default_client()
    701         if not client.asynchronous:
--> 702             return client.sync(self._fit, X, y, **fit_params)
    703         return self._fit(X, y, **fit_params)
    704 

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    831         else:
    832             return sync(
--> 833                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    834             )
    835 

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    337     if error[0]:
    338         typ, exc, tb = error[0]
--> 339         raise exc.with_traceback(tb)
    340     else:
    341         return result[0]

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/distributed/utils.py in f()
    321             if callback_timeout is not None:
    322                 future = asyncio.wait_for(future, callback_timeout)
--> 323             result[0] = yield future
    324         except Exception as exc:
    325             error[0] = sys.exc_info()

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_hyperband.py in _fit(self, X, y, **fit_params)
    400 
    401         _SHAs = await asyncio.gather(
--> 402             *[SHAs[b]._fit(X, y, **fit_params) for b in _brackets_ids]
    403         )
    404         SHAs = {b: SHA for b, SHA in zip(_brackets_ids, _SHAs)}

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _fit(self, X, y, **fit_params)
    658                 random_state=self.random_state,
    659                 verbose=self.verbose,
--> 660                 prefix=self.prefix,
    661             )
    662         results = self._process_results(results)

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in fit(model, params, X_train, y_train, X_test, y_test, additional_calls, fit_params, scorer, random_state, verbose, prefix)
    476         random_state=random_state,
    477         verbose=verbose,
--> 478         prefix=prefix,
    479     )
    480 

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _fit(model, params, X_train, y_train, X_test, y_test, additional_calls, fit_params, scorer, random_state, verbose, prefix)
    260     # async for future, result in seq:
    261     for _i in itertools.count():
--> 262         metas = await client.gather(new_scores)
    263 
    264         if log_delay and _i % int(log_delay) == 0:

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1845                             exc = CancelledError(key)
   1846                         else:
-> 1847                             raise exception.with_traceback(traceback)
   1848                         raise exc
   1849                     if errors == "skip":

/opt/conda/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _score()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py in _passthrough_scorer()

/opt/conda/lib/python3.7/site-packages/sklearn/base.py in score()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_regression.py in r2_score()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_regression.py in _check_reg_targets()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite()

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
[CV, bracket=1] For training there are between 46291 and 98195 examples in each chunk

stsievert commented Aug 6, 2020 •

edited

Loading

I think NaNs are happening because poor hyperparameters are chosen and the loss is climbing to infinity. I got the same error when I my step size was too large (a classic result in optimization). Avoiding NaNs when getting the loss is getting large solves the issue for me:

from skorch import NeuralNetRegressor

class NoNaNs(NeuralNetRegressor):
    def get_loss(self, y_pred, y_true, X=None, training=False):
        if torch.abs(y_true - y_pred).abs().mean() > 1e6:
            return torch.tensor([0.0], requires_grad=True)
        return super().get_loss(y_pred, y_true, X=X, training=training)

model = NoNaNs(module=HiddenLayerNet, ..., **niceties)

I think this issue should be reported upstream to Skorch.

(edit) I haven't tested it, but it also might work to have torch.isnan(y_pred).any() in the if-statement.


          Update notebook

fff894b

Contributor

jrbourbeau commented Aug 12, 2020

Thanks for the NoNaNs fix @stsievert! That solved the issue for me too.

FYI I removed the "Visualization" and "Why not simply sampling instead?" sections as, while I found them to be informative, they take several minutes to execute. This also lets us avoid any issues with ParallelPostFit.predict.

Member Author

mrocklin commented Aug 12, 2020 via email

Scott, should we make HyperbandCV robust to NaN's? Is there an obvious way to do this? Treat them as bad results that should be dropped?

On Wed, Aug 12, 2020 at 9:04 AM James Bourbeau ***@***.***> wrote: Thanks for the NoNaNs fix @stsievert <https://github.com/stsievert>! That solved the issue for me too. FYI I removed the "Visualization" and "Why not simply sampling instead?" sections as, while I found them to be informative, they take several minutes to execute. This also lets us avoid any issues with ParallelPostFit.predict. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTD3366CP6LT6UCPAUTSAK4PRANCNFSM4PAQOSDA> .

stsievert commented Aug 12, 2020

Scott, should we make HyperbandCV robust to NaN's? Is there an obvious way to do this? Treat them as bad results that should be dropped?

I think that's a good idea. I might add infinite losses too. The obvious way to catch the NaN error is to use a try/except block around the score/partial_fit functions and check for the string "NaN" in the error string (I think).

HyperbandSearchCV doesn't get an opportunity to see the output of get_loss for classification problems; it's only used in fitting. I think for Scikit-Learn regression estimators it's also not exposed; I think they (largely) use the correlation coefficient as the score for regression.

Contributor

jrbourbeau commented Aug 18, 2020

Thanks all for your work on this example! I'm going to merge this PR and we can fine tune with follow-up PRs. Thanks again!

jrbourbeau merged commit e8c27db into master

jrbourbeau deleted the hpo branch

August 18, 2020 00:09

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet