Skip to content
This repository was archived by the owner on Apr 24, 2021. It is now read-only.

Conversation

@mrocklin
Copy link
Member

Currently depends on dask/dask-ml#701

This could be improved by using an estimator that benefitted from large
amounts of data.

mrocklin added 3 commits July 18, 2020 16:29
Currently depends on dask/dask-ml#701

This could be improved by using an estimator that benefitted from large
amounts of data.
@mrocklin
Copy link
Member Author

@stsievert if you have any time do you have any thoughts on how this example might be improved?

Copy link

@stsievert stsievert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is nice too see.

There are some improvements to make I think. I've left some comments below to give users a better idea about why they're using Dask, and also some nits. I also think it'd help to provide some text below each title describing what the cell does and why it's required to use Dask. I'd probably point to Dask-ML's hyperparameter optimization docs too.

If you'd like, I might be able to modify this example.

"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would split this into two sections: "Define model and hyperparameters search space" and "Find the best hyperparameters."

"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import SGDClassifier\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure SGDClassifier is relevant according to the 4 categories in https://ml.dask.org/hyper-parameter-search.html. It's a linear model with 6 features; I don't know if I'd label that as "compute constrained."

I think there are a couple options:

  1. Have a more computationally constrained model (e.g, MLPClassifier or PyTorch). (I might use an MLPClassifier then say "Realistically, a PyTorch model might be used. To do that, ... (skorch) ....").
  2. Use IncrementalSearchCV. I think this is the appropriate classifier for the example as written: it's memory-constrained, not compute-constrained.
  3. Search over more hyperparameters. This would make it more computationally constrained; it'd require a higher max_iter in Hyperband.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also point to the docs to make sure the users know why they're using Dask: https://ml.dask.org/hyper-parameter-search.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I chose SGDClassifier just because it was simple. These are all black boxes to me, so I chose the simplest black box about which I could find the most examples :)

" \"store_and_fwd_flag\": \"category\",\n",
" \"PULocationID\": \"UInt16\",\n",
" \"DOLocationID\": \"UInt16\", \n",
" \"payment_type\": \"UInt8\",\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Some columns are included here are never seen again, like PULocationID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was a copy-paste job from another notebook. I should probably remove some of these columns with usecols= I guess.

" blocksize=\"16 MiB\",\n",
")\n",
"\n",
"data = df[[\"passenger_count\", \"trip_distance\", \"RatecodeID\", \"payment_type\", \"fare_amount\"]]\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: RatecodeID is categorical with 5 categories according to the column descriptions docs. Maybe OneHotEncoder should be used on that column?

from dask_ml.preprocessing import OneHotEncoder
rate_indicators = OneHotEncoder().fit_transform(df["RatecodeID"])
# put rate_indicators back into df

"data = df[[\"passenger_count\", \"trip_distance\", \"RatecodeID\", \"payment_type\", \"fare_amount\"]]\n",
"data = data.fillna(0)\n",
"\n",
"labels = (df.tip_amount / df.fare_amount) > 0.25\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I might predict taxi trip duration to mirror https://www.kaggle.com/c/nyc-taxi-trip-duration/. That would imply a regression problem, not a classification problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool. It would be nice to reflect an existing Kaggle problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe at the end show how many minutes you are off:

pred_time = model.score(X_test)
err = np.abs(pred_time - real_time)
pd.Series(err).plot.hist()

"metadata": {},
"outputs": [],
"source": [
"search.score(X_test.sample(frac=0.1, random_state=123), y_test.sample(frac=0.1, random_state=123))"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment on why frac=0.1 is used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was interesting, and something that we might want to think about in Dask-ML.

My current understanding is that search.score calls a scikit-learn scorer on the inputs, and so these are brought into local memory. I imagine that this is because we haven't made dask-compatible scorers for everything. Is that correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ParallelPostFit relevant? It takes a trained model and maps the score/predict functions to each chunk.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? (cc @TomAugspurger)

My sense is that scorers will likely need to be handled one at a time, and that there isn't an obvious way to map them all automatically. It looks like there is a mapping in dask_ml/model_selection/scorer.py. Maybe ParallelPostFit uses that. If so, IncrementalSearchCV and friends (Hyperband) should maybe use the same tricks?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, IncrementalSearchCV and friends (Hyperband) should maybe use the same tricks?

Wrapping it in ParallelPostFit should do the trick, but every key in the hyperparameter dict params would need to be prepended with estimator__.

I'll think more about doing this automatically. My initial reaction is "no", since the default is to fall back to the estimator's default. I wouldn't want to complicate that.

One thing we should be doing is to make something like Hyperband(..., scoring="accuracy") work. Right now we use sklearn.metrics.check_scoring. But if that used dask_ml.metrics.check_scoring things would work. I'll open an issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ParallelPostFit relevant?

I meant this: ParallelPostFit(search.best_estimator_).score(X_test, y_test).

@mrocklin
Copy link
Member Author

If you'd like, I might be able to modify this example.

That would be very very welcome :)

@stsievert
Copy link

stsievert commented Jul 28, 2020

I've made some edits. A summary of the changes:

  • I changed the model to MLPRegressor. I say this is a simple model standing in for a more complicated model that could use GPUs. I point to PyTorch/skorch models and say they can be used.
  • Clarified when to use Hyperband and when to use IncrementalSearchCV. This pointed to the docs: https://ml.dask.org/hyper-parameter-search.html
  • I point to Hyperband's rule-of-thumb to determine the Hyperband's max_iter and the Dask Array chunk size.
  • Added some text explaining each cell/section.
  • Added an error visualization.

This is rough; this is far from a polished draft. @mrocklin let me know what questions you have.

@mrocklin
Copy link
Member Author

Oh, cool. This is fun to play with.

I changed the model to MLPRegressor. I say this is a simple model standing in for a more complicated model that could use GPUs. I point to PyTorch/skorch models and say they can be used.

Thoughts on using PyTorch/Skorch here instead? Would that make things much more complex?

Clarified when to use Hyperband and when to use IncrementalSearchCV. This pointed to the docs: https://ml.dask.org/hyper-parameter-search.html

I think that pointing to docs for a lot of this is good. I like the idea of using Hyperband here, but I don't like the idea of explaining all of the knobs behind Hyperband in a first exposure example like this. I'm curious, are the defaults bad in this case? Would it be ok to omit extra parameters here or do we need to expose those to have things make sense.

I ran into an issue with the classes= keyword not being accepted. Did you run into this too? (Might not be an issue if we get pytorch properly running).

@stsievert
Copy link

I'm curious, are the defaults bad in this case?

You're talking about max_iter and chunk_size? I don't think that's critical here, and I added it to be complete. The default are reasonable: max_iter=81 trains models for 81 calls to partial_fit(X_chunk, y_chunk) and samples 143 hyperparameters. I think that's sufficient for a lot of use cases. But it might be able to be a little better.

I'd still link to the rule of thumb (probably the one in the example; the one in the docstring is hard to link to). I'd also add a note something like "if you want to sample more parameters or train your models for longer, look at HyperbandSearchCV's rule of thumb. Luckily, it's simple and only requires knowing how many hyperparameters to sample and how long to train the model."

Thoughts on using PyTorch/Skorch here instead? Would that make things much more complex?

👍 I think it'd be nice to have PyTorch; we don't have a PyTorch + Hyperband example yet in dask-examples yet. I suspect your users don't want to be tied to Scikit-Learn. Having a PyTorch example would allow users more freedom.

Looking at skorch's getting started guide, it'd amount to this much code:

from skorch import NeuralNetRegressor
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F

class HiddenLayerNet(torch.nn.Module):
    def __init__(self, n_features=10, n_outputs=1, n_hidden=100):
        super().__init__()
        self.fc1 = nn.Linear(n_features, n_hidden)
        self.fc2 = nn.Linear(n_hidden, n_output)

    def forward(self, X, **kwargs):
        return self.fc2(F.relu(self.fc1(x)))

net = NeuralNetRegressor(
    module=HiddenLayerNet,
    module__hiden=200,
    optimizer=optim.SGD,
    optimizer__lr=0.1,
    max_epochs=10,
    # Shuffle training data on each epoch
    iterator_train__shuffle=True,
)

PyTorch modules require float32 input. I'd convert the dataset first.

@mrocklin
Copy link
Member Author

I'd still link to the rule of thumb (probably the one in the example; the one in the docstring is hard to link to). I'd also add a note something like "if you want to sample more parameters or train your models for longer, look at HyperbandSearchCV's rule of thumb. Luckily, it's simple and only requires knowing how many hyperparameters to sample and how long to train the model."

FWIW I suspect that while many researchers find those questions simple to answer I suspect that many practitioners don't have good answers. I think that one of the reasons why Scikit-Learn was popular was that many things worked out of the box with sensible defaults. I wonder if there is a good default solution in this case. (that's probably a problem to solve later though).

+1 I think it'd be nice to have PyTorch; we don't have a PyTorch + Hyperband example yet in dask-examples yet. I suspect your users don't want to be tied to Scikit-Learn. Having a PyTorch example would allow users more freedom.

If you're interested in writing this up I'd be in favor. (I'm really just trying to get as much free labor as I can out of you :) )

@stsievert
Copy link

I've integrated PyTorch.

I didn't have time to debug an issue I ran into: the output of ParallelPostFit(search.best_estimator).predict(X_test) is reported by Dask to be (100, ), but when I compute it's actually (100, 50).

@mrocklin
Copy link
Member Author

mrocklin commented Jul 30, 2020 via email

@TomAugspurger
Copy link

TomAugspurger commented Jul 31, 2020 via email

@jrbourbeau
Copy link
Contributor

Thanks for your work on this @mrocklin @stsievert @TomAugspurger! I pushed a few small updates. Namely:

  • Created a coiled-examples/pytorch cluster configuration to be used in this example
  • Added df = df.categorize(categorical_features) just after the initial read_csv to avoid NotImplementedError: get_dummies with unknown categories is not supported.
  • Commented out several hyperparameters in the search grid / reduced max_iter for HyperbandSearchCV as I found both of these lead to ValueError: Input contains NaN, infinity or a value too large for dtype('float32') errors being raised while calculating model score during the search (full traceback below). I'm not sure what is causing large or non-finite values to appear in the model score. Perhaps an extreme value in model weights? @stsievert have you run into this before? Any thoughts on what might be causing this?
  • Commented out the "Why not simply sampling instead?" code cells as I found them to take a very long time to run.
Traceback:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-9db7871d3c39> in <module>
      1 y_train2 = y_train.reshape(-1, 1).persist()
----> 2 search.fit(X_train, y_train2)

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in fit(self, X, y, **fit_params)
    700         client = default_client()
    701         if not client.asynchronous:
--> 702             return client.sync(self._fit, X, y, **fit_params)
    703         return self._fit(X, y, **fit_params)
    704 

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    831         else:
    832             return sync(
--> 833                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    834             )
    835 

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    337     if error[0]:
    338         typ, exc, tb = error[0]
--> 339         raise exc.with_traceback(tb)
    340     else:
    341         return result[0]

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/distributed/utils.py in f()
    321             if callback_timeout is not None:
    322                 future = asyncio.wait_for(future, callback_timeout)
--> 323             result[0] = yield future
    324         except Exception as exc:
    325             error[0] = sys.exc_info()

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_hyperband.py in _fit(self, X, y, **fit_params)
    400 
    401         _SHAs = await asyncio.gather(
--> 402             *[SHAs[b]._fit(X, y, **fit_params) for b in _brackets_ids]
    403         )
    404         SHAs = {b: SHA for b, SHA in zip(_brackets_ids, _SHAs)}

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _fit(self, X, y, **fit_params)
    658                 random_state=self.random_state,
    659                 verbose=self.verbose,
--> 660                 prefix=self.prefix,
    661             )
    662         results = self._process_results(results)

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in fit(model, params, X_train, y_train, X_test, y_test, additional_calls, fit_params, scorer, random_state, verbose, prefix)
    476         random_state=random_state,
    477         verbose=verbose,
--> 478         prefix=prefix,
    479     )
    480 

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _fit(model, params, X_train, y_train, X_test, y_test, additional_calls, fit_params, scorer, random_state, verbose, prefix)
    260     # async for future, result in seq:
    261     for _i in itertools.count():
--> 262         metas = await client.gather(new_scores)
    263 
    264         if log_delay and _i % int(log_delay) == 0:

~/miniforge3/envs/coiled-jrbourbeau-pytorch/lib/python3.7/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1845                             exc = CancelledError(key)
   1846                         else:
-> 1847                             raise exception.with_traceback(traceback)
   1848                         raise exc
   1849                     if errors == "skip":

/opt/conda/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _score()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py in _passthrough_scorer()

/opt/conda/lib/python3.7/site-packages/sklearn/base.py in score()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_regression.py in r2_score()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_regression.py in _check_reg_targets()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite()

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

@mrocklin
Copy link
Member Author

mrocklin commented Aug 5, 2020

Some thoughts!

  1. First, it's great that we have an environment set up. It was surprisingly hard for me to construct a conda environment with torch and skorch in it for some reason.
  2. I think that we should persist after the read_csv to avoid multiple passes during categorize
  3. I think that we need a few more hyperparameters for this to look interesting. I share the desire to have this complete in a reasonable time. My thinking is 1-3 minutes.
  4. I notice that the partial_fit calls are pretty slow, around 40s. We might want to repartition by size ahead of time, something like df = df.repartition(partition_size="10 MiB") (@stsievert probably knows more)

@mrocklin
Copy link
Member Author

mrocklin commented Aug 5, 2020

When I reduce the partition size I get

Traceback
[CV, bracket=4] creating 81 models
[CV, bracket=3] creating 34 models
[CV, bracket=2] creating 15 models
[CV, bracket=1] creating 8 models
[CV, bracket=0] creating 5 models
[CV, bracket=0] For training there are between 46291 and 98195 examples in each chunk
[CV, bracket=2] For training there are between 46291 and 98195 examples in each chunk
[CV, bracket=3] For training there are between 46291 and 98195 examples in each chunk
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-9db7871d3c39> in <module>
      1 y_train2 = y_train.reshape(-1, 1).persist()
----> 2 search.fit(X_train, y_train2)

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in fit(self, X, y, **fit_params)
    700         client = default_client()
    701         if not client.asynchronous:
--> 702             return client.sync(self._fit, X, y, **fit_params)
    703         return self._fit(X, y, **fit_params)
    704 

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    831         else:
    832             return sync(
--> 833                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    834             )
    835 

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    337     if error[0]:
    338         typ, exc, tb = error[0]
--> 339         raise exc.with_traceback(tb)
    340     else:
    341         return result[0]

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/distributed/utils.py in f()
    321             if callback_timeout is not None:
    322                 future = asyncio.wait_for(future, callback_timeout)
--> 323             result[0] = yield future
    324         except Exception as exc:
    325             error[0] = sys.exc_info()

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_hyperband.py in _fit(self, X, y, **fit_params)
    400 
    401         _SHAs = await asyncio.gather(
--> 402             *[SHAs[b]._fit(X, y, **fit_params) for b in _brackets_ids]
    403         )
    404         SHAs = {b: SHA for b, SHA in zip(_brackets_ids, _SHAs)}

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _fit(self, X, y, **fit_params)
    658                 random_state=self.random_state,
    659                 verbose=self.verbose,
--> 660                 prefix=self.prefix,
    661             )
    662         results = self._process_results(results)

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in fit(model, params, X_train, y_train, X_test, y_test, additional_calls, fit_params, scorer, random_state, verbose, prefix)
    476         random_state=random_state,
    477         verbose=verbose,
--> 478         prefix=prefix,
    479     )
    480 

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _fit(model, params, X_train, y_train, X_test, y_test, additional_calls, fit_params, scorer, random_state, verbose, prefix)
    260     # async for future, result in seq:
    261     for _i in itertools.count():
--> 262         metas = await client.gather(new_scores)
    263 
    264         if log_delay and _i % int(log_delay) == 0:

~/miniconda/envs/coiled-coiled-examples-pytorch/lib/python3.7/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1845                             exc = CancelledError(key)
   1846                         else:
-> 1847                             raise exception.with_traceback(traceback)
   1848                         raise exc
   1849                     if errors == "skip":

/opt/conda/lib/python3.7/site-packages/dask_ml/model_selection/_incremental.py in _score()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_scorer.py in _passthrough_scorer()

/opt/conda/lib/python3.7/site-packages/sklearn/base.py in score()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_regression.py in r2_score()

/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_regression.py in _check_reg_targets()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite()

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
[CV, bracket=1] For training there are between 46291 and 98195 examples in each chunk

@stsievert
Copy link

stsievert commented Aug 6, 2020

I think NaNs are happening because poor hyperparameters are chosen and the loss is climbing to infinity. I got the same error when I my step size was too large (a classic result in optimization). Avoiding NaNs when getting the loss is getting large solves the issue for me:

from skorch import NeuralNetRegressor

class NoNaNs(NeuralNetRegressor):
    def get_loss(self, y_pred, y_true, X=None, training=False):
        if torch.abs(y_true - y_pred).abs().mean() > 1e6:
            return torch.tensor([0.0], requires_grad=True)
        return super().get_loss(y_pred, y_true, X=X, training=training)

model = NoNaNs(module=HiddenLayerNet, ..., **niceties)

I think this issue should be reported upstream to Skorch.

(edit) I haven't tested it, but it also might work to have torch.isnan(y_pred).any() in the if-statement.

@jrbourbeau
Copy link
Contributor

Thanks for the NoNaNs fix @stsievert! That solved the issue for me too.

FYI I removed the "Visualization" and "Why not simply sampling instead?" sections as, while I found them to be informative, they take several minutes to execute. This also lets us avoid any issues with ParallelPostFit.predict.

@mrocklin
Copy link
Member Author

mrocklin commented Aug 12, 2020 via email

@stsievert
Copy link

Scott, should we make HyperbandCV robust to NaN's? Is there an obvious way to do this? Treat them as bad results that should be dropped?

I think that's a good idea. I might add infinite losses too. The obvious way to catch the NaN error is to use a try/except block around the score/partial_fit functions and check for the string "NaN" in the error string (I think).

HyperbandSearchCV doesn't get an opportunity to see the output of get_loss for classification problems; it's only used in fitting. I think for Scikit-Learn regression estimators it's also not exposed; I think they (largely) use the correlation coefficient as the score for regression.

@jrbourbeau
Copy link
Contributor

Thanks all for your work on this example! I'm going to merge this PR and we can fine tune with follow-up PRs. Thanks again!

@jrbourbeau jrbourbeau merged commit e8c27db into master Aug 18, 2020
@jrbourbeau jrbourbeau deleted the hpo branch August 18, 2020 00:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants