Consider persist=False as default when creating table from DataFrame

By default, when creating a table from a Dask Dataframe, the `create_table` function persists the Dask DataFrame in memory.

This was surprising (partly because I didn't see it in the doc examples) when working on a large ETL job. I suspect for most users (especially those running larger jobs on systems where data is larger than memory) this will be a surprising realization. In my case, I spent time trying to debug which compute tasks might be using more memory than expected only to eventually realize it was just the persist call which failed. Once I re-created the tables w/ `persist=False`, my workflow completed successfully.

Is there a rationale for persisting by default?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider persist=False as default when creating table from DataFrame #218

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider persist=False as default when creating table from DataFrame #218

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions