You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-40667][SQL] Refactor File Data Source Options
### What changes were proposed in this pull request?
Code refactor on all File data source options:
- `TextOptions`
- `CSVOptions`
- `JSONOptions`
- `AvroOptions`
- `ParquetOptions`
- `OrcOptions`
- `FileIndex` related options
Change semantics:
- First, we introduce a new trait `DataSourceOptions`, which defines the following functions:
- `newOption(name)`: Register a new option
- `newOption(name, alternative)`: Register a new option with alternative
- `getAllValidOptions`: retrieve all valid options
- `isValidOption(name)`: validate a given option name
- `getAlternativeOption(name)`: get alternative option name if any
- Then, for each class above
- Create/update its companion object to extend from the trait above and register all valid options within it.
- Update places where name strings are used directly to fetch option values to use those option constants instead.
- Add a unit test for each file data source options
### Why are the changes needed?
Currently for each file data source, all options are placed sparsely in the options class and there is no clear list of all options supported. As more and more options are added, the readability get worse. Thus, we want to refactor those codes so that
- we can easily get a list of supported options for each data source
- enforce better practice for adding new options going forwards.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Closesapache#38113 from xiaonanyang-db/SPARK-40667.
Authored-by: xiaonanyang-db <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
0 commit comments