-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23786][SQL] Checking column names of csv headers #20894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
112ce2d
Checks column names are compatible to provided schema
MaxGekk a85ccce
Checking header is matched to schema in per-line mode
MaxGekk 75e1534
Extract header and check that it is matched to schema
MaxGekk 8eb45b8
Checking column names in header in multiLine mode
MaxGekk 9b1a986
Adding the checkHeader option with true by default
MaxGekk 6442633
Fix csv test by changing headers or disabling header checking
MaxGekk 9440d8a
Adding comment for the checkHeader option
MaxGekk 9f91ce7
Added comments
MaxGekk 0878f7a
Adding a space between column names
MaxGekk a341dd7
Fix a test: checking name duplication in schemas
MaxGekk 98c27ea
Fixing the test and adding ticket number to test's title
MaxGekk 811df6f
Refactoring - removing unneeded parameter
MaxGekk 691cfbc
Output filename in the exception
MaxGekk efb0105
PySpark: adding a test and checkHeader parameter
MaxGekk c9f5e14
Removing unneeded parameter - fileName
MaxGekk e195838
Fix for pycodestyle checks
MaxGekk d6d370d
Adding description of the checkHeader option
MaxGekk acd6d2e
Improving error messages and handling the case when header size is no…
MaxGekk 13892fd
Refactoring: check header by calling an uniVocity method
MaxGekk 476b517
Refactoring: convert val to def
MaxGekk f8167e4
Parse header only if the checkHeader option is true
MaxGekk d068f6c
Moving header checks to CSVDataSource
MaxGekk 08cfcf4
Making uniVocity wrapper unaware of header
MaxGekk f6a1694
Fix the test: error mesage was changed
MaxGekk adbedf3
Revert CSV tests as it was before the option was introduced
MaxGekk 0904daf
Renaming checkHeader to enforceSchema
MaxGekk 191b415
Pass required parameter
MaxGekk 718f7ca
Merge branch 'master' of github.com:apache/spark into check-column-names
MaxGekk 75c1ce6
Merge remote-tracking branch 'origin/master' into check-column-names
MaxGekk ab9c514
Addressing Xiao Li's review comments
MaxGekk 0405863
Making header validation case sensitive
MaxGekk 714c66d
Describing enforceSchema in PySpark's csv method
MaxGekk 78d9f66
Respect to caseSensitive parameter
MaxGekk b43a7c7
Check header on csv parsing from dataset of strings
MaxGekk a5f2916
Merge branch 'master' of github.com:apache/spark into check-column-names
MaxGekk 9b2d403
Make Scala style checker happy
MaxGekk 1fffc16
Merge remote-tracking branch 'origin/master' into check-column-names
MaxGekk ad6cda4
Merge remote-tracking branch 'origin/master' into check-column-names
MaxGekk 4bdabe2
Merge branch 'master' of github.com:apache/spark into check-column-names
MaxGekk 2bd2713
Merge branch 'master' into check-column-names
MaxGekk b4bfd1d
Merge branch 'check-column-names' of github.com:MaxGekk/spark-1 into …
MaxGekk 21f8b10
Removing a space to make Scala style checker happy.
MaxGekk aca4db9
Merge branch 'master' of github.com:apache/spark into check-column-names
MaxGekk e3b4275
Addressing review comments
MaxGekk d704766
Removing unnecessary empty checks
MaxGekk 04199e0
Addressing review comments
MaxGekk d5fde52
Merge remote-tracking branch 'origin/master' into check-column-names
MaxGekk 795a878
Addressing Hyukjin Kwon's review comments
MaxGekk 05fc7cd
Improving description of the option
MaxGekk 9606711
Merge remote-tracking branch 'origin/master' into check-column-names
MaxGekk 11c7591
Addressing Wenchen Fan's review comment
MaxGekk 7dce1e7
Merge remote-tracking branch 'origin/master' into check-column-names
MaxGekk c008328
Output warnings when enforceSchema is enabled and the schema is not c…
MaxGekk 9f7c440
Added tests for inferSchema is true and enforceSchema is false
MaxGekk e83ad60
Rename dropFirstRecord to shouldDropHeader
MaxGekk 26ae4f9
Merge remote-tracking branch 'origin/master' into check-column-names
MaxGekk 4b6495b
Merge remote-tracking branch 'origin/master' into check-column-names
MaxGekk c5ee207
Renaming of 'is not conform' to 'does not conform'
MaxGekk a2cbb7b
Fix Scala coding style
MaxGekk 70e2b75
Added description of checkHeaderColumnNames's arguments
MaxGekk e7c3ace
Test checks a warning presents in logs
MaxGekk 3b37712
fix python tests
MaxGekk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this option to streaming reader and writer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added