- 
                Notifications
    You must be signed in to change notification settings 
- Fork 28.9k
          [SPARK-16126] [SQL] Better Error Message When using DataFrameReader without path
          #13837
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pathpath
      | Test build #61016 has finished for PR 13837 at commit  
 | 
| Test build #61044 has started for PR 13837 at commit  | 
| Weird? How to stop this test case run? | 
| retest this please | 
| Test build #61074 has finished for PR 13837 at commit  
 | 
| val availableCodecs = shortParquetCompressionCodecNames.keys.map(_.toLowerCase) | ||
| throw new IllegalArgumentException(s"Codec [$codecName] " + | ||
| s"is not available. Available codecs are ${availableCodecs.mkString(", ")}.") | ||
| s"is not available. Known codecs are ${availableCodecs.mkString(", ")}.") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make it consistent with the output of the other cases. See the code:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CompressionCodecs.scala
Lines 49 to 51 in d6dc12e
| case e: ClassNotFoundException => | |
| throw new IllegalArgumentException(s"Codec [$codecName] " + | |
| s"is not available. Known codecs are ${shortCompressionCodecNames.keys.mkString(", ")}.") | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Available was intentionally used because Parquet only supports snappy, gzip or lzo whereas Known was used for text-based ones (Please see #10805 (comment)) as they support compression codecs including other codecs but that lists the known ones.
| Test build #68556 has finished for PR 13837 at commit  
 | 
| Test build #68567 has finished for PR 13837 at commit  
 | 
| Test build #68579 has finished for PR 13837 at commit  
 | 
| expect_error(read.df(source = "json"), | ||
| paste("Error in loadDF : analysis error - Unable to infer schema for JSON at .", | ||
| "It must be specified manually")) | ||
| paste("Error in loadDF : illegal argument - 'path' is not specified")) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall this test is intentionally testing without path argument?
cc @HyukjinKwon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cc'ing me. Yes, I did. It seems the changes are reasonable as it seems this checking applies to the data sources that need path.
| val equality = sparkSession.sessionState.conf.resolver | ||
| StructType(schema.filterNot(f => partitionColumns.exists(equality(_, f.name)))) | ||
| }.orElse { | ||
| if (allPaths.isEmpty && !format.isInstanceOf[TextFileFormat]) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gatorsmile, would this be better if we explain here text data source is excluded because text datasource always uses a schema consisting of a string field if the schema is not explicitly given?
BTW, should we maybe change text.TextFileFormat to TextFileFormat https://github.com/gatorsmile/spark/blob/45110370fb1889f244a6750ef2a18dbc9f1ba9c2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L139 ?
| hi - where are we on this one? | 
| (gentle ping) | 
What changes were proposed in this pull request?
When users do not specify the path in
DataFrameReaderAPIs, it can get a confusing error message. For example,Error message:
After the fix, the error message will be like:
Another major goal of this PR is to add test cases for the latest changes in #13727.
prevent all column partitioningHow was this patch tested?
Test cases are added.