[SPARK-3007][SQL] Fixes dynamic partitioning support for lower Hadoop versions #2663
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
This is a follow up of #2226 and #2616 to fix Jenkins master SBT build failures for lower Hadoop versions (1.0.x and 2.0.x).
The root cause is the semantics difference of
FileSystem.globStatus()between different versions of Hadoop, as illustrated by the following test code:Target directory structure:
Hadoop 2.4.1 result:
Hadoop 1.0.4 result:
In #2226 and #2616, we call
FileOutputCommitter.commitJob()at the end of the job, and the_SUCCESSmark file is written. When working with lower Hadoop versions, due to theglobStatus()semantics issue,_SUCCESSis included as a separate partition data file byHive.loadDynamicPartitions(), and fails partition spec checking. The fix introduced in this PR is kind of a hack: when inserting data with dynamic partitioning, we intentionally avoid writing the_SUCCESSmarker to workaround this issue.Hive doesn't suffer this issue because
FileSinkOperatordoesn't callFileOutputCommitter.commitJob(), instead, it callsUtilities.mvFileToFinalPath()to cleanup the output directory and then loads it into Hive warehouse by withloadDynamicPartitions()/loadPartition()/loadTable(). This approach is better because it handles failed job and speculative tasks properly. We should add this step toInsertIntoHiveTablein another PR.