Skip to content

Conversation

@steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Nov 10, 2021

Cut modtime-based rename recovery as it doesn't work.
Applications will have to use etag API of HADOOP-17979
and implement it themselves.

Why not do the HEAD and etag recovery in ABFS client?
Cuts the IO capacity in half so kills job commit performance.
The manifest committer of MAPREDUCE-7341 will do this recovery
and act as the reference implementation of the algorithm.

Change-Id: I071fb31967ae63e0247e2f328d9cfd0e2423b2bf

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

Cut modtime-based rename recovery as it doesn't work.
Applications will have to use etag API of HADOOP-17979
and implement it themselves.

Why not do the HEAD and etag recovery in ABFS client?
Cuts the IO capacity in half so kills job commit performance.
The manifest committer of MAPREDUCE-7341 will do this recovery
and act as the reference implementation of the algorithm.

Change-Id: I071fb31967ae63e0247e2f328d9cfd0e2423b2bf
@steveloughran
Copy link
Contributor Author

@snvijaya @joshelser @mehakmeet @mukund-thakur

modified tests all happy; running full suite against azure cardiff

@steveloughran
Copy link
Contributor Author

full test run ok, overloaded system triggered

ITestAzureBlobFileSystemFileStatus.testLastModifiedTime:144->Assert.assertTrue:42->Assert.fail:89 lastModifiedTime should be before createEndTime

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 4s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 33m 11s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 0m 34s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 0m 28s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 javadoc 0m 34s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 2s trunk passed
+1 💚 shadedclient 20m 12s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 32s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 0m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 18s the patch passed
+1 💚 mvnsite 0m 30s the patch passed
+1 💚 javadoc 0m 23s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 0m 23s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 1m 8s the patch passed
+1 💚 shadedclient 20m 29s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 5s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 32s The patch does not generate ASF License warnings.
87m 32s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3641/1/artifact/out/Dockerfile
GITHUB PR #3641
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 09e503fa080c 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b0e403d
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3641/1/testReport/
Max. process+thread count 572 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3641/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@mukund-thakur
Copy link
Contributor

Why can't we just revert the commit rather than deleting it? Also there is delete idempotency checks as well not sure if they also use same LMT check.

@steveloughran
Copy link
Contributor Author

good idea..let me see. delete idempotency is easier...if the file isn't there, delete worked

@steveloughran
Copy link
Contributor Author

@mukund-thakur way too many changes since the original change went in -and that did rename and delete

@steveloughran
Copy link
Contributor Author

this patch is complete. I have nothing further to do in it, all it needs is review.

Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@mukund-thakur mukund-thakur merged commit 3391b69 into apache:trunk Nov 16, 2021
asfgit pushed a commit that referenced this pull request Nov 17, 2021
Cut modtime-based rename recovery as object modification time
is not updated during rename operation.
Applications will have to use etag API of HADOOP-17979
and implement it themselves.

Why not do the HEAD and etag recovery in ABFS client?
Cuts the IO capacity in half so kills job commit performance.
The manifest committer of MAPREDUCE-7341 will do this recovery
and act as the reference implementation of the algorithm.

Contributed by: Steve Loughran

Change-Id: I810054c9fd05041dac552f13d31fb15d7524721b
sunchao pushed a commit that referenced this pull request Jan 4, 2022
Cut modtime-based rename recovery as object modification time
is not updated during rename operation.
Applications will have to use etag API of HADOOP-17979
and implement it themselves.

Why not do the HEAD and etag recovery in ABFS client?
Cuts the IO capacity in half so kills job commit performance.
The manifest committer of MAPREDUCE-7341 will do this recovery
and act as the reference implementation of the algorithm.

Contributed by: Steve Loughran

Change-Id: I810054c9fd05041dac552f13d31fb15d7524721b
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
…#3641)


Cut modtime-based rename recovery as object modification time
is not updated during rename operation.
Applications will have to use etag API of HADOOP-17979
and implement it themselves.

Why not do the HEAD and etag recovery in ABFS client?
Cuts the IO capacity in half so kills job commit performance.
The manifest committer of MAPREDUCE-7341 will do this recovery
and act as the reference implementation of the algorithm.

Contributed by: Steve Loughran
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants