-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown #19045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
holdenk
wants to merge
213
commits into
apache:master
from
holdenk:SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r2
Closed
[WIP][SPARK-20628][CORE][K8S] Keep track of nodes (/ spot instances) which are going to be shutdown #19045
Changes from all commits
Commits
Show all changes
213 commits
Select commit
Hold shift + click to select a range
81fff20
Start of work on adventures
holdenk e470bac
Mini progresss
holdenk a00c707
Go down the path of handling as lost but urgh lets just blacklist ins…
holdenk 74ade44
Plumb through executor loss to the scheduables
holdenk a880177
AppClient suite works! yay
holdenk b970403
Decomissioning now works in the coarse grained scheduler, yay....
holdenk ded6bbc
Remove sketchy println debugging
holdenk 16c855a
Add a worker decommissioning suite
holdenk c79a06d
Merge in latest master
holdenk e3798d0
Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…
holdenk 4f70706
Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…
holdenk 07c3e3e
Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…
holdenk c2a0ad8
Add decommissioning script for whatever process is running locally on…
holdenk 672c3b6
Leave polling mechanism up to the cloud vendors
holdenk 9cfdb7f
Remove legacy comment and remove some unecessary blank lines
holdenk 65a29c1
Remove manually debugging printlns (oops)
holdenk 9f08b7e
Merge in master
holdenk 258a116
Update and add blocking for K8s
holdenk c40fac5
Add workerDecomissioning to K8s conf
holdenk 0ba0ca5
Merge in master
holdenk 5877c16
Tidy up small things.
holdenk fdb3598
Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…
holdenk 4e6572f
Fix missing endLifecycle
holdenk 42a29ab
Add a WIP Decom suite work
holdenk 745f206
Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…
holdenk c7eaaf6
Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…
holdenk 05941da
Attempt at making decomissioning integration test for Spark on K8s co…
holdenk cb61f45
Add initial decomissioning_water helper script
holdenk 963a289
We don't use the JavaConverters in this test suite.
holdenk 1cc1436
1.to(10) is scala code, use range since we're in Python.
holdenk 9036b44
Fix style issue with blank line at end of file.
holdenk d58f2a6
Remove unneeded appArgs
holdenk 5ae1bd7
Add missing sys import
holdenk 8f6cff2
Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…
holdenk be38dab
Add back appArgs since despite Ilan's comments during the stream they…
holdenk bbeceb9
Extend DecommissionSuite so the tests are triggered.
holdenk e5fb644
Check all containers
holdenk 164fa2a
Wait for the pod to become ready for before we kill it.
holdenk 151d5d8
Merge in master
holdenk ca448d1
import the test tag idk why it won't run
holdenk 8d504b2
Remove import
holdenk c2b3e6e
Merge in master
holdenk 7b0023a
Maybe we don't need to explicitly set the docker image since the Pyth…
holdenk cca9948
We don't use () anymore on the properties in kubeconf
holdenk 1bbb69b
Remove unused imports
holdenk 00ae5e9
Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…
holdenk af048f5
Maybe the class loading issue is from two traits with the same test n…
holdenk 43abf98
Configure the image for Python.
holdenk abb0609
Mispelled decommissioning in the python test file.
holdenk 9d4fc23
Change ref
holdenk 7f3fd5f
The spark context on the session object is stored as _sc
holdenk 8306827
Speed up running the kubernetes integration tests locally by allowing…
holdenk 154c8b9
Log exec decom for test
holdenk 3020ef8
Fix log msg check
holdenk 27b4edd
30 seconds why not
holdenk 00310f9
Some temporary printlns for debugging in Jenkins
holdenk c3c0e3a
Just run the decom suite.
holdenk 0bf027a
Try and debug the tests some more.
holdenk be18e52
Re-enable basic test suite.
holdenk 2914581
More debugging
holdenk 953094a
Hey did we not run the Python tests?
holdenk 5d173bd
Python tests aren't registering the executors, lets avoid that noise …
holdenk 705fd58
Fix using SparkPI for decom test.
holdenk ca60dbf
Enable all the tests...
holdenk 044f8c5
more ...
holdenk c09867b
[SPARK-26193][SQL][FOLLOW UP] Read metrics rename and display text ch…
xuanyuanking afe463c
[SPARK-19827][R][FOLLOWUP] spark.ml R API for PIC
srowen c76d70a
[SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - add…
imatiach-msft d639627
[SPARK-25877][K8S] Move all feature logic to feature classes.
3ec9e3b
[SPARK-25277][YARN] YARN applicationMaster metrics should not registe…
LucaCanali d9bccb5
[SPARK-26322][SS] Add spark.kafka.sasl.token.mechanism to ease delega…
gaborgsomogyi b650414
[SPARK-26297][SQL] improve the doc of Distribution/Partitioning
cloud-fan 73a373b
[SPARK-26348][SQL][TEST] make sure expression is resolved during test
cloud-fan 4a5acc7
[SPARK-26355][PYSPARK] Add a workaround for PyArrow 0.11.
ueshin 239b8ec
[MINOR][R] Fix indents of sparkR welcome message to be consistent wit…
AzureQ bda9e84
[MINOR][DOC] Fix comments of ConvertToLocalRelation rule
seancxmao d2a58a1
[MINOR][DOC] update the condition description of BypassMergeSortShuffle
lcqzte10192193 6a1cdf4
[SPARK-26340][CORE] Ensure cores per executor is greater than cpu per…
9b127e1
[SPARK-26313][SQL] move `newScanBuilder` from Table to read related m…
cloud-fan 14b4978
[SPARK-26098][WEBUI] Show associated SQL query in Job page
gengliangwang dde56e4
[SPARK-23886][SS] Update query status for ContinuousExecution
gaborgsomogyi 3fac7d4
[SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf*
icexelloss 611a13b
[SPARK-26360] remove redundant validateQuery call
JasonWayne 41b0107
[SPARK-26337][SQL][TEST] Add benchmark for LongToUnsafeRowMap
viirya 2ba82a7
[SPARK-26368][SQL] Make it clear that getOrInferFileFormatSchema does…
rxin e3b1790
[SPARK-26370][SQL] Fix resolution of higher-order function for the sa…
ueshin 736df41
[MINOR][SQL] Some errors in the notes.
ff2a8b8
[SPARK-26265][CORE][FOLLOWUP] Put freePage into a finally block
viirya aa58472
[SPARK-26362][CORE] Remove 'spark.driver.allowMultipleContexts' to di…
HyukjinKwon 6fce1af
[SPARK-26315][PYSPARK] auto cast threshold from Integer to Float in a…
ecd5aa1
[SPARK-26243][SQL] Use java.time API for parsing timestamps and dates…
MaxGekk ffbc6b1
[SPARK-26078][SQL] Dedup self-join attributes on IN subqueries
mgaido91 4c2af74
[SPARK-26372][SQL] Don't reuse value from previous row when parsing b…
bersprockets f2a56a6
[SPARK-26248][SQL] Infer date type from CSV
MaxGekk 51a1cbb
[MINOR][DOCS] Fix the "not found: value Row" error on the "programmat…
kjmrknsn c26df2b
Revert "[SPARK-26248][SQL] Infer date type from CSV"
HyukjinKwon 90c9bd5
[SPARK-26352][SQL] join reorder should not change the order of output…
rednaxelafx 33de7df
[SPARK-26327][SQL][FOLLOW-UP] Refactor the code and restore the metri…
gatorsmile a1c97b5
[SPARK-20636] Add the rule TransposeWindow to the optimization batch
gatorsmile 0ed3f6a
[SPARK-26243][SQL][FOLLOWUP] fix code style issues in TimestampFormat…
cloud-fan 62a8466
[SPARK-20351][ML] Add trait hasTrainingSummary to replace the duplica…
YY-OnCall 97a1d0d
[SPARK-26255][YARN] Apply user provided UI filters to SQL tab in yar…
chakravarthiT 53e05ac
[SPARK-26371][SS] Increase kafka ConfigUpdater test coverage.
gaborgsomogyi 9bbc1f9
[SPARK-24933][SS] Report numOutputRows in SinkProgress
vackosar a97ca7a
[SPARK-25922][K8] Spark Driver/Executor "spark-app-selector" label mi…
suxingfate 3ee251f
[SPARK-24561][SQL][PYTHON] User-defined window aggregation functions …
icexelloss 435392e
[SPARK-26246][SQL] Inferring TimestampType from JSON
MaxGekk d33bf4b
[SPARK-26081][SQL][FOLLOW-UP] Use foreach instead of misuse of map (f…
HyukjinKwon 77d78b8
[SPARK-24680][DEPLOY] Support spark.executorEnv.JAVA_HOME in Standalo…
stanzhai 4af7980
[SPARK-26384][SQL] Propagate SQL configs for CSV schema inferring
MaxGekk 0702b70
[SPARK-26382][CORE] prefix comparator should handle -0.0
cloud-fan c446c9e
[SPARK-26394][CORE] Fix annotation error for Utils.timeStringAsMs
4578d12
[SPARK-25815][K8S] Support kerberos in client mode, keytab-based toke…
93089b5
[SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False
mgaido91 c3a7a52
[SPARK-26390][SQL] ColumnPruning rule should only do column pruning
cloud-fan 2867beb
[SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLE…
maropu 4b72301
[SPARK-25271][SQL] Hive ctas commands should use data source if it is…
viirya 2aa1022
[SPARK-26318][SQL] Deprecate Row.merge
KyleLi1985 029933c
[SPARK-26308][SQL] Avoid cast of decimals for ScalaUDF
mgaido91 a647251
[SPARK-24687][CORE] Avoid job hanging when generate task binary cause…
caneGuy b011de3
[SPARK-26324][DOCS] Add Spark docs for Running in Mesos with SSL
jomach f4e4c1a
[SPARK-26409][SQL][TESTS] SQLConf should be serializable in test sess…
gengliangwang ec539d1
[SPARK-26392][YARN] Cancel pending allocate requests by taking locali…
Ngone51 cc07eae
[SPARK-25970][ML] Add Instrumentation to PrefixSpan
zhengruifeng 07d111d
[MINOR][SQL] Locality does not need to be implemented
10110346 75ff5d9
[SPARK-26422][R] Support to disable Hive support in SparkR even for H…
HyukjinKwon 65de564
[SPARK-26267][SS] Retry when detecting incorrect offsets from Kafka
zsxwing def767a
[SPARK-26269][YARN] Yarnallocator should have same blacklist behaviou…
Ngone51 38930f0
[SPARK-25642][YARN] Adding two new metrics to record the number of re…
ac33584
[SPARK-26216][SQL][FOLLOWUP] use abstract class instead of trait for …
cloud-fan ffd2ef6
[SPARK-26427][BUILD] Upgrade Apache ORC to 1.5.4
dongjoon-hyun 2b50381
[SPARK-26428][SS][TEST] Minimize deprecated `ProcessingTime` usage
dongjoon-hyun 3892c5d
[SPARK-26430][BUILD][TEST-MAVEN] Upgrade Surefire plugin to 3.0.0-M2
dongjoon-hyun c19093a
[SPARK-26285][CORE] accumulator metrics sources for LongAccumulator a…
ce19610
[SPARK-25245][DOCS][SS] Explain regarding limiting modification on "s…
HeartSaVioR c03498e
[SPARK-26402][SQL] Accessing nested fields with different cases in ca…
dbtsai 4536d53
[SPARK-26178][SPARK-26243][SQL][FOLLOWUP] Replacing SimpleDateFormat …
MaxGekk e73d73e
[SPARK-14023][CORE][SQL] Don't reference 'field' in StructField error…
srowen 35c680e
[SPARK-26426][SQL] fix ExpresionInfo assert error in windows operatio…
210550c
[SPARK-26424][SQL] Use java.time API in date/timestamp expressions
MaxGekk 46913ce
[SPARK-26435][SQL] Support creating partitioned table using Hive CTAS…
viirya a72b963
[SPARK-26191][SQL] Control truncation of Spark plans via maxFields pa…
MaxGekk b3032c9
[SPARK-25892][SQL] Change AttributeReference.withMetadata's return ty…
kevinyu98 e6d6eaf
[SPARK-26451][SQL] Change lead/lag argument name from count to offset
deepyaman 11f1c8d
[SPARK-26446][CORE] Add cachedExecutorIdleTimeout docs at ExecutorAll…
fcbec31
[SPARK-26444][WEBUI] Stage color doesn't change with it's status
seancxmao 1bb70d9
[SPARK-26424][SQL][FOLLOWUP] Fix DateFormatClass/UnixTime codegen
dongjoon-hyun bb07fe9
Maybe we have a race condition with the watcher? idk why we aren't ge…
holdenk ea77b23
wtf is going on with this eventually block
holdenk c120713
Rewrite the eventually's to use should be which I had accidently remo…
holdenk fecd0cf
Change how we handle decom tests to actually decom workers and check …
holdenk b668feb
Revert "Fix using SparkPI for decom test."
holdenk 2905c8a
Revert "Python tests aren't registering the executors, lets avoid tha…
holdenk b77eb9e
Match the wait logic (TODO refactor)
holdenk 9f069a4
Debug pods not becoming ready.
holdenk 789bbdd
special format string.
holdenk 348e2f8
Merge in master
holdenk bf834e2
If all the pods are done we don't need the pods to be ready.
holdenk 1ed5c3d
Sleep 100 before waiting.
holdenk 7acc255
Give the pod 5 minutes to become ready.
holdenk b2c58b6
I think we might have had a race condition where the top thread delet…
holdenk 92f4289
Use POD_RUNNING_TIMEOUT
holdenk 4774b79
Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…
holdenk 739cea0
Merge in master
holdenk 0b33e1a
Fix appclient suite
holdenk 2dc806e
Try and debug the waiting on killing the exec pod logic.
holdenk 1963bf3
Maybe we were not actually hitting the k8s end point which is why the…
holdenk 5771b05
re-order tests to fail faster.
holdenk a39cd85
Refactor the pod ready status check to be shared in the two places we…
holdenk 3a16ee8
More debugging and use shouldBe rather than a direct assert.
holdenk 8044441
Name isn't a label, just get the pod by name directly.
holdenk 25dc907
Get namespace as well since we are not finding the pod whuich is odd.
holdenk 46b5725
Fix namespace for pod exec check
holdenk 4154eef
Fix pod check
holdenk 8a2f5a7
For now skip scala style println checks in KubernetesSuite while we'r…
holdenk 2b45f9a
Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…
holdenk 7d4f264
Fix long line comment in KubernetesSuite
holdenk 367a666
Print out when we are running decom suite.
holdenk e54dbaa
Print out the namespace as well.
holdenk 06bbbed
Debugging is easier when I see what failed
holdenk e8081cb
Without specifiying a spark release tarball the setup env script will…
holdenk 908b204
I don't know why the ready check isn't doing what I expected, lets br…
holdenk 158838c
Fix filter
holdenk a6ad1ff
Remove unrelated subquery suite change
holdenk d85c229
Revert "Speed up running the kubernetes integration tests locally by …
holdenk e770092
Take out set -e because I _think_ in integration env this fails with …
holdenk 0cad83a
Temporary commit to support running tests locally, should be part of …
holdenk ad3474d
Tests are running locally, pod is created and deleted but we don't ge…
holdenk 23decb9
Move the config variable for decom into worker, add a bit more loggin…
holdenk 78d02d3
Fix instances of decomi , register SIGPWR in CoarseGrainedExecutorBac…
holdenk fa6db32
Fixed compile errors from last
holdenk 6d31986
s/SIGPWR/PWR/ in the Scala code.
holdenk ba755de
Add lifecycle change
holdenk 777da86
Print out when we're getting ready to stop Spark and increase sleep
holdenk a61186b
Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-…
holdenk 209bf18
Try and debug whats going on with our container lifecycle. Currently …
holdenk 0faf47c
Try and check that lifecycle is being added with more logging and a l…
holdenk 40b75f5
Print out our scala version and set it for downstream.
holdenk 0ee3ae0
Disable R build because it's not working and also set SPARK_SCALA_VER…
holdenk cd08175
Add scala version as a param but I don't think we need this
holdenk 387035b
Revert "Add scala version as a param but I don't think we need this"
holdenk f11d9f5
Lifecycle now running ok take out the bad lifecycle stage
holdenk 727f76e
decom script should be executable
holdenk b37892d
Decom script is in the pod yaml, not seeing SIGTERM anymore in the lo…
holdenk c1917b5
Try and log our exit process
holdenk fb559fe
Fix printing the worker pid
holdenk 0982c11
Wait that awk statement wasn't doing anything for me
holdenk 6c41552
Attempt to merge in master
holdenk 09a01cf
Fix minor style issues after merge
holdenk 9a5000d
Add license header to decom script
holdenk e271a1d
waitpid is the syscall wait is the shell command
holdenk 7400792
Start cleaning up the decom script, todo fix the PID extraction
holdenk 55fa260
Print out the termination log at the end as well
holdenk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used? I only saw the use of
WorkerDecommissionwhich isDeployMessage.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at Master.scala ( https://github.com/apache/spark/pull/19045/files#diff-29dffdccd5a7f4c8b496c293e87c8668R243 )