Code enhancement: Replaced explicit synchronized access to a hashmap with a concurrent map. #392

varunkatta · 2017-07-26T18:40:21Z

What changes were proposed in this pull request?

Refactored access to a HashMap through an explicit external lock with a implicit internal lock. Thread safe access to the map is maintained.

How was this patch tested?

Ran Unit-tests

liyinan926 · 2017-07-26T18:48:53Z

LGTM.

kimoonkim

LGTM. Thanks for the cleanup.

foxish · 2017-07-26T18:55:03Z

Thanks! This is not a bug-fix correct? If so, please hold off on merge till we cut the 0.3.0 for the 2.2 branch (just to keep 0.3.0 in sync with the 2.1 branch).

mccheah

I think this leaves room for a synchronization bug, but feel free to mention if I'm mistaken here.

mccheah · 2017-07-26T19:01:52Z

.../scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala

        hostToLocalTaskCount
      }
-    for (pod <- executorPodsWithIPs) {
+    for ((_, pod) <- executorPodsByIPs) {


The problem we need to be careful of here is modifications in between each iteration of the for loop. Recall that concurrent hash map only protects concurrent access to one key at a time. But here we care about the whole state; that is, we want the entire map to be consistent throughout the iteration, but we lose that guarantee if we don't lock around the entire iteration cycle.

Per Javadocs at https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentHashMap.html

The following in my understanding.
ConcurrentHashMap is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details. When you iterate, you are guaranteed that you get access to the all the entries in a thread-safe manner. Effectively, you are iterating on a snapshot of the Map's contents (and what you end up reading is dependent on concurrent modifications on the Map by other threads). Multiple threads can iterate on the HashMap in a thread-safe manner too as long as each thread has it own copy of the iterator.

The method here only seems to require, thread-safe access to the map, and it seems to me the change should be safe.

I think the concern from @mccheah is valid if we care about the consistency of the map throughout the entire iterating. Iterating through a ConcurrentHashMap may or may not reflect changes made to the map after the iterator is created, although iteration is thread-safe and won't throw the ConcurrentModificationException. In this particular case, it means we may or may not lose changes made while the map is being iterated through. Also it is mentioned in the Javadoc that "iterators are designed to be used by only one thread at a time."

My change is based on the understanding that we don't need the whole map to be consistent during the iteration the Map. The previous lock there was for thread-safety not for exclusive lock down of the map for access.

"iterators are designed to be used by only one thread at a time."

==> means a single iterator cannot be shared across multiple threads safely. Multiple threads can safely iterate as long as each thread has its own copy of an iterator.

...
But here we care about the whole state; that is, we want the entire map to be consistent throughout the iteration, but we lose that guarantee if we don't lock around the entire iteration cycle.
....

@mccheah - I don't think this is necessary. Confirmed it with @kimoonkim, who is the original author for this method.

Yes. Discovering executor pods is done only as a best effort. No need for consistency.

varunkatta · 2017-07-26T19:04:24Z

This is not a bug fix. Addressed a TODO for a general code enhancement. This code is not directly related to any specific feature work.

mccheah

Changing to approved but would like @kimoonkim to sign off.

satybald · 2017-08-02T21:40:12Z

.../scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala

-  private val EXECUTOR_PODS_BY_IPS_LOCK = new Object
-  // Indexed by executor IP addrs and guarded by EXECUTOR_PODS_BY_IPS_LOCK
-  private val executorPodsByIPs = new mutable.HashMap[String, Pod]
+  private val executorPodsByIPs: concurrent.Map[String, Pod] = new


Why would you like to specify a type of the variable after it? Would it be cleaner without it? i.e.

val executorPodsByIPs = new ConcurrentHashMap[String, Pod]().asScala

The trait concurrent.Map[K,V] is the correct interface/trait for the variable type, but it is abstract and can't be instantiated directly, so in this case typing the variable makes sense. Using scala's concurrent.TrieMap is another option, although the choice is probably arbitrary.

val executorPodsByIPs: concurrent.Map[String, Pod] = concurrent.TrieMap.empty[String, Pod]

https://www.scala-lang.org/api/2.11.8/index.html#scala.collection.concurrent.TrieMap$

@erikerlandson thank you for an explanation. That makes sense for me, though when I typed the line in scala console(2.11.8), the compiler was able to inference concurrent.Map[K, V] class.

scala> val executorPodsByIPs = new ConcurrentHashMap[String, String]().asScala executorPodsByIPs: scala.collection.concurrent.Map[String,String] = Map()

I looked into the docs, and found that .asScala method going to do a conversion from java.util.ConcurrentHashMap[K, V] to scala.concurrent.Map[K, V]

mapAsScalaConcurrentMapConverter

I've looked into databrick scala recomendations, they suggest to avoid concurrent.Map and use j.u.c.ConcurrentHashMap.

Prefer java.util.concurrent.ConcurrentHashMap over scala.collection.concurrent.Map
https://github.com/databricks/scala-style-guide#concurrency

@satybald good point re: style recommendations and SI-7943

@satybald Thanks for bringing this up. Please note that SI-7943 doesn't really affect the thread correctness of our code. It is specific to only the TrieMap implementation. We are already using the java.util.concurrent.ConcurrentHashMap underneath and using the scala.concurrent.Map trait on top for access.

We should not use the trait nevertheless per the community recommendation to avoid any future maintenance or merging of it upstream.

ash211 · 2017-08-21T22:57:50Z

It seems this PR has stalled a bit since the last activity a few weeks ago. @varunkatta what's next here?

varunkatta · 2017-08-23T17:16:03Z

Just back to this PR...I thought this PR was merged long back ; apparently not. I will address the last few comments today.

varunkatta · 2017-08-24T16:23:52Z

@satybald I addressed your comments. Do you want to take a quick look at the diff?

@ash211 Next steps, if there are no more changes requested by reviewers is to merge this PR.

…ode-enhancement

ash211 · 2017-09-07T22:18:45Z

Going to bring in after #459

ash211 · 2017-09-08T03:53:28Z

@varunkatta this is ready for rebase -- please fix merge conflicts

…with a concurrent map. (apache-spark-on-k8s#392) * Replaced explicit synchronized access to hashmap with a concurrent map * Removed usages of scala.collection.concurrent.Map

Replaced explicit synchronized access to hashmap with a concurrent map

7aadcd5

kimoonkim approved these changes Jul 26, 2017

View reviewed changes

mccheah suggested changes Jul 26, 2017

View reviewed changes

mccheah approved these changes Jul 26, 2017

View reviewed changes

kimoonkim approved these changes Jul 26, 2017

View reviewed changes

satybald reviewed Aug 2, 2017

View reviewed changes

varunkatta added 2 commits August 23, 2017 10:25

Merge branch 'branch-2.2-kubernetes' into code-enhancement

d3de5b9

Removed usages of scala.collection.concurrent.Map

850cf2d

varunkatta and others added 4 commits August 30, 2017 10:20

Merge branch 'branch-2.2-kubernetes' into code-enhancement

1ff44c6

Merge branch 'branch-2.2-kubernetes' into code-enhancement

14eed56

Merge branch 'branch-2.2-kubernetes' into code-enhancement

0c54683

Merge branch 'code-enhancement' of github.com:varunkatta/spark into c…

8d1df47

…ode-enhancement

varunkatta mentioned this pull request Sep 6, 2017

Unit Tests for KubernetesClusterSchedulerBackend #459

Merged

Merge branch 'branch-2.2-kubernetes' into code-enhancement

41c2b98

ash211 merged commit e5838c1 into apache-spark-on-k8s:branch-2.2-kubernetes Sep 15, 2017

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019

Merge pull request apache-spark-on-k8s#392 from palantir/rk/upstream

c4f20b9

Code enhancement: Replaced explicit synchronized access to a hashmap with a concurrent map. #392

Code enhancement: Replaced explicit synchronized access to a hashmap with a concurrent map. #392

Uh oh!

Conversation

varunkatta commented Jul 26, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

liyinan926 commented Jul 26, 2017

Uh oh!

kimoonkim left a comment

Choose a reason for hiding this comment

Uh oh!

foxish commented Jul 26, 2017

Uh oh!

mccheah left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varunkatta Jul 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varunkatta Jul 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varunkatta commented Jul 26, 2017

Uh oh!

mccheah left a comment

Choose a reason for hiding this comment

Uh oh!

satybald Aug 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ash211 commented Aug 21, 2017

Uh oh!

varunkatta commented Aug 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

varunkatta commented Aug 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ash211 commented Sep 7, 2017

Uh oh!

ash211 commented Sep 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

varunkatta Jul 26, 2017 •

edited

Loading

varunkatta Jul 26, 2017 •

edited

Loading

satybald Aug 2, 2017 •

edited

Loading

varunkatta commented Aug 23, 2017 •

edited

Loading

varunkatta commented Aug 24, 2017 •

edited

Loading