Skip to content

Conversation

@kimoonkim
Copy link
Member

The prototype referred to by #1. See README.md for usage.

Cc @foxish @ssuchter @ash211

@foxish
Copy link
Member

foxish commented Mar 24, 2017

/cc @kow3ns

@foxish
Copy link
Member

foxish commented Mar 27, 2017

cc/ @prb also for thoughts on this.

daemon.

```
$ kubectl label nodes YOUR-HOST hdfs-namenode-selector=hdfs-namenode-0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a PV/PVC for the name-node, we could probably skip this step. Even if we don't do that here and continue to use hostpath, can we add a comment here to clarify why we're doing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's my thinking too. I'll add a comment.


2. Find the IP of your `kube-dns` name server that resolves pod and service
host names in your k8s cluster. Default is 10.96.0.10. It will be supplied
below as the `clusterDnsIP` parameter. Try this command and find the IP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this configuration at all for clusterDnsIP? If we do, we should be using that of the service fronting kube-dns. You can get that IP address through kubectl get svc --all-namespaces | grep dns.

The individual kube-dns pods can get evicted and change their IP addresses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to fill out /etc/resolv.conf fordatanodes. I meant the service IP, not pod IP of kube-dns. I think the command line example below is equivalent of what you're suggesting. I'll clarify.

labels:
name: hdfs-datanode
annotations:
scheduler.alpha.kubernetes.io/tolerations: |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want it to schedule on the master node? It's not typical to run any user pods on the master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't know what this annotation does. I don't want the master node either. Maybe dropping this annotation is the solution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. So it seems this annotation specifically allows the master node to run a member daemon. I dropped it because that's not what we wanted.

But a daemon is still scheduled on the master node. I think I read about this behavior as a bug. Anyway, not having this annotation is better. So I'll update the patch.

- name: datanode
image: uhopper/hadoop-datanode:2.7.2
env:
# This works only with /etc/resolv.conf mounted from the config map.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is also due to the docker version (1.12+) you're running?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I don't think it's docker version. We were using much older version 1.10.x when I found this issue. Anyway, I'm looking forward to try out kubernetes 1.6 and get rid of this part.

@foxish
Copy link
Member

foxish commented Mar 30, 2017

Thanks for the PR @kimoonkim. I've left a few comments that we can discuss.

@kimoonkim kimoonkim changed the title Add a k8s helm chart that runs HDFS daemons in Kubernetes Add k8s helm charts that run HDFS daemons in Kubernetes Mar 30, 2017
Copy link
Member Author

@kimoonkim kimoonkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @foxish. Added some suggested comments in the new diff and answered some questions in-line. PTAL.

Please notice I commited 6373749 in between, that separates the files into two charts, one for the namenode and the other for datanodes. I was fighting two subtle but important bugs and this commit fixes them:

  1. datanodes get stuck at startup if the statefulset DNS of namenode is not set yet. In practice, this means you want to start namenode first. And start datanode only afterward. Having two charts makes it possible.

  2. The overlay network weave that we are using gets in the way if namenode is not using hostNetwork. It makes connections from datanode to namenode go through its virtual NICs. This leads namenode to believe datanode IPs come from those virtual NICs. The fix is switching namenode to hostNetwork as well.

There is no other changes in 6373749

A good news is that I was able to run a Spark DfsReadWriteTEST job succesfully against this HDFS after fixing those bugs.

- name: datanode
image: uhopper/hadoop-datanode:2.7.2
env:
# This works only with /etc/resolv.conf mounted from the config map.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I don't think it's docker version. We were using much older version 1.10.x when I found this issue. Anyway, I'm looking forward to try out kubernetes 1.6 and get rid of this part.

@kimoonkim
Copy link
Member Author

@foxish Addressed all your comments. Maybe ready for another look before merge?

@foxish
Copy link
Member

foxish commented Apr 3, 2017

Thanks for addressing comments @kimoonkim. One last item I want to address is the DNS server IP address being supplied. Instead of targeting an individual kube-dns pod, we can use the kube-dns service which has a cluster-ip of 10.0.0.10 by default. Do you think that would help here?

@foxish
Copy link
Member

foxish commented Apr 3, 2017

I see. On your deployment, you're ending up with a serviceIP of 10.96.0.10. Maybe we should leave the default out of the readme and other places and just point to the way people can resolve that in their own clusters, using kubectl get svc --namespace=kube-system | grep kube-dns.

@kimoonkim
Copy link
Member Author

That suggestion makes sense. Addressed in the latest diff. Thanks!

@foxish
Copy link
Member

foxish commented Apr 3, 2017

Thanks! This looks like a good beginning. Merging.

@foxish foxish merged commit b40fc37 into apache-spark-on-k8s:master Apr 3, 2017
@kimoonkim
Copy link
Member Author

Great. Thanks for the review, @foxish!

@kimoonkim kimoonkim deleted the add-hdfs-chart branch April 3, 2017 19:45
unai-ttxu added a commit to unai-ttxu/kubernetes-HDFS that referenced this pull request Aug 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants