From 6973d35d057a4ce6a3251083a43e0ec7f264b5cb Mon Sep 17 00:00:00 2001 From: dav-ell Date: Tue, 18 Feb 2020 10:48:56 -0500 Subject: [PATCH 1/4] Add HDFS, Thrift documentation --- README.md | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 06a3485..b48f459 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ https://docs.docker.com/terms/image/ ## How to use this image? -Note: currently this image has only been tested in local mode, using local file system. +Note: currently this image has only been tested in local mode, using local file system. For HDFS support, see "Using HDFS" below. ### Data storage This image is configured (in `hbase-site.xml`) to store the HBase data at `file:///data/hbase`. @@ -82,3 +82,65 @@ EOF ### Accessing the web interface Open you browser at the URL `http://docker-host:16010/`, where `docker-host` is the name / IP of the host running the docker daemon. If using Linux, this is the IP of your linux box. If using OSX or Windows (via Boot2docker), you can find out your docker host by typing `boot2docker ip`. On my machine, the web UI runs at `http://192.168.59.103:16010/` + +### Thrift +Running with Thrift is as simple as: +```bash +docker exec -d hbase-master hbase thrift start +``` + +### Using with HDFS +We'll be using [harisekhon's](https://hub.docker.com/r/harisekhon/hadoop/) Hadoop image, which can be downloaded using `docker pull harisekhon/hadoop`. That image writes to `/tmp`/ by default, which we'd like to change. Create a new file called `hdfs-site.xml` in your home directory with contents: +```xml + + + + + dfs.replication + 1 + + + dfs.datanode.data.dir + file:///data + + +``` +Setting the `dfs.datanode.data.dir` property changes the default directory that HDFS writes data to, which we've set to `/data`. We can now mount a host directory there to enable data persistence across restarts. We create an empty directory at `$HOME/hdfs-data` for this purpose. + +Create a Hadoop container with the following command: +```bash +docker run -d --name hdfs -p 8042:8042 -p 8088:8088 -p 19888:19888 -p 50070:50070 -p 50075:50075 -v $HOME/hdfs-data:/data -v $HOME/hdfs-site.xml:/hadoop/etc/hadoop/hdfs-site.xml harisekhon/hadoop +``` + +Now HDFS is running. Run `docker inspect hbase-master` and look for `IPAddress` near the bottom. Note the value, ours was `172.17.0.2`. + +Next, we need to connect HBase. We can do this by rewriting `hbase-site.xml`. Create a file in `$HOME/hbase-site.xml` with the following contents: +```xml + + + + + + hbase.zookeeper.quorum + hbase-master + + + hbase.rootdir + hdfs://172.17.0.2:8020/hbase/ + + + hbase.zookeeper.property.dataDir + /data/hbase/zookeeper + + +``` + +Then run HBase with the following: +```bash +docker run -d --name hbase-master -h hbase-master -p 16010:16010 \ + -v $HOME/hbase-site.xml:/usr/local/hbase/conf/hbase-site.xml \ + gelog/hbase hbase master start && \ +docker logs -f hbase-master +``` + +You can now browse to `http://localhost:50070/explorer.html#/` to see the contents of HDFS. You should see a `hbase` folder at the top-level directory. From 9314ad34e9bd2eefdf5117afec36263588d5bfa3 Mon Sep 17 00:00:00 2001 From: dav-ell Date: Tue, 18 Feb 2020 10:53:12 -0500 Subject: [PATCH 2/4] Update Thrift documentation --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index b48f459..f949c3d 100644 --- a/README.md +++ b/README.md @@ -86,8 +86,13 @@ Open you browser at the URL `http://docker-host:16010/`, where `docker-host` is ### Thrift Running with Thrift is as simple as: ```bash -docker exec -d hbase-master hbase thrift start +docker run -d --name hbase-master -h hbase-master -p 16010:16010 -p 9090:9090 \ + -v $HOME/data/hbase:/data \ + gelog/hbase hbase master start && \ +docker exec -d hbase-master hbase thrift start && \ +docker logs -f hbase-master ``` +Thrift can then be connected to at port 9090. If additional ports are needed, rerun and add a `-p [port]`. ### Using with HDFS We'll be using [harisekhon's](https://hub.docker.com/r/harisekhon/hadoop/) Hadoop image, which can be downloaded using `docker pull harisekhon/hadoop`. That image writes to `/tmp`/ by default, which we'd like to change. Create a new file called `hdfs-site.xml` in your home directory with contents: From e6dd783ff8bf8b96bf62d9f71ebdba23607dac7b Mon Sep 17 00:00:00 2001 From: dav-ell Date: Tue, 18 Feb 2020 12:45:31 -0500 Subject: [PATCH 3/4] Add HDFS support for multiple hard drives --- README.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/README.md b/README.md index f949c3d..2c46f11 100644 --- a/README.md +++ b/README.md @@ -149,3 +149,18 @@ docker logs -f hbase-master ``` You can now browse to `http://localhost:50070/explorer.html#/` to see the contents of HDFS. You should see a `hbase` folder at the top-level directory. + +### Using HDFS with multiple hard drives +If you have multiple hard drives, you can change the value of your `dfs.datanode.data.dir` to be a comma-separated list of folders, like the following: +```xml + + dfs.datanode.data.dir + file:///data1,file:///data2,file:///data3,file:///data4 + +``` +Any folders that don't exist will be ignored. + +Then, you can run the container with the following options: +``` +docker run -d --name hdfs -p 8042:8042 -p 8088:8088 -p 19888:19888 -p 50070:50070 -p 50075:50075 -v /mnt/disk1/hdfs:/data1 -v /mnt/disk2/hdfs:/data2 -v $HOME/hdfs-site.xml:/hadoop/etc/hadoop/hdfs-site.xml harisekhon/hadoop +``` From 1b036da865eea66b7b33a2c6735b89ec70b5eaed Mon Sep 17 00:00:00 2001 From: dav-ell Date: Tue, 18 Feb 2020 12:49:27 -0500 Subject: [PATCH 4/4] Update formatting --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2c46f11..19df323 100644 --- a/README.md +++ b/README.md @@ -161,6 +161,6 @@ If you have multiple hard drives, you can change the value of your `dfs.datanode Any folders that don't exist will be ignored. Then, you can run the container with the following options: -``` +```bash docker run -d --name hdfs -p 8042:8042 -p 8088:8088 -p 19888:19888 -p 50070:50070 -p 50075:50075 -v /mnt/disk1/hdfs:/data1 -v /mnt/disk2/hdfs:/data2 -v $HOME/hdfs-site.xml:/hadoop/etc/hadoop/hdfs-site.xml harisekhon/hadoop ```