@@ -128,7 +128,7 @@ feature parity with a HiveContext.
128128
129129</div >
130130
131- The specific variant of SQL that is used to parse queries can also be selected using the
131+ The specific variant of SQL that is used to parse queries can also be selected using the
132132` spark.sql.dialect ` option. This parameter can be changed using either the ` setConf ` method on
133133a SQLContext or by using a ` SET key=value ` command in SQL. For a SQLContext, the only dialect
134134available is "sql" which uses a simple SQL parser provided by Spark SQL. In a HiveContext, the
@@ -139,7 +139,7 @@ default is "hiveql", though "sql" is also available. Since the HiveQL parser is
139139
140140Spark SQL supports operating on a variety of data sources through the ` SchemaRDD ` interface.
141141A SchemaRDD can be operated on as normal RDDs and can also be registered as a temporary table.
142- Registering a SchemaRDD as a table allows you to run SQL queries over its data. This section
142+ Registering a SchemaRDD as a table allows you to run SQL queries over its data. This section
143143describes the various methods for loading data into a SchemaRDD.
144144
145145## RDDs
@@ -152,7 +152,7 @@ while writing your Spark application.
152152The second method for creating SchemaRDDs is through a programmatic interface that allows you to
153153construct a schema and then apply it to an existing RDD. While this method is more verbose, it allows
154154you to construct SchemaRDDs when the columns and their types are not known until runtime.
155-
155+
156156### Inferring the Schema Using Reflection
157157<div class =" codetabs " >
158158
@@ -193,7 +193,7 @@ teenagers.map(t => "Name: " + t(0)).collect().foreach(println)
193193<div data-lang =" java " markdown =" 1 " >
194194
195195Spark SQL supports automatically converting an RDD of [ JavaBeans] ( http://stackoverflow.com/questions/3295496/what-is-a-javabean-exactly )
196- into a Schema RDD. The BeanInfo, obtained using reflection, defines the schema of the table.
196+ into a Schema RDD. The BeanInfo, obtained using reflection, defines the schema of the table.
197197Currently, Spark SQL does not support JavaBeans that contain
198198nested or contain complex types such as Lists or Arrays. You can create a JavaBean by creating a
199199class that implements Serializable and has getters and setters for all of its fields.
@@ -480,7 +480,7 @@ for name in names.collect():
480480
481481[ Parquet] ( http://parquet.io ) is a columnar format that is supported by many other data processing systems.
482482Spark SQL provides support for both reading and writing Parquet files that automatically preserves the schema
483- of the original data.
483+ of the original data.
484484
485485### Loading Data Programmatically
486486
@@ -562,7 +562,7 @@ for teenName in teenNames.collect():
562562
563563</div >
564564
565- </div >
565+ </div >
566566
567567### Configuration
568568
@@ -808,7 +808,7 @@ memory usage and GC pressure. You can call `uncacheTable("tableName")` to remove
808808Note that if you call ` cache ` rather than ` cacheTable ` , tables will _ not_ be cached using
809809the in-memory columnar format, and therefore ` cacheTable ` is strongly recommended for this use case.
810810
811- Configuration of in-memory caching can be done using the ` setConf ` method on SQLContext or by running
811+ Configuration of in-memory caching can be done using the ` setConf ` method on SQLContext or by running
812812` SET key=value ` commands using SQL.
813813
814814<table class =" table " >
@@ -881,10 +881,32 @@ To start the JDBC server, run the following in the Spark directory:
881881
882882 ./sbin/start-thriftserver.sh
883883
884- The default port the server listens on is 10000. To listen on customized host and port, please set
885- the ` HIVE_SERVER2_THRIFT_PORT ` and ` HIVE_SERVER2_THRIFT_BIND_HOST ` environment variables. You may
886- run ` ./sbin/start-thriftserver.sh --help ` for a complete list of all available options. Now you can
887- use beeline to test the Thrift JDBC server:
884+ This script accepts all ` bin/spark-submit ` command line options, plus a ` --hiveconf ` option to
885+ specify Hive properties. You may run ` ./sbin/start-thriftserver.sh --help ` for a complete list of
886+ all available options. By default, the server listens on localhost:10000. You may override this
887+ bahaviour via either environment variables, i.e.:
888+
889+ {% highlight bash %}
890+ export HIVE_SERVER2_THRIFT_PORT=<listening-port >
891+ export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host >
892+ ./sbin/start-thriftserver.sh \
893+ --master <master-uri > \
894+ ...
895+ ```
896+ {% endhighlight %}
897+
898+ or system properties:
899+
900+ {% highlight bash %}
901+ ./sbin/start-thriftserver.sh \
902+ --hiveconf hive.server2.thrift.port=<listening-port> \
903+ --hiveconf hive.server2.thrift.bind.host=<listening-host> \
904+ --master <master-uri>
905+ ...
906+ ```
907+ {% endhighlight %}
908+
909+ Now you can use beeline to test the Thrift JDBC server:
888910
889911 ./bin/beeline
890912
@@ -930,7 +952,7 @@ SQL deprecates this property in favor of `spark.sql.shuffle.partitions`, whose d
930952is 200. Users may customize this property via ` SET ` :
931953
932954 SET spark.sql.shuffle.partitions=10;
933- SELECT page, count(*) c
955+ SELECT page, count(*) c
934956 FROM logs_last_month_cached
935957 GROUP BY page ORDER BY c DESC LIMIT 10;
936958
@@ -1139,7 +1161,7 @@ evaluated by the SQL execution engine. A full list of the functions supported c
11391161<div data-lang =" scala " markdown =" 1 " >
11401162
11411163All data types of Spark SQL are located in the package ` org.apache.spark.sql ` .
1142- You can access them by doing
1164+ You can access them by doing
11431165{% highlight scala %}
11441166import org.apache.spark.sql._
11451167{% endhighlight %}
@@ -1245,7 +1267,7 @@ import org.apache.spark.sql._
12451267<tr >
12461268 <td > <b >StructType</b > </td >
12471269 <td > org.apache.spark.sql.Row </td >
1248- <td >
1270+ <td >
12491271 StructType(<i >fields</i >)<br />
12501272 <b >Note:</b > <i >fields</i > is a Seq of StructFields. Also, two fields with the same
12511273 name are not allowed.
@@ -1267,7 +1289,7 @@ import org.apache.spark.sql._
12671289
12681290All data types of Spark SQL are located in the package of
12691291` org.apache.spark.sql.api.java ` . To access or create a data type,
1270- please use factory methods provided in
1292+ please use factory methods provided in
12711293` org.apache.spark.sql.api.java.DataType ` .
12721294
12731295<table class =" table " >
@@ -1373,7 +1395,7 @@ please use factory methods provided in
13731395<tr >
13741396 <td > <b >StructType</b > </td >
13751397 <td > org.apache.spark.sql.api.java </td >
1376- <td >
1398+ <td >
13771399 DataType.createStructType(<i >fields</i >)<br />
13781400 <b >Note:</b > <i >fields</i > is a List or an array of StructFields.
13791401 Also, two fields with the same name are not allowed.
@@ -1394,7 +1416,7 @@ please use factory methods provided in
13941416<div data-lang =" python " markdown =" 1 " >
13951417
13961418All data types of Spark SQL are located in the package of ` pyspark.sql ` .
1397- You can access them by doing
1419+ You can access them by doing
13981420{% highlight python %}
13991421from pyspark.sql import *
14001422{% endhighlight %}
@@ -1518,7 +1540,7 @@ from pyspark.sql import *
15181540<tr >
15191541 <td > <b >StructType</b > </td >
15201542 <td > list or tuple </td >
1521- <td >
1543+ <td >
15221544 StructType(<i >fields</i >)<br />
15231545 <b >Note:</b > <i >fields</i > is a Seq of StructFields. Also, two fields with the same
15241546 name are not allowed.
0 commit comments