Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 16 additions & 6 deletions docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ describes the various methods for loading data into a SchemaRDD.

Spark SQL supports two different methods for converting existing RDDs into SchemaRDDs. The first
method uses reflection to infer the schema of an RDD that contains specific types of objects. This
reflection based approach leads to more concise code and works well when you already know the schema
reflection based approach leads to more concise code and works well when you already know the schema
while writing your Spark application.

The second method for creating SchemaRDDs is through a programmatic interface that allows you to
Expand Down Expand Up @@ -566,7 +566,7 @@ for teenName in teenNames.collect():

### Configuration

Configuration of Parquet can be done using the `setConf` method on SQLContext or by running
Configuration of Parquet can be done using the `setConf` method on SQLContext or by running
`SET key=value` commands using SQL.

<table class="table">
Expand All @@ -575,8 +575,8 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
<td><code>spark.sql.parquet.binaryAsString</code></td>
<td>false</td>
<td>
Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
not differentiate between binary data and strings when writing out the Parquet schema. This
Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
not differentiate between binary data and strings when writing out the Parquet schema. This
flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems.
</td>
</tr>
Expand All @@ -591,10 +591,20 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
<td><code>spark.sql.parquet.compression.codec</code></td>
<td>gzip</td>
<td>
Sets the compression codec use when writing Parquet files. Acceptable values include:
Sets the compression codec use when writing Parquet files. Acceptable values include:
uncompressed, snappy, gzip, lzo.
</td>
</tr>
<tr>
<td><code>spark.sql.parquet.filterPushdown</code></td>
<td>false</td>
<td>
Turn on Parquet filter pushdown optimization. This feature is turned off by default because of a known
bug in Paruet 1.6.0rc3 (<a href="https://issues.apache.org/jira/browse/PARQUET-136">PARQUET-136</a>).
However, if your table doesn't contain any nullable string or binary columns, it's still safe to turn
this feature on.
</td>
</tr>
<tr>
<td><code>spark.sql.hive.convertMetastoreParquet</code></td>
<td>true</td>
Expand Down Expand Up @@ -947,7 +957,7 @@ options.

## Migration Guide for Shark User

### Scheduling
### Scheduling
To set a [Fair Scheduler](job-scheduling.html#fair-scheduler-pools) pool for a JDBC client session,
users can set the `spark.sql.thriftserver.scheduler.pool` variable:

Expand Down