@@ -387,6 +387,15 @@ class JavaSparkContext(val sc: SparkContext)
387387 * other necessary info (e.g. file name for a filesystem-based dataset, table name for HyperTable,
388388 * etc).
389389 *
390+ * @param conf JobConf for setting up the dataset. Note: This will be put into a Broadcast.
391+ * Therefore if you plan to reuse this conf to create multiple RDDs, you need to make
392+ * sure you won't modify the conf. A safe approach is always creating a new conf for
393+ * a new RDD.
394+ * @param inputFormatClass Class of the InputFormat
395+ * @param keyClass Class of the keys
396+ * @param valueClass Class of the values
397+ * @param minPartitions Minimum number of Hadoop Splits to generate.
398+ *
390399 * '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each
391400 * record, directly caching the returned RDD will create many references to the same object.
392401 * If you plan to directly cache Hadoop writable objects, you should first copy them using
@@ -409,6 +418,14 @@ class JavaSparkContext(val sc: SparkContext)
409418 * Get an RDD for a Hadoop-readable dataset from a Hadooop JobConf giving its InputFormat and any
410419 * other necessary info (e.g. file name for a filesystem-based dataset, table name for HyperTable,
411420 *
421+ * @param conf JobConf for setting up the dataset. Note: This will be put into a Broadcast.
422+ * Therefore if you plan to reuse this conf to create multiple RDDs, you need to make
423+ * sure you won't modify the conf. A safe approach is always creating a new conf for
424+ * a new RDD.
425+ * @param inputFormatClass Class of the InputFormat
426+ * @param keyClass Class of the keys
427+ * @param valueClass Class of the values
428+ *
412429 * '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each
413430 * record, directly caching the returned RDD will create many references to the same object.
414431 * If you plan to directly cache Hadoop writable objects, you should first copy them using
@@ -490,6 +507,14 @@ class JavaSparkContext(val sc: SparkContext)
490507 * Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
491508 * and extra configuration options to pass to the input format.
492509 *
510+ * @param conf Configuration for setting up the dataset. Note: This will be put into a Broadcast.
511+ * Therefore if you plan to reuse this conf to create multiple RDDs, you need to make
512+ * sure you won't modify the conf. A safe approach is always creating a new conf for
513+ * a new RDD.
514+ * @param fClass Class of the InputFormat
515+ * @param kClass Class of the keys
516+ * @param vClass Class of the values
517+ *
493518 * '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each
494519 * record, directly caching the returned RDD will create many references to the same object.
495520 * If you plan to directly cache Hadoop writable objects, you should first copy them using
@@ -689,6 +714,9 @@ class JavaSparkContext(val sc: SparkContext)
689714
690715 /**
691716 * Returns the Hadoop configuration used for the Hadoop code (e.g. file systems) we reuse.
717+ *
718+ * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not to modify it unless you
719+ * plan to set some global configurations for all Hadoop RDDs.
692720 */
693721 def hadoopConfiguration (): Configuration = {
694722 sc.hadoopConfiguration
0 commit comments