Skip to content

Commit 2a37bfb

Browse files
rxinjeanlyn
authored andcommitted
[minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API
Author: Reynold Xin <[email protected]> Closes apache#6569 from rxin/freqItemsWarning and squashes the following commits: 7eec145 [Reynold Xin] [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API.
1 parent da5a57e commit 2a37bfb

File tree

2 files changed

+15
-0
lines changed

2 files changed

+15
-0
lines changed

python/pyspark/sql/dataframe.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1170,6 +1170,9 @@ def freqItems(self, cols, support=None):
11701170
"http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou".
11711171
:func:`DataFrame.freqItems` and :func:`DataFrameStatFunctions.freqItems` are aliases.
11721172
1173+
This function is meant for exploratory data analysis, as we make no guarantee about the
1174+
backward compatibility of the schema of the resulting DataFrame.
1175+
11731176
:param cols: Names of the columns to calculate frequent items for as a list or tuple of
11741177
strings.
11751178
:param support: The frequency with which to consider an item 'frequent'. Default is 1%.

sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,9 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
9797
* [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou]].
9898
* The `support` should be greater than 1e-4.
9999
*
100+
* This function is meant for exploratory data analysis, as we make no guarantee about the
101+
* backward compatibility of the schema of the resulting [[DataFrame]].
102+
*
100103
* @param cols the names of the columns to search frequent items in.
101104
* @param support The minimum frequency for an item to be considered `frequent`. Should be greater
102105
* than 1e-4.
@@ -114,6 +117,9 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
114117
* [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou]].
115118
* Uses a `default` support of 1%.
116119
*
120+
* This function is meant for exploratory data analysis, as we make no guarantee about the
121+
* backward compatibility of the schema of the resulting [[DataFrame]].
122+
*
117123
* @param cols the names of the columns to search frequent items in.
118124
* @return A Local DataFrame with the Array of frequent items for each column.
119125
*
@@ -128,6 +134,9 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
128134
* frequent element count algorithm described in
129135
* [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou]].
130136
*
137+
* This function is meant for exploratory data analysis, as we make no guarantee about the
138+
* backward compatibility of the schema of the resulting [[DataFrame]].
139+
*
131140
* @param cols the names of the columns to search frequent items in.
132141
* @return A Local DataFrame with the Array of frequent items for each column.
133142
*
@@ -143,6 +152,9 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
143152
* [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou]].
144153
* Uses a `default` support of 1%.
145154
*
155+
* This function is meant for exploratory data analysis, as we make no guarantee about the
156+
* backward compatibility of the schema of the resulting [[DataFrame]].
157+
*
146158
* @param cols the names of the columns to search frequent items in.
147159
* @return A Local DataFrame with the Array of frequent items for each column.
148160
*

0 commit comments

Comments
 (0)