-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-40922][PYTHON] Document multiple path support in pyspark.pandas.read_csv
#38399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40922][PYTHON] Document multiple path support in pyspark.pandas.read_csv
#38399
Conversation
|
Can one of the admins verify this patch? |
pyspark.pandas.read_csvpyspark.pandas.read_csv
python/pyspark/pandas/namespace.py
Outdated
| path : str | ||
| The path string storing the CSV file to be read. | ||
| path : str or list | ||
| path(s) of the CSV file(s) to be read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this examaple to the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's add at least one example for docstring
|
cc @itholic |
|
Looks good except #38399 (comment) |
|
I added an example FYI: while looking around in the code, I suspect the feature of supporting multiple paths is also present in other |
|
Merged to master. |
…as.read_csv` ### What changes were proposed in this pull request? as discussed in https://issues.apache.org/jira/browse/SPARK-40922: > The path argument of `pyspark.pandas.read_csv(path, ...)` currently has type annotation `str` and is documented as > > path : str > The path string storing the CSV file to be read. >The implementation however uses `pyspark.sql.DataFrameReader.csv(path, ...)` which does support multiple paths: > > path : str or list > string, or list of strings, for input path(s), > or RDD of Strings storing CSV rows. > This PR updates the type annotation and documentation of `path` argument of `pyspark.pandas.read_csv` ### Why are the changes needed? Loading multiple CSV files at once is a useful feature to have and should be documented ### Does this PR introduce _any_ user-facing change? it documents and existing feature ### How was this patch tested? No need for tests (so far): only type annotations and docblocks were changed Closes apache#38399 from soxofaan/SPARK-40922-pyspark-pandas-read-csv-multiple-paths. Authored-by: Stefaan Lippens <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
as discussed in https://issues.apache.org/jira/browse/SPARK-40922:
This PR updates the type annotation and documentation of
pathargument ofpyspark.pandas.read_csvWhy are the changes needed?
Loading multiple CSV files at once is a useful feature to have and should be documented
Does this PR introduce any user-facing change?
it documents and existing feature
How was this patch tested?
No need for tests (so far): only type annotations and docblocks were changed