Skip to content

Conversation

@AlexKirko
Copy link
Member

@AlexKirko AlexKirko commented Sep 17, 2022

Motivation

This PR touches only one docstring, so no other checks are necessary. As outlined in #48487, when we call pd.read_csv, we eventually go to _libs/ops.pyx, where there are defaults for true_values and false_values:

    # the defaults
    true_vals = {'True', 'TRUE', 'true'}
    false_vals = {'False', 'FALSE', 'false'}

    if true_values is not None:
        true_vals = true_vals | set(true_values)

    if false_values is not None:
        false_vals = false_vals | set(false_values)

In addition, there appears to be a str.lower() or str.upper() call somewhere, although I can't find it, because trUe is treated as true-like too. We should outline this in the docstring, so that it's not a surprise to our users.

Definition of done

  • mention the default values for true_values and false_values in the docstring for pd.read_csv

@AlexKirko
Copy link
Member Author

I don't think a whatsnew entry or anything else is needed for a doc-only PR, but please tell me if I've missed anything.

@AlexKirko
Copy link
Member Author

There are a bunch of errors, but it doesn't look like they have anything to do with the docstring change, and they seem to be cropping up over all the current PRs.

@mroeschke mroeschke added Docs IO CSV read_csv, to_csv labels Sep 19, 2022
be integers or column labels.
true_values : list, optional
Values to consider as True.
Values to consider as True in addition to case-insensitive variants of "True".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there are tests for case-insensitive variants of True/False?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question, it will tell us whether the case insensitivity is intended or unnoticed behavior. I'll take a look today or tomorrow.

Copy link
Member Author

@AlexKirko AlexKirko Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke I've taken a look, and the closest that I can find is test_boolean_dtype here. There are also some conversion tests in test_na_values.py. But those have a different purpose.

What would your suggestion be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking. I think test_boolean_dtype should cover the additional docstring note you added here.

@mroeschke mroeschke added this to the 1.6 milestone Sep 23, 2022
@mroeschke mroeschke merged commit 9ad36f2 into pandas-dev:main Sep 23, 2022
@mroeschke
Copy link
Member

Thanks @AlexKirko

@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Docs IO CSV read_csv, to_csv

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DOC: pandas.read_csv

2 participants