Skip to content

[Python] Reading empty CSV file in parallel hangs #38676

@jorisvandenbossche

Description

@jorisvandenbossche

Describe the bug, including details regarding any error messages, version, and platform.

With the following script:

from pyarrow import csv
from io import BytesIO
from concurrent.futures import ThreadPoolExecutor

data = "x,y,z"

def read_csv_pyarrow(i):
    try:
        csv.read_csv(BytesIO(data.encode()))
    except:
        pass
    print(i)
    return i

with ThreadPoolExecutor(4) as e:
    list(e.map(read_csv_pyarrow, range(20)))

this occasionally hangs.

Reading the file itself gives the "ArrowInvalid: CSV parse error: Empty CSV file or block: cannot infer number of columns" error:

return ParseError("Empty CSV file or block: cannot infer number of columns");

We discovered this in the pandas test suite (pandas-dev/pandas#55687)

Component(s)

C++

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions