[SPARK-52877][PYTHON][FOLLOW-UP] Use columns instead of itercolumns in RecordBatch

HyukjinKwon · HyukjinKwon · commit d148e9be24f4 · 2025-07-25T14:37:52.000+09:00
### What changes were proposed in this pull request? This PR proposes to use `columns` instead of `itercolumns` in RecordBatch, which does not exist in the old version of PyArrow. ### Why are the changes needed? To recover the build https://github.com/apache/spark/actions/runs/16507806777/job/46682838114 This is just a temporary workaround. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No, Closes #51661 from HyukjinKwon/SPARK-52877. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
diff --git a/python/pyspark/sql/pandas/serializers.py b/python/pyspark/sql/pandas/serializers.py
@@ -793,7 +793,7 @@ def load_stream(self, stream):
         for batch in super().load_stream(stream):
             columns = [
                 [conv(v) for v in column.to_pylist()] if conv is not None else column.to_pylist()
-                for column, conv in zip(batch.itercolumns(), converters)
+                for column, conv in zip(batch.columns, converters)
             ]
             if len(columns) == 0:
                 yield [[pyspark._NoValue] * batch.num_rows]

Original file line number	Diff line number	Diff line change
`@@ -793,7 +793,7 @@ def load_stream(self, stream):`
`793`	`793`	`for batch in super().load_stream(stream):`
`794`	`794`	`columns = [`
`795`	`795`	`[conv(v) for v in column.to_pylist()] if conv is not None else column.to_pylist()`
`796`		`- for column, conv in zip(batch.itercolumns(), converters)`
	`796`	`+ for column, conv in zip(batch.columns, converters)`
`797`	`797`	`]`
`798`	`798`	`if len(columns) == 0:`
`799`	`799`	`yield [[pyspark._NoValue] * batch.num_rows]`