Skip to content

How to speed up object conversion? #404

@bluesmoon

Description

@bluesmoon

I'm using PyCall (1.7.2) to run an SQL query against a database, and then getting the results into Julia. It appears to be very slow to convert the python list of tuples to a Julia array of tuples, and it seems that all the slowness is in iterating through the list elements, i.e., the speed is O(n) on the number of items in the list.

Here's some example code. The SQL statement returns exactly 100,000 rows:

Using automatic type conversion

@time rows = cs.cursor[:fetchall]()     # PyCall automatically converts to a Julia array of tuples
157.480336 seconds (24.49 M allocations: 649.669 MB, 0.86% gc time)

@time rowarray = map(collect, rows)     # Convert the tuples to arrays
  0.338664 seconds (1.86 M allocations: 65.706 MB, 37.67% gc time)

length(rowarray)
100000

Getting a PyObject and then converting with map

@time rows = pycall(cs.cursor[:fetchall], PyObject)
  7.685769 seconds (73 allocations: 4.031 KB)

@time rowarray = map(collect, rows)
119.437264 seconds (27.05 M allocations: 745.472 MB, 1.29% gc time)

length(rowarray)
100000

As you can see, calling fetchall() takes 157seconds, and then the map is very fast, whereas calling pycall(fetchall, PyObject) takes 7seconds, and then the map is very slow.

So, wise PyCall devs, is there a way for me to combine the fastest parts of the two approaches? I'm not averse to going as low level as necessary as this level of database slowness is causing us a lot of grief.

PS: I've tried parallelising this with pmap and other low level julia parallel functions, but this involves copying the object which has the same issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions