pandas
provides a high-performance, easy-to-use data structures and data analysis tools for Python programming. However, using pandas
with multiprocessing can be a challenge. In his stackoverflow post, Mike McKerns, nicely summarizes why this is so. He says:
You are asking multiprocessing (or other python parallel modules) to output to a data structure that they don't directly output to.¶
This tutorial demonstrates a straightforward workaround where you can return a list of lists from multiprocessing and then convert that to a pandas
data frame. While you're not getting a pandas
data frame straight from your threads, you still get a pandas
data frame at the end. Hooray!