Apache Arrow: The Great Library Unifier
- Track: Low-level AI Engineering and Hacking
- Room: UB2.252A (Lameere)
- Day: Sunday
- Start: 11:20
- End: 11:50
- Video only: ub2252a
- Chat: Join the conversation!
There are multiple low-level libraries used for AI development with GPUs such as PyTorch, libcudf, and TensorFlow. Each has pros and cons with different available algorithms and functions, so how do you pick which one to use? Instead of having to pay the cost for copying data back and forth between GPU and CPU, data can be passed around between these various libraries while leaving it on the GPU and sharing pointers to device data!
This talk will cover how to leverage the Apache Arrow data format and its C Device Interface, in conjunction with DLPack to connect these various libraries together for building low-level AI pipelines. We'll go over examples of handing off data between libraries without forcing extraneous copies from GPU to CPU and back, utilizing HuggingFace's Arrow formatted caches for training, and efficient conversion between Arrow and DLPack interfaces to unify multiple libraries for customized processing.
Speakers
Matthew Topol |