FOSDEM 2025
/
Schedule
/
Events
/
Developer rooms
/
Low-level AI Engineering and Hacking
/
Apache Arrow: The Great Library Unifier

Apache Arrow: The Great Library Unifier

Track: Low-level AI Engineering and Hacking
Room: UB2.252A (Lameere)
Day: Sunday
Start: 11:20
End: 11:50
Video only: ub2252a
Chat: Join the conversation!

There are multiple low-level libraries used for AI development with GPUs such as PyTorch, libcudf, and TensorFlow. Each has pros and cons with different available algorithms and functions, so how do you pick which one to use? Instead of having to pay the cost for copying data back and forth between GPU and CPU, data can be passed around between these various libraries while leaving it on the GPU and sharing pointers to device data!

This talk will cover how to leverage the Apache Arrow data format and its C Device Interface, in conjunction with DLPack to connect these various libraries together for building low-level AI pipelines. We'll go over examples of handing off data between libraries without forcing extraneous copies from GPU to CPU and back, utilizing HuggingFace's Arrow formatted caches for training, and efficient conversion between Arrow and DLPack interfaces to unify multiple libraries for customized processing.

Speakers

Matthew Topol

Attachments

Slides

fosdem-2025

Brussels / 1 & 2 February 2025

Apache Arrow: The Great Library Unifier

Speakers

Attachments

Links