Online / 6 & 7 February 2021


Make life easier for big data users on Arm platform

Currently, there are more and more ARM based datacenter hardware on the market, and their performance has been continuously improving. Thus more and more users and customers starting to consider using these datacenter hardware for their business. Big data is one of the most important area.

On the contrary, the open source ecosystem for big data on ARM is not that perfect, most of the software in the big data ecosystem did not care too much about running on ARM previously, they did not officially tested their codes on ARM, and there are a lot of unsolved problems. In order to make those software able to run on ARM, one have to search and read tons of articles to do a lot of patches and build numbers of dependencies by their own. And once the upstream changes or upgrades, there might be new problems since it does not test on ARM in upstream. All these made users scared to use ARM for their business.

In order to change this situation and make the big data open source ecosystem more friendly to ARM platform and its users, our team started by proposing adding ARM CI to those open source projects. By doing this, the projects will be fully tested on ARM and all future changes will also be tested on ARM. And we fixed a lot of problems directly in upstream, which benefits all users. And then, we start to perform performance comparison tests between ARM and X86, to give users an overview of the status. And there are also large numbers of TODOs in the future.

In this session, you can learn the current status about ARM CI of Big Data ecosystem projects like Hadoop, Spark, Hbase, Flink, Storm, Kudu, Impala etc. and our efforts on fixing ARM related problems. We will also introduce our future plans.


Zhenyu Zheng