Brussels / 1 & 2 February 2020

schedule

How to write a scikit-learn compatible estimator/transformer

Tips and tricks, testing your estimator, and must-watch related current developments


This is a hands-on short tutorial on how to write your own estimator or transformer which can be used in a scikit-learn pipeline, and works seamlessly with the other meta-estimators of the library.

It also includes how they can be conveniently tested with a simple set of tests.

In many data science related tasks, the use-case specific requirements require us to slightly manipulate the behavior of some of the estimators or transformers present in scikit-learn. Some of the tips and requirements are not necessarily well documented by the library, and it can be cumbersome to find those details.

In this short tutorial, we go through an example of writing our own estimator, test it against the scikit-learn's common tests, and see how it behaves inside a pipeline and a grid search.

There has also been recent developments related to the general API of the estimators which require slight modifications by the third party developers. I will cover these changes and point you to the activities to watch as well as some of the private utilities which you can use to improve your experience of developing an estimator.

The materials of the talk will be available on github as a jupyter notebook.

Speakers

Adrin Jalali

Links