Heroes Of Tech — Vladimir Iglovikov
Original Title: A Very, Very Surface-Level Look At Vladimir Iglovikov
I have recently been interested in data science — data science is a hard topic to get into because EVERYONE IS INTERESTED IN IT, and I mean EVERYONE. The applications are enormous, including but not limited to gaining customer insights to modify a marketing campaign, detecting credit card fraud, and predicting future market trends. Vladamir Iglovikov seemed like a good starting point in all of this because he represents a kind of upper-bound for data scientists: He wins Kaggle competitions, shares source code, and makes his research open to the public.
A Brief Background
Vladamir Iglovikov is a “Kaggle grandmaster,” someone who placed in the top 10 in 11 Kaggle competitions. He obtained a Ph.D. in physics from the University of California, Davis (I wonder if we ever passed by him at Kemper?), then used Kaggle to transition to machine learning. It paid off, and he got a job at Lyft working on deep learning techniques to solve autonomous driving problems.
Iglovikov believes that physics was a beneficial major because the coding, math, and statistics knowledge it requires were easily transferable to data science. He also believes that the most interesting data science/machine learning problems are away from the mainstream; these are unsolved problems, not just oversaturated spaces like finding credit score.
His outline for Kaggle beginners? Watch online courses to learn the basics of machine learning and Python programming. Pick a competition and write an end-to-end pipeline that maps data to a submission…if you have to, start by just copy and pasting a kernel someone shared. At a point, you will realize you are competing with hundreds of people who simply “adjust knobs” on their data without knowing what is happening. You will now reach the point where you really have to study.
Useful courses at this point:
As previously stated, he makes his papers open to the public as well. Here is one such work: Deep Convolutional Neural Networks For Breast Cancer Histology Image Analysis.
His Philosophy
“Many people undervalue their work.” When they solve problems, they do not document their findings. Somewhere out there is someone who can benefit.
Let your work be known. Make Github repos public. Improve readability, add configuration files, and write a good README file. Make it easy for other potential users to use your training model. Make your own library to lower the barrier to entry, add dependencies to your requirements.txt, and upload your library to PyPi.
I am leaving out many steps outlined in the post above, but the overarching philosophy is to share your knowledge. After training a model and following the other steps to make your work open source, clearly-organized, and well-documented, spend four hours writing a blog post. Iglovikov was able to get a job in the data science field because of the knowledge he shared via blogging and at Meetups.