KDnuggets Home » News » 2017 » Mar » Opinions, Interviews » 7 Types of Data Scientist Job Profiles ( 17:n11 )

7 Types of Data Scientist Job Profiles


 
 
http likes 263

There is no one profile for the Data Scientist, but I tried to make a few generic job profiles that can somewhat fit job descriptions of different companies. I think there is way too much variety, but I had to narrow down on a set of profiles. Check out the list.



By Muktabh Mayank, ParallelDots.

So yes, this post might somewhat look like a clickbait, but I promise you its not exactly that (Well somewhat).

I recently got in question on Quora asking something on lines of What exact skills do companies look for when they are recruiting a Data Scientist ? and is there a definition of Data Scientist profile ? As is pretty obvious, there is no one profile, as every company is solving its own set of problems. But I tried to make a few generic job profiles that can somewhat fit JDs of different companies. I think there is way too more variety, but I had to narrow down on a set of profiles, so here is the list:

  1. The R using number-cruncher. Can run quick Group By’s and Counts on Numbers in R/Python . This profile is the coding version of Data Analyst from earlier days. Automated report generation in a more analyst-y organization is the most common location one finds this profile in.
    Tools Used : R (dataframes), SQL

    R_logo.svg

    831px-SQL_ANATOMY_wiki.svg

  2. The Modeller. Deeply Mathematical mind, who can apply Bayesian/Frequentist inferences or hierarchal models. Probably I am grouping too many people into a single group here, when people analyzing drug trials, scientists modelling complex phenomena and people running autoregressive models on stocks are grouped into one. The common theme here is Mathematics forms the base of the work
    Tools Used: R is very popular, Fortran, C++ and sometimes functional languages.

    Mathematical_models_for_complex_systems

    Eigen_Silly_Professor_135x135

  3. The Data Engineer who is also a occassional Data Scientist. Take a library from here, take some code from there and make something good enough while you manage the data pipeline. Very common profile, Data Science tasks include writing programs to automate report generation in Pandas, trying out simple Machine Learning models and (now-a-days) running a pretrained Neural Network on the data
    Tools: Python toolchain, Pandas, nltk, Keras.

    Python_logo_and_wordmark.svg

    220px-Hadoop_logo.svg

    pandas

  4. The tabular ML’er (or the XGBoost specialist). Ardent Kaggler, can train multiple algorithms and stack models and optimize the heck out of them. These guys have deep expertise with running and optimizing standard algorithms like XGBoost, Ridge Regression and (now-a-days) Keras models.
    Tools: Python or R, uses XGB, Keras a lot.

    xgboost

    Keras_Logo

  5. The old style ML’er . Close to 4, but not limited to categorical models only. Very good at feature engineering. This was the only Machine Learning expertise until the newer Deep Learning profile came up.
    Tools: C++ / Python with Scikit Learn.

    Scikit-learn_logo

    dlib-logo

    mlpack

  6. Deep Learning Guy. Needs a GPU system and a well tagged dataset and needs to try out architectures and do no feature engineering. Will spend lot of time in trying arcitectures and minimal in feature engineering, but the accuracy will be insane.
    Tools: Python, Theano, Tensorflow and high level libraries like Keras.

    theano

    TensorFlowLogo

  7. The domain specialist. Knows a lot about domain, something about linear models. Codes the domain information and trains a linear algorithm on top. Includes mechanical engineers, analysts at different firms and scientists in pure/applied sciences.
    Tools: Different Specializations use different things. Matlab by Engineers, C++/Fortran and sometimes R/Python.

    r-bioconductor-training

    800px-NumericalRecipes3rdEdCover

  8. The newbie. The intern. Will evolve into whichever of the 7 categories his/her mentor belongs.

    At ParallelDots, we have people of type 2,3,4,5 and 6. (and 8 if you want to join us fulltime).

Muktabh Mayank is a Data Scientist & Entrepreneur, and Co-founder of ParallelDots.

ParallelDots helps enterprises make sense of their unstructured data providing them custom built Deep Learning solutions. Its new product Karna AI generates automatic reports on any topic from thousands of news and social media sources using AI. The company is also working on a couple of to-be-launched exciting AI first products.

Original. Reposted with permission.

Related: