Interview: Lei Shi, ChinaHR.com on Unraveling Insights from Unstructured Data
We discuss challenges in leveraging Big Data, important attributes while profiling employers and job seekers, competitive landscape, desired skills in data scientists and more.
Lei Shi is currently the CTO of ChinaHR, one of China's top online recruitment websites. Lei joined Microsoft Research Asia in 2005 as a researcher in the area of Natural Language Processing. Later he joined Yahoo!, in charge of Yahoo!'s search product development in Beijing. He has extensive experience in leveraging big data and machine learning techniques in solving many complex problems.
First part of the interview
Here is second and last part of my interview with him:
Anmol Rajpurohit: Q6. What are the major challenges in leveraging Big Data and Machine Learning for improved decision-making during hiring process?
Another major challenge is data sparseness. We can frequently see CVs or job postings with inadequately filled information.
AR: Q7. What are the important attributes while profiling job seekers and employers? What are the major steps in the process of statistical profiling?
LS: In general, anything that characterize the job seekers/employers or distinguish them from others for the purpose of employment matching can be taken as attributes in profiling.
However, with Big Data and data analytics techniques, attributes can be also automatically generated and selected. To statistically represent a profile, we convert it into a vector of attributes and values mathematically. We first define a set of attributes that are able to characterize the job or job seeker. Then to represent its value, we can either calculate according to predefined rules or statistical learning models, such as running topic models in job descriptions to capture its topic distribution.
AR: Q8. How do you distinguish ChinaHR.com from its competition?
LS: ChinaHR is a top online recruitment brand in China. Founded in 1997, ChinaHR is like at
AR: Q9. Which of the current trends in Big Data arena are of great interest to you?
AR: Q10. What key qualities do you look for when interviewing for Data Science related positions on your team?
LS: He should be very familiar with machine learning algorithm. He should have a creative mind, because all the work to be done is very new and innovative. Since the volume of our data to be processed is really huge, he should have hands-on experience in many large scale data processing tools and database systems, such as hadoop, Hive, MongoDB etc.
AR: Q11. On a personal note, are there any good books that you’re reading lately, and would like to recommend?
LS: Personally, I do not read books. But I read lots of papers from technical conferences and journals. Most of the latest development in big data and analytics is published there.
Related: