Pew Research: Data Scientist
Pew Research Center informs the public about the issues, attitudes and trends shaping America and the world. Pew Research Center Labs is a new effort by the Center to use techniques from data science to conduct rigorous social science research.
Location: Washington, DC
Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping America and the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research in the areas of U.S. politics and policy views; media and journalism; internet and technology; religion and public life; Hispanic trends; global attitudes and U.S. social and demographic trends. The Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts. Pew Research Center's work is carried out by a staff of 130.
Pew Research Center Labs is a new effort by the Center to use techniques from data science to conduct rigorous social science research. Labs researchers experiment with new data sets and seek out new research methods and opportunities that complement and expand our traditional research agenda. Labs researchers use scraping, social and traditional media data, machine learning, crowdsourcing, network analysis, and other emerging computational methodologies alongside conventional survey methodologies to contribute to the Center's key topics of ongoing research.
The Data Scientist should be eager to work with a team using cutting edge methods in creative ways to enrich the public dialogue and support sound decision-making. He or she will contribute on all aspects of a wide range of data analytics research projects, including development, original research and writing. He or she should have experience designing social science research, a strong computational background, and should be adaptable and comfortable trying out new approaches and languages.
- Project design and development
- Design and conduct original research
- Determine the best research methods and tools to answer the questions at hand
- Obtain data from external sources
- Data management, analysis
- Write in-depth analysis for research reports
- Write short form posts related to project work
- Stay abreast of trends in data science, new kinds of data sources and methodologies
- Advanced degree required, PhD preferred
- PhD level research design experience required
- Training and experience with machine learning (e.g., SVM, Random Forests, GBRT/GBDT, ensemble methods, etc.) required
- Basic computer science training required
Knowledge and Skills Requirements
- Experience interacting with web APIs, working with JSON data, and utilizing regex
- Experience working with very large data sets (data too large to fit into memory)
- Proficiency in R (including ggplot2, dplyr/plyr, stringr, lubridate, tidyr/reshape2)
- Proficiency in Python (including Pandas, Scikit-learn, SciPy + NumPy)
- Proficiency with SQL, Mongo (or other NoSQL DB, Hadoop/Hive/Spark/Pig)
- Familiarity with Natural Language Processing (preprocessing, term-document matrix representation, named entity recognition, POS taggers/parsers)
- Familiarity working with image data, OpenCV, and/or convolutional neural networks for machine learning (e.g., Caffe)
- Experience using crowd-sourcing (e.g., Mechanical Turk) to gather or make data preferred
- Experience scraping unstructured data from the web
- Familiarity with the Vowpal Wabbit (VW) framework
- Experience analyzing large-scale network data
FLSA Status: Exempt
Applicant should send a resume, cover letter (indicating where you learned of the opening) with salary expectations to email@example.com.
Responses can also be mailed to:
Human Resources Department
Pew Research Center
1615 L Street, NW Suite 700
Washington, DC 20036
We are an equal opportunity employer.