Brookhaven National Laboratory: Postdoc in Materials Informatics [Upton, NY]

Seeking candidates to develop and apply information retrieval, information extraction, and various Natural Language Processing (NLP) techniques to the scientific literature in materials science and crystallography for the purpose of building prototype computational data systems.

At: Brookhaven National LaboratoryBrookhaven National Laboratory
Location: Upton, NY
Position: Postdoc in Materials Informatics

Apply here.

About Brookhaven National Laboratory:
Brookhaven National Laboratory is a multipurpose research institution funded primarily by the U.S. Department of Energy’s Office of Science. Located on the center of Long Island, New York, Brookhaven Lab brings world-class facilities and expertise to the most exciting and important questions in basic and applied science—from the birth of our universe to the sustainable energy technology of tomorrow. We operate cutting-edge large-scale facilities for studies in physics, chemistry, biology, medicine, applied science, and a wide range of advanced technologies. The Laboratory's almost 3,000 scientists, engineers, and support staff are joined each year by more than 4,000 visiting researchers from around the world. Our award-winning history, including seven Nobel Prizes, stretches back to 1947, and we continue to unravel mysteries from the nanoscale to the cosmic scale, and everything in between. Brookhaven is operated and managed by Brookhaven Science Associates, which was founded by the Research Foundation for the State University of New York on behalf of Stony Brook University, and Battelle, a nonprofit applied science and technology organization.

Organizational Overview:
Brookhaven National Laboratory (BNL) is the scientific, extreme scale Data Laboratory in the US. With over 133 PB of archived data, we host the largest scientific data archive in the US and the 3rd largest worldwide. In contrast to many others, this is an active archive were we continuously reuse all the data in the archive. In 2017 we analyzed 500PB of data on site. This data comes largely from the many scientific user facilities that we support at BNL such as: the unique nuclear physics experiment RHIC, the brightest synchrotron in the world NSLS II, the largest Tier 1 center for the LHC Atlas experiment, the Japanese particle physics experiment Belle II, the Center for Functional Nano Materials and the Atmospheric Radiation Measurement program. BNL is currently in the process of building a new 60,000 sq. ft. Data Center to accommodate its growing operations.

As a result, we have a lively, fast growing data science research program at BNL, with a specific focus on the challenges presented by the analysis, interpretation and use of data at extreme scales and in real time. The data science program is accompanied by significant computational modeling research effort, in support of the design, planning, analysis and interpretation of experiments and their results. The Computational Science Initiative (CSI - provides the laboratory wide umbrella for these activities, bringing together computer scientists, applied mathematicians and domain scientists to carry out leading edge research, convert research results into practical solutions that advance domain science and provide the necessary computing infrastructure services and training to support efficient operation.

Position Description:
A postdoctoral position is available immediately in the Center for Data-driven Discovery (C3D) in the Computational Science Initiative (CSI) at Brookhaven National Laboratory. The successful candidate will conduct research in the context of a collaborative project.  He or she will develop and apply information retrieval, information extraction, and various Natural Language Processing (NLP) techniques to the scientific literature in materials science and crystallography for the purpose of building prototype computational data systems.

This position is at the interface of natural language processing and materials science.

Essential Duties and Responsibilities:
Research, design, and implement techniques for capturing and using entities from the applicable scientific literature
Create a pipeline to automate the process of extracting useful data from the literature, including from figures, captions, and tables if needed.
Adapt and optimize Natural Language Processing algorithms to solve the problem at hand
Design and implement strategies to validate findings
Design and implement strategies to relate findings with existing materials databases
Develop web interface for interaction with extracted data
Position Requirements

Required Knowledge, Skills, and Abilities:
Ph.D in one of the following: computer science, computer engineering, computational linguistics or a related discipline is required.
Programming skills are essential.
Experience with the main open source tools available in Natural Language Processing
Proven ability for disseminating research results by writing manuscripts and giving academic presentations.
Candidates must be willing to travel to disseminate results and communicate with other scientists.
Written and oral communication skills.
Must be able to work closely and communicate effectively with colleagues with other scientific backgrounds.

Preferred Knowledge, Skills, and Abilities:
Ability to interact effectively with cross-discipline scientists and technical staff
Familiarity with Natural Language Processing libraries (NLP Toolkit, Gensim, etc…)
Extensive coding experience preferably applied to any area of materials science
Ability to work on GPUs
Knowledge of x-ray, neutron or electron diffraction/scattering, synchrotron radiation experiments
Software engineering methods and python programming

Other Information:
Moderate domestic and foreign travel possible.
Review of applications begins immediately. Applications will be accepted until the position is filled.
Candidate should identify two publications that best represent their scientific accomplishments and provide as part of their application.