Regeneron: Spark R&D Developer

Seeking an R&D Spark Developer to join the Genome Informatics team to expand the RGC’s big data infrastructure and develop new algorithms/tools to support various workflows/analyses throughout the RGC and Regeneron.

Company: RegeneronRegeneron
Location: Tarrytown, NY
Position: Spark R&D Developer

Apply online.

Company Summary:

The Regeneron Genetics Center is a wholly-owned subsidiary of the Company organized to collaborate with health systems and research groups to elucidate, on a large scale, genetic factors that cause or influence a range of human diseases. Building upon Regeneron's strengths in mouse genetics and genetics-driven drug discovery and development, the Center will specialize in ultra-high-throughput exome sequencing and computational biology; discovery of genotype-phenotype associations through linkage to well-annotated de-identified patient electronic medical records; and validation of discoveries using Regeneron’s VelociGene® technology. Our interests encompass a breadth of different areas such as Mendelian and family frameworks, large-scale population genetics (both common and rare variants), and gene-gene interactions. Program goals include target discovery, indication discovery, and patient-disease stratification. Objectives include advancing basic science around the world through public sharing of discoveries, providing clinically-valuable insights to physicians and patients of collaborating health-care systems, and identifying novel targets for drug development.

Position Summary:

We are looking for an R&D Spark Developer to join the Genome Informatics team to expand the RGC’s big data infrastructure and develop new algorithms/tools to support various workflows/analyses throughout the RGC and Regeneron. Specifically, the candidate will implement solutions within our Databricks Apache Spark ecosystem, collaborating closely with various team members at the RGC to (i) establish efficient data representations for genotypes, phenotypes and association results, (ii) implement scalable production workflows, and (iii) develop novel machine learning approaches to uncover new relationships between genotypes and phenotypes.
The ideal candidate will have a strong background in computer science specializing in distributed systems and/or machine learning, experience in analyzing large datasets, and have strong communication skills as this job requires collaboration among multiple cross-functional teams.

This position will provide exciting opportunities to work on the bleeding edge of genome informatics and genomic medicine. The RGC hosts a vast amount of data encompassing thousands of phenotypes derived from electronic medical records, integrated with genomic data. Together, these represent a landmark collection of information that will move precision medicine and novel therapeutic discovery forward as a new data-driven paradigm in healthcare.


  • Build out a big data distributed architecture capable of efficiently processing terabytes of genomic and clinical data
  • Develop algorithms and tools to analyze large data sets consisting of billions of rows
  • Develop and deploy machine learning algorithms
  • Develop new web applications used by Regeneron scientists to analyze genomic and clinical datasets
  • Build automation around various components of the system
  • Interact and collaborate with other scientists to clearly define and iterate on requirements
  • Keep abreast of new state-of-the-art software technologies and best-practices including: Spark, Hadoop, various NoSQL databases, AWS, React, and Functional Programming


This position requires a MS (Ph.D. preferred) with 3 or more years of experience in computer science specializing in distributed systems and/or machine learning.

Additional requirements include:

  • Expertise in large distributed systems, such as Spark, Hadoop, or related frameworks/databases is essential
  • 3+ years of software engineering experience in a modern Object Oriented or Functional language (e.g., Scala)
  • Experience in developing and applying machine learning algorithms
  • Experience with client side software development (e.g., HTML, JavaScript, CSS, D3)
  • Excellent communication and presentation skills required
  • Working knowledge of SQL
  • Experience with cloud computing (AWS preferred)
  • Familiarity with genomics and bioinformatics is preferred, but not required

Salary Level is commensurate with experience.

Application link:

This is an opportunity to join our select team that is already leading the way in the Pharmaceutical/Biotech industry. Apply today and learn more about Regeneron Genetics Center’s unwavering commitment to combining good science & good business.

To all agencies: Please, no phone calls or emails to any employee of Regeneron or the Regeneron Genetics Center about this opening. All resumes submitted by search firms/employment agencies to any employee at Regeneron or the Regeneron Genetics Center via-email, the internet or in any form and/or method will be deemed the sole property of the Regeneron Genetics Center, unless such search firms/employment agencies were engaged by Regeneron or the Regeneron Genetics Center for this position and a valid agreement with either Regeneron or the Regeneron Genetics Center is in place. In the event a candidate who was submitted outside of the Regeneron or Regeneron Genetics Center agency engagement process is hired, no fee or payment of any kind will be paid.

RGC is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability status, protected veteran status, or any other characteristic protected by law.