KDnuggets Home » Jobs » Celgene: Sr. Manager, Data Lake ( 17:n27 )

Celgene: Sr. Manager, Data Lake


Seeking a Data Lake Manager. The role will work closely with the Data Science Teams, the Business Data Stewards and the team responsible for data ingestions/integration into the platform.



Company: CelgeneCelgene
Location: Summit, NJ
Web: www.celgene.com
Position: Sr. Manager, Data Lake

_Contact_:
Apply online.

Description

Celgene is a global biopharmaceutical company leading the way in medical innovation to help patients live longer, better lives. Our purpose as a company is to discover and develop therapies that will change the course of human health.  We value our passion for patients, quest for innovation, spirit of independence and love of challenge. With a presence in more than 70 countries - and growing - we look for talented people to grow our business, advance our science and contribute to our unique culture.

Summary

Celgene has established a Big Data capability that provides actionable insights and informs decisions throughout the product life cycle and helps improve patient lives.

Celgene’s modern Big Data Platform includes a Data Lake that stores and provides easy and secured access to data needed by various functions for reporting and analytics. The data in the data lake is organized in a sophisticated way suitable to various types of use case scenarios along with strong use governance and security controls.

The Data Lake Manager is accountable for keeping the data in the data lake well-organized, refreshed and of acceptable quality to support various analytics activities.  The role is also responsible for enforcing established policies and guidelines and assisting data scientist/analysts in navigating/finding the data when provided self-service tools are not sufficient.

The role will work closely with the Data Science Teams, the Business Data Stewards and the team responsible for data ingestions/integration into the platform.

Responsibilities include, but are not limited to, the following:

  • Implement repeatable operating procedures for ongoing data ingestion along with key metadata and metrics that should be captured during the process
  • Monitor data pipeline workloads and ensure issues are resolved properly and in a timely fashion
  • Perform ad-hoc / one-time data ingestions when end-users need assistance
  • Ensure appropriate metadata is created, maintained and accessible to expedite data analysis.  This includes coordinating creation of data dictionaries, data profiling and data quality report generation.
  • Make minor changes to data ingestion when vendor make slight changes to the data format / dictionary.  Coordinate with data ingestion/integration team proactively in case of major changes.
  • Coordinate data format/data dictionary changes with business data stewards and external data vendors when possible
  • Keep the work area used for data exploration organized and perform housekeeping on behalf of the data science team
  • Enforce appropriate naming standards, access controls and data cleanup policies in various zones of the data lake.  Reduce accumulation of Redundant, Obsolete and duplicative data in data lake.
  • Provide metrics on data utilization and data quality

Scope

  • Span of Control –  US, cross functional, Big Data Analytics platform
  • Direct Reports –  None
  • Indirect Reports – will direct the work of implementation partners and a matrix group within IT of approximately 1-5
  • Budgetary Responsibility –  None
  • Interacts w/ -  Data Scientists, Business Data Steward, Data Architect, Big Data Developers, External Data Vendors

Qualifications

  • Bachelor's degree in computer science, system analysis or a related study, or equivalent experience
  • Minimum of 5-7 years hands-on experience with Information Management and Big Data technologies e.g. Hadoop, Spark, Hive. Robust experience with Cloudera is a plus.
  • Minimum 3-5 years of experience in Cloud environments, preferably AWS
  • Excellent interpersonal skills in areas such as teamwork, influence, facilitation and negotiation
  • Problem Solver

Skills/Knowledge Required

  • Bachelor's degree in computer science, system analysis or a related study, or equivalent experience
  • Minimum of 5-7 years working in a Information Management, Big Data service delivery (or equivalent) roles preferably in a global biotech/pharmaceutical organization, focusing on the following disciplines:
    • Data Management
    • Data Architecture
    • Database Administration
    • Data Engineering (Profiling, Preparation)
    • Data Warehousing
  • Minimum of 3-5 years of combined hands-on experience with the following technologies:
    • Big Data Tools: Hadoop, Hive, Impala, Spark, Pig, Scoop, Cloudera Navigator, or similar
    • DBMS / SQL
    • NoSQL and Graph databases
    • ETL/ELT Tools (e.g Talend, Informatica BDM)
    • AWS services, in particular S3 and use of CLI
    • Implementation/maintenance of complex data pipelines
    • Programming languages such Java, python
    • XML/JSON file formats
    • Metadata Management
    • Industry standard data model such as OMOP, CDISC, E2B, etc.
  • Cloudera Certification is a plus
  • Experience working in an on-shore/off-shore model as well as DevOps model (Same edit as last JD)
  • Ability to quickly learn new technologies
  • Excellent interpersonal skills in areas such as teamwork, influence, facilitation and negotiation
  • Problem solver with having demonstrated the ability to “think out of the box”.
  • Strong written and verbal communication skills.  Explain complex technical issues in simple, business-friendly language
  • Excellent planning and organizational skills
  • Demonstrated ability to work well with others and be respected as a leader
  • Capability for being thoughtful, extroverted and collaborative
  • Motivation that is focused on long-term results