SanDisk: Senior Big Data Engineer/Hadoop Developer

Planning and designing next-generation Big Data System architectures, managing the development and deployment of Hadoop applications.

SanDisk At: SanDisk
Location: Milpitas, CA

Email to:


Responsible for planning and designing next-generation "Big- Data" System architectures. Responsible for managing the development and deployment of Hadoop applications. Subject matter expertise and demonstrable hands on delivery experience working on popular Hadoop distribution platforms like Horton Works, Cloudera, Map R. Knowledge and experience with designing the entire "Big Data" Stack and platform. Skills Required. Extensive knowledge about Hadoop Architectures and HDFS. Java/C++, Map Reduce HBase, Hive, PIG, Oozie, Mahout, Zookeeper, Flume, Solr, ElasticSearch, Storm/Spark
  • Leading the learning/understanding and knowledge of very complex semi-conductor data leveraging existing analytics tools and bleeding edge market available tools and technologies discovering new insights that reduce cost, increase performance, reliability and or endurance in semiconductor products.
  • Working through the full flow of very challenging data problems anywhere from data cleaning, to building a concept of the model to extracting the right features and testing the model to drive better decision making through various data and statistical modeling techniques, algorithm development involving real-world noisy data.
  • Ability to quickly understand challenging business problems, find patterns and insights within structured and unstructured data.
  • Data analysis, predictions, discovery, distilling complex science down to digestible insight, unlocking repeatable patterns that can be actionable foresight.
  • Ability to access, analyze and transform large product lifecycle and process data in the semiconductor/fab manufacturing industry.
  • Ability to think critically about the relationships of different metrics measured and process steps to land the right features for a given model
  • Challenge current best thinking, test theories, evaluate feature concepts and iterate rapidly
  • owning the deliverables and managing the priorities/timelines
  • Prototype creative solutions quickly, and be able to lead others in crafting and implementing superior technical vision.
  • Primary responsibility is around deep domain knowledge around the data, including "data wrangling" and rapid prototyping with new tools, technologies and insights with semiconductor data.

Skills & Experience
  • Highly motivated, team player with an entrepreneurial spirit and strong communication and collaboration skills, self-starter with willingness to learn, master new technologies and clearly communicate results to technical and non-technical audience.
  • Experience/proficiency in at least one compiled/object oriented programming language e.g. Java/C++.
  • Experience with big data technologies such as Hadoop, MapReduce, Mahout, Hive, Pig etc. and parallelization tools in enterprise Big Data Platform stack, technologies and ecosystem is a strong plus
  • Hands-on experience in data mining, data analysis, with deep critical thinking and story the data is telling.
  • Experience and expertise in machine learning tools and technologies, with using R for modeling, including developing R scripts and other scripting languages like Python, Perl.
  • Fluency or ability to learn and fearlessly play (hand-on evaluation) with all existing data analytics tools, traditional and advanced.
  • Solid fundamentals, Basic knowledge of machine learning algorithms, classification, clustering, regression, Bayesian modeling, probability theory, algorithm design and theory of computation, linear algebra, partial differential equations, Bayesian statistics, and information retrieval.