Bosch: Data Mining Engineer – Big Data Infrastructure

Bring together disparate technologies and use data mining and analytics to solve business problems in Predictive Maintenance, Health Informatics, Vehicle Diagnostics, Manufacturing, and other domains.

Bosch Research and Technology Center Company: Bosch Research and Technology Center
Location: Palo Alto, CA

Apply to:

Job Description:

The Bosch Research and Technology Center in North America is part of Bosch's global Corporate Research organization. As part of Bosch's drive to internationalize research, we connect Corporate Research with leading edge technology and the innovative environment in North America.

The Data Mining Service Center at Bosch provides Data Mining and Big Data services to Bosch's business units and plants. The center works in collaboration with a large team of researchers, engineers, and software service providers. Our data mining methods and solutions are implemented in a distributed architecture and run on our HPC cluster in order to scale up to Big Data sets.

Data Mining is impacting Bosch's products and services in many domains: Purchasing, Predictive Maintenance, Health Informatics, Vehicle Diagnostics, Manufacturing, Large-Scale Simulations, etc. This is a technical position for someone who is skilled at bringing together disparate technologies to solve business problems and represents a unique opportunity for you to grow with us. Our team of data scientists and software engineers in Palo Alto will grow rapidly in the next couple of years. Therefore, now is the perfect time for you to join and make an impact with your passion to innovate!

  • Develop and implement algorithms for distributed and parallel predictive analytics.
  • Stay up-to-date w/research & innovative 3rd party products addressing storage & analysis of large datasets from real-world problems.
  • Develop distributed/parallel solutions for predictive analytics and visualization of structured & unstructured data sets.
  • Design test cases to evaluate run-time & predictive performance of parallel/distributed algorithms.
  • Improve scalability performance of existing storage and analytics solutions.

  • Ph.D. or M.S. in Computer Science, Statistics, Computer Engineering or related field with at least 3 years of relevant work experience.
  • Practical experience in developing algorithms & applications using MapReduce, MPI, or similar frameworks.
  • Experience parallelizing algorithms in MPI, MapReduce, OpenMP, or similar parallel environment.
  • Experience w/distributed file systems & working knowledge of NoSQL or other distributed DTB systems.
  • Demonstrated experience with relational database systems and familiarity with SQL.
  • Proven expertise in applying descriptive and inferential statistics in Big Data.
  • Competence in theory & application of standard machine learning or data mining algorithms.
  • Knowledge of Linux OS system internals, storage concepts, & networking topologies & protocols is a necessity.
  • Experience identifying performance bottlenecks w/network, I/O, OS, DBMS configuration.
  • Experience with two or more of the following: Java, C++ (STL), Python, Perl, MATLAB, R, SPSS, SAS.
  • HBase, Hive, Pig, Cassandra, or similar technologies - Mahout a plus.