Software Development Engineer, Data Mining/ Text Analysis/ Machine Learning

Develop algorithms and build systems to automatically solve a variety of Information Retrieval and Data Mining problems related to the Amazon Product Catalog.

Amazon Company: Amazon
Location: Seattle, WA

Hiring manager is Deept Kumar

The Darwin team at Amazon is looking for exceptional software engineers to develop algorithms and build systems to automatically solve a variety of Information Retrieval and Data Mining problems related to the Amazon Product Catalog - one of the company's biggest assets.

Our principal challenge is to improve the shopping experience by detecting duplicate products for sale in the catalog and merging them. Merchants on provide information about the products they want to sell. Amazon attempts to match these product data submissions to items in its catalog so that it can display offers for the same product on a single page. Poorly structured or incomplete data makes this problem very challenging and often results in duplicate products getting created in the catalog. These duplicate products are shown in search results and end up confusing customers, leading to a bad customer experience. The Darwin team detects these duplicate products in the catalog using an innovative mix of Information Retrieval, Data Mining and Text Analysis algorithms and human intelligence harnessed via the Amazon Mechanical Turk. We then automatically merge products detected as duplicates together, improving customer experience and the quality of the catalog.

We are also responsible for a variety of other Catalog-related projects such as placing Product Advertisements on pages, automatically extracting important product features from the product description with a view to improving the discovery (search and browse) experience on the website and detecting egregious cases of poor quality data provided by sellers.

We are a highly-motivated, co-operative and fun loving team who thrive on solving challenging problems with innovation. As part of this team you will be analyzing data, developing new algorithms, building large-scale distributed software systems in Java using open source technologies such as Hadoop, Lucene and JBoss and other proprietary technologies.

Basic Qualifications

  • Bachelor's Degree in Computer Science or related field with 4+ years relevant work experience
  • Fundamentals in design and coding skills in Java/C++ on Unix Platforms
  • Familiarity with Perl or other scripting languages and a understanding of SQL
  • Computer Science fundamentals in object-oriented design
  • Computer Science fundamentals in data structures
  • Computer Science fundamentals in algorithm design, problem solving, and complexity analysis
  • Proficiency in at least one modern programming language such as C, C++, Java

Preferred Qualifications

  • PHD/Master's degree in Computer Science or Math or related field with 1-3+ years of relevant work experience
  • Experience iin Perl, Java, Object Oriented Design and familiarity with application and database programming under UNIX/Linux
  • Past experience in at least one of the following aareas - Information Retrieval, Data Mining, Text Analysis or Machine Learning
  • Experience with building higgh-performance, highly-available and scalable distributed systems
  • Experience building complex software sysstems that have been successfully delivered to customers
  • Experience with large database driven applicationns and/or distributed computing
  • Proficiency with HTTP Protocol, REST, XML, J2EE, JavaScript, and AJAX
  • Be highly innovative, flexible and self-directed

Apply online.

Amazon is an Equal Opportunity Employer