Search results for hive

    Found 100 documents, 10397 searched:

  • How Big Data Pieces, Technology, and Animals fit together

    ...gle research paper Interpreting the Data: Parallel Analysis with Sawzall. You generally don't update a single cell in a table when processing it with Hive or Pig. Hive and Pig turned out to be slow because they were built on Hadoop which optimizes for the volume of data moved around, not latency....

    https://www.kdnuggets.com/2015/02/how-big-data-pieces-technology-fit-together.html

  • Which Database is best for an Analyst?

    …the top row compared to the database on the left. Here, a higher number is worse than a lower number. For example, the “20.2” at the intersection of Hive and BigQuery indicates that, among analysts who use both of those databases, the error rate tends to be 20.2% higher for Hive than BigQuery. The…

    https://www.kdnuggets.com/2015/12/database-best-for-analyst.html

  • Interview: Mario Vinasco, Facebook on Advancing Marketing Analytics through Rigorous Experimentation

    ...its goals? How has the experience been to move from relational databases to Hive? MV: We use a combination of tools, technologies such as Presto and Hive (both open sourced), Experimentation Tools (internal), Survey deployment technologies (internal), Ads manager (available to advertisers), MS...

    https://www.kdnuggets.com/2015/04/interview-mario-vinasco-facebook-marketing-analytics.html

  • Meetup/Webcast Apr 23: Shark Data Analytics Stack on a Hadoop Cluster

    ...Shark is a large-scale data warehouse system that runs on top of Spark and is backward-compatible with Apache Hive, allowing users to run unmodified Hive queries on existing Hive workhouses. Shark is able to run Hive queries 100 times faster when the data fits in memory and up to 5-10 times faster...

    https://www.kdnuggets.com/2013/04/meetup-webcast-apr-23-shark-data-analytics-stack-on-a-hadoop-cluster.html

  • Big Data BootCamp Santa Clara: Highlights of talks on Days 1-2

    ...nt and significantly increase the chances of finding a good job. Ashish Dubey, Solution Architect at Qubole delivered a workshop on Hive. Introducing Hive as SQL on Hadoop, he explained Hive as a system for managing and querying unstructured data as if it were structured. It uses Map-Reduce for...

    https://www.kdnuggets.com/2014/05/big-data-bootcamp-santa-clara-talks-day-1-2.html

  • KDnuggets Free Pass to Big Data TechCon How-To Conference, Apr 26-28, Boston

    ...dtechcon and please specify what 2-3 topics or technologies you want to learn more about in 2015, for example Deep Learning, Graph Databases, Hadoop, Hive, IoT Internet of Things, NoSQL Databases, Predictive Analytics, Privacy, Python, Security, Sentiment Analysis, SQL, Spark, R, Text Mining,...

    https://www.kdnuggets.com/2015/03/bigdata-techcon-boston-april-kdnuggets-pass.html

  • The top 5 Big Data courses to help you break into the industry

    ...ion to Big data and Hadoop Ecosystem Lesson 2: HDFS and YARN Lesson 3: MapReduce and Sqoop Lesson 4: Basics of Hive and Impala Lesson 5: Working with Hive and Impala Lesson 6: Types of Data Formats Lesson 7: Advanced Hive concept and data file partitioning Lesson 8: Apache flume and HBase Lesson 9:...

    https://www.kdnuggets.com/2016/08/simplilearn-5-big-data-courses.html

  • Interview: Ranjan Sinha, eBay on Advanced Hadoop Cluster Management through Predictive Modeling

    ...ternational Conference on Big Data. AR: Q4. What are the key capabilities of Apache Kylin? How does it compare against the other alternatives such as Hive? RS: Apache Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis...

    https://www.kdnuggets.com/2015/06/interview-ranjan-sinha-ebay-hadoop-predictive-modeling.html

  • Hadoop Key Terms, Explained

    ...lable, distributed and non-relational database. It is written in Java and based on Google's Big Table. The underlying storage file system is HDFS. 7. Hive Hive is data warehouse software, which supports reading, writing and managing large volume of data stored in a distributed storage...

    https://www.kdnuggets.com/2016/05/hadoop-key-terms-explained.html

  • 50+ Data Science and Machine Learning Cheat Sheets

    ...s cheatsheet Getting Started Apache Hadoop Reference Card Hadoop Command Line cheatsheet Working with HDFS from the command line - Hadoop Cheat sheet Hive Function cheatsheet SQL to Hive cheatsheet Cheat sheets for Machine learning: We often find ourselves spending time thinking which algorithm is...

    https://www.kdnuggets.com/2015/07/good-data-science-machine-learning-cheat-sheets.html

  • A Beginner’s Guide to Data Engineering – Part II

    ...lly, we also have special operators that Transfers data from one place to another, which often maps to the Load step in ETL. At Airbnb, we use MySqlToHiveTransfer or S3ToHiveTransfer pretty often, but this largely depends on one’s data infrastructure and where the data warehouse lives. A Simple...

    https://www.kdnuggets.com/2018/03/beginners-guide-data-engineering-part-2.html

  • 50+ Data Science, Machine Learning Cheat Sheets, updated">2016 Dec Gold Blog50+ Data Science, Machine Learning Cheat Sheets, updated

    ...s cheatsheet Getting Started Apache Hadoop Reference Card Hadoop Command Line cheatsheet Working with HDFS from the command line - Hadoop Cheat sheet Hive Function cheatsheet SQL to Hive cheat sheet Cheat sheets for web application framework Django: Django is a free and open source web application...

    https://www.kdnuggets.com/2016/12/data-science-machine-learning-cheat-sheets-updated.html

  • Big Data TechCon, the HOW-TO conference, Boston, April 26-28

    ...s apply to relational databases, NoSQL databases, unstructured data, flat files and data feeds. Come up to speed on Hadoop, Spark, Yarn, HBase, R and Hive. Learn from the smartest, hardest-working faculty in the Big Data universe in a way you never could by reading a book or watching a webinar. The...

    https://www.kdnuggets.com/2015/02/bigdata-techcon-how-to-boston-april.html

  • WibiData releases Kiji Chirashi framework for Big Data Applications

    ...he Hive DDL generation as well as the ability to write back to Kiji tables. With greater Hive support, analysts can access all the data in Kiji using HiveQL, JDBC/ODBC, or other Hive compatible business intelligence tools. For example, ad hoc queries for aggregating statistics of users are useful...

    https://www.kdnuggets.com/2013/10/wibidata-releases-kiji-chirashi-framework-for-big-data-applications.html

  • 75 Big Data Terms to Know to Make your Dad Proud

    ...k with data stored in big data format (i.e. HBase or HDFS). Sorry for being little geeky here. Apache Hive: Know SQL? Then you are in good hands with Hive. Hive facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Apache Pig: Pig is a platform for...

    https://www.kdnuggets.com/2017/06/75-big-data-terms.html

  • Evaluating HTAP Databases for Machine Learning Applications

    …r analytical workloads. It is ACID compliant and handles transactional updates, but these are not optimized for large-scale real-time OLTP workloads. Hive is not typically used to power real-time, concurrent applications. One Hive application can add rows while another reads from the same partition…

    https://www.kdnuggets.com/2016/11/evaluating-htap-databases-machine-learning-applications.html

  • Top KDnuggets tweets, Mar 26-27: Watch “Statistics with R for newbies”; Coursera free #DataScience courses

    ...ree eBOOK: Practical Machine Learning: Innovations in Recommendations - tradeoffs & predicting what user wants buff.ly/1lmTXCd Free ebook: Apache Hive - How to access big data on Hadoop with SQL/HiveQL #DataScience #BigData buff.ly/1mwOKYQ Free eBook on #BigData and #DataScience from Big Data...

    https://www.kdnuggets.com/2014/03/top-tweets-mar26-27.html

  • Introduction to Apache Spark

    ...Spark SQL is Spark’s package for working with structured data. It allows querying data via SQL as well as the Apache Hive variant of SQL — called the Hive Query Language (HQL) — and it supports many sources of data, including Hive tables, Parquet, and JSON. Spark Streaming : Spark Streaming is a...

    https://www.kdnuggets.com/2018/07/introduction-apache-spark.html

  • Things you should know when traveling via the Big Data Engineering hype-train

    ...dulers and what schedulers are available by default? What are queues? What is Hive? - It’s a SQL layer on top of HDFS. What is the connection between hive, hiveserver2, metastore and beeline. What is the difference between hive, pig and Impala? What are UDFs? What is HBase? - It’s a NoSQL database...

    https://www.kdnuggets.com/2018/10/big-data-engineering-hype-train.html

  • A Beginner’s Guide to Data Engineering  –  Part I">Silver BlogA Beginner’s Guide to Data Engineering  –  Part I

    ...g whereas nowadays they are all written in Scalding, scheduled by Twitter’s own orchestration engine. At Airbnb, data pipelines are mostly written in Hive using Airflow. During my first few years working as a data scientist, I pretty much followed what my organizations picked and take them as...

    https://www.kdnuggets.com/2018/01/beginners-guide-data-engineering-1.html

  • Simplifying Data Pipelines in Hadoop: Overcoming the Growing Pains

    ...an example, in order to execute a Hive query, an ETL engineer would only need to provide the SQL query, rather than writing a shell script containing Hive credentials and Hive commands, in addition to the SQL query that has to be executed. With this approach: There was no need for the same problem...

    https://www.kdnuggets.com/2017/05/simplify-data-pipelines-hadoop.html

  • Hadoop as a Data Warehouse: Cracking the Code with Kudu

    …eable file store that acts much like a relational database in everyday use. Commands like INSERT, UPDATE, and DELETE that are unavailable in HDFS and Hive (or didn’t work like their counterparts in relational databases), now function as expected and scale to previously unimagined sizes of data….

    https://www.kdnuggets.com/2017/06/hadoop-data-warehouse-kudu.html

  • Interview: M.C. Srivas, MapR on Demystifying the Art of Processing Massive Data

    ...ex nested structures like JSON. Hive and Impala implement different, disjointed subsets of what Apache Drill is capable of. Hive and Impala implement HiveQL (Hive Query Language) which is not ANSI SQL, although Impala might be evolving slowly to include ANSI SQL 92. Shark has been discontinued,...

    https://www.kdnuggets.com/2015/02/interview-mc-srivas-mapr-processing-massive-data.html

  • Apache Drill Makes Big Data Analysis Easier for Everyone

    …o use query by Apache Drill to find out the following information: The top months based on gross sales SELECT `month`, SUM(order_total) as sales FROM hive.orders GROUP BY `month` ORDER BY sales desc; The top countries or regions based on gross sales SELECT `month`, `state`, SUM(order_total) as…

    https://www.kdnuggets.com/2015/08/apache-drill-big-data-analysis.html

  • Big Data Developer Conference, Santa Clara: Day 2 Highlights

    ...ct and Consultant delivered a remote session showing how to convert NYSE raw data into dashboard using Hadoop ecosystem with no programming using SQL/Hive Query & Tableau. He formatted and uploaded NYSE data to Hadoop using Hue web interface. He performed processing and staging on data using Hive....

    https://www.kdnuggets.com/2015/04/big-data-developer-conference-highlights-day2.html

  • Hadoop and Big Data: The Top 6 Questions Answered

    ...n top of HDFS or stand-alone. As an in-memory engine, Spark is much faster than the traditional MapReduce approach. Spark can process data from HDFS, Hive, Flume and other data sources extremely fast, allowing Hadoop to be an effective streaming or real-time analytics platform. Spark can replace...

    https://www.kdnuggets.com/2016/01/hadoop-and-big-data-questions.html

  • IBM: Big Data Architect

    ...ques. At least 2 years of experience in a consulting environment. At least 2 years of experience in the following components of the Hadoop ecosystem: Hive, HBase, Spark, Storm, YARN, Flume, and/or Oozie. Preferred Technical and Professional Experience: At least 5 years of experience in the Hadoop...

    https://www.kdnuggets.com/jobs/16/05-27-ibm-big-data-architect.html

  • Dataiku Data Science Studio, now also runs on Apache Spark

    ...ts of Spark, PySpark and Spark R ease and speed up the native capabilities found in DSS and make Spark a viable alternative to the traditional Hadoop/Hive stack. (2) It's not just about Volume, it's also about Collaboration When using R or Python local stacks for interactive analysis with advanced...

    https://www.kdnuggets.com/2015/09/dataiku-data-science-studio-now-also-apache-spark.html

  • Interview: Florian Douetteau, Dataiku Founder, on Empowering Data Scientists

    ...ages does it give over a single language approach? Florian Douetteau: The ability to use different languages in one project (from SQL, R or Python to Hive, Pig, or all things Spark) is great for two main reasons: Different languages are more adapted to different parts of the data science workflow –...

    https://www.kdnuggets.com/2016/07/interview-florian-douetteau-dataiku-empowering-data-scientists.html

  • UnitedHealth Group: Hadoop Big Data Developer

    ...able, row and column level security Hadoop-Map R Folder, Processing and Storage Components Compare data across data storage locations: from files, to Hive, to SQL Server Work as technical leader on agile teams, including supporting story creation and maintenance, create right-sized technical...

    https://www.kdnuggets.com/jobs/16/05-25-uhg-hadoop-big-data-developer.html

  • Top Recent Big Data videos on YouTube

    …43 views in total) This Big Data Hadoop Tutorial playlist takes you through various training videos in Hadoop: What is Hadoop, Hadoop tutorial video, Hive tutorial, HDFS tutorial, HBase tutorial, Pig tutorial, Hadoop architecture, MapReduce tutorial, YARN tutorial, Hadoop use-cases, Hadoop…

    https://www.kdnuggets.com/2017/05/top-recent-big-data-videos-youtube.html

  • Top Languages for analytics, data mining, data science

    …than overall population. Here are the languages more likely to be used with R: Julia, 64% more Lisp/Clojure, 41% more GNU Octave, 27% more Pig Latin/Hive/other Hadoop-based languages, 27% more Unix shell/awk/sed, 23% more Python, 13% more Here are the full results: What programming/statistics…

    https://www.kdnuggets.com/2013/08/languages-for-analytics-data-mining-data-science.html

  • Amazon, Customer Segmentation and Targeting: Machine Learning/Research Scientists (All Levels)

    ...rience developing and implementing machine learning algorithms and/or statistical models. Programming, prototyping and scripting skills (Oracle, SQL, Hive, Pig, SAS, R, Weka, Python in Unix/Linux environments). Writing, communication and presentation skills. Proficiency in at least one modern...

    https://www.kdnuggets.com/jobs/14/05-14-amazon-machine-learning-research-scientist.html

  • Health Integrated: Big Data DevOps Engineer

    ...ce tuning and monitoring Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, oozie and hive Experience in administering, and supporting Linux operating systems and hardware in an enterprise environment. (CentOS/RHEL) Expertise in...

    https://www.kdnuggets.com/jobs/14/09-18-healthintegrated-big-data-devops-engineer.html

  • Data Scientists (all levels), Amazon Consumer Analytics

    ...rience developing and implementing machine learning algorithms and/or statistical models. Programming, prototyping and scripting skills (Oracle, SQL, Hive, Pig, SAS, R, Weka, Python). Communication and data presentation skills. Preferred Qualifications A strong track record of innovating through...

    https://www.kdnuggets.com/jobs/13/06-18-amazon-data-scientists-all-levels.html

  • Strata + Hadoop World 2015 San Jose – Day 1 Highlights

    ...- 35% and Java MapReduce - 7%. Managing data on multi-tenant platforms poses various challenges such as: Data shared across tools such as MR, Pig and Hive Schema and semantic knowledge across the company Fine-grained access controls(row/column) vs. all or nothing   They talked about Apache...

    https://www.kdnuggets.com/2015/03/strata-hadoop-2015-san-jose-highlights-day1.html

  • Clarity Solution Group: Data engineers

    ...ble, end to end process to consume large volume, complex data from sources such as Hive, Scribe and 3P APIs. Integrating these datasets together into Hive, Vertica and 3P APIs   Work with business stakeholders and data SMEs to elicit requirements and develop real-time business metrics,...

    https://www.kdnuggets.com/jobs/14/11-07-clarity-us-data-engineers.html

  • SQL-like Query Language for Real-time Streaming Analytics

    ...d for all, and reuse them. Best lesson we can learn from Hive and Hadoop, which does exactly that for batch analytics. I have explained Big Data with Hive many time, most gets it right away. Hive has become the major programming API most Big Data use cases. Following is list of reasons for SQL like...

    https://www.kdnuggets.com/2015/03/sql-query-language-realtime-streaming-analytics.html

  • The Big Data Ecosystem is Too Damn Big">2016 Silver BlogThe Big Data Ecosystem is Too Damn Big

    ...g data conversation to make it all “easier” to use by existing leveraging skillsets, there are too many SQL on big data solutions too. Should you use Hive, or Spark SQL? If you do use Hive, should you use it on MapReduce, or Tez? Plus, don’t forget Impala. Or HAWQ, Apache Drill, Presto and all the...

    https://www.kdnuggets.com/2016/06/big-data-ecosystem-too-damn-big.html

  • 5 Big Data Projects You Can No Longer Overlook

    ...dles dependency resolution, workflow management, visualization etc. Luigi stresses that it does not replace lower-level data-processing tools such as Hive or Pig, but is instead meant to create workflows between numerous tasks. Luigi supports Hadoop out of the box as well, which potentially makes...

    https://www.kdnuggets.com/2016/07/five-big-data-projects-cant-overlook.html

  • Why Not So Hadoop?

    ...r than Hadoop for some applications. For instance, this experiment was conducted at Airbnb comparing a Amazon Redshift 16 node cluster with a 44 node Hive/Hadoop EMR cluster, and the SQL based Redshift outperformed the EMR cluster. The study was done in 2013, and Hadoop has evolved from then with...

    https://www.kdnuggets.com/2016/09/why-not-so-hadoop.html

  • IBM: Data Engineers/Data Scientists

    ...arge data sets using open source technologies such as Programming language (R), Hadoop, Apache Spark, etc. Software development experience with Jaql, Hive, Java, Go, C++ , JSON, Python, XML etc. Cloud-based data engineering experience with PaaS & IaaS CUDAs, FPGAs & HPCs, applied to data science...

    https://www.kdnuggets.com/jobs/16/06-09-ibm-data-engineers-data-scientists.html

  • Microsoft: Principal Data Scientist

    ...do: Maintain and work with our data pipeline that transfers and processes terabytes of data using tools like Spark, Scala, Python, Apache Kafka, Pig/Hive & Impala. Work directly with application teams (such as Xbox, Skype for Business, Microsoft Office 365) to understand their domain and get...

    https://www.kdnuggets.com/jobs/16/10-07-exp-platform-principal-data-scientist.html

  • Intel’s Investments in Cognitive Tech: Impact and New Opportunities

    ...ions could be common in the nearest future. By pursuing local computing, Intel has acquired Saffron, a cognitive computing platform provider, Silicon Hive, a developer of a tool for programming SoC components, and Olaworks, a mobile face recognition company. Indisys, a natural language processing...

    https://www.kdnuggets.com/2016/05/intel-investment-cognitive-tech-impact-new-opportunities.html

  • Top Data Science Courses on Udemy

    By Brendan Martin, LearnDataSci. 2016 has been a great year for both new and older Udemy courses for data science. The instructors have been hard at work keeping their courses updated, and at times even putting in an overhaul of the course material. Udemy courses are great because not only are...

    https://www.kdnuggets.com/2016/04/top-data-science-courses-udemy.html

  • UnitedHealth Group/OptumLabs: Vice President of Optum Data Science Program

    ...nalysis, quantitative analytics, and/or forecasting/predicting analytics 2+ years of applying machine learning using distributed systems like Hadoop, Hive, Spark or similar libraries 2 + years of experience with R or Python 3 years experience leading teams either directly or cross matrixed...

    https://www.kdnuggets.com/jobs/16/03-28-uhg-vice-president-optum-data-science-program.html

  • UnitedHealth Group/OptumLabs: Senior Data Scientist.

    ...nalysis, quantitative analytics, and/or forecasting/predicting analytics 2+ years of applying machine learning using distributed systems like Hadoop, Hive, Spark or similar libraries 2+ years of experience with R or Python Experience working in an Agile Software Development environment Preferred...

    https://www.kdnuggets.com/jobs/16/05-02-uhg-senior-data-scientist.html

  • UnitedHealth Group/OptumLabs: Vice President of Optum Data Science Program.

    ...nalysis, quantitative analytics, and/or forecasting/predicting analytics 2+ years of applying machine learning using distributed systems like Hadoop, Hive, Spark or similar libraries 2 + years of experience with R or Python 3 years experience leading teams either directly or cross matrixed...

    https://www.kdnuggets.com/jobs/16/05-03-uhg-vp-optum-data-science-program.html

  • NEW: 6 Hot Career Prospects in Data Science Industry Today

    …databases and extract data relevant for analysis. Skills Required: Extensive knowledge of data-driven programming languages like R, SAS, Python, SQL, Hive, Spark etc. Proven expertise in distributed computing and statistical modeling. Proficiency in database architecture, process management, data…

    https://www.kdnuggets.com/2016/11/hot-career-prospects-data-science-industry.html

  • R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results

    ...e the Big Data tools and their share in 2016, 2015, and %change. Tool 2016%Share 2015%share % change Hadoop 22.1% 18.4% +20.5% Spark 21.6% 11.3% +91% Hive 12.4% 10.2% +21.3% MLlib 11.6% 3.3% +253% SQL on Hadoop tools 7.3% 7.2% +1.6% H2O 6.7% 2.0% +234% HBase 5.5% 4.6% +18.6% Apache Pig 4.6% 5.4%...

    https://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html

  • Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory

    …er job of organizing the data in a columnar model in HDFS we could significantly improve the performance of Hadoop for analytical jobs, primarily for Hive queries, but for other projects as well. At the time we had around 400M users and were generating over 100TB of compressed data per day, so…

    https://www.kdnuggets.com/2017/02/apache-arrow-parquet-columnar-data.html

  • Ingram Micro: Data Architect

    ...on environments Experience building systems to transition from datasets ranging from gigabytes to terabytes Experience with Big Data tech (e.g., Pig, Hive, Spark, Hbase, Presto, Sqoop, Hadoop, Impala) Implement industry best practices including document principles, policies, standards, and...

    https://www.kdnuggets.com/jobs/17/12-19-ingram-micro-data-infrastructure-architect.html

  • Accenture: Data Science Consultant

    ...inux Bash scripting including sed, awk, cut, uniq, sort, tr SQL Experience of Machine Learning and Big Data technologies such as Hadoop, Mahout, Pig, Hive, etc. Experience in Plotting Graphics (Scatterplots/matrix plots, Line graphs/bar charts, etc.) Experience in Data Analysis Preferred...

    https://www.kdnuggets.com/jobs/17/08-09-accenture-data-science-consultant.html

  • Presto for Data Scientists – SQL on anything

    ...a set of virtual machines or a dedicated cluster on premises. If Hadoop is in the mix, data scientists can rest assured. Presto simply connects to a Hive Metastore allowing users to share the same data with Hive, Spark, and other Hadoop ecosystem tools.An additional benefit of not being dependent...

    https://www.kdnuggets.com/2018/04/presto-data-scientists-sql.html

  • UnitedHealth Group: Data Analytics and Reporting Lead [Minnetonka, MN or Telecommute]

    ...s Work hands-on in SAS (SAS Enterprise Guide) and add new functionality to the tool to meet the needs Work with Big Data technologies like Hadoop and Hive Leverage the proven UnitedHealthcare SAS Fin360 framework and build fully functional innovative solutions while integrating robust...

    https://www.kdnuggets.com/jobs/18/11-16-unitedhealthgroup-data-analytics-reporting-lead.html

  • Virginia Tech: Data Engineer [Blacksburg, VA]

    ...sing Python, Spark, SQL Hands on experience with AWS services – Kinesis, S3, Glue, Lambda, Cloudformation, RDS, EC2, EMR or HDFS, Hadoop Yarn, Hbase, Hive, Pig Hands on experience in ELT/ETL and dimensional data modeling Proficiency in Python and at least one SQL language such as T-SQL or PL/SQL...

    https://www.kdnuggets.com/jobs/19/05-06-virginia-tech-data-engineer.html

  • Celgene: Sr. Manager, Data Lake

    ...ed study, or equivalent experience Minimum of 5-7 years hands-on experience with Information Management and Big Data technologies e.g. Hadoop, Spark, Hive. Robust experience with Cloudera is a plus. Minimum 3-5 years of experience in Cloud environments, preferably AWS Excellent interpersonal skills...

    https://www.kdnuggets.com/jobs/17/07-18-celgene-manager-data-lake.html

  • Unsupervised Investments (II): A Guide to AI Accelerators and Incubators

    ...uity stake’ only if the startup raised funding within 12months from the end of the program. Not sure how this changed for the Global AI+ program; The Hive (Bay area): they define it as a ‘co-creation studio to build and launch startups’ in AI (subdivided in deep learning, blockchain, AR, ‘ambient...

    https://www.kdnuggets.com/2017/05/unsupervised-investments-guide-ai-accelerators-incubators.html

  • A Concise Overview of Recent Advances in the Internet of Things (IoT)

    ...ts are expected to reach $100bn by 2018! 2. Smart homes The Internet of Things is also about all smart home appliances from smart thermostats such as Hive, intelligent fridges to other connected sockets. One product that produced a lot of hype this year is the Amazon Echo. It’s basically a wireless...

    https://www.kdnuggets.com/2017/01/grakn-year-review-iot-internet-things.html

  • UnitedHealth Group/OptumLabs: Senior Data Scientist

    ...nalysis, quantitative analytics, and/or forecasting/predicting analytics 2+ years of applying machine learning using distributed systems like Hadoop, Hive, Spark or similar libraries 2+ years of experience with R or Python Experience working in an Agile Software Development environment Preferred...

    https://www.kdnuggets.com/jobs/16/03-28-uhg-senior-data-scientist.html

  • What Top Firms Ask: 100+ Data Science Interview Questions

    …solve the problem when the list does not fit in memory? Capital One Data Engineer What is Hadoop serialization? Explain a simple Map/Reduce problem. Hive: LinkedIn Data Engineer Write a Hive UDF that returns a sentiment score. For example, if good = 1, bad = -1, and average = 0, then a review of a…

    https://www.kdnuggets.com/2017/03/top-firms-100-data-science-interview-questions.html

  • HDFS vs. HBase : All you need to know">Silver Blog, May 2017HDFS vs. HBase : All you need to know

    …h analytics to gain SKU level insights, and involved recursive/sequential calculations. HDFS and MapReduce frameworks were better suited than complex Hive queries on top of Hbase. MapReduce was used for data wrangling and to prepare data for subsequent analytics. Hive was used for custom analytics…

    https://www.kdnuggets.com/2017/05/hdfs-hbase-need-know.html

  • How to Make Your Database 200x Faster Without Having to Pay More

    …ples and an efficient statistical technique for estimating the error. The latest code is a bit outdated and only works for earlier versions of Apache Hive. The project is not currently active. Verdict Verdict is a middleware that sits between your application or BI tool and your backend SQL…

    https://www.kdnuggets.com/2016/11/make-database-200x-faster.html

  • Apple: Senior Data Scientist, Retail – Online

    ...ed econometric procedures. have working knowledge of SAS - Enterprise Guide/Miner. experience with one or more of the following is desirable: Hadoop, Hive, NoSQL, Spark, Hive, Mahout, Impala, Pig, Cascading, Theano Data Visualization with Tableau     To be successful in this position, you...

    https://www.kdnuggets.com/jobs/14/12-07-apple-senior-data-scientist-retail-online.html

  • Top Spark Ecosystem Projects

    ...umns, similar to a relational table Spark SQL - execute SQL queries written using either a basic SQL syntax or HiveQL, and read data from an existing Hive installation Spark Streaming - an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of...

    https://www.kdnuggets.com/2016/03/top-spark-ecosystem-projects.html

  • Big Data TechCon – Great How-To Conference

    ...rs, a full-day crash course on Hadoop, an introduction to Neo4j, and a class on Cassandra, among others. For those in the middle, there were talks on Hive and Pig, HBase, various introductions to NoSQL databases for SQL pros, and so forth. For advanced developers, there were all manner of entrees:...

    https://www.kdnuggets.com/2014/04/big-data-techcon-great-how-to-conference.html

  • Thomson Reuters: Data Scientist (Data Innovation Lab)

    ...ing techniques, classification, clustering and collaborative filtering Work experience with one or more of the following: Big data analytics (Hadoop, Hive, NoSQL, Spark, Shark, Hive, Mahout, Impala, Solr, HBase, Pig, Cascading) Information extraction, data mining, or machine learning  ...

    https://www.kdnuggets.com/jobs/14/06-13-thomsonreuters-data-scientist-data-innovation-lab.html

  • 18 essential Hadoop tools

    ..., the basic framework for splitting data across a cluster underpinning Hadoop. Apache HBase, a table-oriented database built on top of Hadoop. Apache Hive, a data warehouse built on top of Hadoop that makes data accessible through an SQL-like language. Apache Sqoop, a tool for transferring data...

    https://www.kdnuggets.com/2014/08/18-essential-hadoop-tools.html

  • Thomson Reuters: Data Scientist

    ...la, Java, C#, etc) Experience working with scripting languages like Python, Experience with one or more of the following: Big data analytics (Hadoop, Hive, NoSQL, Spark, Shark, Hive, Mahout, Impala, Solr, HBase, Pig, Cascading) Information extraction, data mining, or machine learning Data...

    https://www.kdnuggets.com/jobs/14/10-02-thomsonreuters-data-scientist.html

  • Book Review: Data Just Right

    ...that examples of ETL are not relevant or may be distracting or cumbersome. Chapter 5 is when the book moves into a higher gear by addressing Hadoop, Hive and the relatively newer Shark. There are no examples of Map Reduce, perhaps there being too many examples of Map Reduce already. Fortunately...

    https://www.kdnuggets.com/2014/04/book-review-data-just-right.html

  • KDnuggets Exclusive: Interview with Paco Nathan, Chief Scientist at Mesosphere

    ...ked up the torch and defined this approach even better than their foundation technology, Cascading. Meanwhile, the industry was focused on using Pig, Hive, etc., which represent enormous steps backwards in terms of formalizing data workflow abstractions. Seriously, I’d pay large sums of money to...

    https://www.kdnuggets.com/2014/03/exclusive-paco-nathan-mesosphere-big-data-player.html

  • Machine Learning Engineer

    …velopment. Familiarity working with large-scale datasets (such as Twitter, blogs, Facebook, etc.) and big data techniques (such as Hadoop, MapReduce, Hive, Hbase) would be a plus. PhD or MS degree specializing in a relevant field such as Statistics, Machine Learning, Data Mining. Deep understanding…

    https://www.kdnuggets.com/jobs/13/08-28-adobe-machine-learning-engineer.html

  • Top KDnuggets tweets, May 3-5: Social network analysis of Boston Marathon Bomber; Hadoop Toolbox: When to use what

    ...Boston Marathon Bomber Dzhokhar Tsarnaev and his friends bit.ly/18lBbCM Most Favorited: Hadoop Toolbox: When to use what - a guide to Hadoop, Hbase, Hive, Pig, Sqoop, Oozie, Flume, Avro, ... bit.ly/18khThf Top 10 Tweets What social network analysis says about Boston Marathon Bomber Dzhokhar...

    https://www.kdnuggets.com/2013/05/top-tweets-may03-may05.html

  • Data Scientist

    ...ionate members to our development team. We need your expertise for these tasks:Design and develop end to end innovative data mining platform (hadoop, hive, storm, etc.) from the concept up to operations Use latest data mining technologies such as artificial intelligence / machine learning...

    https://www.kdnuggets.com/jobs/13/05-22-swisscom-senior-software-engineer-big-data.html

  • Machine Leaning Engineer/Data Scientist

    ...rogramming. Familiarity working with large-scale datasets (such as Twitter, blogs, Facebook, etc) and big data techniques (such as Hadoop, MapReduce, Hive, Hbase) would be a plus. Responsibilities Develop and implement ad-targeting models that use advanced statistical and machine learning...

    https://www.kdnuggets.com/jobs/13/11-20-adobe-data-scientist.html

  • Cray: Senior Data Scientist

    ...me analytics and big data platforms like Hadoop Experience with data manipulation and analysis using SQL, noSQL, Java, C, SAS EGuide, MapReduce, PIG, HIVE, Python, SAS, SAS HPA, SAS Visual Analytics, SAS EMiner, Salford TreeNet, R, and machine learning approaches such as Mahout Strong theoretical...

    https://www.kdnuggets.com/jobs/14/10-07-cray-senior-data-scientist.html

  • Guide to Data Science Cheat Sheets

    ...s.sourceforge.net/matlab-python-xref.pdf Cheat Sheets for SQL SQL Joins www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins SQL and Hive hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf Additional Cheat Sheets for Java...

    https://www.kdnuggets.com/2014/05/guide-to-data-science-cheat-sheets.html

  • Spark SQL for Real-Time Analytics

    …with Spark’s functional programming API. Spark SQL has been part of Spark Core since version 1.0. It runs HiveQL/SQL alongside or replacing existing hive deployments. It can connect to existing BI Tools. It has bindings in Python, Scala and Java. It makes two vital additions to the framework….

    https://www.kdnuggets.com/2015/09/spark-sql-real-time-analytics.html

  • Interview: Joseph Babcock, Netflix on Genie, Lipstick, and Other In-house Developed Tools

    ...s (AWS), specifically the Simple Storage Service (S3) file system that backs our Hadoop cluster. Production ETL jobs are largely written in Pig, with Hive used for adhoc analyses and Presto for interactive analytics. We also maintain a Teradata cloud instance, which is the backend for many of our...

    https://www.kdnuggets.com/2015/06/interview-joseph-babcock-netflix-in-house-developed-tools.html

  • SanDisk: Senior Staff Hadoop Developer

    ...gning the entire "Big Data" Stack and platform. Skills Required. Extensive knowledge about Hadoop Architectures and HDFS. Java/C++, Map Reduce HBase, Hive, PIG, Oozie, Mahout, Zookeeper, Flume, Solr, ElasticSearch, Storm/Spark Leading the learning/understanding and knowledge of very complex...

    https://www.kdnuggets.com/jobs/16/01-20-sandisk-senior-staff-hadoop-developer.html

  • Spark and the Remorseless Recrystallization of the Open Source Analytics Ecosystem

    …es, which are widely adopted in many big-data analytics initiatives now. What joins Spark to Hadoop are that fact that they both include HDFS, HBase, Hive, Ambari, Mahout, Pig, and Cassandra as key components of their respective ecosystems. How do they differ? Unlike Spark, Hadoop also includes…

    https://www.kdnuggets.com/2016/01/spark-crystallization-open-source-analytics-ecosystem.html

  • The Post-Hadoop World: New Kid On The Block Technologies

    ...ria (CTO & Founder, 6sense) Over the past six years, we’ve seen a rapid evolution in data processing platforms and technologies. While Hadoop and Hive remain core components of the data processing toolkit, a new breed of emerging technologies is changing the way we work with and use data. While...

    https://www.kdnuggets.com/2015/02/post-hadoop-world-new-technologies.html

  • Interview: James Taylor, Salesforce on Phoenix + HBase – The Future of Big Data

    ...ew trends, just top off my head: General adoption of Hadoop by all companies Standardization of using SQL to access big data (Phoenix, Drill, Impala, Hive, etc.) Requirements around being able to access data in a low latency manner (Phoenix, Spark, Storm, HBase) Adoption of Apache Calcite as a...

    https://www.kdnuggets.com/2015/06/interview-james-taylor-salesforce-phoenix-hbase.html

  • SanDisk: Senior Big Data Engineer/Hadoop Developer

    ...gning the entire "Big Data" Stack and platform. Skills Required. Extensive knowledge about Hadoop Architectures and HDFS. Java/C++, Map Reduce HBase, Hive, PIG, Oozie, Mahout, Zookeeper, Flume, Solr, ElasticSearch, Storm/Spark Leading the learning/understanding and knowledge of very complex...

    https://www.kdnuggets.com/jobs/16/02-03-sandisk-big-data-engineer-hadoop.html

  • Interview: Michael Lurye, Time Warner Cable on Key Lessons from Shifting to Hadoop

    ...a or Scala. But we are a BI shop and our developer skills are SQL and ETL tools, not Java. While Hadoop comes with higher-level tools such as Pig and Hive, we do not believe that converting several million lines of SQL code to a similar amount of Pig and Hive code would make sense for us. We...

    https://www.kdnuggets.com/2015/04/interview-michael-lurye-twc-key-lessons-hadoop.html

  • KDnuggets™ News 15:n05, Feb 11: Annual Salary Poll; 10 things statistics teaches about Big Data; Data Science Jargon

    ...logy, and Animals fit together - Feb 5, 2015. How Big Data Pieces and animals fit together: MapReduce, HDFS, Apache Spark,, Pregel, Zookeeper, Flume, Hive, Pig, and more, explained by a Quora (and past Facebook) Data Scientist.    Opinions  (see also All Opinions for this month...

    https://www.kdnuggets.com/2015/n05.html

  • 10 things statistics taught us about big data analysis

    ...ms with the data. To do this you need to interact with the data quickly. One way to do this is to analyze the whole data set at once using tools like Hive, Hadoop, or Pig. But an often easier, better, and more cost effective approach is to use random sampling . As Robert Gentleman put it "make big...

    https://www.kdnuggets.com/2015/02/10-things-statistics-big-data-analysis.html

  • What do you want to learn? Big Data TechCon How-To Conference, Apr 26-28, Boston

    ...(40%) Deep learning / Machine Learning (40%) Hadoop and other things in Hadoop stack (30%) Python (20%)   Other topics included Graph Databases, Hive, Internet of Things, NoSQL Databases, R, Testing Analytics Models, and Text Mining. Below is more information about Big Data TechCon. Big Data...

    https://www.kdnuggets.com/2015/04/big-data-techcon-conference-april-boston.html

  • Big Data Developer Conference, Santa Clara: Day 1 Highlights

    ...d latest in Big Data Ecosystem. The conference tutorials and talks covered a wide variety of topics including Hadoop, Lambda Architecture, MapReduce, Hive, Pig, Spark, MongoDB, etc. Highlights from Day 1(Monday, March 23): The first day of the conference started with introduction to Big Data...

    https://www.kdnuggets.com/2015/04/big-data-developer-conference-highlights-day1.html

  • Karmasphere, Zementis make PMML models available for Hadoop

    ...redictive models, making them highly portable. Karmasphere and Zementis take this one step further by transforming PMML models into industry standard Hive UDF's for Hadoop. Analysts can now use Karmasphere to deploy their statistical models natively on Hadoop, across all their data and dimensions,...

    https://www.kdnuggets.com/2013/06/karmasphere-zementis-make-pmml-models-available-for-hadoop.html

  • KDnuggets Annual Software Poll:RapidMiner and R vie for first place

    ...ned relatively stable, at about 14%, vs 15% in 2012, but only 3% in 2011. The most popular Big Data tools were Big Data Software: Hadoop/ Hbase/ Pig/ Hive, 9.3% MongoDB, 4.3% Other Big DataData/Cloud analytics software, 3.2% Other NoSQL Databases, 2.0% Only one vote was received for an interesting...

    https://www.kdnuggets.com/2013/06/kdnuggets-annual-software-poll-rapidminer-r-vie-for-first-place.html

  • KDnuggets Annual Software Poll:RapidMiner and R vie for first place

    ...ned relatively stable, at about 14%, vs 15% in 2012, but only 3% in 2011. The most popular Big Data tools were Big Data Software: Hadoop/ Hbase/ Pig/ Hive, 9.3% MongoDB, 4.3% Other Big DataData/Cloud analytics software, 3.2% Other NoSQL Databases, 2.0% Only one vote was received for an interesting...

    https://www.kdnuggets.com/2013/06/kdnuggets-annual-software-poll-rapidminer-r-vie-for-first-place.html

  • Top KDnuggets tweets, Feb 20-21: 10 R packages every data scientist should know; Data Science vs Big Data

    ...t plans for Hadoop, machine learning, HPC, #BigData & analytics on Azure. bit.ly/12NGMkm Great evaluation insights: how MarkedUp chose Cassandra, Hive, & Hadoop, but not MongoDB, Riak #Analytics #BigData bit.ly/YpvuuO Every known meteorite fall on earth mapped - are there any patterns?...

    https://www.kdnuggets.com/2013/02/top-tweets-feb20-21.html

  • Microsoft REEF, new open source big data framework

    …s, Microsoft Hadoop has become a key building block in the new generation of scale-out systems. Early versions of analytic tools over Hadoop, such as Hive and Pig for SQL-like queries, were implemented by translation into Map-Reduce computations. This approach has inherent limitations, and the…

    https://www.kdnuggets.com/2013/08/microsoft-reef-new-open-source-big-data-framework.html

  • Data Mining / Analytic News, Aug 2013

    ...nguages continue to be R (used by 61% of KDnuggets readers), Python (39%), and SQL (37%). SAS is stable at around 20%. The highest growth was for Pig/Hive/Hadoop-based languages, R, and SQL, while Perl, C/C++, and Unix tools declined. We also find a small affinity between R and Python users....

    https://www.kdnuggets.com/2013/08/index.html

  • KDnuggets™ News 13:n12, May 8

    ..., 2013.What social network analysis says about Boston Marathon Bomber Dzhokhar Tsarnaev; Hadoop Toolbox: When to use what - a guide to Hadoop, Hbase, Hive, Pig, Sqoop, Oozie, Flum; TweetMap - a fantastic tool to visualize and map tweets in real-time (goodbye, privacy?); 5 free Excel add-ins to help...

    https://www.kdnuggets.com/2013/n12.html

  • KDnuggets™ News 13:n16, Jul 3

    ...ing. Karmasphere, Zementis make PMML models available for Hadoop - Jun 26, 2013.Karmasphere and Zementis transform PMML models into industry standard Hive UDF's for Hadoop, to allow data analysts easily use existing models, including from SAS, SPSS and R. Datameer 3.0: first ever point-click...

    https://www.kdnuggets.com/2013/n16.html

  • KDnuggets™ News 13:n21, Aug 28

    ...nguages continue to be R (used by 61% of KDnuggets readers), Python (39%), and SQL (37%). SAS is stable at around 20%. The highest growth was for Pig/Hive/Hadoop-based languages, R, and SQL, while Perl, C/C++, and Unix tools declined. We also find a small affinity between R and Python users. Forbes...

    https://www.kdnuggets.com/2013/n21.html

  • KDnuggets™ News 14:n08, Apr 3

    ...#8221;; Coursera free #DataScience courses - Mar 28, 2014. Also free ebooks on Practical Machine Learning: Innovations in Recommendations, and Apache Hive - How to access big data on Hadoop with SQL/HiveQL. Top KDnuggets tweets, Mar 24-25: Is a Data Science Certificate sufficient? Kaggle branches...

    https://www.kdnuggets.com/2014/n08.html

  • KDnuggets™ News 14:n12, May 21

    ...ata, Data Science - May 19 and beyond - May 19, 2014. Data Mining: FTL; Deep Learning with H2O; Purchase history to Customer Projects; Apache Hadoop, Hive, Kafka, Solr; Python for Big Data Analytics, and more. Courses Northwestern Online MS in Predictive Analytics - May 15, 2014. Prepare for...

    https://www.kdnuggets.com/2014/n12.html

Refine your search here:

Sign Up

By subscribing you accept KDnuggets Privacy Policy