Search results for apache pig

    Found 76 documents, 14200 searched:

  • Will Apache Spark Finally Advance Genomic Data Analysis?

    Spark has been useful in mapping out genetic traits that can be associated with certain diseases and the genetic makeup of microorganisms that live in our bodies.

  • Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory

    Apache Parquet and Apache Arrow both focus on improving performance and efficiency of data analytics. These two projects optimize performance for on disk and in-memory processing

  • Dataiku Data Science Studio, now also runs on Apache Spark

    Dataiku Data Science Studio version 2.1 has many useful features for Data Scientists, including integration with Apache Spark.

  • Working with Big Data: Tools and Techniques

    Where do you start in a field as vast as big data? Which tools and techniques to use? We explore this and talk about the most common tools in big data.

  • Platinum BlogHow to Become More Marketable as a Data Scientist">Silver BlogPlatinum BlogHow to Become More Marketable as a Data Scientist

    As a data scientist, you are in high demand. So, how can you increase your marketability even more? Check out these current trends in skills most desired by employers in 2019.

  • Going deeper with recurrent networks: Sequence to Bag of Words Model

    Deep learning makes it possible to convert unstructured text to computable formats, incorporating semantic knowledge to train machine learning models. These digital data troves help us understand people on a new level.

  • 75 Big Data Terms to Know to Make your Dad Proud

    Here is a good list of 75 Big Data terms you can use to impress your father, even if you already bought him a gift.

  • The top 5 Big Data courses to help you break into the industry

    Here is an updated and in-depth review of top 5 providers of Big Data and Data Science courses: Simplilearn, Cloudera, Big Data University, Hortonworks, and Coursera

  • R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results

    R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R. RapidMiner remains the most popular general Data Science platform. Big Data tools used by almost 40%, and Deep Learning usage doubles.

  • Hadoop Key Terms, Explained

    An straightforward overview of 16 core Hadoop ecosystem concepts. No Big Picture discussion, just the facts.

  • Strata + Hadoop World 2015 San Jose – Day 1 Highlights

    Here are the quick takeaways and valuable insights from selected talks at one of the most reputed conferences in Big Data – Strata + Hadoop World 2015, San Jose.

  • 18 essential Hadoop tools

    Hadoop tools develop at a rapid rate, and keeping up with the latest can be difficult. Here we detail 18 of the most essential tools that work well with Hadoop.

  • Gold BlogThe 20 Python Packages You Need For Machine Learning and Data Science">Rewards BlogGold BlogThe 20 Python Packages You Need For Machine Learning and Data Science

    Do you do Python? Do you do data science and machine learning? Then, you need to do these crucial Python libraries that enable nearly all you will want to do.

  • Gold BlogPath to Full Stack Data Science">Rewards BlogGold BlogPath to Full Stack Data Science

    Start your journey toward mastering all aspects of the field of Data Science with this focused list of in-depth self-learning resources. Curated with the beginner in mind, these recommendations will help you learn efficiently, and can also offer existing professionals useful highlights for review or help filling in any gaps in skills.

  • Data Analysis Using Scala

    It is very important to choose the right tool for data analysis. On the Kaggle forums, where international Data Science competitions are held, people often ask which tool is better. R and Python are at the top of the list. In this article we will tell you about an alternative stack of data analysis technologies, based on Scala.

  • Containerization of PySpark Using Kubernetes

    This article demonstrates the approach of how to use Spark on Kubernetes. It also includes a brief comparison between various cluster managers available for Spark.

  • Skills to Build for Data Engineering">Silver BlogSkills to Build for Data Engineering

    This article jumps into the latest skill set observations in the Data Engineering Job Market which could definitely add a boost to your existing career or assist you in starting off your Data Engineering journey.

  • The Most In Demand Tech Skills for Data Scientists

    By the end of this article you’ll know which technologies are becoming more popular with employers and which are becoming less popular.

  • Top 13 Skills To Become a Rockstar Data Scientist">Platinum BlogTop 13 Skills To Become a Rockstar Data Scientist

    Education, coding, SQL, big data platforms, storytelling and more. These are the 13 skills you need to master to become a rockstar data scientist.

  • The Data Science Gold Rush: Top Jobs in Data Science and How to Secure Them

    Because big data touches almost every industry across the board, those who aren’t already working in data and analytics will soon be utilizing the technology for its undeniable business benefits. Whichever way you slice it, the future of work is through data.

  • Things you should know when traveling via the Big Data Engineering hype-train

    Maybe you want to join the Big Data world? Or maybe you are already there and want to validate your knowledge? Or maybe you just want to know what Big Data Engineers do and what skills they use? If so, you may find the following article quite useful.

  • UnitedHealth Group: Big Data Engineering Lead (Eden Prairie, MN)

    Seeking strong leaders who are collaborative, self -starters, take ownership / accountability and drive results. The Lead Big Data Engineer will involve managing one or more Scrum teams, provide technical leadership and work with Product Owners/Functional experts and Senior Management.

  • 9 Must-have skills you need to become a Data Scientist, updated">Platinum Blog9 Must-have skills you need to become a Data Scientist, updated

    Check out this collection of 9 (plus some additional freebies) must-have skills for becoming a data scientist.

  • The Two Sides of Getting a Job as a Data Scientist">Gold BlogThe Two Sides of Getting a Job as a Data Scientist

    Are you a Data Scientist looking for a Job? Are you a Recruiter looking for a Data Scientist? If you answered yes or NO to this questions you need to read this.

  • A Beginner’s Guide to Data Engineering  –  Part I">Silver BlogA Beginner’s Guide to Data Engineering  –  Part I

    Data Engineering: The Close Cousin of Data Science.

  • 277 Data Science Key Terms, Explained">Silver Blog, Sep 2017277 Data Science Key Terms, Explained

    This is a collection of 277 data science key terms, explained with a no-nonsense, concise approach. Read on to find terminology related to Big Data, machine learning, natural language processing, descriptive statistics, and much more.

  • Top Recent Big Data videos on YouTube

    Top viewed videos on Big Data since 2015 include Big Data use cases in psychographics, sports, politics and data monetisation.

  • Grunion, Query Optimization Tool for Data Science and Big Data

    Grunion is a patent-pending query optimization, translation, and federation framework built to help bridge the gap between data science and data engineering teams. Read more to request access.

  • Predictions for Data Science in 2017

    Our predictions include: 2017 will be the year of Deep Learning (DL) technology, Artificial General Intelligence is still far away, Software and Hardware Progress will accelerate, and AI will have unexpected socio-political implications.

  • 5 Career Paths in Big Data and Data Science, Explained">Silver Blog, 20175 Career Paths in Big Data and Data Science, Explained

    Sexiest job... massive shortage... blah blah blah. Are you looking to get a real handle on the career paths available in "Data Science" and "Big Data?" Read this article for insight on where to look to sharpen the required entry-level skills.

  • 50+ Data Science, Machine Learning Cheat Sheets, updated">2016 Dec Gold Blog50+ Data Science, Machine Learning Cheat Sheets, updated

    Gear up to speed and have concepts and commands handy in Data Science, Data Mining, and Machine learning algorithms with these cheat sheets covering R, Python, Django, MySQL, SQL, Hadoop, Apache Spark, Matlab, and Java.

  • Top 12 Interesting Careers to Explore in Big Data

    From data driven strategies to decision making, the true worth of Big Data has been realized, and has led to opening up of amazing career choices. Check out these 12 interesting careers to explore in Big Data.

  • Microsoft: Principal Data Scientist

    Microsoft is seeking intelligent people to dive into data, make sense of it, and leverage data to solve large-scale problems Microsoft’s products.

  • Dataiku DSS 3.1 – Now with 5 ML Backends & Scala!

    Introducing Dataiku DSS 3.1, with new visual machine learning engines that allow users to create incredibly powerful predictive applications within a code-free interface.

  • 5 Big Data Projects You Can No Longer Overlook

    Check out 5 Big Data projects that you are not likely to have seen before, but which may be useful to you, and perhaps even scratch an itch you didn't know you had.

  • Interview: Florian Douetteau, Dataiku Founder, on Empowering Data Scientists

    Here is an interview with Florian Douetteau, founder of Dataiku, on how their tools empower data scientists, and how data science itself is evolving.

  • Top Data Science Courses on Udemy

    An overview of the very best that Udemy has to offer in data science education. Includes courses covering machine learning, Python, Hadoop, visualization, and more.

  • Top Big Data Processing Frameworks

    A discussion of 5 Big Data processing frameworks: Hadoop, Spark, Flink, Storm, and Samza. An overview of each is given and comparative insights are provided, along with links to external resources on particular related topics.

  • Spark and the Remorseless Recrystallization of the Open Source Analytics Ecosystem

    Apache Spark had robust machine learning, graph, streaming, and in-memory capability to the Hadoop-centric ecosystem. In 2016, we expect adoption in diverse big data, advanced analytics, data science, Internet of Things, and other application domains.

  • Jet: Big Data Engineer

    You'll be responsible for helping to build a world class data platform to collect, process, and manage a vast amount of information generated by Jet's rapidly growing business.

  • Spark + SETI: Amping up Spark SQL with Parquets

    Spark SQL is a great component for data scientists as it simplifies the querying large distributed datasets. Learn how to integrate it with Parquets, which we have found to significantly improve the performance of sparse-column queries.

  • Best Data Science Online Courses

    The number of online data science courses have exploded in recent years and there courses for any needs. Here is a extensive list of free and paid courses from Coursera, DataCamp, Dataquest, edX, Udacity, Udemy, and other major providers.

  • AspenTech: Data Scientist

    If you want a shot at greatness, as a member of the data science team, you will develop and investigate hypotheses, structure experiments and build mathematical models to understand data patterns and relationships and prescribe actions and options.

  • Spark SQL for Real-Time Analytics

    Apache Spark is the hottest topic in Big Data. This tutorial discusses why Spark SQL is becoming the preferred method for Real Time Analytics and for next frontier, IoT (Internet of Things).

  • NYC Data Science Academy courses & bootcamps in Data Engineering, Data Science, R, Python, and Machine Learning

    Upcoming training from NYC Data Science Academy: 6-Week Intensive Data Engineering Bootcamp, 12-Week Data Science Bootcamp, courses in R, Python, Data Science and Machine Learning, and more.

  • Big Data – yes, that’s what a latest Sensational Rap Music Video is all about

    Music video featuring Big Data and Hadoop (and Map-Reduce and NoSQL) might be all you need to light up your day!

  • 50+ Data Science and Machine Learning Cheat Sheets

    Gear up to speed and have Data Science & Data Mining concepts and commands handy with these cheatsheets covering R, Python, Django, MySQL, SQL, Hadoop, Apache Spark and Machine learning algorithms.

  • R leads RapidMiner, Python catches up, Big Data tools grow, Spark ignites

    R is the most popular overall tool among data miners, although Python usage is growing faster. RapidMiner continues to be most popular suite for data mining/data science. Hadoop/Big Data tools usage grew to 29%, propelled by 3x growth in Spark. Other tools with strong growth include H2O (0xdata), Actian, MLlib, and Alteryx.

  • Big Data Bootcamp, Austin: Day 3 Highlights

    Highlights from the presentations by Big Data and Analytics leaders/consultants on day 3 of Big Data Bootcamp in Austin.

  • SEIU 775: Data Scientist (Research and Analytics)

    Develop and build an ambitious portfolio of predictive modeling projects including learning outcomes, matching algorithms to connect home care aides with their consumers, and personalizing care plans for individual long-term care consumers.

  • Big Data Developer Conference, Santa Clara: Day 3 Highlights

    Highlights from the presentations/tutorials by Data Science leaders from VISA, Glassbeam, Unravel on day 3 of Big Data Developer Conference, Santa Clara.

  • Big Data Developer Conference, Santa Clara: Day 1 Highlights

    Highlights from the presentations/tutorials by Data Science leaders from ElephantScale, SciSpike, Twitter and Informatica on day 1 of Big Data Developer Conference, Santa Clara

  • Strata + Hadoop World 2015 San Jose – Day 2 Highlights

    Strata + Hadoop World 2015 was a great conference, and here are key insights from some of the best sessions on day 2.

  • HomeUnion: Big Data Technical Project Manager

    Big Data TPM plays a key role in delivering all Big Data related product releases, and is familiar with Big Data technologies and Agile methodologies.

  • KDnuggets™ News 15:n05, Feb 11: Annual Salary Poll; 10 things statistics teaches about Big Data; Data Science Jargon

    KDnuggets Annual Analytics/Data Science salary poll; 10 things statistics taught us about big data analysis; Data Science's Most Used, Confused, and Abused Jargon; Top 30 people in Big Data and Analytics; and more news, software, opinions, interviews, webcasts, courses, jobs, academic, publications, top tweets, and CFP.

  • How Big Data Pieces, Technology, and Animals fit together

    How Big Data Pieces and animals fit together: MapReduce, HDFS, Apache Spark,, Pregel, Zookeeper, Flume, Hive, Pig, and more, explained by a Quora (and past Facebook) Data Scientist.

  • Interview: Nandu Jayakumar, Yahoo on What Does One Need for Big Data Success

    We discuss Yahoo’s contributions to Big Data ecosystem, recommendation to Big Data vendors, predictions for Big Data, advice, and more.

  • KDnuggets™ News 14:n35, Dec 29

    Features | Software | Opinions | Interviews | News | Courses | Meetings | Jobs | Academic | Tweets | CFP | Quote Features 2015 Read more »

  • Top KDnuggets tweets, Dec 24-25: 24 Data Science, Machine Learning Resources; Pregnant women can guess their children sex

    Pregnant women can intuit the sex of their children; #Pig and #Python can't fly but can predict Airline delays; 24 #DataScience, #Statistics, #MachineLearning Resources; The 50 Most Innovative CS Depts in USA.

  • R and Hadoop make Machine Learning Possible for Everyone

    R and Hadoop make machine learning approachable enough for inexperienced users to begin analyzing and visualizing interesting data to start down the path in this lucrative field.

  • Apple: Data Scientist

    The Maps Supply Chain Management team is looking for a skilled, experienced, and motivated Data Scientist to help us discover patterns and quality issues with our local data sets.

  • Big Data and Hadoop, Big Data Boot Camp LA

    Big Data Boot Camp LA provided attendees a comprehensive understanding of Big Data and Hadoop technologies. Sujee Maniyam provided a good technical overview of Hadoop and current trends. We provide key takeaways.

  • ACM Data Science Camp, San Jose, Oct 25

    ACM Data Mining Camps, held since 2009, are renamed this year to Data Science Camp to be more inclusive of Big Data & Data Science. Propose or organize sessions and attend this great meeting.

  • KDnuggets Analytics, Data Mining, Data Science Software Poll – Analyzed

    We analyze the results of KDnuggets Software Poll, including correlations between tools, and relationships between commercial, free, and Hadoop/Big Data tools. We identify a potential capability gap. Download anonymized data and analyze it yourself.

  • Stanford University: Data Analyst

    Work with wide-range of challenges by analyzing unique expenditure datasets, produce insights to help reduce spending and improve reimbursements, payments, and contractors payments.

  • Big Data BootCamp: Highlights of talks on Day 3

    Highlights from the presentations by big data technology practitioners from Hortonworks, Intel, Rackspace, SciSpike, and Yahoo at Big Data Bootcamp 2014 in Santa Clara.

  • Big Data BootCamp Santa Clara: Highlights of talks on Days 1-2

    Highlights from the presentations by big data technology practitioners from Caspida, Datastax, ElephantScale, Hortonworks, MapR and Qubole at Big Data Bootcamp 2014 in Santa Clara.

  • Book Review: Data Just Right

    An introduction to technology and software at play in the current quest to define the Big Data Analytics computing paradigm, the book Data Just Right is reviewed in detail here.

  • KDnuggets Exclusive: Interview with Paco Nathan, Chief Scientist at Mesosphere

    KDnuggets talks with Paco Nathan, computer scientist, OSS developer, author, and advisor about Apache Mesos, Cascading, his books and Big Data trends.

  • 10 New Year resolutions for CIOs who want to take the Big Data plunge in 2014

    The Big Data hype is everywhere, but many CIOs aren’t sure how to take their first steps toward adopting Big Data. Here are 10 New Year’s resolutions for CIOs who want to take the Big Data plunge in 2014.

  • Data Mining / Analytic Publications News, Jan 2013

    Features (14) | Software (6) | Courses, Events (14) | Jobs | Academic | Competitions (9) | Publications (26) | News Briefs (8) Big Data Myth Read more »

  • Top KDnuggets tweets, Jan 21-23: Free BigData education, Coursera “pseudo-degree”; What is Hadoop, MapReduce, HDFS

    Free #BigData education, including Coursera "pseudo-degree" program for Data Science ; Free #BigData Education: Technical perspective - Learn what is Hadoop, MapReduce, HDFS, Pig; New Book: R and Data Mining: Examples and Case Studies; How significant is columnar storage for Big data analytics - an explanation

  • Microsoft REEF, new open source big data framework

    REEF (Retainable Evaluator Execution Framework) is a big data framework that sits on top of Hadoop new YARN resource manager, and is especially well suited for building machine learning jobs.

  • KDnuggets Annual Software Poll:RapidMiner and R vie for first place

    The 2013 KDnuggets Software Poll was marked by a battle between RapidMiner and R for the first place. Surprisingly, commercial and free software maintained parity, with about 30% using each exclusively, and 40% using both. Only 10% used their own code - is analytics software maturing? Real Big Data is still done by a minority - only 1 in 7 used Hadoop or similar tools, same as last year.

  • KDnuggets Annual Software Poll:RapidMiner and R vie for first place

    The 2013 KDnuggets Software Poll was marked by a battle between RapidMiner and R for the first place. Surprisingly, commercial and free software maintained parity, with about 30% using each exclusively, and 40% using both. Only 10% used their own code - is analytics software maturing? Real Big Data is still done by a minority - only 1 in 7 used Hadoop or similar tools, same as last year.

  • KDnuggets™ News 13:n02, Jan 30

    Features (10) | Software (4) | Courses, Events (2) | Webcasts (3) | Jobs (12) | Academic (5) | Competitions (4) | Publications (12) | NewsBriefs Read more »

Refine your search here: