- Vaex: Pandas but 1000x faster - May 17, 2021.
If you are working with big data, especially on your local machine, then learning the basics of Vaex, a Python library that enables the fast processing of large datasets, will provide you with a productive alternative to Pandas.
- How to get started managing data quality with SQL and scale - May 4, 2021.
Silent data quality issues are the biggest problem facing data teams today, who are flying blind with no systems or processes in place to monitor and detect bad data before it has a downstream impact.
- How Uber manages Machine Learning Experiments with Comet.ml - Apr 21, 2021.
At Uber, where ML is fundamental to most products, a mechanism to manage offline experiments easily is needed to improve developer velocity. To solve for this, Uber AI was looking for a solution that will potentially complement and extend its in-house experiment management and collaboration capabilities.
- Computer Vision at Scale With Dask And PyTorch - Nov 23, 2020.
A tutorial on conducting image classification inference using the Resnet50 deep learning model at scale with using GPU clusters on Saturn Cloud. The results were: 40x faster computer vision that made a 3+ hour PyTorch model run in just 5 minutes.
- Compute Goes Brrr: Revisiting Sutton’s Bitter Lesson for AI - Nov 19, 2020.
"It's just about having more compute." Wait, is that really all there is to AI? As Richard Sutton's 'bitter lesson' sinks in for more AI researchers, a debate has stirred that considers a potentially more subtle relationship between advancements in AI based on ever-more-clever algorithms and massively scaled computational power.
- Microsoft and Google Open Sourced These Frameworks Based on Their Work Scaling Deep Learning Training - Nov 2, 2020.
Google and Microsoft have recently released new frameworks for distributed deep learning training.
- Deploying Secure and Scalable Streamlit Apps on AWS with Docker Swarm, Traefik and Keycloak - Oct 23, 2020.
If you are a data scientist who just wants to get the work done but doesn’t necessarily want to go down the DevOps rabbit hole, this tutorial offers a relatively straightforward deployment solution leveraging Docker Swarm and Traefik, with an option of adding user authentication with Keycloak.
- 5 Challenges to Scaling Machine Learning Models - Oct 7, 2020.
ML models are hard to be translated into active business gains. In order to understand the common pitfalls in productionizing ML models, let’s dive into the top 5 challenges that organizations face.
- LinkedIn’s Pro-ML Architecture Summarizes Best Practices for Building Machine Learning at Scale - Sep 23, 2020.
The reference architecture is powering mission critical machine learning workflows within LinkedIn.
- Here’s what you need to look for in a model server to build ML-powered services - Sep 15, 2020.
More applications are being infused with machine learning while MLOps processes and best practices are becoming well established. Critical to these software and systems are the servers that run the models, which should feature key capabilities to drive successful enterprise-scale productionizing of machine learning.
- Showcasing the Benefits of Software Optimizations for AI Workloads on Intel® Xeon® Scalable Platforms - Sep 1, 2020.
The focus of this blog is to bring to light that continued software optimizations can boost performance not only for the latest platforms, but also for the current install base from prior generations. This means customers can continue to extract value from their current platform investments.
- Scaling Computer Vision Models with Dataflow - Jul 31, 2020.
Scaling Machine Learning models is hard and expensive. We will shortly introduce the Google Cloud service Dataflow, and how it can be used to run predictions on millions of images in a serverless way.
- Is depth useful for self-attention? - Jul 27, 2020.
Learn about recent research that is the first to explain a surprising phenomenon where in BERT/Transformer-like architectures, deepening the network does not seem to be better than widening (or, increasing the representation dimension). This empirical observation is in contrast to a fundamental premise in deep learning.
- Scale sensitive data science and analytics with confidence - Jul 16, 2020.
Listen to this on-demand webinar and hear how WorldQuant Predictive derives insights from building models on sensitive data while maximizing value and minimizing risk.
- The Bitter Lesson of Machine Learning - Jul 15, 2020.
Since that renowned conference at Dartmouth College in 1956, AI research has experienced many crests and troughs of progress through the years. From the many lessons learned during this time, some have needed to be re-learned -- repeatedly -- and the most important of which has also been the most difficult to accept by many researchers.
- Some Things Uber Learned from Running Machine Learning at Scale - Jul 7, 2020.
Uber machine learning runtime Michelangelo has been in operation for a few years. What has the Uber team learned?
- Evaluating Ray: Distributed Python for Massive Scalability - Mar 25, 2020.
If your team has started using Ray and you’re wondering what it is, this post is for you. If you’re wondering if Ray should be part of your technical strategy for Python-based applications, especially ML and AI, this post is for you.
- Scaling Your Data Strategy - Mar 17, 2020.
This article presents a particular vision for a cohesive data strategy for addressing large-scale problems with data-driven solutions, based on prior professional experiences.
- Uber Unveils a New Service for Backtesting Machine Learning Models at Scale - Mar 2, 2020.
The transportation giant built a new service and architecture for backtesting forecasting models.
- Large Scale Adversarial Representation Learning - Feb 7, 2020.
GANs can be used for unsupervised learning where a generator maps latent samples to generate data, but this framework does not include an inverse mapping from data to latent representation. BiGAN adds an encoder E to the standard generator-discriminator GAN architecture — the encoder takes input data x and outputs a latent representation z of the input.
- Uber Has Been Quietly Assembling One of the Most Impressive Open Source Deep Learning Stacks in the Market - Jan 27, 2020.
Many of the technologies used by Uber teams have been open sourced and received accolades from the machine learning community. Let’s look at some of my favorites.
- Accuracy vs Speed – what Data Scientists can learn from Search - Jan 2, 2020.
Delivering accurate insights is the core function of any data scientist. Navigating the development road toward this goal can sometimes be tricky, especially when cross-collaboration is required, and these lessons learned from building a search application will help you negotiate the demands between accuracy and speed.
- Scalable graph machine learning: a mountain we can climb? - Dec 10, 2019.
Graph machine learning is a developing area of research that brings many complexities. One challenge that both fascinates and infuriates those working with graph algorithms is — scalability. We take a close look at scalability for graph machine learning methods covering what it is, what makes it difficult, and an example of a method that tackles it head-on.
- Monitoring Models at Scale - Nov 7, 2019.
Catch this Domino webinar on monitoring models at scale, Dec 11 @ 10am PT, covering detecting changes in pattern of real-world data your models are seeing in production, tracking how model accuracy and other quality metrics are changing over time, and getting alerted when health checks fail so that resolution workflows can be triggered.
- Scaling a Massive State-of-the-art Deep Learning Model in Production - Jul 15, 2019.
A new NLP text writing app based on OpenAI's GPT-2 aims to write with you -- whenever you ask. Find out how the developers setup and deployed their model into production from an engineer working on the team.
- KDnuggets™ News 19:n23, Jun 19: Useful Stats for Data Scientists; Python, TensorFlow & R Winners in Latest Job Report - Jun 19, 2019.
This week on KDnuggets: 5 Useful Statistics Data Scientists Need to Know; Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS; How to Learn Python for Data Science the Right Way; The Machine Learning Puzzle, Explained; Scalable Python Code with Pandas UDFs; and much more!
- Why organizations fail in scaling AI and Machine Learning - May 29, 2019.
We explain why AI needs to understand business processes and how the business processes need to be able to change to bring insight from AI into the process.
- Most impactful AI trends of 2018: The rise of ML Engineering - Mar 1, 2019.
As both research and applied teams are doubling down on their engineering and infrastructure needs, the nascent field of ML Engineering will build upon 2018’s foundation and truly blossom in 2019.
- How to Engineer Your Way Out of Slow Models - Nov 27, 2018.
We describe how we handle performance issues with our deep learning models, including how to find subgraphs that take a lot of calculation time and how to extract these into a caching mechanism.
- One-Click Machine Learning Deployments with Anaconda Enterprise - Aug 20, 2018.
With Anaconda Enterprise, your organization can develop, govern, and automate machine learning pipelines, while scaling with ease.
- Deep Learning and Challenges of Scale Webinar - Jul 9, 2018.
Join Nvidia for an on-demand webinar to learn how to tackle the challenges of scaling and building complex deep learning systems.
- ebook: A Guide to Data Science at Scale - Jun 12, 2018.
Read our eBook to learn how easy it is to build and scale ML models with a unified analytics platform, how to collaborate across data teams to uncover insights faster, and more. Free download.
- Deep learning scaling is predictable, empirically - May 10, 2018.
This study starts with a simple question: “how can we improve the state of the art in deep learning?”
- Best Practices for Scaling Data Science Across the Organization - May 7, 2018.
Join Forrester and Anaconda for a webinar on Thursday, May 17, at 2:00 PM CT, to learn best practices for scaling data science across your entire organization. Learn how to tackle five key challenges facing organizations today!
- To SQL or not To SQL: that is the question! - May 7, 2018.
This article looks at the emergence of the NoSQL movement and compares it to a traditional relational database.
- Unlock the Next Era of Analytics – AI and Machine Learning at Scale - Apr 12, 2018.
Join us on Apr 19 for an interactive virtual event to hear from a panel of analytic experts as they dispel the myths and dive into the nitty-gritty of how AI and machine learning will impact analytic teams.
- Operational Best Practices for Enterprise Data Science - Jan 24, 2018.
Join Team Anaconda for a live webinar, Jan 30, 2pm CT, as we tackle the four main concerns we hear from our customers and show you best practices for managing enterprise data science: scalability, security, integration, and governance.
- Analytic Creation to Production: Bridging The Chasm, Webinar, Dec 7 - Dec 1, 2017.
Understand best practices for optimizing the handoff from analytic team to IT across your business as a core competency, how to create scalable peak model performance, and more.
- The Guts and Glory of Data Science - Nov 6, 2017.
Are you a data science leader, or aspiring to be one? Learn how industry leaders manage their data science initiatives as core capabilities that drive their company’s strategic objectives.
- Big Data Architecture: A Complete and Detailed Overview - Sep 19, 2017.
Data scientists may not be as educated or experienced in computer science, programming concepts, devops, site reliability engineering, non-functional requirements, software solution infrastructure, or general software architecture as compared to well-trained or experienced software architects and engineers.
- The Internet of Things in the Cloud - May 11, 2017.
Cloud computing is the next evolutionary step in Internet-based computing, which provides the means for delivering ICT resources as a service. Internet-of-Things can benefit from the scalability, performance and pay-as-you-go nature of cloud computing infrastructures.
- RCloud – DevOps for Data Science - Nov 28, 2016.
After almost two decades of software development, term – DevOps was coined and officially given importance to collaboration between development and deployment of software systems. In this early stage of Data Science field, use of standardized and empirical practises like DevOps will definitely speed up its evolution.