Search results for s3

Found 343 documents, 5930 searched:

2024 Reading List: 5 Essential Reads on Artificial Intelligence
Transform your understanding of current and future tech with these top 5 AI reads to explore the minds shaping our future.
https://www.kdnuggets.com/2024-reading-list-5-essential-reads-on-artificial-intelligence
Natural Language Processing: Bridging Human Communication with AI
The post highlights real-world examples of NLP use cases across industries. It also covers NLP's objectives, challenges, and latest research developments.
https://www.kdnuggets.com/natural-language-processing-bridging-human-communication-with-ai
Turn Your Laptop Into a Personal Analytics Engine with DuckDB and MotherDuck
Bring the powerful tools to your laptop.
https://www.kdnuggets.com/turn-your-laptop-into-a-personal-analytics-engine-with-duckdb-and-motherduck
Using Lightning AI Studio For Free
Use Lightning AI Cloud IDE for free to experiment, train, and deploy your AI models.
https://www.kdnuggets.com/using-lightning-ai-studio-for-free
The Top 5 Alternatives to GitHub for Data Science Projects
The blog discusses five platforms designed for data scientists with specialized capabilities in managing large datasets, models, workflows, and collaboration beyond what GitHub offers.
https://www.kdnuggets.com/the-top-5-alternatives-to-github-for-data-science-projects
A Comprehensive List of Resources to Master Large Language Models
Large Language Models (LLMs) have now become an integral part of various applications. This article provides an extensive list of resources for anyone interested to dive into the world of LLMs.
https://www.kdnuggets.com/a-comprehensive-list-of-resources-to-master-large-language-models
Optimizing Data Analytics: Integrating GitHub Copilot in Databricks
Integrating AI-powered pair programming tools for data analytics in Databricks optimizes and streamlines the development process, freeing up developer time for innovation.
https://www.kdnuggets.com/optimizing-data-analytics-integrating-github-copilot-in-databricks
Greening AI: 7 Strategies to Make Applications More Sustainable
The article delves into a comprehensive methodology that sheds light on how to accurately estimate the carbon footprint associated with AI applications. It explains the environmental impact of AI, a crucial consideration in today's world.
https://www.kdnuggets.com/greening-ai-7-strategies-to-make-applications-more-sustainable
7 Best Cloud Database Platforms
Cloud databases have made it easier and cheaper to develop enterprise-level applications, offering flexibility, convenience, and standard database functionality. See what KDnuggets recommends.
https://www.kdnuggets.com/7-best-cloud-database-platforms
7 Steps to Mastering Large Language Models (LLMs)
Large Language Models (LLMs) have unlocked a new era in natural language processing. So why not learn more about them? Go from learning what large language models are to building and deploying LLM apps in 7 easy steps with this guide.
https://www.kdnuggets.com/7-steps-to-mastering-large-language-models-llms
Best Practices for Building ETLs for ML
This article talks about several best practices for writing ETLs for building training datasets. It delves into several software engineering techniques and patterns applied to ML.
https://www.kdnuggets.com/best-practices-for-building-etls-for-ml
The Top 5 Data Management Tools For Your Projects
See what KDnuggets is recommending for the top 5 cutting-edge tools for cloud, ETL, transformation, master data management, and visualization.
https://www.kdnuggets.com/top-5-data-management-tools-for-your-projects
Deploying Your Machine Learning Model to Production in the Cloud
Learn a simple way to have a live model hosted on AWS.
https://www.kdnuggets.com/deploying-your-ml-model-to-production-in-the-cloud
Don’t Miss Out! Enroll in FREE Courses Before 2023 Ends
Complete the last quarter of the year and improve your skills to get you kickstarted for 2024’s self-development plan with these FREE courses.
https://www.kdnuggets.com/dont-miss-out-enroll-in-free-courses-before-2023-ends
KDnuggets Survey: Benchmark With Your Peers On Data Science Spend & Trends 2023 H2
KDnuggets, along with The All Things Insights Survey Committee and its partners, have created a Spend & Trends survey to provide you and your colleagues in our community with much needed benchmarking information on mindset and focus trends as well as budget and technology spend.
https://www.kdnuggets.com/kdnuggets-survey-benchmark-peers-data-science-spends-trends
Working with Big Data: Tools and Techniques
Where do you start in a field as vast as big data? Which tools and techniques to use? We explore this and talk about the most common tools in big data.
https://www.kdnuggets.com/working-with-big-data-tools-and-techniques
Who Will Make Money from the Generative AI Gold Rush?
Buckle up for the Generative AI gold rush! Will BigTech rule with its picks and shovels? Which startups will strike it rich? Will “copilot for X” be the business strategy to hit pay dirt? How can startups dig moats to keep out other prospectors? And will the US once again have the richest gold seams?
https://www.kdnuggets.com/2023/08/make-money-generative-ai-gold-rush.html
How to Ace Data Scientist Professional Certificate Exam
Gain insights into the certification process and expert tips for passing the certificate exam.
https://www.kdnuggets.com/2023/08/ace-data-scientist-professional-certificate.html
The Best Courses for AI from Universities with YouTube Playlists
Kickstart a new career or develop your current one with these YouTube playlists by trusted Universities!.
https://www.kdnuggets.com/2023/08/best-courses-ai-universities-youtube-playlists.html
A Comprehensive Guide to MLOps
Machine Learning Operations (MLOps) is a relatively new discipline that provides the structure and support necessary for machine learning (ML) models to thrive in production environments.
https://www.kdnuggets.com/2023/08/comprehensive-guide-mlops.html
Mastering GPUs: A Beginner’s Guide to GPU-Accelerated DataFrames in Python
RAPIDS cuDF, with its pandas-like API, enables data scientists and engineers to quickly tap into the immense potential of parallel computing on GPUs–with just a few code line changes. Read on for more.
https://www.kdnuggets.com/2023/07/mastering-gpus-beginners-guide-gpu-accelerated-dataframes-python.html
How to Build a Streaming Semi-structured Analytics Platform on Snowflake
Building a datalake for semi-structured data or json has always been challenging. Imagine if the json documents are streaming or continuously flowing from healthcare vendors then we need a robust modern architecture that can deal with such a high volume. At the same time analytics layer also needs to be created so as to generate value from it.
https://www.kdnuggets.com/2023/07/build-streaming-semistructured-analytics-platform-snowflake.html
How to Optimize SQL Queries for Faster Data Retrieval
Today, we’ll talk about why SQL query optimization is important and which techniques can be used to optimize it.
https://www.kdnuggets.com/2023/06/optimize-sql-queries-faster-data-retrieval.html
Advanced Feature Selection Techniques for Machine Learning Models
Mastering Feature Selection: An Exploration of Advanced Techniques for Supervised and Unsupervised Machine Learning Models.
https://www.kdnuggets.com/2023/06/advanced-feature-selection-techniques-machine-learning-models.html
RedPajama Project: An Open-Source Initiative to Democratizing LLMs
Leading project to Empower the Community through Accessible Large Language Models.
https://www.kdnuggets.com/2023/06/redpajama-project-opensource-initiative-democratizing-llms.html
Building and Training Your First Neural Network with TensorFlow and Keras
Learn how to build and train your first Image Classification model with Keras and TensorFlow using Convolutional Neural Network.
https://www.kdnuggets.com/2023/05/building-training-first-neural-network-tensorflow-keras.html
Schedule & Run ETLs with Jupysql and GitHub Actions
This blog provided you with a comprehensive overview of ETL and JupySQL, including a brief introduction to ETLs and JupySQL. We also demonstrated how to schedule an example ETL notebook via GitHub actions, which allows you to automate the process of executing ETLs and JupySQL from Jupyter.
https://www.kdnuggets.com/2023/05/schedule-run-etls-jupysql-github-actions.html
Fine-Tuning OpenAI Language Models with Noisily Labeled Data
Reduce LLM prediction error by 37% via data-centric AI.
https://www.kdnuggets.com/2023/04/finetuning-openai-language-models-noisily-labeled-data.html
11 Best Practices of Cloud and Data Migration to AWS Cloud
list of Best Practices compiled from our learnings during our migration journey to the AWS cloud.
https://www.kdnuggets.com/2023/04/11-best-practices-cloud-data-migration-aws-cloud.html
8 Open-Source Alternative to ChatGPT and Bard
Discover the widely-used open-source frameworks and models for creating your ChatGPT like chatbots, integrating LLMs, or launching your AI product.
https://www.kdnuggets.com/2023/04/8-opensource-alternative-chatgpt-bard.html
Top Free Courses on Large Language Models
Interested in learning how ChatGPT and other AI chatbots work under the hood? Look no further. Check out these free courses and resources on large language models from Stanford, Princeton, ETH, and more.
https://www.kdnuggets.com/2023/03/top-free-courses-large-language-models.html
5 Data Analysis Projects For Beginners
Are you a data analyst newbie looking to boost your resume to land your first job? If yes, then up your game as a beginner with these 5 projects that you can’t afford to miss.
https://www.kdnuggets.com/2023/02/5-data-analysis-projects-beginners.html
Learn Data Engineering From These GitHub Repositories
Kickstart your Data Engineering career with these curated GitHub repositories.
https://www.kdnuggets.com/2023/02/learn-data-engineering-github-repositories.html
KDnuggets Survey: Benchmark with your peers on industry spend and trends
KDnuggets and its partners have just released a Spend & Trends survey to provide you the opportunity to benchmark with your peers on how folks are spending and the mindsets around current trends.
https://www.kdnuggets.com/2023/02/kdnuggets-survey-industry-spend-trends.html
Scaling Data Management Through Apache Gobblin
Software companies can manage big data at a hyper-scale on different infrastructure stacks using Apache Gobblin.
https://www.kdnuggets.com/2023/01/scaling-data-management-apache-gobblin.html
Overcome Your Data Quality Issues with Great Expectations
Bad data costs organizations money, reputation, and time. Hence it is very important to monitor and validate data quality continuously.
https://www.kdnuggets.com/2023/01/overcome-data-quality-issues-great-expectations.html
Beginner’s Guide to Cloud Computing
Learn how cloud computing works, different types of models, top cloud platforms, and applications.
https://www.kdnuggets.com/2023/01/beginner-guide-cloud-computing.html
7 Super Cheat Sheets You Need To Ace Machine Learning Interview
Revise the concepts of machine learning algorithms, frameworks, and methodologies to ace the technical interview round.
https://www.kdnuggets.com/2022/12/7-super-cheat-sheets-need-ace-machine-learning-interview.html
Top Data Analyst Certification Courses for 2022
Top certification courses by IBM, Edureka, DataCamp, Udacity, and Google.
https://www.kdnuggets.com/2022/11/top-data-analyst-certification-courses-2022.html
Top 10 MLOps Tools to Optimize & Manage Machine Learning Lifecycle
As more businesses experiment with data, they realize that developing a machine learning (ML) model is only one of many steps in the ML lifecycle.
https://www.kdnuggets.com/2022/10/top-10-mlops-tools-optimize-manage-machine-learning-lifecycle.html
Is OLAP Dead?
OLAP enables citizen analysts to quickly, efficiently, and cost-effectively uncover new business insights at a reduced time-to-value.
https://www.kdnuggets.com/2022/10/olap-dead.html
Essential Books You Need to Become a Data Engineer
In this article, I will go through the roadmap of books you need to become a Data Engineer.
https://www.kdnuggets.com/2022/10/essential-books-need-become-data-engineer.html
10 Cheat Sheets You Need To Ace Data Science Interview
The only cheat you need for a job interview and data professional life. It includes SQL, web scraping, statistics, data wrangling and visualization, business intelligence, machine learning, deep learning, NLP, and super cheat sheets.
https://www.kdnuggets.com/2022/10/10-cheat-sheets-need-ace-data-science-interview.html
Free Algorithms in Python Course
Algorithms are an often misunderstood concept. Leverage Python to learn what algorithms really are, and how to implement an array of basic computational algorithms in the language.
https://www.kdnuggets.com/2022/09/free-algorithms-python-course.html
Python String Processing Cheatsheet
Try this string processing primer cheatsheet to gain an understanding of using Python to manipulate and process strings at a basic level.
https://www.kdnuggets.com/2020/01/python-string-processing-primer.html
Generate Synthetic Time-series Data with Open-source Tools
An introduction to the generative adversarial network model DoppelGANger, and how you can use a new open-source PyTorch implementation of it to create high-quality synthetic time-series data.
https://www.kdnuggets.com/2022/06/generate-synthetic-timeseries-data-opensource-tools.html
Top Data Science Podcasts for 2022
Here are some data science related podcasts to help you either grow your interest in the field, increase your current knowledge, or help you develop yourself.
https://www.kdnuggets.com/2022/06/top-data-science-podcasts-2022.html
How To Structure a Data Science Project: A Step-by-Step Guide
Check out all the necessary steps to successfully structure your data science projects leveraging data science templates.
https://www.kdnuggets.com/2022/05/structure-data-science-project-stepbystep-guide.html
MLOps Is a Mess But That’s to be Expected
In this post, I want to focus the discussion about the state of machine learning operations (MLOps) today, where we are, where we are going.
https://www.kdnuggets.com/2022/03/mlops-mess-expected.html
A New Way of Managing Deep Learning Datasets
Create, version-control, query, and visualize image, audio, and video datasets using Hub 2.0 by Activeloop.
https://www.kdnuggets.com/2022/03/new-way-managing-deep-learning-datasets.html
Feature Stores for Real-time AI & Machine Learning
Real-time AI/ML is on the rise and feature stores are key to successfully deploying them. Read on to see how the choice of online store and the feature store architecture play important roles in determining its performance and cost.
https://www.kdnuggets.com/2022/03/feature-stores-realtime-ai-machine-learning.html
Top 7 YouTube Courses on Data Analytics
Learn data analytics by taking the best YouTube courses. These courses will cover data analysis with Python, R, SQL, PowerBI, Tableau, Excel, and SPSS.
https://www.kdnuggets.com/2022/02/top-7-youtube-courses-data-analytics.html
Cloud Storage Adoption is the Need of the Hour for Business
The rush towards cloud storage means that the cloud has to offer a valuable proposition to businesses. Let’s explore why businesses regardless of their size should consider moving to the cloud.
https://www.kdnuggets.com/2022/02/cloud-storage-adoption-need-hour-business.html
Orchestrate a Data Science Project in Python With Prefect
Learn how to optimize your data science workflow in a few lines of code.
https://www.kdnuggets.com/2022/02/orchestrate-data-science-project-python-prefect.html
The Complete Collection of Data Science Cheat Sheets – Part 2
A collection of cheat sheets that will help you prepare for a technical interview on Data Structures & Algorithms, Machine learning, Deep Learning, Natural Language Processing, Data Engineering, Web Frameworks.
https://www.kdnuggets.com/2022/02/complete-collection-data-science-cheat-sheets-part-2.html
From Oracle to Databases for AI: The Evolution of Data Storage
From Oracle, to NoSQL databases, and beyond, read about data management solutions from the early days of the RBDMS to those supporting AI applications.
https://www.kdnuggets.com/2022/02/oracle-databases-ai-evolution-data-storage.html
The Complete Collection of Data Science Cheat Sheets – Part 1
A collection of cheat sheets that will help you prepare for a technical interview, assessment tests, class presentation, and help you revise core data science concepts.
https://www.kdnuggets.com/2022/02/complete-collection-data-science-cheat-sheets-part-1.html
How to Set Up Your Data Science Stack on a Budget
Whether you’re working independently or setting up a stack for a company, you need an affordable stack option. Here’s how you can set up your stack without spending too much.
https://www.kdnuggets.com/2022/01/data-science-stack-budget.html
6 Data Science Technologies You Need to Build Your Supply Chain Pipeline
Here are some of the data science technologies needed to build a comprehensive and smooth supply chain pipeline.
https://www.kdnuggets.com/2022/01/6-data-science-technologies-need-build-supply-chain-pipeline.html
How to Process a DataFrame with Millions of Rows in Seconds
TLDR; process it with a new Python Data Processing Engine in the Cloud.
https://www.kdnuggets.com/2022/01/process-dataframe-millions-rows-seconds.html
Using Datawig, an AWS Deep Learning Library for Missing Value Imputation
A lot of missing values in the dataset can affect the quality of prediction in the long run. Several methods can be used to fill the missing values and Datawig is one of the most efficient ones.
https://www.kdnuggets.com/2021/12/datawig-aws-deep-learning-library-missing-value-imputation.html
A Beginner’s Guide to End to End Machine Learning
Learn to train, tune, deploy and monitor machine learning models.
https://www.kdnuggets.com/2021/12/beginner-guide-end-end-machine-learning.html
What Comes After HDF5? Seeking a Data Storage Format for Deep Learning
In this article we are discussing that HDF5 is one of the most popular and reliable formats for non-tabular, numerical data. But this format is not optimized for deep learning work. This article suggests what kind of ML native data format should be to truly serve the needs of modern data scientists.
https://www.kdnuggets.com/2021/11/after-hdf5-data-storage-format-deep-learning.html
Design Patterns for Machine Learning Pipelines">Design Patterns for Machine Learning Pipelines
ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.
https://www.kdnuggets.com/2021/11/design-patterns-machine-learning-pipelines.html
Advanced PyTorch Lightning with TorchMetrics and Lightning Flash
In this tutorial we will be diving deeper into two additional tools you should be using: TorchMetrics and Lightning Flash. TorchMetrics unsurprisingly provides a modular approach to define and track useful metrics across batches and devices, while Lightning Flash offers a suite of functionality facilitating more efficient transfer learning and data handling, and a recipe book of state-of-the-art approaches to typical deep learning problems.
https://www.kdnuggets.com/2021/11/advanced-pytorch-lightning-torchmetrics-lightning-flash.html
Is the Modern Data Stack Leaving You Behind?
The modern data stack narrative is largely dominated by analytics engineering. Where does that leave data engineers? Discover the difference between the MDS for data engineers & analytics engineers.
https://www.kdnuggets.com/2021/11/modern-data-stack-leaving-behind.html
ETL and ELT: A Guide and Market Analysis
ETL and related techniques remain a powerful and foundational tool in the data industry. We explain what ETL is and how ETL and ELT processes have evolved over the years, with a close eye toward how third-generation ETL tools are about to disrupt standard data processing practices.
https://www.kdnuggets.com/2021/10/etl-elt-guide-market-analysis.html
Deploying Serverless spaCy Transformer Model with AWS Lambda
A step-by-step guide on how to deploy NER transformer model serverless.
https://www.kdnuggets.com/2021/10/deploying-serverless-spacy-transformer-model-aws-lambda.html
Training BPE, WordPiece, and Unigram Tokenizers from Scratch using Hugging Face
Comparing the tokens generated by SOTA tokenization algorithms using Hugging Face's tokenizers package.
https://www.kdnuggets.com/2021/10/bpe-wordpiece-unigram-tokenizers-using-hugging-face.html
Serving ML Models in Production: Common Patterns
Over the past couple years, we've seen 4 common patterns of machine learning in production: pipeline, ensemble, business logic, and online learning. In the ML serving space, implementing these patterns typically involves a tradeoff between ease of development and production readiness. Ray Serve was built to support these patterns by being both easy to develop and production ready.
https://www.kdnuggets.com/2021/10/serving-ml-models-production-common-patterns.html
How our Obsession with Algorithms Broke Computer Vision: And how Synthetic Computer Vision can fix it
Deep Learning radically improved Machine Learning as a whole. The Data-Centric revolution is about to do the same. In this post, we’ll take a look at the pitfalls of mainstream Computer Vision (CV) and discuss why Synthetic Computer Vision (SCV) is the future.
https://www.kdnuggets.com/2021/10/obsession-algorithms-broke-computer-vision.html
Amazon Web Services Webinar: Leverage data sets to create a customer-centric strategy and improve business outcomes
Register now for this webinar, Oct 28, to learn how using third-party data enhances applications to better prioritize your target customer - helping you build a more customer-centric business.
https://www.kdnuggets.com/2021/10/roidna-aws-webinar-customer-centric-strategy.html
AutoML: An Introduction Using Auto-Sklearn and Auto-PyTorch
AutoML is a broad category of techniques and tools for applying automated search to your automated search and learning to your learning. In addition to Auto-Sklearn, the Freiburg-Hannover AutoML group has also developed an Auto-PyTorch library. We’ll use both of these as our entry point into AutoML in the following simple tutorial.
https://www.kdnuggets.com/2021/10/automl-introduction-auto-sklearn-auto-pytorch.html
The Evolution of Tokenization – Byte Pair Encoding in NLP
Though we have SOTA algorithms for tokenization, it's always a good practice to understand the evolution trail and learning how have we reached here. Read this introduction to Byte Pair Encoding.
https://www.kdnuggets.com/2021/10/evolution-tokenization-byte-pair-encoding-nlp.html
Surpassing Trillion Parameters and GPT-3 with Switch Transformers – a path to AGI?">Surpassing Trillion Parameters and GPT-3 with Switch Transformers – a path to AGI?
Ever larger models churning on increasingly faster machines suggest a potential path toward smarter AI, such as with the massive GPT-3 language model. However, new, more lean, approaches are being conceived and explored that may rival these super-models, which could lead to a future with more efficient implementations of advanced AI-driven systems.
https://www.kdnuggets.com/2021/10/trillion-parameters-gpt-3-switch-transformers-path-agi.html
Important Statistics Data Scientists Need to Know
Several fundamental statistical concepts must be well appreciated by every data scientist -- from the enthusiast to the professional. Here, we provide code snippets in Python to increase understanding to bring you key tools that bring early insight into your data.
https://www.kdnuggets.com/2021/09/important-statistics-data-scientists.html
Path to Full Stack Data Science">Path to Full Stack Data Science
Start your journey toward mastering all aspects of the field of Data Science with this focused list of in-depth self-learning resources. Curated with the beginner in mind, these recommendations will help you learn efficiently, and can also offer existing professionals useful highlights for review or help filling in any gaps in skills.
https://www.kdnuggets.com/2021/09/path-full-stack-data-science.html
Adventures in MLOps with Github Actions, Iterative.ai, Label Studio and NBDEV
This article documents the authors' experience building their custom MLOps approach.
https://www.kdnuggets.com/2021/09/adventures-mlops-github-actions-iterative-ai-label-studio-and-nbdev.html
The Machine & Deep Learning Compendium Open Book">The Machine & Deep Learning Compendium Open Book
After years in the making, this extensive and comprehensive ebook resource is now available and open for data scientists and ML engineers. Learn from and contribute to this tome of valuable information to support all your work in data science from engineering to strategy to management.
https://www.kdnuggets.com/2021/09/machine-deep-learning-open-book.html
Amazon Web Services Webinar: Boost customer satisfaction and sales with consumer insights data
Join this webinar, Sep 27, to learn how to leverage external data to understand market needs and consumer behavior – helping you build a more customer-centric business.
https://www.kdnuggets.com/2021/09/roidna-aws-webinar-consumer-insights-data.html
Build a synthetic data pipeline using Gretel and Apache Airflow
In this blog post, we build an ETL pipeline that generates synthetic data from a PostgreSQL database using Gretel’s Synthetic Data APIs and Apache Airflow.
https://www.kdnuggets.com/2021/09/build-synthetic-data-pipeline-gretel-apache-airflow.html
Best Resources to Learn Natural Language Processing in 2021
In this article, the author has listed listed all the best resources to learn natural language processing including Online Courses, Tutorials, Books, and YouTube Videos.
https://www.kdnuggets.com/2021/09/best-resources-learn-natural-language-processing-2021.html
CSV Files for Storage? No Thanks. There’s a Better Option
Saving data to CSV’s is costing you both money and disk space. It’s time to end it.
https://www.kdnuggets.com/2021/08/csv-files-storage-better-option.html
Open Source Datasets for Computer Vision">Open Source Datasets for Computer Vision
Access to high-quality, noise-free, large-scale datasets is crucial for training complex deep neural network models for computer vision applications. Many open-source datasets are developed for use in image classification, pose estimation, image captioning, autonomous driving, and object segmentation. These datasets must be paired with the appropriate hardware and benchmarking strategies to optimize performance.
https://www.kdnuggets.com/2021/08/open-source-datasets-computer-vision.html
Writing Your First Distributed Python Application with Ray
Using Ray, you can take Python code that runs sequentially and transform it into a distributed application with minimal code changes. Read on to find out why you should use Ray, and how to get started.
https://www.kdnuggets.com/2021/08/distributed-python-application-ray.html
Development & Testing of ETL Pipelines for AWS Locally
Typically, development and testing ETL pipelines is done on real environment/clusters which is time consuming to setup & requires maintenance. This article focuses on the development and testing of ETL pipelines locally with the help of Docker & LocalStack. The solution gives flexibility to test in a local environment without setting up any services on the cloud.
https://www.kdnuggets.com/2021/08/development-testing-etl-pipelines-aws-locally.html
dbt for Data Transformation – Hands-on Tutorial
The data build tool (dbt) is gaining in popularity and use, and this hands-on tutorial covers creating complex models, using variables and functions, running tests, generating docs, and many more features.
https://www.kdnuggets.com/2021/07/dbt-data-transformation-tutorial.html
Not Only for Deep Learning: How GPUs Accelerate Data Science & Data Analytics">Not Only for Deep Learning: How GPUs Accelerate Data Science & Data Analytics
Modern AI/ML systems’ success has been critically dependent on their ability to process massive amounts of raw data in a parallel fashion using task-optimized hardware. Can we leverage the power of GPU and distributed computing for regular data processing jobs too?
https://www.kdnuggets.com/2021/07/deep-learning-gpu-accelerate-data-science-data-analytics.html
Why and how should you learn “Productive Data Science”?">Why and how should you learn “Productive Data Science”?
What is Productive Data Science and what are some of its components?
https://www.kdnuggets.com/2021/07/learn-productive-data-science.html
How to Use Kafka Connect to Create an Open Source Data Pipeline for Processing Real-Time Data
This article shows you how to create a real-time data pipeline using only pure open source technologies. These include Kafka Connect, Apache Kafka, Kibana and more.
https://www.kdnuggets.com/2021/07/kafka-open-source-data-pipeline-processing-real-time-data.html
When to Retrain an Machine Learning Model? Run these 5 checks to decide on the schedule
Machine learning models degrade with time, and need to be regularly updated. In the article, we suggest how to approach retraining and plan for it in advance.
https://www.kdnuggets.com/2021/07/retrain-machine-learning-model-5-checks-decide-schedule.html
Geometric foundations of Deep Learning">Geometric foundations of Deep Learning
Geometric Deep Learning is an attempt for geometric unification of a broad class of machine learning problems from the perspectives of symmetry and invariance. These principles not only underlie the breakthrough performance of convolutional neural networks and the recent success of graph neural networks but also provide a principled way to construct new types of problem-specific inductive biases.
https://www.kdnuggets.com/2021/07/geometric-foundations-deep-learning.html
A Lightning Fast Look at Single Line Exploratory Data Analysis
Here's a very quick look at how you can perform EDA with a single line of code using D-Tale.
https://www.kdnuggets.com/2021/07/single-line-exploratory-data-analysis.html
Learning Data Science Through Social Media
Want your social media algorithms to show you actual algorithms? Spare a moment during your social media scrolling to learn a bit of data science. Here are suggestions for at-a-glance access to good ideas and tips on your favorite platforms.
https://www.kdnuggets.com/2021/07/learning-data-science-through-social-media.html
Create and Deploy Dashboards using Voila and Saturn Cloud
Working with and training large datasets, maintaining them all in one place, and deploying them to production is a challenging job. In this article, we covered what Saturn Cloud is and how it can speed up your end-to-end pipeline, how to create dashboards using Voila and Python and publish them to production in just a few easy steps.
https://www.kdnuggets.com/2021/06/create-deploy-dashboards-voila-saturn-cloud.html
An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM)
Understanding why your AI-based models make the decisions they do is crucial for deploying practical solutions in the real-world. Here, we review some techniques in the field of Explainable AI (XAI), why explainability is important, example models of explainable AI using LIME and SHAP, and demonstrate how Explainable Boosting Machines (EBMs) can make explainability even easier.
https://www.kdnuggets.com/2021/06/explainable-ai-xai-explainable-boosting-machines-ebm.html
Facebook Launches One of the Toughest Reinforcement Learning Challenges in History
The FAIR team just launched the NetHack Challenge as part of the upcoming NeurIPS 2021 competition. The objective is to test new RL ideas using a one of the toughest game environments in the world.
https://www.kdnuggets.com/2021/06/facebook-launches-toughest-reinforcement-learning-challenges.html
Get Interactive Plots Directly With Pandas">Get Interactive Plots Directly With Pandas
Telling a story with data is a core function for any Data Scientist, and creating data visualizations that are simultaneously illuminating and appealing can be challenging. This tutorial reviews how to create Plotly and Bokeh plots directly through Pandas plotting syntax, which will help you convert static visualizations into interactive counterparts -- and take your analysis to the next level.
https://www.kdnuggets.com/2021/06/interactive-plots-directly-pandas.html
How I Doubled My Income with Data Science and Machine Learning">How I Doubled My Income with Data Science and Machine Learning
Many career opportunities exist in the ever-expanding domain of data. Finding your place -- and finding your salary -- is largely up to your dedication, focus, and drive to learn. If you are an aspiring Data Scientist or have already started your professional journey, there are multiple strategies for maximizing your earning potential.
https://www.kdnuggets.com/2021/06/double-income-data-science-machine-learning.html
State of Mathematical Optimization Report, 2021
Download your copy of Gurobi's first-ever "State of Mathematical Optimization Report," which is based on data from a survey of commercial mathematical optimization users. Get yours now.
https://www.kdnuggets.com/2021/05/gurobi-state-mathematical-optimization-report-2021.html
Awesome list of datasets in 100+ categories
With an estimated 44 zettabytes of data in existence in our digital world today and approximately 2.5 quintillion bytes of new data generated daily, there is a lot of data out there you could tap into for your data science projects. It's pretty hard to curate through such a massive universe of data, but this collection is a great start. Here, you can find data from cancer genomes to UFO reports, as well as years of air quality data to 200,000 jokes. Dive into this ocean of data to explore as you learn how to apply data science techniques or leverage your expertise to discover something new.
https://www.kdnuggets.com/2021/05/awesome-list-datasets.html
Animated Bar Chart Races in Python
A quick and step-by-step beginners project to create an animation bar graph for an amazing Covid dataset.
https://www.kdnuggets.com/2021/05/animated-race-bar-charts-python.html
Super Charge Python with Pandas on GPUs Using Saturn Cloud
Saturn Cloud is a tool that allows you to have 10 hours of free GPU computing and 3 hours of Dask Cluster computing a month for free. In this tutorial, you will learn how to use these free resources to process data using Pandas on a GPU. The experiments show that Pandas is over 1,000,000% slower on a CPU as compared to running Pandas on a Dask cluster of GPUs.
https://www.kdnuggets.com/2021/05/super-charge-python-pandas-gpus-saturn-cloud.html
Applying Python’s Explode Function to Pandas DataFrames">Applying Python’s Explode Function to Pandas DataFrames
Read this applied Python method to solve the issue of accessing column by date/ year using the Pandas library and functions lambda(), list(), map() & explode().
https://www.kdnuggets.com/2021/05/applying-pythons-explode-function-pandas-dataframes.html
10 Must-Know Statistical Concepts for Data Scientists
Statistics is a building block of data science. If you are working or plan to work in this field, then you will encounter the fundamental concepts reviewed for you here. Certainly, there is much more to learn in statistics, but once you understand these basics, then you can steadily build your way up to advanced topics.
https://www.kdnuggets.com/2021/04/10-statistical-concepts-data-scientists.html
Deep Learning Recommendation Models (DLRM): A Deep Dive
The currency in the 21st century is no longer just data. It's the attention of people. This deep dive article presents the architecture and deployment issues experienced with the deep learning recommendation model, DLRM, which was open-sourced by Facebook in March 2019.
https://www.kdnuggets.com/2021/04/deep-learning-recommendation-models-dlrm-deep-dive.html
Automated Text Classification with EvalML
Learn how EvalML leverages Woodwork, Featuretools and the nlp-primitives library to process text data and create a machine learning model that can detect spam text messages.
https://www.kdnuggets.com/2021/04/automated-text-classification-evalml.html
How to deploy Machine Learning/Deep Learning models to the web">How to deploy Machine Learning/Deep Learning models to the web
The full value of your deep learning models comes from enabling others to use them. Learn how to deploy your model to the web and access it as a REST API, and begin to share the power of your machine learning development with the world.
https://www.kdnuggets.com/2021/04/deploy-machine-learning-models-to-web.html
Top YouTube Machine Learning Channels
These are the top 15 YouTube channels for machine learning as determined by our stated criteria, along with some additional data on the channels to help you decide if they may have some content useful for you.
https://www.kdnuggets.com/2021/03/top-youtube-machine-learning-channels.html
DeepMind’s AlphaFold & the Protein Folding Problem
Recently, DeepMind's AlphaFold made impressive headway in the protein structure prediction problem. Read this for an overview and explanation.
https://www.kdnuggets.com/2021/03/deepmind-alphafold-protein-folding-problem.html
Dask and Pandas: No Such Thing as Too Much Data
Do you love pandas, but don't love it when you reach the limits of your memory or compute resources? Dask provides you with the option to use the pandas API with distributed data and computing. Learn how it works, how to use it, and why it’s worth the switch when you need it most.
https://www.kdnuggets.com/2021/03/dask-pandas-data.html
Machine Learning Systems Design: A Free Stanford Course">Machine Learning Systems Design: A Free Stanford Course
This freely-available course from Stanford should give you a toolkit for designing machine learning systems.
https://www.kdnuggets.com/2021/02/machine-learning-systems-design-free-stanford-course.html
The Difficulty of Graph Anonymisation
Lessons from network science and the difficulty of graph anonymization. A data scientist's take on the difficultly of striking a balance between privacy and utility in anonymizing connected data.
https://www.kdnuggets.com/2021/02/difficulty-graph-anonymisation.html
Powerful Exploratory Data Analysis in just two lines of code">Powerful Exploratory Data Analysis in just two lines of code
EDA is a fundamental early process for any Data Science investigation. Typical approaches for visualization and exploration are powerful, but can be cumbersome for getting to the heart of your data. Now, you can get to know your data much faster with only a few lines of code... and it might even be fun!
https://www.kdnuggets.com/2021/02/powerful-exploratory-data-analysis-sweetviz.html
Feature Store as a Foundation for Machine Learning
With so many organizations now taking the leap into building production-level machine learning models, many lessons learned are coming to light about the supporting infrastructure. For a variety of important types of use cases, maintaining a centralized feature store is essential for higher ROI and faster delivery to market. In this review, the current feature store landscape is described, and you can learn how to architect one into your MLOps pipeline.
https://www.kdnuggets.com/2021/02/feature-store-foundation-machine-learning.html
Past 2021 Meetings / Online Events on AI, Analytics, Big Data, Data Science, and Machine Learning
Past | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec Read more »
https://www.kdnuggets.com/meetings/past-meetings-2021.html
What is Graph Theory, and Why Should You Care?
Go from graph theory to path optimization.
https://www.kdnuggets.com/2021/01/graph-theory-why-care.html
Machine learning is going real-time
Extracting immediate predictions from machine learning algorithms on the spot based on brand-new data can offer a next level of interaction and potential value to its consumers. The infrastructure and tech stack required to implement such real-time systems is also next level, and many organizations -- especially in the US -- seem to be resisting. But, what even is real-time ML, and how can it deliver a better experience?
https://www.kdnuggets.com/2021/01/machine-learning-real-time.html
The Ultimate Scikit-Learn Machine Learning Cheatsheet">The Ultimate Scikit-Learn Machine Learning Cheatsheet
With the power and popularity of the scikit-learn for machine learning in Python, this library is a foundation to any practitioner's toolset. Preview its core methods with this review of predictive modelling, clustering, dimensionality reduction, feature importance, and data transformation.
https://www.kdnuggets.com/2021/01/ultimate-scikit-learn-machine-learning-cheatsheet.html
10 Underappreciated Python Packages for Machine Learning Practitioners">10 Underappreciated Python Packages for Machine Learning Practitioners
Here are 10 underappreciated Python packages covering neural architecture design, calibration, UI creation and dissemination.
https://www.kdnuggets.com/2021/01/10-underappreciated-python-packages-machine-learning-practitioners.html
How to Get a Job as a Data Engineer
Data engineering skills are currently in high demand. If you are looking for career prospects in this fast-growing profession, then these 10 skills and key factors will help you prepare to land an entry-level position in this field.
https://www.kdnuggets.com/2021/01/get-job-as-data-engineer.html
Model Experiments, Tracking and Registration using MLflow on Databricks
This post covers how StreamSets can help expedite operations at some of the most crucial stages of Machine Learning Lifecycle and MLOps, and demonstrates integration with Databricks and MLflow.
https://www.kdnuggets.com/2021/01/model-experiments-tracking-registration-mlflow-databricks.html
Six Tips on Building a Data Science Team at a Small Company
When a company decides that they want to start leveraging their data for the first time, it can be a daunting task. Many businesses aren’t fully aware of all that goes into building a data science department. If you're the data scientist hired to make this happen, we have some tips to help you face the task head-on.
https://www.kdnuggets.com/2021/01/six-tips-building-data-science-team-small-company.html
How to easily check if your Machine Learning model is fair?
Machine learning models deployed today -- as will many more in the future -- impact people and society directly. With that power and influence resting in the hands of Data Scientists and machine learning engineers, taking the time to evaluate and understand if model results are fair will become the linchpin for the future success of AI/ML solutions. These are critical considerations, and using a recently developed fairness module in the dalex Python package is a unified and accessible way to ensure your models remain fair.
https://www.kdnuggets.com/2020/12/machine-learning-model-fair.html
Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance
A practical deep dive on production monitoring architectures for machine learning at scale using real-time metrics, outlier detectors, drift detectors, metrics servers and explainers.
https://www.kdnuggets.com/2020/12/production-machine-learning-monitoring-outliers-drift-explainers-statistical-performance.html
Crack SQL Interviews">Crack SQL Interviews
SQL is an essential programming language for data analysis and processing. So, SQL questions are always part of the interview process for data science-related jobs, including data analysts, data scientists, and data engineers. Become familiar with these common patterns seen in SQL interview questions and follow our tips on how to neatly handle each with SQL queries.
https://www.kdnuggets.com/2020/12/crack-sql-interviews.html
Industry 2021 Predictions for AI, Analytics, Data Science, Machine Learning
We bring you industry predictions from 12 innovative companies - what key trends they expect in 2021 in AI, Analytics, Data Science, and Machine Learning?
https://www.kdnuggets.com/2020/12/industry-2021-predictions-ai-data-science-machine-learning.html
Data Compression via Dimensionality Reduction: 3 Main Methods
Lift the curse of dimensionality by mastering the application of three important techniques that will help you reduce the dimensionality of your data, even if it is not linearly separable.
https://www.kdnuggets.com/2020/12/data-compression-dimensionality-reduction.html
Pruning Machine Learning Models in TensorFlow
Read this overview to learn how to make your models smaller via pruning.
https://www.kdnuggets.com/2020/12/pruning-machine-learning-models-tensorflow.html
14 Data Science projects to improve your skills
There's a lot of data out there and so many data science techniques to master or review. Check out these great project ideas from easy to advanced difficulty levels to develop new skills and strengthen your portfolio.
https://www.kdnuggets.com/2020/12/14-data-science-projects-improve-skills.html

More...12 3 >

Search results for s3

Top Posts