- 5 Data Science Open-source Projects You Should Consider Contributing to - Jun 7, 2021.
As you prepare to interview for a position in data science or are looking to jump to the next level, now is the time to enhance your skills and your resume with by working on rea, open-source projects. Here, we suggest a great selection of projects you can contribute to and help build something awesome, so, all you need to do choose one and tackle it head on.
- How to organize your data science project in 2021 - Apr 19, 2021.
Maintaining proper organization of all your data science projects will increase your productivity, minimize errors, and increase your development efficiency. This tutorial will guide you through a framework on how to keep everything in order on your local machine and in the cloud.
- KDnuggets™ News 21:n04, Jan 27: The Ultimate Scikit-Learn Machine Learning Cheatsheet; Building a Deep Learning Based Reverse Image Search - Jan 27, 2021.
The Ultimate Scikit-Learn Machine Learning Cheatsheet; Building a Deep Learning Based Reverse Image Search; Data Engineering — the Cousin of Data Science, is Troublesome; Going Beyond the Repo: GitHub for Career Growth in AI & Machine Learning; Popular Machine Learning Interview Questions
- Going Beyond the Repo: GitHub for Career Growth in AI & Machine Learning - Jan 21, 2021.
Many online tools and platforms exist to help you establish a clear and persuasive online profile for potential employers to review. Have you considered how your go-to online code repository could also help you land your next job?
- Build a Data Science Portfolio that Stands Out Using These Platforms - Jan 19, 2021.
Making your big break into the data science profession means standing out to potential employers in a crowd of tough competition. An important way to showcase your skills and experience is through the presentation of a portfolio. Following these recommendations for developing your portfolio will help you network effectively and stay on top of an ever-changing field.
- 5 Most Useful Machine Learning Tools every lazy full-stack data scientist should use - Nov 18, 2020.
If you consider yourself a Data Scientist who can take any project from data curation to solution deployment, then you know there are many tools available today to help you get the job done. The trouble is that there are too many choices. Here is a review of five sets of tools that should turn you into the most efficient full-stack data scientist possible.
- Learn to build an end to end data science project - Nov 11, 2020.
Appreciating the process you must work through for any Data Science project is valuable before you land your first job in this field. With a well-honed strategy, such as the one outlined in this example project, you will remain productive and consistently deliver valuable machine learning models.
- 6 Lessons Learned in 6 Months as a Data Scientist - Oct 8, 2020.
When transitioning into a Data Science career, a new mindset toward collaboration, data, and reporting is required. Learn from these recommendations on approaches you should consider to successfully develop into your dream job.
- Getting Started in AI Research - Oct 5, 2020.
A guide on how to contribute to confirming the reproducibility of some of the most recent papers and join open-search research.
- 4 Tools to Speed Up Your Data Science Writing - Sep 9, 2020.
This article covers how you can achieve your writing goals with these 4 tools.
- Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills - Sep 8, 2020.
We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.
- GitHub is the Best AutoML You Will Ever Need - Aug 12, 2020.
This article uses PyCaret 2.0, an open source, low-code machine learning library in Python to develop a simple AutoML solution and deploy it as a Docker container using GitHub actions.
- A Complete guide to Google Colab for Deep Learning - Jun 16, 2020.
Google Colab is a widely popular cloud service for machine learning that features free access to GPU and TPU computing. Follow this detailed guide to help you get up and running fast to develop your next deep learning algorithms with Colab.
- Interactive Machine Learning Experiments - May 26, 2020.
Dive into experimenting with machine learning techniques using this open-source collection of interactive demos built on multilayer perceptrons, convolutional neural networks, and recurrent neural networks. Each package consists of ready-to-try web browser interfaces and fully-developed notebooks for you to fine tune the training for better performance.
- Made With ML: Discover, build, and showcase machine learning projects - Mar 23, 2020.
This is a short introduction to Made With ML, a useful resource for machine learning engineers looking to get ideas for projects to build, and for those looking to share innovative portfolio projects once built.
- The Most Useful Machine Learning Tools of 2020 - Mar 13, 2020.
This articles outlines 5 sets of tools every lazy full-stack data scientist should use.
- Top 5 must-have Data Science skills for 2020 - Jan 8, 2020.
The standard job description for a Data Scientist has long highlighted skills in R, Python, SQL, and Machine Learning. With the field evolving, these core competencies are no longer enough to stay competitive in the job market.
- GitHub Repo Raider and the Automation of Machine Learning - Nov 18, 2019.
Since X never, ever marks the spot, this article raids the GitHub repos in search of quality automated machine learning resources. Read on for projects and papers to help understand and implement AutoML.
- Automatic Version Control for Data Scientists - Sep 24, 2019.
How can you keep your machine learning models and data organized so you can collaborate effectively? Discover this new tool set available for better version control designed for the data scientist workflow.
- Top 10 Statistics Mistakes Made by Data Scientists - Jun 7, 2019.
The following are some of the most common statistics mistakes made by data scientists. Check this list often to make sure you are not making any of these while applying statistics to data science.
- PyViz: Simplifying the Data Visualisation Process in Python - Jun 6, 2019.
There are python libraries suitable for basic data visualizations but not for complicated ones, and there are libraries suitable only for complex visualizations. Is there a single library that handles both these tasks efficiently? The answer is yes. It's PyViz
- How to Automate Tasks on GitHub With Machine Learning for Fun and Profit - May 3, 2019.
Check this tutorial on how to build a GitHub App that predicts and applies issue labels using Tensorflow and public datasets.
- Trending Deep Learning Github Repositories - Feb 1, 2019.
Check these pair of resources for trending and top GitHub deep learning repositories for some new ideas on what to be looking out for.
- Papers with Code: A Fantastic GitHub Resource for Machine Learning - Dec 31, 2018.
Looking for papers with code? If so, this GitHub repository, a clearinghouse for research papers and their corresponding implementation code, is definitely worth checking out.
- Top 10 Python Data Science Libraries - Nov 16, 2018.
The third part of our series investigating the top Python Libraries across Machine Learning, AI, Deep Learning and Data Science.
- Top 13 Python Deep Learning Libraries - Nov 2, 2018.
Part 2 of a new series investigating the top Python Libraries across Machine Learning, AI, Deep Learning and Data Science.
- GitHub Python Data Science Spotlight: High Level Machine Learning & NLP, Ensembles, Command Line Viz & Docker Made Easy - Oct 16, 2018.
This post spotlights 5 data science projects, all of which are open source and are present on GitHub repositories, focusing on high level machine learning libraries and low level support tools.
- Top 8 Python Machine Learning Libraries - Oct 9, 2018.
Part 1 of a new series investigating the top Python Libraries across Machine Learning, AI, Deep Learning and Data Science.
- Visualising Geospatial data with Python using Folium - Sep 27, 2018.
Folium is a powerful data visualization library in Python that was built primarily to help people visualize geospatial data. With Folium, one can create a map of any location in the world if its latitude and longitude values are known. This guide will help you get started.
- Journey to Machine Learning – 100 Days of ML Code - Sep 7, 2018.
A personal account from Machine Learning enthusiast Avik Jain on his experiences of #100DaysOfMLCode, a challenge that encourages beginners to code and study machine learning for at least an hour, every day for 100 days.
- KDnuggets™ News 18:n31, Aug 15: Top 10 roles in AI and data science; Github Data Science Spotlight: Python tools for Machine Learning - Aug 15, 2018.
Also: A Practitioner Guide to NLP; Reinforcement Learning: The Business Use Case; Data Scientist guide for getting started with Docker
- GitHub Python Data Science Spotlight: AutoML, NLP, Visualization, ML Workflows - Aug 8, 2018.
This post includes a wide spectrum of data science projects, all of which are open source and are present on GitHub repositories.
- From Data to Viz: how to select the the right chart for your data - Aug 1, 2018.
We offer an interactive, decision tree-style tool, which examines the data you have and proposes a set of potentially appropriate visualizations to represent your dataset.
- How To Create Natural Language Semantic Search For Arbitrary Objects With Deep Learning - Jun 13, 2018.
An end-to-end example of how to build a system that can search objects semantically.
Pages: 1 2
- ioModel Machine Learning Research Platform – Open Source - Jun 5, 2018.
This article introduces ioModel, an open source research platform that ingests data and automatically generates descriptive statistics on that data.
- GANs in TensorFlow from the Command Line: Creating Your First GitHub Project - May 16, 2018.
In this article I will present the steps to create your first GitHub Project. I will use as an example Generative Adversarial Networks.
- Jupyter Notebook for Beginners: A Tutorial - May 1, 2018.
The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. Although it is possible to use many different programming languages within Jupyter Notebooks, this article will focus on Python as it is the most common use case.
Pages: 1 2
- Top 16 Open Source Deep Learning Libraries and Platforms - Apr 24, 2018.
We bring to you the top 16 open source deep learning libraries and platforms. TensorFlow is out in front as the undisputed number one, with Keras and Caffe completing the top three.
- How I Unknowingly Contributed To Open Source - Apr 24, 2018.
This article explains what is meant by the term 'open source' and why all data scientists should be a part of it.
- How Do I Get My First Data Science Job? - Apr 2, 2018.
Here are the steps you need to obtain your first job in data science, including details on how to create a good portfolio, key networking tips, getting the right education and managing expectations.
- Ranking Popular Distributed Computing Packages for Data Science - Mar 20, 2018.
We examined 140 frameworks and distributed programing packages and came up with a list of top 20 distributed computing packages useful for Data Science, based on a combination of Github, Stack Overflow, and Google results.
- Top 20 Python AI and Machine Learning Open Source Projects - Feb 20, 2018.
We update the top AI and Machine Learning projects in Python. Tensorflow has moved to the first place with triple-digit growth in contributors. Scikit-learn dropped to 2nd place, but still has a very large base of contributors.
- Building a Daily Bitcoin Price Tracker with Coindeskr and Shiny in R - Feb 7, 2018.
This tutorial is to help an R user build his/her own Daily Bitcoin Price Tracker using three packages, Coindeskr, Shiny and Dygraphs.
- Natural Language Processing Library for Apache Spark – free to use - Nov 28, 2017.
Introducing the Natural Language Processing Library for Apache Spark - and yes, you can actually use it for free! This post will give you a great overview of John Snow Labs NLP Library for Apache Spark.
- Search Millions of Documents for Thousands of Keywords in a Flash - Sep 1, 2017.
We present a python library called FlashText that can search or replace keywords / synonyms in documents in O(n) – linear time.
- Deep Learning Zero to One: 5 Awe-Inspiring Demos with Code for Beginners, part 2 - Jul 1, 2017.
Here are deep learning examples and demos you can just download and run, including Spotify Artist Search using Speech APIs, Symbolic AI Speech Recognition, and Algorithmia API Photo Colorizer.
- Pitfalls in pseudo-random number sampling at scale with Apache Spark - Jun 27, 2017.
Large scale simulation of random number generation is possible with today’s high speed & scalable distributed computing frameworks. Let’s understand how it can be achieved using Apache Spark.
- Deep Learning Zero to One: 5 Awe-Inspiring Demos with Code for Beginners - Jun 26, 2017.
Here are deep learning demos and examples you can just download and run. No Math. No Theory. No Books.
- K-means Clustering with Tableau – Call Detail Records Example - Jun 16, 2017.
We show how to use Tableau 10 clustering feature to create statistically-based segments that provide insights about similarities in different groups and performance of the groups when compared to each other.
Pages: 1 2
- How A Data Scientist Can Improve Productivity - May 25, 2017.
Data Science projects involve iterative processes and may need changes in data at every iteration. But Data versioning, data pipelines and data workflows make Data Scientist’s life easy, let’s see how.
- DataScience.com Releases Python Package for Interpreting the Decision-Making Processes of Predictive Models - May 24, 2017.
DataScience.com new Python library, Skater, uses a combination of model interpretation algorithms to identify how models leverage data to make predictions.
- Data Version Control: iterative machine learning - May 11, 2017.
ML modeling is an iterative process and it is extremely important to keep track of all the steps and dependencies between code and data. New open-source tool helps you do that.
- DataScience Launches Interactive Tool For Exploring Data Science Trends - Apr 14, 2017.
DataScience Trends, a new interactive tool from DataScience Inc., gives users the ability to explore and visualize data across 2.8 million open source repositories without writing code.
- Machine Learning-driven Firewall - Feb 23, 2017.
Cyber Security is always a hot topic in IT industry and machine learning is making security systems more stronger. Here, a particular use case of machine learning in cyber security is explained in detail.
- RCloud – DevOps for Data Science - Nov 28, 2016.
After almost two decades of software development, term – DevOps was coined and officially given importance to collaboration between development and deployment of software systems. In this early stage of Data Science field, use of standardized and empirical practises like DevOps will definitely speed up its evolution.
- Top 20 Python Machine Learning Open Source Projects, updated - Nov 21, 2016.
Open Source is the heart of innovation and rapid evolution of technologies, these days. This article presents you Top 20 Python Machine Learning Open Source Projects of 2016 along with very interesting insights and trends found during the analysis.
- Top KDnuggets tweets, Oct 05-11: Most Active #DataScientists on #Github; Why Not So Hadoop? - Oct 12, 2016.
Most Active #DataScientists, Free Books, Notebooks & Tutorials on #Github; Why Not So Hadoop?; Free #MachineLearning text PDF, from theory to algorithms; Top @reddit #MachineLearning Posts September.
- New sequence learning data set - Sep 17, 2016.
A new data set for the study of sequence learning algorithms is available as of today. The data set consists of pen stroke sequences that represent handwritten digits, and was created based on the MNIST handwritten digit data set.
- Top KDnuggets tweets, May 25-31: 19 Free eBooks to learn #programming with #Python; Awesome collection of public datasets on Github - Jun 1, 2016.
Introducing Hybrid lda2vec Algorithm via Stitch Fix; #DeepLearning and Deep #Gaussian Processes - explainer; Awesome collection of public #datasets on Github; #DataScience foundations: 19 Free eBooks to learn #programming with #Python.
- Top 10 Open Dataset Resources on Github - May 31, 2016.
The top open dataset repositories on Github include a variety of data, freely available for use by researchers, practitioners, and students alike.
- A Data Science Approach to Writing a Good GitHub README - May 4, 2016.
Readme is the first file every user will look for, whenever they are checking out the code repository. Learn, what you should write inside your readme files and analyze your existing files effectiveness.
- Top 10 IPython Notebook Tutorials for Data Science and Machine Learning - Apr 22, 2016.
A list of 10 useful Github repositories made up of IPython (Jupyter) notebooks, focused on teaching data science and machine learning. Python is the clear target here, but general principles are transferable.
- The MBA Data Science Toolkit: 8 resources to go from the spreadsheet to the command line - Apr 18, 2016.
A great guide for the MBA, or any relatively non-technical convert, for getting comfortable with the command line and other technical skills required to excel in data science.
Pages: 1 2
- Top 10 Data Science Resources on Github - Mar 24, 2016.
The top 10 data science projects on Github are chiefly composed of a number of tutorials and educational resources for learning and doing data science. Have a look at the resources others are using and learning from.
- Top 10 Data Visualization Projects on Github - Feb 22, 2016.
Github provides a number of open source data visualization options for data scientists and application developers integrating quality visuals. This is a list and description of the top project offerings available, based on the number of stars.
- Embedding Open Cognitive Analytics at the IoT’s Edge - Feb 19, 2016.
Cognitive computing is penetrating more aspects of the IoT as algorithms enable edge devices and applications. Understand how unstructured data captured by IoT edge devices with the help of cognitive algorithms distilled into actionable insights.
- Top 10 Deep Learning Projects on Github - Jan 13, 2016.
The top 10 deep learning projects on Github include a number of libraries, frameworks, and education resources. Have a look at the tools others are using, and the resources they are learning from.
- KDnuggets™ News 15:n41, Dec 16: Top 10 Machine Learning Projects on Github; How to use Python and R together - Dec 16, 2015.
Top 10 Machine Learning Projects on Github; Using Python and R together: 3 main approaches; Top 2015 KDnuggets Stories on Analytics, Big Data, Data Science; 22 Big Data experts predictions for 2016.
- Top 10 Machine Learning Projects on Github - Dec 14, 2015.
The top 10 machine learning projects on Github include a number of libraries, frameworks, and education resources. Have a look at the tools others are using, and the resources they are learning from.
Pages: 1 2
- Top /r/DataScience Posts, November: Open source Plot.ly, Pokemon (?), Social analysis with R - Dec 3, 2015.
November on /r/DataScience: Plot.ly is open sourced, Pokemon and Big Data games, a new social network analysis package for R, insider information on landing a Google Data Scientist job, and a free data science curriculum.
- YCML Machine Learning library on Github - Aug 24, 2015.
YCML is a new Machine Learning library available on Github as an Open Source (GPLv3) project. It can be used in iOS and OS X applications, and includes Machine Learning and optimization algorithms.
- Continually Updated Data Science IPython Notebooks - Jul 13, 2015.
Continually updated Data Science IPython Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, and various command lines.
- Top 20 Python Machine Learning Open Source Projects - Jun 1, 2015.
We examine top Python Machine learning open source projects on Github, both in terms of contributors and commits, and identify most popular and most active ones.
- Awesome Public Datasets on GitHub - Apr 6, 2015.
A long, categorized list of large datasets (available for public use) to try your analytics skills on. Which one would you pick?
Pages: 1 2
- NYC Open Data Meetups in January - Jan 3, 2015.
Upcoming events including Python Machine learning class Demo Day, Data Science Bootcamp and more.
- Mirador, a free tool for visual exploration of complex datasets - Oct 1, 2014.
Mirador is an open-source tool for visual exploration of complex datasets, enabling users to discover correlation patterns and derive new hypotheses from the data. Download Windows and Mac OS X versions from Github.
- Interview: Sujee Maniyam, Elephant Scale on the Best Free Online Resources to Learn Hadoop - Aug 7, 2014.
We discuss the startup - Elephant Scale, DIY Hadoop learning, best free online resources for learning Hadoop, getting a good job in Big Data, and the experience of authoring a book - Hadoop Illuminated (available for free).
- Top KDnuggets tweets, Jul 16-17: An awesome list of Big Data frameworks - Jul 18, 2014.
An awesome GitHub list of #BigData frameworks, resources, and more; 15 interviews with 15 data scientists; 14 definitions of data scientist, from funny to serious; Revised standards for statistical evidence.
- Employee Churn 202: Good and Bad Churn - May 4, 2014.
This post extends the “quantitative scissors” approach to employee churn and examines the factors that underlie attrition cost.
- Top KDnuggets tweets, Apr 2-3: Data scientists need their GitHub; How to make Data Scientist job less tedious - Apr 4, 2014.
Also Top stories in March: Machine Learning in 7 Pictures; import.io adds authenticated APIs, command line crawlers.
- Employee Churn 201: Calculating Employee Value - Apr 4, 2014.
Much has been written about customer churn. This post examines employee churn - an equally important problem and its unique dynamics.