Asel Mendis (@aselmendis) is a Data Scientist in Australia. He is originally from Sri Lanka and has been in the data space since 2019. As a Data Scientist, his objective is to use the relevant techniques and technology to create insights and value. He is a Contributing Editor at KDnuggets. He has interests in Location Intelligence, Spatial Analysis, Demographics, Machine Learning, Data Visualization and Statistics using R and Python. He has a Master of Analytics from the Royal Melbourne Institute of Technology specializing in Applied Statistics and as of 2021 is currently undertaking a PhD in Applied Statistics focusing on the topic of Black Spots on roads. He is also a member of the Statistical Society of Australia and holds the designation of Graduate Statistician (GStat) and is looking forward to becoming an Accredited Statistician (AStat.) in the future.
Databases are the houses of our data and data scientists HAVE TO HAVE A KEY! In this article, I discuss some lesser known concepts of SQL that data scientists do not familiarize themselves with.
By using R, Flexdashboard and Leaflet, we can build a customized and branded web application to showcase location based data interactively across the organization. Instead of crowding the application with many widgets, we use menu tabs and pages to separate the interactive aspects.
Geographic Information Systems Analysis is the analysis of spatial relationships and patterns. Spatial components are being ingrained into society with the advent of the Internet of Things (IoT) in which more data can be connected and is likely to have a spatio-temporal component as well.
Many statisticians in industry agree that blindly imputing the missing values in your dataset is a dangerous move and should be avoided without first understanding why the data is missing in the first place.
GIS has mostly been behind more popular buzzwords like machine learning and deep learning. GIS has always been around us in the background being used in government, business, medicine, real estate, transport, manufacturing etc.
Biostatisticians use statistical techniques that your current everyday data scientists have probably never heard of. This is a great example where lack of domain knowledge exposes you as someone that does not know what they are doing and are merely hopping on a trend.
Since Python and R are a must for today's data scientists, continuous learning is paramount. Online courses are arguably the best and most flexible way to upskill throughout ones career.
While the validation process cannot directly find what is wrong, the process can show us sometimes that there is a problem with the stability of the model.
An estimated 8,650% growth of the volume of Data to 175 zetabytes from 2010 to 2025 has created an enormous need for Data Engineers to build an organization's big data platform to be fast, efficient and scalable.
Over the years new alternative providers have risen to provided a solitary data science environment hosted on the cloud for data scientist to analyze, host and share their work.