Extremely beginner-friendly guide to your data science career - Subhralina Nayak

An extremely useful Dashboard that an extremely talented Data Scientist created!

Before you roll your eyes at this, thinking that this may be the 1 millionth article promising a Data Science roadmap, I request you to keep reading because this is for an absolute beginner.
This takes me back to my initial days when I was used to feeling overwhelmed by looking at the forest of roadmaps by hundreds of influencers. There was too much out there, and I had too little time.
So, if you stumble across this article and are a beginner and overwhelmed like I was, I hope this write-up helps you.

My simple roadmap includes 6 basic building blocks- SQL, Python, Statistics, Data Analysis, Machine Learning, and Best Practices. Let me take you through everything.

SQL

This should be your first step. Period!!
SQL should be the holy grail for every Data Scientist. Data has to be fetched from different forms and manipulated into forms as per requirement. There is no better tool than SQL for this.

Things to learn:

  • Basic CRUD Operations: Create, retrieve, update and delete data from a database

  • SQL clauses: select, from, where, group by and order by

  • Aggregate functions

  • Writing subqueries

  • Common Table Expressions

  • Window functions

Resources:

This video covers the basics of SQL: https://m.youtube.com/watch?v=HXV3zeQKqGY&t=33s

Practice writing SQL for yourself: https://www.hackerrank.com/domains/sql

Python

This is as simple as it sounds. It must know one programming language and Python will be your best bet.

Things to learn:

  • Reading all types of data files like CSV, XML, parquet etc.

  • Basic Data Science libraries like Pandas, Numpy, Scikitlearn, Matplotlib and Seaborn

  • Dataframe manipulation

  • Creating basic visuals like a bar chart, scatter plot etc.

Resources

A really good video: https://m.youtube.com/watch?v=LHBE6Q9XlzI

Platform to practice Python: https://www.hackerrank.com/domains/python

Statistics

Basic statistics are important when one is a beginner. This will form the backbone for Data Analysis as well as Machine Learning

Things to learn:

  • Measures of central tendency: Mean, median and Mode

  • p-values

  • Central Limit Theory

  • Data Distributions

  • Hypothesis Testing

Resources

I would suggest this comprehensive video: https://m.youtube.com/watch?v=xxpc-HPKN28&t=240s

I also suggest following 

Josh Starmer

on Youtube. He is the one person who has it all for Statistics.

Josh’s channel: https://m.youtube.com/c/joshstarmer

Data analysis

Once you get hold of SQL and Python, it should be time to apply those and improve your problem- skills. The easy way to start is to get hold of some data and answer some questions like, “What is the average sales for 2021?” or “Is XYZ even a profitable product?”

Things to learn:

  • Checking for categorical and continuous data

  • Imputing null values

  • Checking for data distribution

  • Analyzing outliers

Resources:

Kaggle is a very good source to get hold of all types of data. I suggest starting with the House Price Prediction dataset. This would be the one: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data

Machine Learning

You may have heard the buzz around Machine Learning. It sounds as if it is the only thing a Data Scientist needs to know. WRONG!!
You need to learn the basics to get started.

Things to learn:

  • Regression

  • Classification

  • Clustering

  • Model Evaluation Techniques

  • Cross Validation

  • Hyperparameter Tuning

Resources:

Here is a playlist of Josh’s videos: https://m.youtube.com/playlist?list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF

Best practices

Whether you write a query in SQL or a piece of code in Python, it should be readable and well documented.

Things to learn:

  • DRY rule (Don’t Repeat Yourself)

  • Naming convention

  • Writing meaningful comments

Resources:

You can find many resources for incorporating best practices in bits and pieces over the internet. There isn’t one comprehensive guide for this in particular.

Guess what? I am working on creating a guide on best practices. If this sounds interesting to you, leave a comment on this blog.

All of the things written above will serve as a starter for your Data Science career. There will be more to learn along the road but never overwhelm yourself. Take one step at a time.

Happy Learning!!!

More from Subhralina Nayak here.

Previous
Previous

IT Consulting: What It Means and How It Can Help Your Business - Svetlana Cherednichenko

Next
Next

Data Management: The Science Of Insight And Scalability For Growing Businesses