Extremely beginner-friendly guide to your data science career - Subhralina Nayak
An extremely useful Dashboard that an extremely talented Data Scientist created!
Before you roll your eyes at this, thinking that this may be the 1 millionth article promising a Data Science roadmap, I request you to keep reading because this is for an absolute beginner.
This takes me back to my initial days when I was used to feeling overwhelmed by looking at the forest of roadmaps by hundreds of influencers. There was too much out there, and I had too little time.
So, if you stumble across this article and are a beginner and overwhelmed like I was, I hope this write-up helps you.
My simple roadmap includes 6 basic building blocks- SQL, Python, Statistics, Data Analysis, Machine Learning, and Best Practices. Let me take you through everything.
SQL
This should be your first step. Period!!
SQL should be the holy grail for every Data Scientist. Data has to be fetched from different forms and manipulated into forms as per requirement. There is no better tool than SQL for this.
Things to learn:
Basic CRUD Operations: Create, retrieve, update and delete data from a database
SQL clauses: select, from, where, group by and order by
Aggregate functions
Writing subqueries
Common Table Expressions
Window functions
Resources:
This video covers the basics of SQL: https://m.youtube.com/watch?v=HXV3zeQKqGY&t=33s
Practice writing SQL for yourself: https://www.hackerrank.com/domains/sql
Python
This is as simple as it sounds. It must know one programming language and Python will be your best bet.
Things to learn:
Reading all types of data files like CSV, XML, parquet etc.
Basic Data Science libraries like Pandas, Numpy, Scikitlearn, Matplotlib and Seaborn
Dataframe manipulation
Creating basic visuals like a bar chart, scatter plot etc.
Resources
A really good video: https://m.youtube.com/watch?v=LHBE6Q9XlzI
Platform to practice Python: https://www.hackerrank.com/domains/python
Statistics
Basic statistics are important when one is a beginner. This will form the backbone for Data Analysis as well as Machine Learning
Things to learn:
Measures of central tendency: Mean, median and Mode
p-values
Central Limit Theory
Data Distributions
Hypothesis Testing
Resources
I would suggest this comprehensive video: https://m.youtube.com/watch?v=xxpc-HPKN28&t=240s
I also suggest following
on Youtube. He is the one person who has it all for Statistics.
Josh’s channel: https://m.youtube.com/c/joshstarmer
Data analysis
Once you get hold of SQL and Python, it should be time to apply those and improve your problem- skills. The easy way to start is to get hold of some data and answer some questions like, “What is the average sales for 2021?” or “Is XYZ even a profitable product?”
Things to learn:
Checking for categorical and continuous data
Imputing null values
Checking for data distribution
Analyzing outliers
Resources:
Kaggle is a very good source to get hold of all types of data. I suggest starting with the House Price Prediction dataset. This would be the one: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data
Machine Learning
You may have heard the buzz around Machine Learning. It sounds as if it is the only thing a Data Scientist needs to know. WRONG!!
You need to learn the basics to get started.
Things to learn:
Regression
Classification
Clustering
Model Evaluation Techniques
Cross Validation
Hyperparameter Tuning
Resources:
Here is a playlist of Josh’s videos: https://m.youtube.com/playlist?list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF
Best practices
Whether you write a query in SQL or a piece of code in Python, it should be readable and well documented.
Things to learn:
DRY rule (Don’t Repeat Yourself)
Naming convention
Writing meaningful comments
Resources:
You can find many resources for incorporating best practices in bits and pieces over the internet. There isn’t one comprehensive guide for this in particular.
Guess what? I am working on creating a guide on best practices. If this sounds interesting to you, leave a comment on this blog.
All of the things written above will serve as a starter for your Data Science career. There will be more to learn along the road but never overwhelm yourself. Take one step at a time.
Happy Learning!!!
More from Subhralina Nayak here.