R programming Tutorial: Great Learning
1. What is R Programming Language? Introduction & Basics.
R is a programming language developed by Ross Ihaka and Robert Gentleman at the University of Auckland.
This is a programming language and environment commonly used in statistical computing, data analytics, and scientific research.
This is widely used among statisticians and data miners for developing statistical software and data analysis. It is an easy to use interface, because of that it has grown in popularity in recent years.
R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows, and macOS.
2. Why use R for statistical computing and graphics?
It runs on all platforms.
It is the popular programming language and increasing in popularity.
It is an open-source and free programming language.
It is being used by the biggest tech giants.
R and its libraries implement a wide variety of statistical and graphical techniques.
It is easily extensible through functions and extensions.
R community is noted for its active contributions in terms of packages.
3. Features of R Programming
The R programming language is an open-source scripting language for predictive analytics and data visualization. users typically access it through a command-line interpreter.
It is extensively used by Software Programmers, Statisticians, Data Scientists, and Data Miners. It is one of the most popular analytics tools used in Data Analytics and Business Analytics.
A well-developed, simple, and effective programming language which includes conditionals, loops, user-defined recursive functions, and input and output facilities.
4. Application of R programming in the real world
Social Media: Behavior analysis, Sentiment Analysis
IT: Business Intelligence Software Development and Machine learning Product
Finance: Stock Market Modeling and Fraud Detection
Government: Weather Forecasting and Record-Keeping
Research and Academic
E-Commerce
Banking Sector
Health Care
Manufacturing Industry
5. How to download & install R, R studio, Anaconda on Mac or Window
R can be downloaded for free from https://www.r-project.org/
R Studio allows the user to run R in a more user-friendly environment. It is open-source (i.e. free) and available at https://rstudio.com/products/rstudio/download/
Anaconda: With Anaconda, you can easily install the R programming language and over 6,000 commonly used R packages for data science. You can also create and share your custom R packages.
To download Anaconda: https://docs.anaconda.com/anaconda/install/
System requirements
License: Free use and redistribution under the terms of the End User License Agreement.
Operating system: Windows 7 or newer, 64-bit macOS 10.13+, or Linux, including Ubuntu, RedHat, CentOS 6+, and others.
If your operating system is older than what is currently supported, you can find older versions of the Anaconda installers in our archive that might work for you. See Using Anaconda on older operating systems for version recommendations.
System architecture: Windows- 64-bit x86, 32-bit x86; MacOS- 64-bit x86; Linux- 64-bit x86, 64-bit Power8/Power9.
Minimum 5 GB disk space to download and install.
6. R data Types
a) Vector
Vector is a basic data structure in R.
The data types can be logical, integer, double, character, complex, or raw.
Example- To check : typeof(),length().
b) Lists
Almost all lists in R internally are Generic Vectors, whereas traditional dotted pair lists (as in LISP) remain available but rarely seen by users (except as formals of functions).
c) Matrices
In R matrices are an extension of the numeric or character vectors. Matrix creates a matrix from the given set of values. as.matrix attempts to turn its argument into a matrix. is.matrix tests if its argument is a (strict) matrix.
d) Arrays
An array in R can have one, two, or more dimensions. It is simply a vector that is stored with additional attributes giving the dimensions (attribute “dim”) and optionally names for those dimensions (attribute “dimnames”).
e) Factors
The function factor is used to encode a vector as a factor (the terms ‘category’ and ‘enumerated type’ are also used for factors). If the argument ordered is TRUE, the factor levels are assumed to be ordered. For compatibility with S there is also a function ordered.
is.factor, is. ordered, as.factor and as.ordered are the membership and coercion functions for these classes.
7. Data Frames
The function data.frame() creates data frames, tightly coupled collections of variables that share many of the properties of matrices and lists, used as the fundamental data structure by most of R’s modeling software.
A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.
Example: data.frame(df, stringsAsFactors = TRUE)
df: It can be a matrix to convert as a data frame or a collection of variables to join.
stringsAsFactors: Convert string to factor by default.
8. R Variables
A basic concept in (statistical) programming is called a variable.
A variable allows you to store a value (e.g. 6) or an object (e.g. a function description) in R. You can then later use this variable’s name to easily access the value or the object that is stored within this variable.
Let’s assign a value 29 to a variable my_var with the command
my_var <- 29
Suppose you have a basket with six oranges. As a data analyst, you want to store the number of oranges in a variable with the name my_oranges.
my_oranges <- 6
Every tasty fruit basket needs apples, so you decide to add six apples. As a data analyst, your reflex is to immediately create the variable my_apples and assign the value 6 to it. Next, you want to calculate how many pieces of fruit you have in total. Since you have given meaningful names to these values, you can now explicitly code this:
my_apples + my_oranges
my_apples<-6
my_oranges<-6
my_fruits<- my_apples + my_oranges
my_fruits
[1] 12
9. Arithmetic & Logical Operators with Example
Arithmetic Operators:
These operators are used to carry out mathematical operations like addition and multiplication. Here is a list of arithmetic operators available in R.
Logical Operators:
Logical Operators in R programming language work only for the basic data types logical, numeric, and complex and vectors of these basic data types.
10. Flow Control Statements
These are the basic control-flow constructs of the R language. They function in much the same way as control statements in any Algol-like language.
There are eight types of control structures in R:
if
if-else
for
nested loops
while
repeat and break
next
return
11. R Functions
A function, in a programming environment, is a set of instructions.
A programmer builds a function to avoid repeating the same task or reduce complexity.
Example:
fahrenheit_to_celsius <- function(temp_F) {
temp_C <- (temp_F — 32) * 5 / 9
return(temp_C)
}
12. R Strings
Any value coded within a pair of single quote (‘ ‘) and double quotes (“ “) in R programming is termed as a ‘string’
Example:
str_length(“abc”)
13. R Packages
It is a collection of R functions and data sets.
Few standard ones come with the R installation, others have to be downloaded from: http://cran.r-project.org/
Or the packages can be installed using install.packages(“package name”)
Once installed we need to call the package in when needed using library(“package name”)
14. Data Reshaping
R data reshaping is all about changing how data is organized into rows and columns. Most of the time data processing in R is done by taking the input data as a data frame but there is a problem when we need a data frame in a format that is different from the format in which we received it.
R provides many functions to merge, split, and change the rows to columns and vice-versa in a data frame.
Transpose a Matrix
Joining rows and columns in Dataframe
Merging Data Frame
Melting and Casting
15. R vs Python
R :
R’s first release came in 1995.
Open-source programming languages.
Easy to get primary results.
New libraries or tools are added continuously to their respective catalog.
Statisticians, engineers, and scientists without computer programming skills. It’s popular in academia, finance, pharmaceuticals, media, and marketing.
If you have no coding experience, then R may be easier to learn.
R has been used primarily in academics and research and is great for exploratory data analysis. In recent years, enterprise usage has rapidly expanded.
R’s analysis-oriented community has developed open-source packages for specific complex models that a data scientist would otherwise have to build from scratch.
Slow High Learning curve Dependencies between library
Graphs are made to talk. R makes it beautiful
Large catalog for data analysis
GitHub interface
RMarkdown
Shiny
Example: library(readr)
xyz<- read_csv(“xyz_2020.csv”)
Python:
It is released in 1991.
Open-source programming languages.
Good to deploy algorithm.
New libraries or tools are added continuously to their respective catalog.
Python is a production-ready language, meaning it can be a single tool that integrates with every part of your workflow.
If you have no coding experience, then python may be easier to learn.
Python is used by programmers that want to delve into data analysis or apply statistical techniques, and by developers and programmers that turn to data science.
Python’s suite of specialized deep learning and other machine learning libraries includes popular tools like sci-kit-learn, Keras, and TensorFlow.
Not as many libraries as R.
Jupyter notebook: Notebooks help to share data with colleagues.
Mathematical computation.
Deployment.
Code Readability.
Speed.
Function in Python.
Example: import pandas
XYZ = pandas.read_csv(“xyz_2020.csv”)
Python is a powerful, versatile language that programmers can use for a variety of tasks in computer science. Learning Python will help you develop a versatile data science toolkit, and it is a versatile programming language you can pick up pretty easily even as a non-programmer.
On the other hand, R is a programming environment specifically designed for data analysis that is very popular in the data science community. You’ll need to understand R if you want to make it far in your data science career.
The reality is that learning both tools and using them for their respective strengths can only improve you as a data scientist. Versatility and flexibility are traits of any data scientist at the top of their field. The Python vs R debate confines you to one programming language. You should look beyond it and embrace both tools for their respective strengths. Using more tools will only make you better as a data scientist.
Read more post from this writer at Great Learning's Medium