The statistical software we will be using in this class is called R. R is free, open source and widely used in data/business analytics applications. We will use R Studio as the user interface for R. R Studio is also free and open source. You should download both R (the latest release) and RStudio Desktop and install both on your computers.
Once you install R and R Studio, open R Studio. You should see this:
Click on the windows button in the upper right corner of the 'Console' window. The windows will re-size and you will see this:
The screen is divided into four sections. The upper right has your Global Environment that contains your data sets and other objects. The bottom right contains your Files, Plots, Packages and Help files. The panel on the bottom left is called the Console. This is where R displays its results. Also, if you plan to give R just one command you can type at the '>' prompt. The upper left section with a yet 'Untitled1' tab is where you will type your R Script or your R Markdown file.
R Script is a sequence of R commands that we write to control what R does. These commands are executed by R when you click 'Source' button, or (more frequently) when you highlight a portion of the code and click 'Run' button. You can execute a single line where your cursor is by clicking the 'Run' button, or holding the 'control' key and pressing 'enter'. R Scripts are used to produce data analysis - from loading and manipulating the data to creating tables and graphs.
An increasingly popular way of conducting data analysis is using R Markdown. R Markdown integrates data analysis and the writing of the document that discusses the analysis (e.g. a research paper or a report). Instead of doing the analysis using statistical software (e.g. R, Stata, SAS) and copying the results into a word processing software (Word, Docs, La Tex), R Markdown does both the analysis and the writing of the document. We will use R Markdown to produce lab reports and your final project.
We can open a new R Markdown file by clicking on the "+" icon and selecting R Markdown. A window will ask us for name and type of the document. We can just go with the defaults. The 'untitled' markdown has some text in it already. This text is the basic introduction to R Markdown. Once we are familiar with R Markdown, we will delete it and supply our own text. Save the untitled R Markdown file in a folder for this class on your computer. The R Markdown file needs to be processed/run/knitted to generate a nice looking report or a paper. This is done by pushing the 'Knit HTML' button. This produces an HTML document, but we can click on the triangle next to the 'Knit HTML' and ask for a pdf or a Word documents.
R packages are add-ons to base R. They have to be installed in order for you to use them. You can install them by clicking on the 'Packages' tab in the bottom right panel of R Studio, and clicking on 'Install Packages' button. You can select a package and click the 'Install' button.
The package that is particularly useful for this class is tidyverse.
Tidyverse is actually a collection of other packages including
ggplot2. You should install
tidyverse on your computer.
Read Chapter 1 (Introduction) in R for Data Science. Also read the article entitled The Sexiest Job of the 21st Century posted on Nexus.
tidyverse package on your computer.
Open a new R Markdown document. Type in a greeting to me (e.g. "Hello Professor! This is George!"). Knit it into an HTML or PDF file.