Getting Started with R

Statistical Computation

It is fair to say that statistics is a computationally intensive area of mathematics - it invariably involves a lot of number-crunching.

Prior to the advent of computers, it was necessary to do all of the computations involved by hand or with desk calculators.

Fortunately, a typical personal computer today has made it possible to automate most common statistical procedures and, while one must always be aware of what techniques are being used and what assumptions they make, it us usually not necessary to get into the low level details of the computations.

Why R?

Commercial Statistical Software

With the increased availability and processing power of personal computers, there has been a proliferation of software packages that can perform statistical computation.

Most of these are expensive commercial products. In addition to paying license fees that often run into thousands or tens of thousands of dollars, you may have to deal with software that manages the license (in effect, this software decides whether or not you have paid for the software and disables the product if it decides you have not paid the bill, whether you have actually paid it or not). Also, the product may or may not run on the platform (Macintosh, Linux) you have, or the license may restrict you to a specific platform.

The license agreement may also specify restrictions on which computers can have the software installed, who can use it, and for what purposes it may be used.

The college licenses several commercial statistical packages, including SPSS and JMP, and you will find these programs on many lab computers on campus. Most license agreements (JMP being an exception) generally do not allow individual students to copy the software to their own computers. None that I am aware of allow you to take it with you when you graduate.

The Open Source Alternative

A different approach is to make use of software that is available for free, often known as "open source" software because the authors make the source code freely available.

Usually, open source software is available at no charge and anyone can legally download it, install it, and use it for any purpose they choose. Depending on which of several open source licenses the software is released under, you may even be allowed to modify it and sell the modified version, as long as you include the source code and say where it came from.

Contrary to the "you get what you pay for" mindset, the quality of open source software often equals or exceeds that of equivalent commercial software. That said, in my experience the user interface is often easier with commercial products, which usually include some kind of graphical interface. Many open source programs, R included, use a command line interface, but if you can get past this hurdle you can accomplish whatever statistical analysis you want with open source software. Graphical interface bigots often deride command line interfaces as antiquated, but many sophisticated users actually prefer them.

R

R is an open source implementation of the S language which was developed at Bell Labs in the 1980s. It is available for all common platforms (Macintosh, Windows, Unix variants) as a free download from the R project website or any one of its mirror sites.

When you download and install R, you also install extensive documentation. This is fortunate because R allows contributions from outside the development team and over time the system has grown to be quite large, with new packages and routines being added all the time.

Downloading and Installing R

The starting point is the R project website, www.r-project.org: rinstall_1.JPG Click on the CRAN link under the Download menu item and scroll down the list of mirror sites to the ones in the USA (any mirror will work, but it is considered proper etiquette to pick the one geographically closest to you) rinstall_2.JPG One possibility is to select the Statlib mirror at Carnegie Mellon University. The following page should appear: rinstall_3.JPG Note that R runs on a variety of platforms. Most open source software is developed on Unix or Unix variants like Linux or FreeBSD (MacOS X is based on a modified version of FreeBSD). The reason for this is that free, high-quality development tools are available on these platforms. Once developed, open source software may be ported to proprietary operating systems like windows or MacOS. With apologies to Unix and Mac users, we will illustrate the installation on the windows platform, so the next screenshot is the result of clicking on the Windows link. rinstall_4.JPG For starters, we will select the base packages to install. It is very easy to add extra packages once R is installed. rinstall_5.JPG The following screen should appear. It is worth taking a look at the README.R- link for any instructions specific to this version or platform: rinstall_6.JPG Next go back to the windows install page and select the install binary, whose name usually ends with -win32.exe. The following dialog should appear: rinstall_7.JPG Usually you want to choose the Save File option, which will download the install binary to your computer. Once the download is complete, you should see something like this: rinstall_8.JPG If you download to your Desktop folder, you should see an icon for the R setup program: rinstall_9.JPG Double click the icon to start the setup program. The following dialog should appear: rinstall_10.JPG Select the language you want and click OK rinstall_11.JPG Click Next > and the statement of the GNU General Public License (GPL) should appear. Clicking Next > implies acceptance of the license terms. Basically, it says you can use the software for any purpose, make copies, and even modify and sell it, as long as you include the source code. rinstall_12.JPG Next select a directory to install R into. Usually you can take the default, but in this example the C: partition is nearly full so R is going to be installed on the D: partition instead. rinstall_13.JPG Select the components to install, or take the defaults: rinstall_14.JPG Take the default startup options in most cases: rinstall_15.JPG Take the default folder for the startup icons: rinstall_16.JPG Take the default on these additional windows-specific tasks. rinstall_17.JPG Now the setup program should begin copying files to your computer, which will take a few minutes depending on your connection speed. rinstall_18.JPG If all goes well, you should get a dialog box with a Finish button, and an icon for starting the R system should appear on your desktop: rinstall_19.JPG Double-click the icon to start R: rinstall_21.JPG Type in help.start() and hit enter to start a web browser window for the R help pages. rinstall_22.JPG The following messages should appear in the R window: rinstall_23.JPG And a browser window that looks something like this should open: rinstall_24.JPG Your R environment should now be installed and configured.