Introduction to Data Science with R (Social Data Analytics Workshop Series)

Date 10/12/17 12:00pm to 1:00pm
Location Sparks B001 (The DataBasement)

Workshop 5: Data Input/Output and R Packages 

In this workshop, we will finally move beyond the basic functionality provided by default in R, and learn about how to extend that functionality by using R packages. We will make use of one such package (rio), to read in (and write out) data from many different formats to R (SAS, Sata, Excel, Minitab, SPSS, etc.).

 Materials for this workshop, including the script Matt will work through, have been posted to the workshop website:

 If you have missed previous workshops, you will benefit from catching up using the materials on the website. In particular, you should install and configure R and RStudio on your laptop -- this process is described in the pictorial for Workshop 1 --- and download [or git clone / fetch] the R scripts for each workshop.

General Information about the Workshop Series

Do you want to develop the skills to program and manage data using R? If so, this workshop series is for you! We will be meeting (almost) weekly for an hour throughout the semester to cover everything from basic R programming up through big data analytics and high performance computing. This workshop series will start with several weeks introducing R and basic R programming, so no prior experience is required (only a laptop). We will then move on to a series of workshops on reading in, cleaning, transforming, and combining multiple, complex datasets (including text and social network data) -- using our newfound R programming skills. Once we have the basics of data mangament down, we will cover web-based data collection, both from traditional web pages, and from the Twitter API. Finally, we will get into performance and scalability issues, and go ov! er the steps for accessing the ICS cluster resources at Penn State.

