Introduction to Data Science with R (Social Data Analytics Workshop Series)
|Date||10/12/17 12:00pm to 1:00pm|
|Location||Sparks B001 (The DataBasement)|
Workshop 5: Data Input/Output and R Packages
In this workshop, we will finally move beyond the basic functionality provided by default in R, and learn about how to extend that functionality by using R packages. We will make use of one such package (rio), to read in (and write out) data from many different formats to R (SAS, Sata, Excel, Minitab, SPSS, etc.).
Materials for this workshop, including the script Matt will work through, have been posted to the workshop website: https://github.com/matthewjdenny/SoDA-Workshop-Series-Introduction-to-Data-Science
If you have missed previous workshops, you will benefit from catching up using the materials on the website. In particular, you should install and configure R and RStudio on your laptop -- this process is described in the pictorial for Workshop 1 --- and download [or git clone / fetch] the R scripts for each workshop.
Do you want to develop the skills to program and manage data using R? If so, this workshop series is for you! We will be meeting (almost) weekly for an hour throughout the semester to cover everything from basic R programming up through big data analytics and high performance computing. This workshop series will start with several weeks introducing R and basic R programming, so no prior experience is required (only a laptop). We will then move on to a series of workshops on reading in, cleaning, transforming, and combining multiple, complex datasets (including text and social network data) -- using our newfound R programming skills. Once we have the basics of data mangament down, we will cover web-based data collection, both from traditional web pages, and from the Twitter API. Finally, we will get into performance and scalability issues, and go ov! er the steps for accessing the ICS cluster resources at Penn State.
These workshops will be offered (most) Thursdays during the Fall 2017 semester from 12:00-1:00 in Sparks B001 (The DataBasement). Directions to the DataBasement here: http://bdss.psu.edu/pdf-folder/finding-the-sparks-databasement .
The instructor for the workshop is Matt Denny, who can be contacted at email@example.com.
Materials (including slides, video tutorials, pictorial tutorials, scripts, ...) for previous workshops are available on the workshop website: https://github.com/matthewjdenny/SoDA-Workshop-Series-Introduction-to-Data-Science.
This workshop series is sponsored by the Big Data Social Science IGERT.
For more information about BDSS-IGERT and SoDA, visit bdss.psu.edu