Tuesday, 23 October 2012

Coursera: Data Programming in R, post II

I've finished my course work for Computing for Data Analysis, on Coursera, so I thought I'd take the time do do a quick review.

Overall:
I'm glad I took the course. A structured learning timeline with specific targets and an active discussion board is very valuable. It's far better than learning in isolation through random web tutorials and linked resources. 

The course is not for everyone:
 If you haven't done any programming, R is not a good first language. This course will not be a friendly introduction to programming. In particular, R is quirky and the command syntax is difficult to read. Many commands have similar names, but subtly different behaviors.  The help files are opaque and the examples frequently esoteric. The R programming environment lacks some very basic coding tools such as code completion, although these are probably available in other environments such as R-Studio or ESS.  If you want to learn basic programming, take a course in Python.

R will be a lot easier to digest if you are comfortable with statistics or matrix algebra, can read mathematical notation without difficulty, and have done a bit of programming. If you want flexible data analysis and publication quality output from free software, you'll be very happy. Most of what you want to do can be done with the core functions of R. It's probably best to learn what they can do before reaching for a package. This may save a lot of time later on when the package gets superseded by another one. The core of R will still be there, unchanged. That said, R appears to have a pretty good package management system, so incorporating packages that rely on packages seems to work very well.

If you are coming from an object oriented language such as Java, Ruby or C++, the scoping rules are a leetle different. This seems to be very powerful when used well, but it's mind bending. I haven't really managed to bend my mind around this one enough yet.

Level of the course: 
It would definitely help to have some programming and problem solving background going in. Students needed good problem solving to do the exercises. The basics of the language were taught in Powerpoint slides. The information in the videos was enough to get through the quizzes, but the exercises required more: R-help, R-bloggers, stackoverflow were very useful. 
There was very little discussion of speed optimisation or the tradeoffs in using different programming approaches in R. The final exercise could be solved with for loops, and judging by the forums, many students resorted to them. There was no penalty for this in the grading. The exercises, however were thoughtfully put together, and did provide a good platform for learning to leverage the language, with a little creativity and perseverance. 

There was an introduction to the differences between S3 and S4 classes, but no discussion of more advanced technologies such as refactoring, unit testing, version control, or documentation. I saw one reference to software carpentry on the forums, but there was no reference to how to incorporate these methods in R specifically. Function prototypes were provided for the exercises, and these included useful comments, setting a good standard. However, there was no mention of Runit (testing, TDD), Git (version control), or .Rd files or Roxygen (creating documentation). So if you want to learn how to incorporate these into your work with R, don't look here.  

Time Commitment: 
The course website suggests 2-4 hours / week. I was able to fit the video viewing and exercises into that time frame, on the outside. I spent an extra couple of hours reading and commenting on the forums and looking further / honing more satisfying solutions to the exercises.

What next?
R is quirky. I won't remember much of what I learned beyond a month or two at most. There is clearly a steep learning curve here, and thus a big difference between introduction, competence, and mastery.  At this point, I've had an introduction. In order to progress, I'll need some projects to work on.  

Resources for the future:
... which is just a tiny tip of the iceberg. Let me know about more in the comments.


No comments:

Post a Comment