Tuesday 23 October 2012

Coursera: Data Programming in R, post II

I've finished my course work for Computing for Data Analysis, on Coursera, so I thought I'd take the time do do a quick review.

Overall:
I'm glad I took the course. A structured learning timeline with specific targets and an active discussion board is very valuable. It's far better than learning in isolation through random web tutorials and linked resources. 

The course is not for everyone:
 If you haven't done any programming, R is not a good first language. This course will not be a friendly introduction to programming. In particular, R is quirky and the command syntax is difficult to read. Many commands have similar names, but subtly different behaviors.  The help files are opaque and the examples frequently esoteric. The R programming environment lacks some very basic coding tools such as code completion, although these are probably available in other environments such as R-Studio or ESS.  If you want to learn basic programming, take a course in Python.

R will be a lot easier to digest if you are comfortable with statistics or matrix algebra, can read mathematical notation without difficulty, and have done a bit of programming. If you want flexible data analysis and publication quality output from free software, you'll be very happy. Most of what you want to do can be done with the core functions of R. It's probably best to learn what they can do before reaching for a package. This may save a lot of time later on when the package gets superseded by another one. The core of R will still be there, unchanged. That said, R appears to have a pretty good package management system, so incorporating packages that rely on packages seems to work very well.

If you are coming from an object oriented language such as Java, Ruby or C++, the scoping rules are a leetle different. This seems to be very powerful when used well, but it's mind bending. I haven't really managed to bend my mind around this one enough yet.

Level of the course: 
It would definitely help to have some programming and problem solving background going in. Students needed good problem solving to do the exercises. The basics of the language were taught in Powerpoint slides. The information in the videos was enough to get through the quizzes, but the exercises required more: R-help, R-bloggers, stackoverflow were very useful. 
There was very little discussion of speed optimisation or the tradeoffs in using different programming approaches in R. The final exercise could be solved with for loops, and judging by the forums, many students resorted to them. There was no penalty for this in the grading. The exercises, however were thoughtfully put together, and did provide a good platform for learning to leverage the language, with a little creativity and perseverance. 

There was an introduction to the differences between S3 and S4 classes, but no discussion of more advanced technologies such as refactoring, unit testing, version control, or documentation. I saw one reference to software carpentry on the forums, but there was no reference to how to incorporate these methods in R specifically. Function prototypes were provided for the exercises, and these included useful comments, setting a good standard. However, there was no mention of Runit (testing, TDD), Git (version control), or .Rd files or Roxygen (creating documentation). So if you want to learn how to incorporate these into your work with R, don't look here.  

Time Commitment: 
The course website suggests 2-4 hours / week. I was able to fit the video viewing and exercises into that time frame, on the outside. I spent an extra couple of hours reading and commenting on the forums and looking further / honing more satisfying solutions to the exercises.

What next?
R is quirky. I won't remember much of what I learned beyond a month or two at most. There is clearly a steep learning curve here, and thus a big difference between introduction, competence, and mastery.  At this point, I've had an introduction. In order to progress, I'll need some projects to work on.  

Resources for the future:
... which is just a tiny tip of the iceberg. Let me know about more in the comments.


Tuesday 2 October 2012

Spirograph mania!

According to RetroWow, among other sources, Spirograph was invented by Denys Fisher in 1965. It was first intended as a drafting tool, but was marketed as a toy. There were multiple versions, and there is a modern remake from Hasbro.

 We've tried out three versions. The new version didn't stay in the house. It was too difficult to use because the gears kept slipping underneath the outer template. The pocket version shown below works OK, but the designs are somewhat limited (no epitrochoids). The antique Kenner version is our favorite. This is getting a lot of use right now. It's great at 8 or 9 years old and up, but your six year old would have to be very adept with a pen to enjoy it for long.
Mom! Can I get out the Spirograph?

The old pens were dry, so I did get a wonderful set of Stabilos, which are working perfectly. The pens need to be narrow enough to fit through the holes in the gears. Felt tips are a bit softer than roller balls, so they don't seem to make holes in the paper as easily, although the ink sometimes runs a bit.

Once you do a few patterns, it's nice to be able to predict what the wheels will do. There is a handy chart on the inside of the box lid, but the math is rather fun, too. We're not quite up to common denominators and gear ratios yet, but predicting the number of points on a spirograph pattern will be a good tool when we get there:
  • Outer wheel: 96 teeth
  • Inner wheel: 60 teeth
  • Step 1: Factor the number of teeth
    • 96 = 32 x 3 = 2 x 2 x 2 x 2 x 2 x 3
    • 60 = 20 x 3 = 5 x 2 x 3
  • Step 2: Compare the numbers. Find factors that are the same.
    • 96 = 32 x 3 = 2 x 2 x 2 x 2 x 2 x 3
    • 60 = 10 x 6 = 5 x 2 x 2 x 3
  • Step 3: Calculate the largest common denominator:
    • = 2 x 2 x 3 = 12
  • Step 4: Divide the number of teeth by the LCD to get the gear ratio:
    • 96 / 12 = 8, 
    • 60 / 12 = 5 
    • for a gear ratio of 8/5
    So it will take 5 round trips inside of the larger wheel to create a pattern with 8 points.
    When the circle is rotated around the inside of the fixed circle, as in this pattern, the result is called a hypotrochoid. 'Hypo' is a commonly used Greek root for 'under' or 'inner' as in hypoglycaemia for low blood sugar and hypoxia for lack of oxygen. If the inner circle is fixed and the outer one is rotated, the pattern is an epitrochoid. I think of 'epi' as a Greek root meaning 'surface' or 'outer' as in the medical name for the outer skin - epidermis.

    Of course these geometric forms have equations which can be used to describe them. And these equations have been implemented as interactive demos on the web. I don't find these as much fun as the physical drawing. Partly this is because the process of drawing a hypotrochoid is rather pleasant loopy-loop feeling. Also, though, the restriction to an integer number of teeth on the gears makes for a restriction on the variety of patterns. As in flowers, we don't really notice that number of petals is restricted to a multiple of 2, 3, or 5. We just find it pleasing.