Monday 30 September 2013

Coursera: Introduction to Data Science (Course Review)

I finished this a few months ago, but it will probably be offered again, so here's a review. This course was taught by Bill Howe at University of Washington and offered as a MOOC on Coursera.

Course Description: 
The Coursera description promises newbie to data ninja in 8 weeks. Workload 8-10 hours/ week. Those of us who have finished our statistical mechanics homework at 2 AM know that such promises are not only empty, but rather a guarantee of a course that is over-ambitious. As the description implies, this is an overview course that tries to do too much. Every student should realise this at the outset: an introductory course that claims to cover everything is certain to be a rough ride. (I know. I've taught some.)

Lectures: 
The lectures covered some really interesting content, and the lecturer appears to know the industry very well, particularly the Microsoft perspective and tools. I assume that many of his continuing education students are Microsoft employees who want to update particular skills. Such students are not the run-of-the mill for UW.

When I was a TA at UW, the undergraduates needed slow feeding with very small spoons. Lecture halls were filled with slumped bodies under baseball caps. The evening courses, in contrast, were filled with lively, interested, adults who learned independently and came to class with lists of questions. This course is aimed at those active, interested adult learners. That said, the number of hours listed for this class is a gross underestimate for the material covered and the assignments given.

Professor Howe's lecture style is not always engaging, and a lot of material is covered. There were often over 3 hours worth of lecture material to review during the week. Along with following links and reading supporting papers, this left very little time for the assignments themselves. Prof. Howe did a good job of introducing and comparing a range of current technology choices (particularly the comparison of different database technologies). As a data science newbie, I would have liked a bit more information and emphasis on use cases for different types of databases.

The database parts of the course were well presented, and this covered subjects that I hadn't seen in my other work. The data analysis elements were not so clearly taught, though, and there are better (slower) ways to learn this material on Coursera. If you have time, Jeff Leek's course 'Data Analysis' covers this much more thoroughly. Andrew Ng's now legendary Machine Learning course is also good, although more mathematically oriented, with less emphasis on organisation, data munging techniques, and communicating results.

Later lectures in this Intro. to Data Science course appeared to have incorrect answers in the in-lecture questions. I got bored of trying to keep track of the errors and inconsistencies in the course. The material needs a thorough editing before the next showing.

Assignments:  
The lectures were not particularly good preparation for the homework assignments. A lot of independent learning was required to make progress in the course. The assignments were also relatively difficult compared to what I expected from the course description. The first assignment was a sentiment analysis of a Tweet stream written in Python. I have a pretty good programming background, having started with Basic back in 1983, visiting Fortran, MatLab, Igor, Unix utilities, C, Ruby and continuing to objective-C, R and Functional programming in Scala. I pick things up quickly. The course description did not require a programming background, yet I had to spend several hours learning the ins and outs of Python from Code Academy before I could get a handle on the assignments.

The level of the first assignments was not commensurate with expectations from the course description. I learned a lot, more than I expected, in fact, and I can now implement a matrix multiplication in Python, SQL, or MapReduce based on the homework assignments. The auto-grader for the 1st assignment never did accept my answer for the final part. It also didn't give sufficient feedback for me to solve the problem, which probably had something to do with text encoding, but was very frustrating none-the-less. This sort of issue doesn't teach much. Save your perseverance for things that matter.

Overall, the assignments were challenging. I learned a lot, but not always what the point of the assignment was. I think there were a lot of complaints (more than normal) about the difficulty of the assignments, and later assignments were quite a bit easier than earlier ones. Assignments covered:

  • Python:   Tweet stream sentiment analysis
  • SQL:   Queries, tables and matrix multiplication
  • Tableau Visualization:    FAA Bird Strike Data 
    • write-up and peer assessment
    • note: I had to use Tableau via Amazon web services as it only runs on Windows.
  • MapReduce:   data joins, basic network analysis, matrix multiplication
  • Kaggle: Take part in a competition (I did facial keypoints detection)
    • write-up and peer assessment: ranking on the leaderboard did not matter.
A couple of the assignment deadlines were changed after the deadline had passed. This is very unfair to people who have worked hard to make the deadline, although it was reasonable in the case of the MapReduce homework where we were using a new web system that was supposed to be able to handle the volume of students. This is a continuing problem with MOOCs that have > 100k students enrolled. Any time the professor makes an assignment that will run on new technology, be prepared for a very frustrating experience. In my opinion, new web-based technology should not be used for graded assignments in MOOCs. They should be tested first as an optional assignment or a staged assignment so that 100k students are not accessing it in the same week.

Overall Recommendation:
Students:  I hope that the professor will offer this course in a pared-down form. As it is, if you're already awesome at Python and SQL, go ahead and dive in. Everyone else should consider this a taster course and audit only, at least with the current assignments. Be selective about which parts you choose to look at. If you experience slowdowns or poor behaviour with particular technologies in the assignments, put it aside and try again when the course is over or the deadline is passed. It seems like a class, but it's a free platform and you get what you pay for. A lot of professors are using this to try out new technologies, so don't expect it to all work as advertised.

52 comments:

  1. Replies
    1. Such an excellent post. Data Science is one of the booming technology in the current IT world. I have much interest to work with data science. Thank you for sharing with us.

      Data Science Training Institute in Chennai
      Data Science Course in Chennai

      Delete
  2. well done! the blog is good and it is about tableau it is useful for students and tableau Developers for more updates on Tableau follow the link

    tableau online training

    ReplyDelete
  3. It has been simply incredibly generous with you to provide openly what exactly many individuals would’ve marketed for an eBook to end up making some cash for their end, primarily given that you could have tried it in the event you wanted.

    Data Science Training in Bangalore


    DataScience Training in Chennai

    ReplyDelete
  4. This information is enough for learners but update more and more about MSBI Online Training

    ReplyDelete
  5. It has been simply incredibly generous with you to provide openly what exactly many individuals would’ve marketed for an eBook to end up making some cash for their end, primarily given that you could have tried it in the event you wanted.

    RPA Training in Bangalore


    ReplyDelete

  6. This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up
    salesforce Online Course Hyderabad

    ReplyDelete
  7. It was very nice article and it is very useful Data Science online Training

    ReplyDelete


  8. Really it was an awesome article… very interesting to read…
    Thanks for sharing.........

    datascience online training in hyderabad


    ReplyDelete
  9. Thanks for sharing great information in your blog. Got to learn new things from your Blog. It was a very nice blog to learn about Data Science.
    Data Science Online Training in India

    ReplyDelete
  10. This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up
    salesforce Online Course

    ReplyDelete
  11. Thank you for sharing such a valuable article with good information in this blog.learn Data science course with advanced technology.
    Data Science Training in Hyderabad

    ReplyDelete
  12. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.

    Python training in bangalore | Python course in pune | Python training in bangalore

    ReplyDelete
  13. It's really a nice experience to read your post. Thank you for sharing this useful information. If you are looking for more about
    R Programming institutes in Chennai | R Programming Training in Chennai

    ReplyDelete
  14. It's really a nice experience to read your post. Thank you for sharing this useful information. If you are looking for more about hadoop training in chennai velachery | hadoop training course fees in chennai | Hadoop Training in Chennai Omr

    ReplyDelete
  15. From your discussion I have understood that which will be better for me and which is easy to use. Really, I have liked your brilliant discussion. I will comThis is great helping material for every one visitor. You have done a great responsible person. i want to say thanks owner of this blog.

    Java training in Bangalore | Java training in Kalyan nagar

    Java training in Bangalore | Java training in Kalyan nagar

    Java training in Bangalore | Java training in Jaya nagar


    ReplyDelete

  16. I am so proud of you and your efforts and work make me realize that anything can be done with patience and sincerity. Well I am here to say that your work has inspired me without a doubt.
    Data Science Course in Indira nagar
    Data Science Course in btm layout
    Python course in Kalyan nagar
    Data Science course in Indira nagar
    Data Science Course in Marathahalli
    Data Science Course in BTM Layout

    ReplyDelete
  17. Thanks for posting such a great article.you done a great job Data Science Online Training in Hyderabad

    ReplyDelete
  18. Nice tecnical information, it is helping me thanks for posting this blog Data Science Online course Bangalore

    ReplyDelete
  19. Nice article about Data science training in BangaloreThanks for the useful information shared.

    ReplyDelete
  20. Really awesome blog. Your blog is really useful for me
    Regards,
    Data Science Course in Chennai

    ReplyDelete
  21. This comment has been removed by the author.

    ReplyDelete
  22. Impressive! I finally found great post here. Nice article on data science . Thanks for sharing.
    Data Science Course in Marathahalli

    ReplyDelete
  23. Great job for publishing such a beneficial web site. Your web log isn’t only useful but it is additionally really creative too.

    SEO Cheltenham
    SEO Agency Gloucester
    SEO Agency Cheltenham
    Local SEO Agency

    ReplyDelete
  24. Amazing Article! You have furnished the right information about Data Science that will be useful to anyone at all time. It shows how well you understand this subject. Thanks for sharing.
    Our Java Programming Training In Virginia for data science and Java Developers helps all developers to become better programmers.

    ReplyDelete
  25. I see the greatest contents on your blog and I extremely love reading them.
    data scientist training in malaysia

    ReplyDelete
  26. Become a specialist in Data Science by completing Data Science Training in Hyderabad program by AI Patasala in association with Data Science industry experts
    AI Patasala Data Scientist Training in Hyderabad

    ReplyDelete
  27. Thanks for sharing this information. I really like your blog post very much. You have really shared a informative and interesting blog post with people..
    full stack developer course with placement

    ReplyDelete
  28. Unquestionably generally speaking very intriguing post. I was looking for such an information and totally savored the experience of examining this one. Keep on posting. A responsibility of appreciation is all together for sharing.best data science course in bhubaneswar

    ReplyDelete
  29. After whole eat as yard goal. Success wait south. Go around perhaps.sports

    ReplyDelete
  30. This was a very informative article. Thanks for sharing your knowledge!
    Data science classes in Nagpur

    ReplyDelete

  31. Thanks for sharing a valuable blog. Keep sharing.
    home automation in hyderabad
    Awesome Article! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more.

    ReplyDelete