Not that I'm complaining. I had a great time. There were some really talented people there, and some very committed parents, doctors, programmers and just plain technically minded people. It was a really interesting weekend.
The hackers present were a pleasant mix of odd-ball grad student types, young doctors, programmers, developers and random 'IT' people, all with an interest in trying to contribute something to the efficiency, ability and smooth running of the NHS. The elephant in the room is whether any of these projects will ever become useful. Several applications had very useful ideas. The winning app 'Waitless' aimed to provide an SMS service in which people could send an SMS and get information about the distance to local NHS services and the likely waiting time once they get there. This way, someone with an earache could make an informed choice to go to their local walk-in clinic instead of the A&E department depending on wait times, opening hours, and distance.
The progress made on these apps over the course of the weekend was astounding, with several nearly becoming usable services, and certainly good proof of principle demos by the time of presentation at 3pm on Sunday. It is truly amazing what a team of 8 can get done in a weekend with modern developer tools, available APIs, open source software, and online services. Wow.
And me? well, not so much...
I ended up working on the aptly named FAIL project: 'Fatal Accident Inquiry Learning', which attempted to apply machine learning techniques to Scottish Fatal Accident Inquiry (FAI) Reports. Unfortunately, I spent most of the first day struggling to get nltk and various supporting technologies installed on my system and most of the second day learning the very basics of working with these technologies. Carl had somewhat more success in getting some snippets of the reports into Carrot2, but the results were less than impressive.
Challenges:
1) I need more experience with Python
Everything I know about Python, I learned from Code Academy and from my homework for 'Intro to Data Science' on Coursera. The homework was a pretty good introduction for this project, as it involved sentiment analysis of a tweet stream. I was able to do some basic filtering and use the structure of the base homework code. There were some difficulties in translating the learning from the homework analyses of short tweets to these much larger, richer records, and I struggled to create a workflow between parts of the analysis.
2) I need more time with natural language processing
The FAIs are long legal text documents. It is possible to extract text from them, but it isn't easy. The text has some consistent elements, but is not in a consistent form. Some dates are 'day month year' format, while others are given as 'month day, year' and still others are 'day of the month of year'. This makes it somewhat difficult to even extract basic information such as the age of the victim. It should be possible to get this by using a grammar with NLTK, but well, I didn't manage to come to grips with it in 45 minutes I gave to it. Perhaps not surprising. Similarly with bi-grams, tri-grams, collocations and point mutation importance (PMI) to collect information on phrases and unusual words. I learned a lot, but wasn't able to put much of it into use. Yet.
...hopefully I'll find some time to try out topic modelling on this data at some point.
3) Relevance of the project
Ideally, we'd like to analyse these reports and make some inferences that are relevant for the NHS and that could lead to improvements in quality of care. Unfortunately, the data set we are looking at is not like hospital episode statistics -- it is not a statistic. Although there were some 1652 fatal accidents in Scotland in 2011, only 28 FAI reports were published that year. Our dataset consists of the 82 such published inquiry results from the last few years. Some inquiries are published long after the incident, but this indicates that inquiries are held for less than 2% of fatal accidents.
Inquiries can be called whenever there are unusual circumstances. They are required in some circumstances, such as when a death occurs in custody. By definition, then, these accidents are the outliers. Some of them are candidates for the 'Darwin Awards': tragedies begot by stupidity. Others are simply tragedies.
The Scottish authorities hold these inquiries with an eye toward preventing further accidents, and such investigations do have impact on our daily lives. Protocols for how often the highway lines are repainted, police guidelines for how people in custody are transported, and yes, even those ubiquitous labels: this is not a toy; not for children under 3 yrs of age; do not play on or around. FAIs are the fault-checking analyses that lead to health and safety advice.
So we tried various approaches to extracting information and comparing text in the reports, but ultimately we did not come up with a truly compelling use case for the data or inferences from it.
Observations on the data:
- Accident statistics are interesting reading.
- Each of these accidents is a story of its own.
- Men are in more fatal accidents than women. For all ages, nearly 65% of the accident victims are men. For men < 65 years of age, the ratio climbs to nearly 75%.
- Fatal accidents are more common in older people. In 2011, 57% of the male accident victims were over 65 yo. At the same time, among female victims, 76% were over 65 yo.
- In younger age groups, poisoning is the most common cause of accidental fatalities. In 2011, there were no poisonings in children < 15 yo. The statistics include alcohol poisoning.
- Falls are the most common fatal accident type for people over 65 years, and over 60% of the victims are women, showing a sharp contrast with all other accident types and ages.
- don't forget your coffee cup
- wander around and see what different groups are doing -- don't wait until the pub afterwards to find the person with a degree in computational linguistics!
- what's your goal? if it's social, be social. If it's coding, join a group
- you will probably learn more from a larger team with more varied skills
- how competitive do you want to be?
- a video of a good use case is impressive
- it's often more efficient to ask for help
Thanks! Great account. Are you coming to Cardiff?
ReplyDeleteI'd love to, but sadly won't make it.
Delete