Tuesday, 11 January 2011

#LAK11 what about #privacy when using #data analytics to create #knowledge?

LAK11, the course on learning analytics and knowledge has started and immediately discussions touch a sensitive and complex issue: privacy. While talking about educational data mining, the issue of privacy was raised by many participants.

First a quick note on educational data mining: data mining, also called Knowledge Discovery in Databases (KDD), is the field of discovering novel and potentially useful information from large amounts of data. And with many social media apps gathering massive amounts of data, this field is of growing interests to companies as well as researchers. Educational data mining has a different aim, not to make money, but to map/increase/analyse educational knowledge creation and awareness. What makes educational data mining so interesting, is that it is increasingly using models drawn from the psychometrics literature in education.

The idea behind EDM is a great one: modeling student individual differences in these areas enables software to respond to those individual differences, significantly improving student learning for example. Another key area of application of EDM methods has been in studying pedagogical support (both in learning software, and in other domains, such as collaborative learning behaviors), towards discovering which types of pedagogical support are most effective, either overall or for different groups of students or in different situations. The only thing is, how do you cope with the privacy issues?

Privacy can be good or bad
Gathered data – just like anything else – can be used for the good or the bad. An early form of data gathering was getting all the Jews registered in Germany. Once that was done – for purely administrative purposes at first – it was a small step to gather the Jews themselves for … processing. There is also a recent initiative: in Nepal the third sex has been added to Nepal's next demographic surveys. This is a wonderful thing (I for one do not apply for many 'either/or' categories), but it makes abuse possible.

To me privacy is more relevant to minorities and vulnerable groups, then any other groups in society. If facebook sells gay related data to the Ugandan government, I would not like to be a gay person in Uganda, for you risk being seriously harassed if not killed (at this point in time). No matter when, privacy should be an option that is respected, disconnected from your own online identity, certainly you – as a user – should be able to use a ‘non-traceable’ account for certain reasons?

For instance by using disconnect(ere).

Out of the crowdsourcing cloud, out of the real world?
A way forward to keep your own data private is via: disconnect. But then if all of the people from a specific group use disconnect, their data will no longer be in the crowd, and as such 'non-existent', which makes vulnerable groups even more invisible and less catered to.

perspective of a LAK11 participant
Sarah Haavind, a participant of LAK11, mentions two interesting points on privacy as well:
First: the FB and twitter users no longer clutch to privacy (is this really true and has anyone researched this or the reasons behind it? I do mind and I adjust my privacy settings)
Secondly she adds to the privacy-commercial link: “What if the data-mining marketers know what I like? Isn't it a positive to be bombarded with ads for products I'm interested in rather than a random assortment, given their presence in our lives?”.
She has a point there, but… how can small, very targeted businesses come up through the data analytics that is available?
If I had to choose, I would also like smaller companies to be able to take part in this ad-data-world. I am not interested in starbucks, I am interested in the local coffeeshop burning its own coffee beans, just because I like a variety of tastes. If I go to Ethiopia, I want to listen to local contemporary music, I am not interested in the hashed music that can be found everywhere. So to that effect, I would like it better if I could choose localized data from my profile of ‘local business’ then the big mainstream stuff. But do I need to open up my private life to get access to such time-saving and potentially interesting stuff?

Having said this, gathering and mining educational data might help us a long way to get quick access to some sustainable outcomes (learner preferences, learner critical thinking skills…). But of course, like all data mining, the proof is in the algorithm. If the algorithm is based on a wrong hypothesis, will the results then still be useful? And if data is sold to the highest bidder, then the ethics of that highest bidder better be in sync with my interpretation of ethics.