SUBSCRIBE

Beware Of Big Data Biases

Companies large and small are rushing to coral the abundant and cheap data available in various social media outlets and dig into the readily available information of what people are thinking, feeling, doing and even intending to do in the hopes improving corporate decisions, campaigns and making more money.

Two computer scientists, one from Canadaโ€™s McGill University and the other from the Carnegie Mellon University and the other from Carnegie Mellon University in the United States, are warning that these huge datasets can be โ€œmisleading.โ€

big data, social media, analytics, researchers

McGillโ€™s Derek Ruths and Juergen Pfeffer of Carnegie Mellon cautioned in article they published in the Nov. 28 issue of the journal Science, that big data users need to figure out how to correct for biases inherent in information gathered from Facebook posts, tweets, and other social media output.

Ruths is an assistant professor in the School of Computer Science at McGill.

Pfeffer is an assistant research professor at the Institute for Software research, School of Computer Science at Carnegie.

The two pointed out that thousands of research papers based on data collected from social media are published each year.

โ€œMany of these papers are used to inform and justify decision and investments among the public and in industry and government,โ€ Ruths said in an article in the MCGill Web site.

โ€œNot everything that can be labelled as,Big Data is automatically great, said Pfeffer who was quoted in an article appearing in the Carnegie Mellon Web site. โ€œ,,,the old adage of behavioural research still applies: Know Your Data.โ€

Their research highlighted several issues with using big data. The McGill article posted some of the issues and the ways to address them:

  • Different social media platforms attract different users โ€“ Pinterest, for example, is dominated by females aged 25-34 โ€“ yet researchers rarely correct for the distorted picture these populations can produce.
  • Publicly available data feeds used in social media research donโ€™t always provide an accurate representation of the platformโ€™s overall data โ€“ and researchers are generally in the dark about when and how social media providers filter their data streams.
  • The design of social media platforms can dictate how users behave and, therefore, what behaviour can be measured. For instance, on Facebook the absence of a โ€œdislikeโ€ button makes negative responses to content harder to detect than positive โ€œlikesโ€.
  • Large numbers of spammers and bots, which masquerade as normal users on social media, get mistakenly incorporated into many measurements and predictions of human behaviour.
  • Researchers often report results for groups of easy-to-classify users, topics, and events, making new methods seem more accurate than they actually are. For instance, efforts to infer political orientation of Twitter users achieve barely 65 per cent accuracy for typical users โ€“ even though studies (focusing on politically active users) have claimed 90 per cent accuracy.


Related Download
Application Performance Management for App-Driven Businesses Sponsor: IBM
Application Performance Management for App-Driven Businesses
Software applications are essential in todayโ€™s business environment, where internal and external services are delivered across mobile, social, collaboration, and cloud technologies. Application Performance Management (APM) is strategically important for companies that need to ensure the performance and availability of business-critical software applications — if an application has problems that impact customers, a business can lose revenues or incur damage to its brand.
Register Now


Tech Jobs

Categories