BigData

Forschungsfragen:

Grundsätzliche Fragen, die man sich mit Big Data stellen kann: „Will large-scale analysis of DNA help cure diseases? Or will it usher in a new wave of medical inequality? Will data analytics help make people’s access to information more efficient and effective? Or will it be used to track protesters in the streets of major cities? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Some or all of the above?” (Boyd/Crawford 2011: 1)

“With the increased automation of data collection and analysis – as well as algorithms that can extract and inform us of massive patterns in human behavior – it is necessary to ask which systems are driving these practices, and which are regulating them. In Code, Lawrence Lessig (1999) argues that systems are regulated by four forces: the market, the law, social norms, and architecture – or, in the case of technology, code.” (ebd.: 2)

“We must ask difficult questions of Big Data’s models of intelligibility before they crystallize into new orthodoxies.“ (ebd.: 4) Limitation durch Zeitdimension: “ […] the specialized tools of Big Data also have their own inbuilt limitations and restrictions. One is the issue of time. ‘Big Data is about exactly right now, with no historical context that is predictive […].For example, Twitter and Facebook are examples of Big Data sources that offer very poor archiving and search functions, where researchers are much more likely to focus on something in the present or immediate past – tracking reactions to an election, TV finale or natural disaster – because of the sheer difficulty or impossibility of accessing older data.” (ebd.)

“When researchers approach a dataset, they need to understand – and publicly account for – not only the limits of the dataset, but also the limits of which questions they can ask of a dataset and what interpretations are appropriate.” (ebd. 7)

Auswahl:

Big Data ist nicht aufgrund der Größe, sondern bezüglich der Relationalität zu anderen Daten interessant. Der Wert der Daten ergibt sich aus den entstehenden Mustern/Strukturen, die sichtbar werden, wenn diese mit anderen Daten in Verbindung gebracht werden (z.B. Individual-, Netzwerk- und Gruppendaten). (vgl. ebd.: 1-2) Große Datensets sind nicht automatisch besser. Quantität bedeutet nicht Qualität. Big Data ≠ whole Data Twitter ist eine beliebte Quelle bei der Verwendung von Big Data, aber die Verwendung von Twitterdaten bringt viele methodologische Herausforderungen mit sich, die nur selten durch die Forscher*innen thematisisert werden. Beispiel Twitter: Repäsentiert nicht alle Menschen, obwohl viele Journalisten und Forscher*innen auf eine Allgemeinheit verweisen. twitter users= all people. „Due to uncertainties about what an account represents and what engagement looks like, it is standing on precarious ground to sample Twitter accounts and make claims about people and users. Twitter Inc. can make claims about all accounts or all tweets or a random sample thereof as they have access to the central database. Even so, they cannot easily account for lurkers, people who have multiple accounts or groups of people whoall access one account. Additionally, the central database is also prone to outages, and tweets are frequently lost and deleted. (ebd. 6)”

“Given uncertainty, it is difficult for researchers to make claims about the quality of the data that they are analyzing. Is the data representative of all tweets? No, because it excludes tweets from protected accounts. Is the data representative of all public tweets? Perhaps, but not necessarily.” (ebd. 7)

“Research insights can be found at any level, including at very modest scales. In some cases, focusing just on a single individual can be extraordinarily valuable.” (ebd. 8)

Methodik:

Datenfehler: Große Datensets aus dem Internet sind oft nicht reliabel, da sie anfällig für Datenlücken sind. Diese Fehler und Lücken werden besonders bei multiplen Datensets, die miteinander verknüpft werden zu einem Problem. “This requires understanding the properties and limits of a dataset, regardless of its size. A dataset may have many millions of pieces of data, but this does not mean it is random or representative. To make statistical claims about a dataset, we need to know where data is coming from; it is similarly important to know and account for the weaknesses in that data. Furthermore, researchers must be able to account for the biases in their interpretation of the data. To do so requires recognizing that one’s identity and perspective informs one’s analysis.” (ebd. 5)

Netzwerkanalysen anhand von “behavioral and articulated networks”.

Articulated networks: email or cell phone address books, instant messaging buddy lists, ‘Friends’ lists on social network sites, and ‘Follower’ lists on other social media genres.

Behavioral networks: people who text message one another, those who are tagged in photos together on Facebook, people who email one another, and people who are physically in the same Space.

Probleme: “Fascinating network analysis can be done with behavioral and articulated networks. But there is a risk in an era of Big Data of treating every connection as equivalent to every other connection, of assuming frequency of contact is equivalent to strength of relationship, and of believing that an absence of connection indicates a relationship should be made. Data is not generic. There is value to analyzing data abstractions, yet the context remains critical.”

Analyse und Interpretation der Ergebnisse:

Absatz zu Objektivität: “claims to objectivity are necessarily made by subjects and are based on subjective observations and choices.” (ebd. 5) Alle Forscher*innen interpretieren Daten automatisch, indem sie mit ihnen arbeiten. Ein Model kann mathematisch fundiert, ein Experiment kann valide sein, doch sobald eine Forscher*in versucht, die Bedeutung der Daten zu verstehen, hat der Prozess der Interpretation begonnen. (vgl. ebd. 5) Datensäuberung subjektiv: „in the case of social media data, there is a ‘data cleaning’ process: making decisions about what attributes and variables will be counted, and which will be ignored. This process is inherently subjective.” (ebd.)

“‘As a large mass of raw information, Big Data is not self-explanatory. And yet the specific methodologies for interpreting the data are open to all sorts of philosophical debate. Can the data represent an ‘objective truth’ or is any interpretation necessarily biased by some subjective filter or the way that data is ‘cleaned?’ (2010, p. 13) ‘“ (ebd.)

Die Interpretation der Daten steht im Zentrum der Datenanalyse, ungeachtet der Größe des Datensets. (vgl. ebd. 6)

Aus: Soziale Medien als Gegenstand und Instrument sozialwissenschaftlicher Forschung

Hypothesenbildung:

Hypothesenbildung kann nicht durch statistische Auswertungen und reines Sammeln von Daten ersetzt werde. Klassische Forschung ist an die Entwicklung von Forschungsfragen gebunden mit Hilfe derer man eine Phänomen gezielt und detailliert untersuchen kann. (

Missing Data und Untersuchungsbereich: Klassische Problem bestehen weiterhin bei größeren oder mehreren Datensätzen. Die Erhebung bleibt weithin nicht zufällig und bilden ebenso nur ein Teilausschnitt da. Man muss sich deswegen im klaren sein wo sein Untersuchungsbereich liegt und wo er endet. Außerdem müssen die Daten großer Datensätze ebenso kontrolliert werden.

Ethische Überlegungen Bei Big Data ergeben sich neue Forschungsethische Fragen bzw. Standards. Persönliche Verantwortung: ForscherInnen sind in der persönlichen Verantwortung gegenüber allem Unvorhergesehenen, was im Zuge der Untersuchung auftreten kann, sowie verpflichtet über Gefahren und auf das Recht auf Verweigerung zu informieren.**