In our everyday life, we search for and receive a lot of information. Through the Internet, the searching for data and facts has changed tremendously. Search engines help us find the information we need from the vast amount available. But how do Google and other search engines actually gather the information? Behind both large search engines and smaller library databases, there are so-called...Information Retrieval: the Great Search for Knowledge
Seemingly omnipresent in the business and tech world and passionately debated, Big Data has had a profound impact on today’s increasingly digitized economy. While advocates of the business model underscore the different applications of this amassed data, more and more critics of this practice are voicing their concerns about privacy issues. Revelations from whistleblowers like Edward Snowden have shed new light on the surveillance practices of government agencies, like the NSA, leaving many users wary of programs and applications soliciting them to share their personal information. For this reason, many citizens have developed a sense of caution and skepticism regarding the term, ‘Big Data’. But a closer look reveals that there’s much more to the practice than what’s described in newspaper and magazine headlines.
Big Data: definition?
‘Big Data’ refers to data volumes that are so complex that conventional software and hardware used for processing data is no longer of any use. This means that, at its essence, Big Data is a neutral term, as it’s also used to describe harmless quantities of data that can be observed in research or other non-commercial environments. However, given the fact that such data can also refer to personal information, like the communication or consumer behavior of internet users, the term often carries negative connotations. Opponents of the practice are worried about potential abuse of personal rights when it comes to collecting and evaluating data.
How ‘big’ is Big Data?
The term ‘Big Data’ doesn’t refer to any certain amount of data. There’s no clear boarder indicating when a quantity of data should be classified as ‘Big Data’. Typically, however, the term refers to quantities of data that are so large that they can no longer be measured in gigabytes.
How is Big Data accrued?
Data volume has reached immense proportions: it only took 10 minutes in 2014 to collect the same amount of data that humans produced from the beginning of mankind up until 2002. According to some prognoses, this vast sum of data is only going to increase, doubling every two years. The flood of data has mostly been brought on through the increasing digitization of daily life. Big Data is created by stringing together different data sources, like:
- Mobile internet use
- Social media
- Vital data measurements
- Media streaming
Big Data doesn’t solely refer to collected data; the use of this information and its analyses is also part of this definition. The goal here is to find patterns and relationships and to put these into the right contexts. One of the biggest challenges in doing so isn’t just having to work with the enormous volumes of data; data speed and the diversity of such information also play a role. Data continuously flows into an unstructured pool. There, it needs to be gathered, saved, and processed, all in real-time, if possible. In order to properly interpret this data and put it into context, a sophisticated data infrastructure is needed.
How to deal with Big Data
According to its definition, the data volumes that Big Data works with are so extensive that normal software and hardware simply aren’t able to cope with such a large quantities of information. When dealing with such sizeable scales, special technical requirements are demanded of the applied software. Only with the help of special frameworks can data be analyzed. The software needs to be able to process as many data sets as quickly as possible and furthermore has to be capable of rapidly importing large quantities of data. What’s more, the software should be able to provide users with the data quantities in real time and, if necessary, have the ability to simultaneously answer multiple database requests.
Hadoop is a popular open source solution, but due to its complex implementation, using it often requires the support of a data scientist. Fortunately, cloud computing options are available that don’t require such expertise.
Examples of Big Data applications
There’s a diverse range of applications for Big Data that meet the demands of many fields and topics. Even very simple, everyday applications that are known to internet users worldwide are based on this technology. A popular application of Big Data that can be observed on the sites of large online retailers is the small box that informs you which additional products other customers who’ve viewed your current item of interest also went on to purchase. These recommendations arise by evaluating the purchasing data of other customers.
Further areas that profit from Big Data:
- Medical research: by evaluating large sums of data, doctors can develop the best-possible therapy solutions and plans for their patients.
- Manufacturing: companies can monitor the data of their machines and hence increase the efficiency and sustainability of their production.
- Business: Big Data helps companies get to know their customers and allows them to better match their offers to their desires.
- Energy: in order to tailor energy consumption to individual use, the usage rates need to be known. Collected user data ensures a more sustainable energy supply in the long run.
- Marketing: in terms of marketing, Big Data is often used in retargeting efforts. The goal here is to improve customer relationships.
- Fighting crime: governments and defense departments around the world are using Big Data to support their efforts in the fight against terrorism.
Criticism of Big Data
Much of the criticism around Big Data encompasses data protection topics. Large datasets offer potential for companies and brands; thanks to Big Data, marketing measures can be more easily adjusted. However, for targeting efforts, the applied data can be used to create a precise user profile. Data protection activists and civil liberty organizations view these measures as an intrusion into individual’s privacy.
Another controversial issue is the virtual complete control some companies have over such data. Where there’s Big Data, there’s often big money to be found. All of this potential for vast profits creates an environment where big players like Google call the shots. This monopoly of power over which large search engine providers preside is widely criticized. Without clear rules and regulations for protecting data and anonymizing it, there’s no way of being completely able to rule out abuse.
Responsible use of Big Data
Despite these criticisms, Big Data can be very useful, if the technology is correctly implemented. Important feats in cancer research would not have been possible if it weren’t for power of Big Data. Data gathered from power supply and traffic systems are evaluated and used to optimize existing structures. But despite all the potential offered to the disciplines of medicine, traffic control, and the business world, there are still plenty of ethical questions that need to be dealt with. Predicting certain events, such as the probability that an individual will develop a certain illness, is, at least for many, an unsettling prospect. A sound strategy needs to be developed in order to make sure that the rights of private individuals are respected, while not losing sight of the goal of a Big Data project.