The Life(cycle) of Big Data

Circle.png

The amount of data generated by internet-connected devices is increasing exponentially, from a global tally of internet traffic amounting to just 100 gigabytes of data per hour in 1997, reaching more than 120,000 gigabytes per hour in 2007. This year, estimates indicate that total internet traffic, including mobile data, will exceed 1.6 million gigabytes per hour. This exponential increase in the amount of data being produced and traveling over networks is the backbone of the Big Data buzz, and many believe that data will not only improve decisionmaking in companies, but also will enable artificially intelligent systems to make better decisions for us.

But unlocking the value of Big Data requires ever smarter ways of gathering, storing and analyzing this rising tide of data, so that it can be converted into the insights that will solve the pressing problems of business today and in the future. As this new type of value chain evolves, it will become increasingly important for companies to understand how all the links fit together. 

Step 1: Generating Big Data

While personal computers were previously the primary tool to collect and transmit data, the Big Data revolution is being powered by an ever-increasing array of devices with embedded sensors and internet connections that send, receive, and react to data.

This “Internet of Things” (IoT), such as wearable fitness devices, industrial equipment, and home monitoring sensors, stands in stark contrast to the manual data entry of traditional company databases. Even now, companies devote enormous resources to solicit and record consumer feedback on their products or services through surveys.

Dr. Huang Ying, Vice President of Research & Technology at Lenovo, described society as “transitioning from an era of personal computing to pervasive computing.” From wearable tech to smart home appliances, data is being generated from an increasingly diverse set of sources. The Internet of Things and smart devices now track and quantify basic information such as heart rate, daily step count, pollution levels, and other patterns of daily life. This data is generated automatically in the background on our devices, sometimes without the explicit knowledge of users who have forgotten the permissions they granted when, for example, installing an app.

One example of how companies collect data is the popular American entertainment streaming company Netflix, which operates in 190 countries (but not Mainland China). Netflix has over 100 million subscribers who stream a total of 125 million hours of TV or movies per day. The company collects information on individual users, including clicks, pauses or fast forwarding, and time of day, and then correlates that data with demographic data such as the age, gender, and genre preferences of the user. This collection of user data enables the company to tailor a customizable experience to individuals: “Netflix has a culture of experimentation and data-driven decision making that allows new ideas to be tested in production so we get data and feedback from our members,” wrote Nirmal Govind, Director, Streaming Science & Algorithms at Netflix.

Step 2: Storing Big Data

Large data sets used to be stored in-house on bulky mainframes that were connected to an internal network; later, third-party servers off-site were the storage method of choice. But the rapid increase in data generation has forced a similarly rapid move toward cloud computing. “There is no more shift to the cloud, the cloud is here,” says Reagen Li, Marketing Manager at Aryaka Networks, a Silicon Valley-based network connectivity service provider and new AmCham China member. “Both foreign and domestic enterprises in China are rapidly adopting cloud storage solutions for Big Data to gain the upper hand in business information, but connecting to cloud storage and Big Data adds pressure on enterprise’s existing IT infrastructure.”

Cloud solutions are relatively low cost, easy to install, and allow on-demand flexibility for a mobile workforce. The cloud is accessible from virtually any Internet-connected device, and more secure because data is automatically backed-up. Microsoft’s OneDrive is a file-hosting service that enables users to store, backup and share documents, photos, and video’s online. The free basic plan includes 5 GB of storage and works on all devices. Businesses can use this cloud storage solution to collaborate in Microsoft Office Online documents and receive real-time notifications on edits from colleagues.

Apple recently announced that it was opening its first data center in China. The US technology company will partner with a Chinese data management firm to build the new storage facility in Guizhou province. "The addition of this data center will allow us to improve the speed and reliability of our products and services while also complying with newly passed regulations," Apple said in a statement. The data center is part of a US $1 billion investment in the region, and will also comply with storage localization requirements and cross-border data transfer regulations outlined in China’s Cybersecurity Law.

Step 3: Analyzing Big Data

The value for business is not necessarily in collecting and storing the data itself, but rather in understanding what the data reveals. Big Data analysis enables companies to better understand their products, customers or internal processes.

Before Big Data, decision-makers relied on data from surveys, experiments or anecdotal evidence, but this type of data is less reliable because of inherent human biases. In contrast, Big Data is generated automatically, in the background and on a massive scale so the sample size is larger, more dynamic, and more specific.

Big Data is multi-dimensional and requires advanced computational analysis. Software such as Microsoft’s Power BI and IBM’s Watson Analytics run advanced analytics (regression analysis and algorithm modeling) between multi-variate data to help analysts make connections and spot trends within the data set. Machine learning and artificial intelligence (AI) are now able to create and implement algorithms based on Big Data sets automatically, but we still rely on humans to provide context and reveal conclusions to make the information digestible.

Steal our Slides: Download our PPT slides about developing regulations on data collection and storage in China and customize them for your next presentation.

According to Dr. Ding Wei, co-chair of AmCham China’s ICT Forum, “Big Data is the gas for artificial intelligence and can power other innovative automation solutions going forward.” Given that China is already a large manufacturing hub, the analysis of Big Data involved in logistics and supply chain operations can create significant advantages for companies that move quickly. 

Step 4: Utilizing Big Data

Brick and mortar retailers in the past tracked purchases of products and used previously collected data to anticipate what inventory needed to be ordered for the future. As retail shifted to the internet, companies such as Amazon have more data about what consumers click or how long users stay on a given webpage. Other companies like Netflix use data about users to suggest content that its algorithms reveal a certain user will likely enjoy. (For example, Netflix knows that I enjoy romantic comedies from the early 2000’s featuring Jennifer Garner and that I do not enjoy military action and adventure movies. The company then utilizes that data to display a personalized home page of suggested movies and TV shows that I will enjoy.)

Today, businesses that properly utilize Big Data have a competitive advantage over those relying on outdated techniques. The insights gleaned from Big Data analysis support better decision-making in business because they allow businesses greater customization. Big Data can be used to transform applications and drive new functionality based on trends in the past.

Big Data can be also made available to the public. Open source data is particularly powerful because it enables more people to analyze it and can generate many different conclusions. Over the past three years, Chinese courts have posted more than 29 million court judgments online. This mass release of court data is changing how courts judge cases as well as how litigants and lawyers navigate the legal system. Just in Henan province alone, 1,058,986 cases from 184 Henan courts have been posted online and are searchable. Access to this data helps curb judicial wrong doing, standardizes outcomes and offers predictability though precedent to all companies seeking to do business in the province.

Big data can also be predictive; it has the potential to identify problems before they occur by spotting anomalies in the data. As companies gain greater insight on how their products of services are used and the complex relationship between even seemingly disparate variables, they can begin to anticipate and respond in real-time to user behavior. 

Unlocking Value

Networking experts at Cisco estimate that the world’s devices will produce roughly 66 million gigabytes of internet traffic every hour by 2021, continuing the trend of exponential growth. As the technology products that enable Big Data become even more ubiquitous, this cycle will become a more familiar feature of the business landscape. Owing to developments in sensors and mobile connections, companies are imminently more capable of measuring devices in real time, storing the data in an accessible way, analyzing that information for critical insights, and incorporating those results back into their business practices. As the lifecycle and the processes within each segment are further refined, savvy companies will look for ways to unlock even more value from Big Data.