Hadoop Tutorials.CO.IN
Big Data - Hadoop - Hadoop Ecosystem - NoSQL - Spark

Big Data Analytics - What is that ?

by Dipayan Dev

In a recent statistics, IBM estimates that every day 2.5 quintillion bytes of data are created - so much that 90% of the data in the world today has been created in the last two years. It is a mind-boggling figure and the irony is that we feel less informed in spite of having more information available today.

The surprising growth in volumes of data has badly affected today's business. The online users create content like blog posts, tweets, social networking site interactions and photos. And the servers continuously log messages about what online users are doing.

The online data comes from the posts on the social media sites like Facebook and Twitter, YouTube video, cell phone conversation records etc. This data is called Big Data.


Big Data concept means a datasets which continues to grow so much that it becomes difficult to manage it using existing database management concepts & tools. The difficulty can be related to data capture, storage, search, sharing, analytics and visualization etc.

The Big Data spans across three dimensions: Volume, Velocity and Variety.

  • Volume - The size of data is very large and in terabytes and petabytes.
  • Velocity - It should be used when streaming in to the enterprise in order to maximize its value to the business. The role of time is very critical here.
  • Variety - It extends beyond the structured data, including unstructured data of all varieties: text, audio, video, posts, log files etc.


When an enterprise can leverage all the information available with large data rather than just a subset of its data then it has a powerful advantage over the market competitors. Big Data can help to gain insights and make better decisions.

Big Data presents an opportunity to create unprecedented business advantage and better service delivery. It also requires new infrastructure and a new way of thinking about the way business and IT industry works. The concept of Big Data is going to change the way we do things today.

The International Data Corporation (IDC) study predicts that overall data will grow by 50 times by 2020, driven in large part by more embedded systems such as sensors in clothing, medical devices and structures like buildings and bridges. The study also determined that unstructured information - such as files, email and video - will account for 90% of all data created over the next decade. But the number of IT professionals available to manage all that data will only grow by 1.5 times today's levels.

The digital universe is 1.8 trillion gigabytes in size and stored in 500 quadrillion files. And its size gets more than double in every two years time frame. If we compare the digital universe with our physical universe then it's nearly as many bits of information in the digital universe as stars in our physical universe.


A Big Data platform should give a solution which is designed specifically with the needs of the enterprise in the mind. The following are the basic features of a Big Data offering-

  • Comprehensive - It should offer a broad platform and address all three dimensions of the Big Data challenge -Volume, Variety and Velocity.
  • Enterprise-ready - It should include the performance, security, usability and reliability features.
  • Integrated - It should simplify and accelerates the introduction of Big Data technology to enterprise. It should enable integration with information supply chain including databases, data warehouses and business intelligence applications.
  • Open source based - It should be open source technology with the enterprise-class functionality and integration.
  • Low latency reads and updates
  • Robust and fault-tolerant
  • Scalability
  • Extensible
  • Allows adhoc queries
  • Minimal maintenance


The main challenges of Big Data are data variety, volume, analytical workload complexity and agility. Many organizations are struggling to deal with the increasing volumes of data. In order to solve this problem, the organizations need to reduce the amount of data being stored and exploit new storage techniques which can further improve performance and storage utilization.


Big Data is a new gold rush & key enabler for the social business. A large or medium sized company can neither make sense of all the user generated content online nor can collaborate with customers, suppliers and partners effectively on social media channels without using Big Data analytics. The collaboration with customers and insights from user generated online contents are critical for the success in the age of social media.

In a study by McKinsey's Business Technology Office and McKinsey Global Institute (MGI) firm calculated that the U.S. faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of Big Data.

The biggest gap is the lack of the skilled managers to make decisions based on analysis by a factor of 10x.Growing talent and building teams to make analytic-based decisions is the key to realize the value of Big Data.

Thank you for reading. Happy Learning !!


Follow us on Twitter

Recommended for you