Importance of data in Machine Learning

Bits, Bytes, Kilobytes, Megabytes, Gigabytes, Terabytes. We’ve all hear these digital information units on a daily basis. But what about Petabytes, Exabytes, Zettabytes, and so on? 

Did you know that approximately 2.5 quintillion bytes of data is generated every day? To put that into perspective, that’s around 2.5 million Terabytes of data. Mind-boggling, isn’t it? 

Where does this humongous amount of data come from? Listed below are a couple of sources: 

  1. Internet (web queries/searches from the 4.4 billion internet users, broadband data, etc)
  2. Social Media (photos, videos, comments, likes on Instagram, Facebook, Snapchat, Youtube, etc) 
  3. Communication (text messages, emails, video calls, etc) 
  4. Internet of Things (IoT) (connected cars, machines, meters, wearables, and other consumer electronics)
  5. And much, much more!

The enormity of Big data is so much that humans can’t reasonably visualize it without help. So recognizing patterns and making predictions is a long-shot. Machine learning helps humans make sense and use of big data.

Machine Learning, a subset of Artificial Intelligence, has its foundations laid on algorithms.  A basic definition of an algorithm is a systematic set of operations (a procedure) to perform on a given data set. Machine Learning algorithms are used to identify patterns within vast amounts of data. This vast amount of data, or Big Data, enables a computer to perform work based on pattern recognition.

Here a real-life application of big data:

image courtesy-

And here’s the thing about machine learning algorithms- they become more effective when they are fed more data. So more the data, the merrier. The more data that is fed to the algorithm the more it can “learn” i.e recognize patterns in the data in order to make predictions. Are you starting to connect the dots here? 

Is Big data the same as a large database with random information?

Well, no.

Yes, big data implies lots of data, but it also includes the idea of complexity and depth. Big data is essentially huge amounts of data with depth and complexity from which meaningful patterns can be drawn. 

Sources of Big Data: 

  1. Corporate-owned Databases
  2. Public Sources (Governments, Universities, Non-Profit Organizations)
  3. Private Sources (Amazon, Google)
  4. Creating new data from existing data


So I’m going to publish articles on my learnings on AI, ML and everything around it. So if you want to join on my learning journey, feel free to subscribe.

Leave a Reply


Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close