Have you ever been fascinated with the origin of the world? Or questioned how it all started? Well, this curiosity is fundamental to human existence and over the years, millions of us have spent their life looking for answers. Although the correct answer has not been discovered yet, tremendous amount of work has been done and remarkable amount of knowledge generated by scientists and researchers over the years. Of the recent breakthroughs, the discovery of the Higgs boson particle or the “God particle” in March 2013 has certainly been a leap closer to the secrets.
The discovery has been possible because of the Large Hadron Collider (LHC), the mammoth machine that allows about 600 million particle collisions per second. Experiments of this size are capable of generating around 30 petabytes of data per year (after filtering out the not-so-useful 99%) – the amount which if converted to text on paper can fill as many as 600 million filing cabinets of standard size or be equal to 9 million high-definition movies. Big data (from a single source albeit) cannot get bigger than this.
The natural question which now arises is how this amount of data is managed, since it is not possible for any device invented until now to store and process this data single-handedly. Well, the answer to this lies in the computing facilities spread and connected across the globe accessed by research groups, scientist, students, as well as private analyzers many of whom have shaped their careers on the basis of data available from these sources. An interesting fact to note is that it was in an attempt to store, process, and access this large amount of data conveniently that the World Wide Web, without which you wouldn’t have been reading this article, was created in 1989 by CERN.
In a bid to arrange massive storage facilities, global networking, computing power, and funding, the Worldwide LHC Computing Grid was established to give a community of about 10,000 physicists access to LHC data near real time. The grid “is a distributed computing infrastructure arranged in tiers.” It handles the data of the four main experiments being conducted at CERN. It utilizes powerful processing and visualization tools and sensors in a flexible manner to provide users their requested data. The distributive system has other benefits as well. If an untoward event occurs at one site, it can be received from the several other locations it is mirrored at.
So this was about how big data training is being used to find the fundamental questions of human existence. Of the finest achievements by CERN, innovations in the field of big data have been a substantial progress that has made numerous scientific breakthroughs possible, taking humans, as a species, closer to find why the universe is how it is and how our earth came into being.
Have you ever come across an application of big data that boggled your mind? Share your experiences in the comments section below.