Big data is a very important subject in modern times with the rapid advancement of new technologies for example smartphones, pc/laptops, game consoles, that all in some way gather information that is stored. Big companies are needing a place to not only store all the data that is coming in but to also analyse it for specific purposes and at the fastest speed manageable. There are many different providers out there who provide this service, this paper will talk about one way the company Google handles data using their own special made platform.
In modern times, the amount of data being stored is terrifically large. Companies must deal with such abundance of data on a daily basis in both storing and analyzing as fast as they can. Google is one such company that not only store data but they analyze data from each user using their product. The platform used by google for this database management is called BigQuery, which runs in the cloud and provides real time information. In this survey, the inner working of BigQuery is glossed over to show how this platform manages to do the job it is supposed to do.
Role of BIG DATA
Big data is a vast amount of information, both structured and unstructured, that can be categorized in to 3 main categories:
• Volume- Data from variety of sources, such as social media or machine-to-machine data.
• Velocity- Data streams in at unparalleled speed.
• Variety- Data comes in many formats, including documents or audio or even email.
Management of big data is useful with how it is used. Ways to use the stored information include, but not limited to, reduction of costs, time reductions, and making smart decisions based on data results.
By the way – WHAT IS THE CLOUD?
The cloud, or cloud computing (basically internet computing), is a net based resource that allows sharing computer resources without the need of local servers. Not only does it not make us of local servers but it does not make use of one’s hard drive. This allows your hard drive to not lose memory from whatever it is that is being saved.
The benefit of being able to save/store data on the cloud instead of a local hard drive or a local server is being able to access much needed files from anywhere. These devices will work provided that they have an active internet connection. Once connected, a cloud could be either set in private or public mode, depending on a companies or individuals priority for the safety of the data. Even though it seems that there is a general theme, there is not one strict set cloud computing network but many with business, like Google, being primary providers. They do follow some guidelines, in the Cloud computing usually work in three service categories:
BigQuery works with the IaaS infrastructure and more would be explain on what that is later on.
To have to ability to use the cloud requires software from a company however. Examples include Google’s Google Drive, Microsoft’s OneDrive, or DropBox to name a few and are the most known applications.
As mentioned before, BigQuery is a platform designed by Google in 2010. It works with Google’s cloud storage, not Google Drive, abilities to do large scale data analytics. A user is allowed to work with up to billions of rows of datasets and is easy to use. It can be accessed by using a web UI or a command-line tool, or by making called to the BigQuery REST API. These API use client libraries such as Java, .NET, or Python for interaction. Third party tools are also available to allow for interactions including, but not limited to, visualisation of data.
Client libraries are useful snippets of code written in different coding languages that would allow access to API’s, which this case allows BigQuery to interact with Google Storage.
Before one can analyze or use any of the BigQuery features, the data must be loaded on the BigQuery itself it a process called a job. A job is just actions made by the user and executed by BigQuery on their behalf.
Data is exporting in many formats and each file has a max export of about a GB per file but allows for up to a thousand exports per day.
BigQuery is an IaaS webservice. An IaaS is a form of cloud computing that provides virtualized computing resources over the internet. It is a host software, servers, etc.… on behalf of its users. While also hosting applications to handle system maintenance or backup.
Big Data: What it is and why it matters